Audio-Only Phonetic Segment Classification Using Embeddings Learned From Audio and Ultrasound Tongue Imaging Data

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

IEEE-Inst Electrical Electronics Engineers Inc

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

This paper presents a phonetic segment classification method based on joint embeddings learned from processing Ultrasound Tongue Imaging (UTI) and audio data. For constructing the embeddings, we compiled an ultrasound image dataset synchronized with audio that encompasses common speech scenarios. The embeddings are obtained from artificial neural network models trained on this dataset. During testing, our model processes only audio data, making it practical for speech therapy as no ultrasound imaging is required. Experiments show that our method yields similar performance compared to methods that simultaneously use both audio and UTI data. However, it outperforms the methods utilizing solely audio or UTI data in real-time classification.

Açıklama

Anahtar Kelimeler

Speech therapy, ultrasound tongue imaging (UTI), phonetics, phonetic segment classification, phonetic embedding space, Speech therapy, ultrasound tongue imaging (UTI), phonetics, phonetic segment classification, phonetic embedding space

Kaynak

Ieee-Acm Transactions on Audio Speech and Language Processing

WoS Q Değeri

Scopus Q Değeri

Cilt

32

Sayı

Künye

Onay

İnceleme

Ekleyen

Referans Veren