Audio-Only Phonetic Segment Classification Using Embeddings Learned From Audio and Ultrasound Tongue Imaging Data
Yükleniyor...
Tarih
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
IEEE-Inst Electrical Electronics Engineers Inc
Erişim Hakkı
info:eu-repo/semantics/closedAccess
Özet
This paper presents a phonetic segment classification method based on joint embeddings learned from processing Ultrasound Tongue Imaging (UTI) and audio data. For constructing the embeddings, we compiled an ultrasound image dataset synchronized with audio that encompasses common speech scenarios. The embeddings are obtained from artificial neural network models trained on this dataset. During testing, our model processes only audio data, making it practical for speech therapy as no ultrasound imaging is required. Experiments show that our method yields similar performance compared to methods that simultaneously use both audio and UTI data. However, it outperforms the methods utilizing solely audio or UTI data in real-time classification.
Açıklama
Anahtar Kelimeler
Speech therapy, ultrasound tongue imaging (UTI), phonetics, phonetic segment classification, phonetic embedding space, Speech therapy, ultrasound tongue imaging (UTI), phonetics, phonetic segment classification, phonetic embedding space
Kaynak
Ieee-Acm Transactions on Audio Speech and Language Processing
WoS Q Değeri
Scopus Q Değeri
Cilt
32








