Minimal feature set in language identification and finding suitable classification method with it

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Elsevier Science Bv

Erişim Hakkı

info:eu-repo/semantics/openAccess

Özet

Language identification (LI) is a phase of natural language processing. Although LI is formerly studied, there is still much work to do for better performance. The purpose of this study is to present low dimensional feature set which is built from letters and diacritics and suitable classification algorithm (C-SVC, MLP or LDA) with it for high performance. In addition, a weight factor has been integrated to language identification system for increasing the performance. Experiments have been done on ECI corpus. Weight factor has increased the classification accuracies. The most accurate and the fastest method is C-SVC for our feature set. (C) 2011 Published by Elsevier Ltd.

Açıklama

1st World Conference on Innovation and Software Development (INSODE) -- OCT 02-10, 2011 -- Bahcesehir Univ, Istanbul, TURKEY

Anahtar Kelimeler

language identification, feature based methods, letter features, weighting factor, classification algorithms

Kaynak

First World Conference on Innovation and Computer Sciences (Insode 2011)

WoS Q Değeri

Scopus Q Değeri

Cilt

1

Sayı

Künye

Onay

İnceleme

Ekleyen

Referans Veren