Letter based text scoring method for language identification

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer-Verlag Berlin

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

In recent years, an unexpected amount of growth has been observed in the volume of text documents on the internet, intranet, digital libraries and news groups. It is an important issue to obtain useful information and meaningful patterns from these documents. Identification of Languages of these text documents is an important problem which is studied by many researchers. In these researches generally words (terms) have been used for language identification. Researchers have studied on different approaches like linguistic and statistical based. In this work, Letter Based Text Scoring Method has been proposed for language identification. This method is based on letter distributions of texts. Text scoring has been performed to identify the language of each text document. Text scores are calculated by using letter distributions of new text document. Besides its acceptable accuracy proposed method is easier and faster than short terms and n-gram methods.

Açıklama

3rd International Conference on Advances in Information Systems -- OCT 20-22, 2004 -- Izmir, TURKEY

Anahtar Kelimeler

Kaynak

Advances in Information Systems, Proceedings

WoS Q Değeri

Scopus Q Değeri

Cilt

3261

Sayı

Künye

Onay

İnceleme

Ekleyen

Referans Veren