Centroid-based language identification using letter feature set

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer-Verlag Berlin

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

In recent years, an unexpected amount of growth of the text documents volume has been observed on the internet, intranet, in digital libraries and newsgroups. To obtain useful information and meaningful patterns from these documents, a great many researchers known under the term text mining have been carried out. Among them text categorization is to be mentioned that covers the problem of classifying documents relative to their similarities. One of techniques applied in this area is called centroid-based document classification method. All researchers on text categorization use the notion of frequency somehow or other. In this study, letter frequencies (LF) have been used for text categorization. By making use of letter frequencies information, the centroid-based document classification has been carried out. An experiment has been done on language detection for text documents. Its results allow propose that the letter-based text categorization should be done prior to term based text categorization.

Açıklama

5th International Conference on Intelligent Text Processing and Computational Linguistics -- FEB 15-21, 2004 -- Seoul, SOUTH KOREA

Anahtar Kelimeler

Kaynak

Computational Linguistics and Intelligent Text Processing

WoS Q Değeri

Scopus Q Değeri

Cilt

2945

Sayı

Künye

Onay

İnceleme

Ekleyen

Referans Veren