An Integrated Architecture for Processing Business Documents in Turkish

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

Springer-Verlag Berlin

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

This paper covers the first research activity in the field of automatic processing of business documents in Turkish. In contrast to traditional information extraction systems which process input text as a linear sequence of words and locus on semantic aspects, proposed approach doesn't ignore document layout information and benefits hints provided by layout analysis. In addition, approach not only checks relations of entities across document for verifying its integrity, but also verifies extracted information against real word data (e.g. customer database). This rule-based approach uses a morphological analyzer for Turkish, a lexicon integrated domain ontology, a document layout analyzer, an extraction ontology and a template mining module. Based on extraction ontology, conceptual sentence analysis increases portability which requires only domain concepts when compared to information extraction systems that rely on large set of linguistic patterns.

Açıklama

10th International Conference on Intelligent Text Processing and Computational Linguistics -- MAR 01-07, 2009 -- Mexico City, MEXICO

Anahtar Kelimeler

Information Extraction

Kaynak

Computational Linguistics and Intelligent Text Processing

WoS Q Değeri

Scopus Q Değeri

Cilt

5449

Sayı

Künye

Onay

İnceleme

Ekleyen

Referans Veren