Author Identification in Turkish Documents with Ridge Regression Analysis

Yükleniyor...
Küçük Resim

Tarih

Dergi Başlığı

Dergi ISSN

Cilt Başlığı

Yayıncı

IEEE

Erişim Hakkı

info:eu-repo/semantics/closedAccess

Özet

The amount of documentation which increasing in a proportional manner with the increasing pace of technological development result the need for successful classification methods to categorize them to facilitate accessibility. In addition to printed documents, hundreds of thousands of texts are published on digital media every day, creating problems such as incorrect or anonymous transfer of text writers in a dirty information complex. In this study, for the solution of the author recognition problem, the features extracted by applying the Tf-Idf weighting method for word 1-3-ngrams and character 2-6-ngrams were combined and represented in vector space. Ridge Regression is trained for each author, and each trained model is provided with a predictive value on the test data set. The result with the highest value is then determined as the final estimate.This model, which was established in Hurriyet and Sabah national newspapers, has been trained in 100 different opinion columns of 237 different writers in the last 20 years and has been tested on a test set consisting of 20 different opinion columns for each author.This model, which has a accuracy of 89.6%, performed better than the best results in the literature on the same dataset.

Açıklama

27th Signal Processing and Communications Applications Conference (SIU) -- APR 24-26, 2019 -- Sivas Cumhuriyet Univ, Sivas, TURKEY

Anahtar Kelimeler

ridge regression, author recognition, tf-idf, natural language processing

Kaynak

2019 27th Signal Processing and Communications Applications Conference (Siu)

WoS Q Değeri

Scopus Q Değeri

Cilt

Sayı

Künye

Onay

İnceleme

Ekleyen

Referans Veren