Turkish synonym identification from multiple resources: monolingual corpus, mono/bilingual online dictionaries, and WordNet

In this study, a model is proposed to determine synonymy by incorporating several resources. The model extracts the features from monolingual online dictionaries, a bilingual online dictionary, WordNet, and a monolingual Turkish corpus. Once it has built a candidate list, it determines the synonymy for a given word by means of those features. All these resources and the approaches are evaluated. Taking all features into account and applying machine learning algorithms, the model shows good performance of F-measure with 81.4%. The study contributes to the literature by integrating several resources and attempting the first corpus-driven synonym detection system for Turkish.

Keywords:

Synonym, dependency relations, corpus-based statistics,

PDF

Turkish Journal of Electrical Engineering and Computer Science-Cover

ISSN: 1300-0632
Yayın Aralığı: Yılda 6 Sayı
Yayıncı: TÜBİTAK

Arşiv