Turkish synonym identi cation from multiple resources: monolingual corpus,mono/bilingual online dictionaries, and WordNe

Turkish synonym identi cation from multiple resources: monolingual corpus,mono/bilingual online dictionaries, and WordNe

In this study, a model is proposed to determine synonymy by incorporating several resources. The modelextracts the features from monolingual online dictionaries, a bilingual online dictionary, WordNet, and a monolingualTurkish corpus. Once it has built a candidate list, it determines the synonymy for a given word by means of thosefeatures. All these resources and the approaches are evaluated. Taking all features into account and applying machinelearning algorithms, the model shows good performance of F-measure with 81.4%. The study contributes to the literatureby integrating several resources and attempting the rst corpus-driven synonym detection system for Turkish.

___

  • [1]Lyons J. Introduction to Theoretical Linguistics. Cambridge, UK: Cambridge University Press, 1968.
  • [2]Kiyota Y, Kurohashi S, Kido F. Dialog navigator": a question answering system based on large text knowledgebase. In: COLING 2002 19th International Conference on Computational Linguistics; 24 August{1 September 2002;Taipei, Taiwan. pp. 1-7.
  • [3]Riezler S, Liu Y, Vasserman A. Translating queries into snippets for improved query expansion. In: ACL 2008 22ndInternational Conference on Computational Linguistics; 18{22 August 2008; Manchester, UK. pp. 734-744.
  • [4]Inkpen DZ. A statistical model for near-synonym choice. ACM Transactions on Speech and Language Processing2007; 4: 1-17.
  • [5]Lin D. Automatic retrieval and clustering of similar words. In: COLING-ACL'98 36th Annual Meeting of theAssociation for Computational Linguistics; 10{14 August 1998; Montreal, Canada. pp. 768-774.
  • [6]Barzilay R, Elhadad M. Using lexical chains for text summarization. In: ACL/EACL'97 Workshop on IntelligentScalable Text Summarization; 11 July 1997; Madrid, Spain. pp. 10-17.
  • [7]Inkpen DZ, Hirst G. Near-synonym choice in natural language generation. In: Nicolov N, Bontcheva K, Angelova G,Mitkov R, editors. Recent Advances in Natural Language Processing. Amsterdam, the Netherlands: John Benjamins,2003. pp. 141-152.
  • [8]Mirkin S, Dagan I, Geffet M. Integrating pattern-based and distributional similarity methods for lexical entailmentacquisition. In: COLING ACL'06 21st International Conference on Computational Linguistics and 44th AnnualMeeting of the Association for Computational Linguistics; 17{21 July 2006; Sydney, Australia. pp. 579-568.
  • [9]Harris Z. Distributional structure. Word 1954; 10: 146-162.
  • [10]Crouch CJ, Yang B. Experiments in automatic statistical thesaurus construction. In: ACM/SIGIR 1992 15th AnnualInternational ACM/SIGIR Conference on Research and Development in Information Retrieval; 21{24 June 1992;Copenhagen, Denmark. New York, NY, USA: ACM. pp. 77-88.
  • [11]Grefenstette G. Explorations in Automatic Thesaurus Discovery. Norwell, MA, USA: Kluwer Academic Press, 1994.
  • [12]Van der Plas L, Bouma G. Syntactic contexts for nding semantically related words. In: Computational Linguisticsin the Netherlands; 17 December 2004; Utrecht, the Netherlands. pp. 173-186.
  • [13]Barzilay R, McKeown K. Extracting paraphrases from a parallel corpus. In: ACL 2001 39th Annual Meeting and10th Conference of the European Chapter, Proceedings of the Conference; 9{11 July 2001; Toulouse, France. pp.50-57.
  • [14]_Ibrahim A, Katz B, Lin J. Extracting structural paraphrases from aligned monolingual corpora. In: ACL 2003 2ndInternational Workshop on Paraphrasing: Paraphrase Acquisition and Applications; 11 July 2003; Sapporo, Japan.pp. 57-64.
  • [15]Shimohata M, Sumita E. Automatic paraphrasing based on parallel corpus for normalization. In: The 3rd Interna-tional Conference on Language Resources and Evaluation; 20{31 May 2002; Las Palmas, Spain. pp. 453-457.
  • [16]Van der Plas L, Tiedemann J. Finding synonyms using automatic word alignment and measures of distributionalsimilarity. In: ACL-COLING'06 21st International Conference on Computational Linguistics and 44th AnnualMeeting of the Association for Computational Linguistics; 17{21 July 2006; Sydney, Australia. pp. 866-873.
  • [17]Blondel VD, Sennelart P. Automatic extraction of synonyms in a dictionary. In: SDM 2002 2nd SIAM InternationalConference on Data Mining; 13 April 2002; Arlington, VA, USA. pp. 1-7.
  • [18]Wang T, Hirst G. Exploring patterns in dictionary de nitions for synonym extraction. Natural Language Engineering2012; 18: 313-342.
  • [19]Lin D, Zhao S, Qin L, Zhou M. Identifying synonyms among distributionally similar words. In: The 18th Interna-tional Joint Conference on Arti cial Intelligence (IJCAI-03); 9{15 August 2003; Acapulco, Mexico. pp. 1492-1493.
  • [20]Freitag D, Blume M, Byrnes J, Chow E, Kapadia S, Rohwer R, Wang Z. New experiments in distributionalrepresentations of synonymy. In: ACL 2005 9th Conference on Computational Natural Language Learning; 29{30 June 2005; Ann Arbor, Michigan, USA. pp. 25-32.
  • [21]Terra E, Clarke CLA. Frequency estimates for statistical word similarity measures. In: HLT-NAACL 2003 HumanLanguage Technology Conference of the North American Chapter of the Association for Computational Linguistics;27 May{1 June 2003; Edmonton, Canada. pp. 165-172.
  • [22]Turney PD. Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: The 12th European Conference onMachine Learning (ECML'01) and the 5th European Conference on Principles and Practice of Knowledge Discoveryin Databases (PKDD'01); 3{7 September 2001; Freiburg, Germany. pp. 491-502.
  • [23]Turney PD, Littman ML, Bigham J, Shnayder V. Combining independent modules in lexical multiple-choiceproblems. In: Nicolov N, Bontcheva K, Angelova G, Mitkov R, editors. Current Issues in Linguistic Theory.Amsterdam, the Netherlands: John Benjamins, 2003. pp. 101-110.
  • [24]Landauer TK, Dumanis ST. A solution to Plato's problem: the latent semantic analysis theory of acquisition,induction, and representation of knowledge. Psychol Rev 1997; 104: 211-240.
  • [25]Hagiwara M, Ogawa Y, Toyama K. Selection of effective contextual information for automatic synonym acquisition.In: ACL 2006 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL;17{21 July 2006; Sydney, Australia. pp. 353-360.
  • [26]Hagiwara M. A supervised learning approach to automatic synonym identi cation based on distributional fea-tures. In: ACL 2008 46th Annual Meeting of the Association for Computational Linguistics on Human LanguageTechnologies: Student Research Workshop; 15{20 June 2008; Columbus, OH, USA. pp. 1-6.
  • [27]Heylen K, Peirsman Y, Geeraerts D, Speelman D. Modelling word similarity: an evaluation of automatic synonymyextraction algorithms. In: The 6th International Language Resources and Evaluation Conference; 28{30 May 2008;Marrakech, Morocco. pp. 3243-3249.
  • [28]Yates A, Goharian N, Frieder O. Graded relevance ranking for synonym discovery. In: The 22nd InternationalConference on World Wide Web Companion; 13{17 May 2013; Rio de Janeiro, Brazil. pp. 139-140.
  • [29]Bilgin O, CetinogluO, O azer K. Building a WordNet for Turkish. Rom J Inf Sci Tech 2004; 7: 163-172.
  • [30]Serbetci A, Orhan Z, Pehlivan_I. Extraction of semantic word relations in Turkish from dictionary de nitions. In:ACL 2011 Workshop on Relational Models of Semantics; 23 June 2011; Portland, OR, USA. pp. 11-18.
  • [31]Yazc E, Amasyal MF. Automatic extraction of semantic relationships using Turkish dictionary de nitions. EMOBilimsel Dergi 2011; 1: 1-13.
  • [32]Yldz T, Yldrm S, Diri B. An integrated approach to automatic synonym detection in Turkish corpus. Lect NotesComp Sci 2014; 8686: 116-127.
  • [33]Sak H, Gungor T, Saraclar M. Turkish language resources: morphological parser, morphological disambiguator andweb corpus. Lect Notes Comp Sci 2008; 5221: 417-427.
  • [34]Yldz T, Yldrm S, Diri B. Acquisition of Turkish meronym based on classi cation of patterns. Pattern Anal Appl2016; 19: 495.
  • [35]Yldrm S, Yldz T. Automatic extraction of Turkish hypernym/hyponym pairs from large corpus. In:ACL/COLING 2012 24th International Conference on Computational Linguistics; 8{15 December 2012; Bombay,India. pp. 493-500.
  • [36]Pedersen T, Patwardhan S, Michelizzi J. WordNet::Similarity-Measuring the relatedness of concepts. In: HLT-NAACL 2004 Human Language Technology Conference of the North American Chapter of the Association forComputational Linguistics; 2{7 May 2004; Boston, MA, USA. pp. 38-41.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: 6
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

A self-tuning NeuroFuzzy feedback linearization-based damping control strategyfor multiple HVDC links

Saghir AHMAD, Laiq KHAN

Dual broadband antenna with compact double ring radiators for IEEE 802.11 ac/b/g/n WLAN communication applications

Merih PALANDÖKEN

Comparative study for identi cation of multiple alarms in telecommunicationnetworks

Atila YILMAZ

A methodology to include real-life failure data in the failure rate estimation of power distribution systems

Kazem MAZLUMI, Mojtaba GILVANEJAD, Hossein ASKARIAN ABYANEH

A combined approach based on fuzzy AHP and fuzzy inference system to rankreviewers in online communities

Hossein ABBASIMEHR, MohammadJafar TAROKH

A compact wideband printed antenna for free-space radiometric detection of partial discharge

Ian GLOVER, Yong ZHANG, Pavlos LAZARIDIS, Raed ABD-ALHAMEED

Design and realization of a novel planar array antenna and low power LNA for Ku-band small satellite communications

Hasan Bülent YAĞCI, Osman CEYLAN, Selçuk PAKER, Lida KOUHALVANDI

A 3D simulation of the formation of primary platelet thrombi based on a hybrid computational model

Chaoqing MA, Oubong GWUN

Effect of micro/nano-SiO 2 on mechanical, thermal, and electrical properties of silicone rubber, epoxy, and EPDM composites for outdoor electrical insulations

Muhammad IQBAL, Muhammad ALI, Muhammad AMIN, Muhammad YASIN, Hidayatullah KHAN

Universal and stable medical image generation for tissue segmentation (The unistable method)

Ihab ELAFF, Ali EL-KEMANY, Mohamed KHOLIF