Turkish synonym identi cation from multiple resources: monolingual corpus,mono/bilingual online dictionaries, and WordNe
Turkish synonym identi cation from multiple resources: monolingual corpus,mono/bilingual online dictionaries, and WordNe
In this study, a model is proposed to determine synonymy by incorporating several resources. The modelextracts the features from monolingual online dictionaries, a bilingual online dictionary, WordNet, and a monolingualTurkish corpus. Once it has built a candidate list, it determines the synonymy for a given word by means of thosefeatures. All these resources and the approaches are evaluated. Taking all features into account and applying machinelearning algorithms, the model shows good performance of F-measure with 81.4%. The study contributes to the literatureby integrating several resources and attempting the rst corpus-driven synonym detection system for Turkish.
___
- [1]Lyons J. Introduction to Theoretical Linguistics. Cambridge, UK: Cambridge University Press, 1968.
- [2]Kiyota Y, Kurohashi S, Kido F. Dialog navigator": a question answering system based on large text knowledgebase. In: COLING 2002 19th International Conference on Computational Linguistics; 24 August{1 September 2002;Taipei, Taiwan. pp. 1-7.
- [3]Riezler S, Liu Y, Vasserman A. Translating queries into snippets for improved query expansion. In: ACL 2008 22ndInternational Conference on Computational Linguistics; 18{22 August 2008; Manchester, UK. pp. 734-744.
- [4]Inkpen DZ. A statistical model for near-synonym choice. ACM Transactions on Speech and Language Processing2007; 4: 1-17.
- [5]Lin D. Automatic retrieval and clustering of similar words. In: COLING-ACL'98 36th Annual Meeting of theAssociation for Computational Linguistics; 10{14 August 1998; Montreal, Canada. pp. 768-774.
- [6]Barzilay R, Elhadad M. Using lexical chains for text summarization. In: ACL/EACL'97 Workshop on IntelligentScalable Text Summarization; 11 July 1997; Madrid, Spain. pp. 10-17.
- [7]Inkpen DZ, Hirst G. Near-synonym choice in natural language generation. In: Nicolov N, Bontcheva K, Angelova G,Mitkov R, editors. Recent Advances in Natural Language Processing. Amsterdam, the Netherlands: John Benjamins,2003. pp. 141-152.
- [8]Mirkin S, Dagan I, Geffet M. Integrating pattern-based and distributional similarity methods for lexical entailmentacquisition. In: COLING ACL'06 21st International Conference on Computational Linguistics and 44th AnnualMeeting of the Association for Computational Linguistics; 17{21 July 2006; Sydney, Australia. pp. 579-568.
- [9]Harris Z. Distributional structure. Word 1954; 10: 146-162.
- [10]Crouch CJ, Yang B. Experiments in automatic statistical thesaurus construction. In: ACM/SIGIR 1992 15th AnnualInternational ACM/SIGIR Conference on Research and Development in Information Retrieval; 21{24 June 1992;Copenhagen, Denmark. New York, NY, USA: ACM. pp. 77-88.
- [11]Grefenstette G. Explorations in Automatic Thesaurus Discovery. Norwell, MA, USA: Kluwer Academic Press, 1994.
- [12]Van der Plas L, Bouma G. Syntactic contexts for nding semantically related words. In: Computational Linguisticsin the Netherlands; 17 December 2004; Utrecht, the Netherlands. pp. 173-186.
- [13]Barzilay R, McKeown K. Extracting paraphrases from a parallel corpus. In: ACL 2001 39th Annual Meeting and10th Conference of the European Chapter, Proceedings of the Conference; 9{11 July 2001; Toulouse, France. pp.50-57.
- [14]_Ibrahim A, Katz B, Lin J. Extracting structural paraphrases from aligned monolingual corpora. In: ACL 2003 2ndInternational Workshop on Paraphrasing: Paraphrase Acquisition and Applications; 11 July 2003; Sapporo, Japan.pp. 57-64.
- [15]Shimohata M, Sumita E. Automatic paraphrasing based on parallel corpus for normalization. In: The 3rd Interna-tional Conference on Language Resources and Evaluation; 20{31 May 2002; Las Palmas, Spain. pp. 453-457.
- [16]Van der Plas L, Tiedemann J. Finding synonyms using automatic word alignment and measures of distributionalsimilarity. In: ACL-COLING'06 21st International Conference on Computational Linguistics and 44th AnnualMeeting of the Association for Computational Linguistics; 17{21 July 2006; Sydney, Australia. pp. 866-873.
- [17]Blondel VD, Sennelart P. Automatic extraction of synonyms in a dictionary. In: SDM 2002 2nd SIAM InternationalConference on Data Mining; 13 April 2002; Arlington, VA, USA. pp. 1-7.
- [18]Wang T, Hirst G. Exploring patterns in dictionary de nitions for synonym extraction. Natural Language Engineering2012; 18: 313-342.
- [19]Lin D, Zhao S, Qin L, Zhou M. Identifying synonyms among distributionally similar words. In: The 18th Interna-tional Joint Conference on Arti cial Intelligence (IJCAI-03); 9{15 August 2003; Acapulco, Mexico. pp. 1492-1493.
- [20]Freitag D, Blume M, Byrnes J, Chow E, Kapadia S, Rohwer R, Wang Z. New experiments in distributionalrepresentations of synonymy. In: ACL 2005 9th Conference on Computational Natural Language Learning; 29{30 June 2005; Ann Arbor, Michigan, USA. pp. 25-32.
- [21]Terra E, Clarke CLA. Frequency estimates for statistical word similarity measures. In: HLT-NAACL 2003 HumanLanguage Technology Conference of the North American Chapter of the Association for Computational Linguistics;27 May{1 June 2003; Edmonton, Canada. pp. 165-172.
- [22]Turney PD. Mining the Web for synonyms: PMI-IR versus LSA on TOEFL. In: The 12th European Conference onMachine Learning (ECML'01) and the 5th European Conference on Principles and Practice of Knowledge Discoveryin Databases (PKDD'01); 3{7 September 2001; Freiburg, Germany. pp. 491-502.
- [23]Turney PD, Littman ML, Bigham J, Shnayder V. Combining independent modules in lexical multiple-choiceproblems. In: Nicolov N, Bontcheva K, Angelova G, Mitkov R, editors. Current Issues in Linguistic Theory.Amsterdam, the Netherlands: John Benjamins, 2003. pp. 101-110.
- [24]Landauer TK, Dumanis ST. A solution to Plato's problem: the latent semantic analysis theory of acquisition,induction, and representation of knowledge. Psychol Rev 1997; 104: 211-240.
- [25]Hagiwara M, Ogawa Y, Toyama K. Selection of effective contextual information for automatic synonym acquisition.In: ACL 2006 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL;17{21 July 2006; Sydney, Australia. pp. 353-360.
- [26]Hagiwara M. A supervised learning approach to automatic synonym identi cation based on distributional fea-tures. In: ACL 2008 46th Annual Meeting of the Association for Computational Linguistics on Human LanguageTechnologies: Student Research Workshop; 15{20 June 2008; Columbus, OH, USA. pp. 1-6.
- [27]Heylen K, Peirsman Y, Geeraerts D, Speelman D. Modelling word similarity: an evaluation of automatic synonymyextraction algorithms. In: The 6th International Language Resources and Evaluation Conference; 28{30 May 2008;Marrakech, Morocco. pp. 3243-3249.
- [28]Yates A, Goharian N, Frieder O. Graded relevance ranking for synonym discovery. In: The 22nd InternationalConference on World Wide Web Companion; 13{17 May 2013; Rio de Janeiro, Brazil. pp. 139-140.
- [29]Bilgin O, CetinogluO, O azer K. Building a WordNet for Turkish. Rom J Inf Sci Tech 2004; 7: 163-172.
- [30]Serbetci A, Orhan Z, Pehlivan_I. Extraction of semantic word relations in Turkish from dictionary de nitions. In:ACL 2011 Workshop on Relational Models of Semantics; 23 June 2011; Portland, OR, USA. pp. 11-18.
- [31]Yazc E, Amasyal MF. Automatic extraction of semantic relationships using Turkish dictionary de nitions. EMOBilimsel Dergi 2011; 1: 1-13.
- [32]Yldz T, Yldrm S, Diri B. An integrated approach to automatic synonym detection in Turkish corpus. Lect NotesComp Sci 2014; 8686: 116-127.
- [33]Sak H, Gungor T, Saraclar M. Turkish language resources: morphological parser, morphological disambiguator andweb corpus. Lect Notes Comp Sci 2008; 5221: 417-427.
- [34]Yldz T, Yldrm S, Diri B. Acquisition of Turkish meronym based on classi cation of patterns. Pattern Anal Appl2016; 19: 495.
- [35]Yldrm S, Yldz T. Automatic extraction of Turkish hypernym/hyponym pairs from large corpus. In:ACL/COLING 2012 24th International Conference on Computational Linguistics; 8{15 December 2012; Bombay,India. pp. 493-500.
- [36]Pedersen T, Patwardhan S, Michelizzi J. WordNet::Similarity-Measuring the relatedness of concepts. In: HLT-NAACL 2004 Human Language Technology Conference of the North American Chapter of the Association forComputational Linguistics; 2{7 May 2004; Boston, MA, USA. pp. 38-41.