Türkçe Metinler Üzerine Yapılan Sayısal Üslup Araştırmalarını İnceleyen ve Benim Adım Kırmızı Çevirilerinin Aslına Olan Sadakatini Ölçen Bir Çalışma

Bu makalede bilişimin beşerî bilimlerdeki önemli bir uygulaması olan sayısal üslup analizi yönteminin tanıtılması hedeflenmiş ve çevirilerin aslına sadakatini ölçen özgün bir araştırma sunulmuştur. Sayısal üslup analizi, bilgi ve belge yönetiminde çeşitli sınıflama işlemlerini gerçekleştiren ve edebiyat araştırmalarında yakın okuma sırasında görülmesi mümkün olmayan gözlemleri sağlayan yaklaşımlardan oluşmaktadır. Makalede, öncelikle Türkçe metinler üzerinde çalışmak isteyen araştırmacılar için, üslup analizinin Türkçeye nasıl uyarlanacağı anlatılmış ve bu konuda Türkçe metinler üzerinde yapılan çalışmaları inceleyen kapsamlı bir kaynak taraması sunulmuştur. Üslup analizinin uygulama amaçları örneklerle incelenmiş, ön işleme ve öznitelik çıkarımı, sınıflandırma yaklaşımları, başarı düzeyi değerlendirmesi ve yardımcı bilişim araçları konularına yer verilmiştir. Orhan Pamuk’un Benim Adım Kırmızı isimli romanı ve çevirilerindeki üslup uyumuna ilişkin sunulan özgün araştırma, roman kahramanlarının temel bileşenler düzlemindeki dağılımlarını inceleyen yeni bir yaklaşım kullanmaktadır. İstatistiksel olarak kayda değer olan gözlemler yazar üslubunun çevirilerde korunduğunu gösteren niteliktedir.

A Survey of Stylometry Research on Turkish Texts and A Study on Quantification of Loyalty for Translations of My Name is Red

In this article an important problem of digital humanities, stylometry, is introduced and a novel study on quantification of translation loyalty is presented. Stylometry involves approaches that perform various classification tasks in information and document management and provides observations in literary analyses that cannot be obtained by close reading. A comprehensive survey ofrelated studies andways of adapting them to Turkish are presented forresearchers who want to work on Turkish texts. In this context, the purpose of stylistic analysis, pre-processing, feature extraction and classification approaches, performance measures and available software tools are provided. Our new study on Orhan Pamuk's novel My Name is Red quantifies the consistency of translations with the original work and uses a new approach that examines the distributions of novel protagonists on the principal components analysis plane. Statistically significant observations show that the writer style is preserved in translations.

___

  • Abbasi, A., Chen, H. ve Salem, A. (2008). Sentiment analysis in multiple languages: Feature selection for opinion classification in web forums. ACM Transactions on Information Systems. 26(3), 1-34. Erişim adresi: https://doi.org/10.1145/1361684.1361685
  • Afroz, S., Çalışkan İslam, A., Stolerman, A., Greenstadt, R. ve McCoy, D. (2014). Doppelganger finder: Taking stylometry to the underground. İçinde Proceedings - IEEE Symposium on Security and Privacy, 212-226. Erişim adresi: https://doi.org/10.1109/SP.2014.21
  • Agün, H. V., Yılmazel, S. ve Yılmazel, O. (2017). Effects oflanguage processing in Turkish authorship attribution. 2017 IEEE International Conference on Big Data (Big Data), (1), 1876­ 1881. Erişim adresi: https://doi.org/10.1109/BigData.2017.8258132
  • Akın, A. A. ve Akın, M. D. (2007). Zemberek, an open source NLP framework for Turkic languages. Structure, 10, 1-5. Erişim adresi: https://doi.org/10.1.1.556.69
  • Altıntaş, K., Can, F. ve Patton, J. M. (2007). Language change quantification using time-separated parallel translations. Literary and Linguistic Computing, 22(4), 375-393. Erişim adresi: https://doi.org/10.1093/llc/fqm026
  • Amasyalı, M. F. ve Diri, B. (2006). Automatic Turkish text categorization in terms of author, genre and gender. Natural Language Processing and Information Systems, Proceedings, 3999, 221-226. Erişim adresi: https://doi.org/10.1007/11765448_22
  • Aslantürk, O. (2014). Tamgacı: artırımsal ve geri beslemeli Türkçe yazar çözümleme (Doktora tezi). Hacettepe Üniversitesi, Bilgisayar Mühendisliği Bölümü.
  • Aslantürk, O., Sezer, E. A., Sever, H. ve Raghavan, V. (2010). Application of cascading rough setbased classifiers on authorship attribution. Proceedings - 2010 IEEE International Conference on Granular Computing, GrC 2010, 656-660. Erişim adresi: https://doi.org/10.1109/GrC.2010.110
  • Atay, O. (2001). Bir Bilim Adamının Romanı Mustafa İnan. İstanbul: İletişim Yayınları.
  • Baker, M. (2000). Towards a methodology for investigating the style of a literary translator. Target, 12(2), 241-266
  • Bay, Y. ve Çelebi, E. (2016). Feature selection for enhanced author identification of Turkish text. İçinde The 30th International Symposium on Computer and Information Sciences, 2015, 371-379. Erişim adresi: https://doi.org/10.1007/978-3-319-22635-4_34
  • Bergsma, S., Post, M. ve Yarowsky, D. (2012). Stylometric analysis of scientific articles. Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 327-337. Erişim adresi: www.clsp.jhu.edu/
  • Bozkurt, İ. N., Bağlıoglu, Ö. ve Uyar, E. (2012). Authorship attribution. 22nd International International Symposium on Computer and Information Sciences, 1-5. Erişim adresi: https://doi.org/10.1109/ISCIS.2007.4456854
  • Burrows, J. (2002). “Delta”: A measure of stylistic difference and a guide to likely authorship. Literary and Linguistic Computing, 17(3), 267-287. Erişim adresi: https://doi.org/10.1093/llc/17.3.267
  • Can, E. F. Can, F. Şahin, P. D. ve Kalpaklı, M. (2013). Automatic categorization ofOttoman poems. Glottotheory International Journal of Theoretical Linguistics, 4(2), 40-57. Erişim adresi: https://doi.org/10.1524/glot.2013.0014
  • Can, F. (2018). İnce Memedlerin sayılarında gizlenenler ve Türkçenin değişimi üzerine. Yazarından alınmış, yayımlanacak metin.
  • Can, F., Can, E. F. ve Karbeyaz, C. (2011). Translation relationship quantification: A cluster-based approach and its application to Shakespeare's sonnets. Proceedings of the 25th International Symposium on Computer and Information Sciences (ISCIS'10) içinde, London, UK, September 22-24, 2010. Lecture Notes in Electrical Engineering 62, DOI 10.1007/978-90-481-9794-1_25, pp. 117-120
  • Can, F. ve Patton, J. M. (2004). Change of writing style with time. Computers and the Humanities, 38(1), 61-82. Erişim adresi: https://doi.org/10.1023/B:CHUM.0000009225.28847.77
  • Can, F. ve Patton, J. M. (2010). Change of word characteristics in 20th-century Turkish literature: A statistical analysis. Journal of Quantitative Linguistics. Erişim adresi: https://doi.org/10.1080/09296174.2010.485444
  • Canbay, P., Sever, H. ve Sezer, E. A. (2018). Determining of discriminative blog size for authorship attribution on the Turkish texts. 2018 6th International Symposium on Digital Forensic and Security (ISDFS), (May), 1-5. Erişim adresi: https://doi.org/10.1109/ISDFS.2018.8355373
  • Canbay, P., Sezer, E. A. ve Sever, H. (2018). Authorship modelling approach for authorship verification on the Turkish texts. 2018 26th Signal Processing and Communications Applications Conference (SIU), İzmir, 1-4. Erişim adresi: 10.1109/SIU.2018.8404436
  • Çakır, M. U. ve Güldamlasıoğlu, S. (2016). Text mining analysis in Turkish language using big data tools. 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC), 614-618. Erişim adresi: https://doi.org/10.1109/COMPSAC.2016.203
  • Çöltekin, C. (2014). A set of open source tools for Turkish natural language processing. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), 1079-1086.
  • Dalkılıç, G., ve Çebi, Y. (2003). Türkçe külliyat oluşturulması ve Türkçe metinlerde kullanılan kelimelerin uzunluk dağılımlarının belirlenmesi. DEÜ Mühendislik Fakültesi Fen ve Mühendislik Dergisi, 5(1), 1-7.
  • Demirci, S. (2014). Emotion analysis on Turkish tweets (Yüksek lisans tezi). Ortadoğu Teknik Üniversitesi, Bilgisayar Mühendisliği Bölümü. Erişim adresi: http://etd.lib.metu.edu.tr/upload/12618821/index.pdf
  • de Morgan, S. E. (1882). Memoir of Augustus de Morgan by his wife Sophia Elizabeth de Morgan with selections from his letters. London: Longmans, Green, and Co.
  • Diederich, J., Kindermann, J., Leopold, E. ve Paass, G. (2003). Authorship attribution with support vector machines. Applied Intelligence, 19(1/2), 109-123. Erişim adresi: https://doi.org/10.1023/A:1023824908771
  • Diri, B. ve Amasyalı, M. F. (2003). Automatic author detection for Turkish texts. Artificial Neural Networks and Neural Information, 1. Erişim adresi: http://www.yildiz.edu.tr/~diri/ICANN.pdf
  • Eder, M., Rybicki, J. ve Kestemont, M. (2016). Stylometry withR: Apackage for computational text analysis. R Journal, 8(1): 107-21. Erişim adresi: https://journal.r-project.org/archive/2016/RJ-2016-007/index.html
  • Eryiğit, G. (2014). İTÜ Turkish NLP web service. İçinde Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Gothenburg, Sweden: Association for Computational Linguistics. Erişim adresi: http://web.itu.edu.tr/gulsenc/papers/itunlp.pdf
  • Mosteller, F. ve Wallace, D. (1964). Inference and disputed authorship: The Federalist. Reading, Mass: Addison-Wesley.
  • Gonçalves, T. ve Quaresma, P. (2007). Evaluating preprocessing techniques in a text classification problem. Sao Leopoldo, RS, Brasil: SBC-Sociedade Brasileira de Computaçao, 841-850. Erişim adresi: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.131.1271verep=rep1vetype=pdf
  • Grieve, J. (2007). Quantitative authorship attribution: An evaluation of techniques. Literary and Linguistic Computing, 22(3), 251-270. Erişim adresi: https://doi.org/10.1093/llc/fqm020
  • Guyon, I. ve Elisseeff, A. (2006). Feature extraction, foundations and applications: An introduction to feature extraction. Studies in Fuzziness and Soft Computing, 207, 1-25. Erişim adresi: https://doi.org/10.1007/978-3-540-35488-8_1
  • El-Fiqi, H., Petraki, E. ve Abbass, H. A. (2016). Pairwise comparative classification for translator stylometric analysis. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16(1), 1-26. Erişim adresi: https://doi.org/10.1145/2898997
  • Holmes, D. I. (1992). A stylometric analysis of Mormon scripture and related texts. Journal of the Royal Statistical Society. Series A (Statistics in Society), 155(1), 91-120.
  • Holmes, D. I. (1997). Stylometry, its origins, development and aspirations. İçinde ACH-ALLC '97 Joint International Conference of the Association for Computers and the Humanities and the Association for Literary and Linguistic Computing. Kingston, Ontario, Canada, June 37, 1997.
  • Holmes, D. I. (1998). The evolution of sylometry in humanities scholarship. Literary andLinguistic Computing, 13(3), 111-117. Erişim adresi: https://doi.org/10.1093/llc/13.3.111
  • Hurtado, J., Taweewitchakreeya, N. ve Zhu, X. (2014). Who wrote this paper? Learning for authorship de-identification using stylometric featuress. Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration, IEEE IRI 2014, 859-862. Erişim adresi: https://doi.org/10.1109/IRI.2014.7051981
  • IBM Corp. Released. (2017). IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY: IBM Corp.
  • İşi, A., Çemrek, F. ve Yıldız, Z. (2013). İstatistikten edebiyata bir köprü: Stilometri analizi. Uşak Üniversitesi SosyalBilimler Dergisi, 6(3), 271-271. Erişim adresi: https://doi.org/10.12780/UUSBD170
  • İnan, M. (1987). Dil ve Matematik. Prof. Dr. Mustafa İnan: Konferansları, makaleleri ve konuşmaları. İstanbul: İTÜ.
  • Jolliffe, I. T. (2002). Principal component analysis. Springer.
  • Juola, P. (2008). Author attribution, foundations and trends. Information Retrieval, 1(3), 233-334.
  • Jovic, A., Brkic, K. ve Bogunovic, N. (2015). A review of feature selection methods with applications. İçinde 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (ss. 1200-1205). IEEE. Erişim adresi: https://doi.org/10.1109/MIPRO.2015.7160458
  • Kacmarcik, G. ve Gamon, M. (2006). Obfuscating document stylometry to preserve author anonymity. Proceedings of the COLING/ACL on conference poster sessions -, (July), 444-451. Erişim adresi: https://doi.org/10.3115/1273073.1273131
  • Karbeyaz, C. (2011). A cluster-based externalplagiarism andparallel corpora detection method. (Yüksek lisans tezi). Bilkent Üniversitesi, Bilgisayar Mühendisliği Bölümü.
  • Koppel, M., Schler, J. ve Argamon, S. (2009). Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology, 60(1), 9-26. Erişim adresi: https://doi.org/10.1002/ASI.V60:1
  • Koppel, M. ve Winter, Y. (2014). Determining if two documents are written by the same author. Journal of the American Society for Information Science and Technology, 65(1), 178-187. Erişim adresi: https://doi.org/10.1002/asi.22954
  • Küçükyılmaz, T., Cambazoğlu, B. B., Aykanat, C. ve Can, F. (2008). Chat mining: Predicting user and message attributes in computer-mediated communication. Information Processing and Management, 44(4), 1448-1466. Erişim adresi: https://doi.org/10.1016/J.IPM.2007.12.009
  • Meyer, S. ve Stein, B. (2006). LNCS 3936 - Intrinsic plagiarism detection. Advances, 565-569. Erişim adresi: https://doi.org/10.1007/11735106
  • Moretti, F. (2013). Distant Reading. London: Verso.
  • Neal, T., Sundararajan, K., Fatima, A., Yan, Y., Xiang, Y. ve Woodard, D. (2017). Surveying stylometry techniques and applications. ACM Computing Surveys, 50(6), 1-36. Erişim adresi: https://doi.org/10.1145/3132039
  • Nguyen, D. P. (2014). Obfuscation techniques for Java source code. Erişim adresi: https://dr.ntu.edu.sg/handle/10220/26030
  • Oakes, M. P. (2009). Corpus linguistics and stylometry. İçinde Corpus Linguistics: An International Handbook. Erişim adresi: https://doi.org/10.1515/9783110213881.2.1070
  • Oflazer, K. (2014). Turkish and its challenges for language processing. Language Resources and Evaluation, 48(4), 639-653. Erişim adresi: https://doi.org/10.1007/s10579-014-9267-2
  • Pamuk, O. (1998). Benim Adım Kırmızı. İstanbul: İletişim Yayınları.
  • Pamuk, O. (2002). My Name is Red (E. M. Göknar, Çeviri). New York : Vintage International.
  • Pamuk, O. (2002). Mon nom est Rouge (G. Authier, Çeviri). Gallimard.
  • Pamuk, O. (2006). Me Ilamo Rojo (R. Carpintero, Çeviri). Buenos Aires: Alfaguara.
  • Patton, J. M. ve Can. F. (2012). Determining translation invariant characteristics ofJames Joyce's Dubliners. Quantitative Methods in CorpusBased Translation Studies (M. P. Oakes and M. Ji Ed.) içinde. Amsterdam and Philadelphia:John Benjamin Publishing Company
  • Patton, J. M. ve Can, F. (2004). A stylometric analysis ofYaşar Kemal's İnce Memed tetralogy. Computers andHumanities, 38(4), 457-467. Erişim adresi: https://doi.org/10.1007/s10579-004-1906-6
  • Peersman, C., Daelemans, W. ve Vaerenbergh, L. Van. (2011). Predicting age and gender in online social networks. International Conference on Information andKnowledge Management, Proceedings, 37-44. Erişim adresi: https://doi.org/10.1145/2065023.2065035
  • R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Erişim adresi: http://www.R-project.org/.
  • Reddy, T. R., Vardhan, B. V. ve Reddy, P. V. (2016). A survey on authorship profiling techniques. International Journal of Applied Engineering Research, 11(5), 3092-3102.
  • Saygılı, N. Ş., Amghar, T., Levrat, B. ve Acarman, T. Taking advantage ofTurkish characteristic features to achieve authorship attribution problems for Turkish. 2017 25th Signal Processing and Communications Applications Conference (SIU), Antalya, 1-4. Erişim adresi: 10.1109/SIU.2017.7960438
  • Schulz, K. (2011). The mechanic muse - what is distant reading? - The New York Times. Erişim adresi: https://www.nytimes.com/2011/06/26/books/review/the-mechanic-muse-what-is-distant-reading.html
  • Sokolova, M. ve Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management, 45, 427-437. Erişim adresi: https://doi.org/10.1016/j.ipm.2009.03.002
  • Srividhya, V. ve Anitha, R. (2010). Evaluating preprocessing techniques in text categorization. International Journal of Computer Science and Application, 47(11), 49-51.
  • Stamatatos, E. (2007). Author identification using imbalanced and limited training texts. 18th International Conference on Database and Expert Systems Applications (DEXA 2007) içinde (ss. 237-241). IEEE. Erişim adresi: https://doi.org/10.1109/DEXA.2007.5
  • Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 60(3), 538-556. Erişim adresi: https://doi.org/10.1002/asi.21001
  • Taş, T. ve Görür, A. K. (2007). Author identification forTurkishtexts. Çankaya Üniversitesi Fen-Edebiyat Fakültesi, 7, 151-161. Erişim adresi: http://www.arastirmax.com/system/files/dergiler/64470/makaleler/7/1/arastirmax_11081_pp_151-161.pdf
  • Taşçı, H. ve Ekinci, E. (2012). Character level authorship attribution for Turkish text documents. The Online Journal of Science and Technology, 2(3), 12-16.
  • Tennyson, M. F. (2013). A replicated comparative study of source code authorship attribution. Proceedings - 2013 3rdInternational Workshop on Replication in Empirical Software Engineering Research, RESER 2013, içinde (ss. 76-83). Erişim adresi: https://doi.org/10.1109/RESER.2013.12
  • Toraman, C., Can, F, ve Koçberber, S. (2011). Developing a text categorization template for Turkish news portals. INISTA 2011 - 2011 International Symposium on INnovations in Intelligent SysTems and Applications, 379-383. Erişim adresi: https://doi.org/10.1109/INISTA.2011.5946096
  • Torunoğlu, D., Çakırman, E., Ganiz, M. C., Akyokuş, S. ve Gürbüz, M. Z. (2011). Analysis of preprocessing methods on classification of Turkish texts. INISTA 2011 - 2011 International Symposium on INnovations in Intelligent SysTems and Applications içinde (ss. 112-117). Erişim adresi: https://doi.org/10.1109/INISTA.2011.5946084
  • Tunalı, V. ve Bilgin, T. (2012). PRETO: A high-performance text mining tool for preprocessing Turkish texts. Proceedings of the 13th International Conference, 134-140. Erişim adresi: https://doi.org/10.1145/2383276.2383297
  • Tunalı, V. ve Bilgin, T. T. (2012). Examining the impact ofstemming on clustering Turkish texts. INISTA 2012 - International Symposium on Innovations in Intelligent SysTems and Applications içinde (ss. 2-5). Erişim adresi: https://doi.org/10.1109/INISTA.2012.6246966
  • Türkoğlu, F. (2006). Melez yaklaşımlarla Türkçe dokümanlarda yazar tanıma. (Yüksek lisans tezi).Yıldız Teknik Üniversitesi, Bilgisayar Mühendisliği Bölümü.
  • Türkoğlu, F., Diri, B. ve Amasyalı, M. F. (2007). Author attribution ofTurkish texts by feature mining. Advanced Intelligent Computing Theories, 1086-1093. Erişim adresi: https://doi.org/10.1007/978-3-540-74171-8_110
  • Tweedie, F. J., Singh, S. ve Holmes, D. I. (1996). Neural network applications in stylometry: The Federalist Papers. Language Resources and Evaluation, 30(1), 1-10. Erişim adresi: https://doi.org/10.1007/BF00054024
  • Verhoeven, B., Company, J. S. ve Daelemans, W. (2014). Evaluating content-independent features for personality recognition. Proceedings of the 2014 ACM Multi Media on Workshop on Computational Personality Recognition - WCPR '14, 7-10. Erişim adresi: https://doi.org/10.1145/2659522.2659527
  • Verhoeven, B., Skrjanec, I. ve Pollak, S. (2017). Gender profiling for Slovene twitter communication: The influence of gender marking, content and style. Proceedings of the 6th Workshop on BaltoSlavic Natural Language Processing içinde (ss. 119-125). Valencia, Spain. Erişim adresi: https://doi.org/10.18653/v1/W17-1418
  • Yavanoğlu, O. (2016). Intelligent authorship identification with using Turkish newspapers metadata. İçinde Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016 , 1895­ 1900. https://doi.org/10.1109/BigData.2016.7840809
  • Zheng, R., Li, J., Chen, H. ve Huang, Z. (2006). A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology, 57(3), 378-393. Erişim adresi: https://doi.org/10.1002/asi.20316
  • Zheng, R., Qin, Y., Huang, Z. ve Chen, H. (2003). Authorship analysis in cybercrime investigation. Lecture Notes in Computer Science 2665, 59-73. Erişim adresi: https://doi.org/10.1007/3-540-44853-5_5
Türk Kütüphaneciliği-Cover
  • Başlangıç: 1952
  • Yayıncı: Türk Kütüphaneciler Derneği