Bu çalışmada, bilgi erişimi araştırması için tasarlanmış açık kaynak kodlu bir araç olan Lemur kullanılarak, Türkçe dili için hazırlanmış TREC benzeri bir derlem üzerinde otomatik indeksleme ve geri getirme deneyleri gerçekleştirildi. Bilgi erişiminde dil modelleme yaklaşımı başta olmak üzere Lemur tarafından desteklenen üç geri getirme modeli ve dile özgü ön işleme teknikleri araştırıldı. Deneylerimiz, dile özgü ön işleme tekniklerinin tüm geri getirim modelleri için geri getirme performansını artırdığını gösterdi. Ayrıca Türkçe dili için en iyi performans dil modelleme yaklaşımından elde edildi.


We used Lemur Toolkit, an open source toolkit designed for Information Retrieval research, for our automated indexing and retrieval experiments on a TREC-like test collection for Turkish language. We investigate effectiveness of three retrieval models Lemur supports, especially Language modeling approach to Information Retrieval, combined with language specific preprocessing techniques. Our experiments show that language specific preprocessing significantly improves retrieval performance for all retrieval models. Also Language Modeling approach is the best performing retrieval model when language specific preprocessing applied. 


  • Altingovde, I.S., Ozcan, R., Ocalan, H.C., Can, F. and Ulusoy, O. (2007). Large-scale cluster-based retrieval experiments on Turkish texts. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 30th annual
  • And, P.O., Ogilvie, P. and Callan, J. (2002). Experiments Using the Lemur Toolkit. Paper presented at the in Proceedings of the Tenth Text Retrieval Conference (TREC-10).
  • Arslan, A. and Yilmazel, O. (2008). 19-22 Oct. 2008). A comparison of Relational Databases and information retrieval libraries on Turkish text retrieval. Paper presented at the Natural Language Processing and Knowledge Engineering, 2008. Conference on. International
  • Buckley, C. and Voorhees, E.M. (2004). Retrieval evaluation with incomplete information. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 27th annual
  • Can, F., Kocberber, S., Balcik, E., Kaynak, C., Ocalan, H.C., and Vursavas, O.M. (2008). Information retrieval on Turkish texts. J. Am. Soc. Inf. Sci. Technol. 59(3), 407- 421.
  • Cover, T.M. and Thomas, J.A. (1991). Elemets of information theory. Wiley-Interscience.
  • Eryigit, G. and Adali, E. (2004). An Affix Stripping Morphological Analyzer For Turkish. Paper presented at the IASTED International Conference on Artificial Intelligence and Applications, Innsbruck, Austria.
  • Harman, D. (1993). Overview of the first TREC conference. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 16th annual
  • Jones, K.S. (1988). A statistical interpretation of term specificity and its application in retrieval Document retrieval systems (pp. 132-142): Taylor Graham Publishing.
  • Manning, C., Raghavan, P. and Schutze, H. (2008). Introduction to Information Retrieval: Cambridge University Press.
  • Ponte, J.M. and Croft, W.B. (1998). A language modeling approach to information retrieval. Paper presented at the Proceedings international ACM SIGIR conference on Research and development in information retrieval. 21st annual
  • Robertson, S., Walker, S., Hancock-Beaulieu, M., Gull, A. and Lau, M. (1992). Okapi at TREC-3. Paper presented at the Text REtrieval Conference.
  • Salton, G., Wong, A. and Yang, C.S. (1975). A vector space model for automatic indexing. Commun. ACM 18(11), 613- 620.
  • Walker, S., Robertson, S.E., Boughanem, M., Jones, G.J.F. and Jones, S. (1998). Okapi at TREC-6 - Automatic ad hoc, VLC, routing, filtering and QSDR.
  • Zhai, C. Notes on the Lemur TFIDF model, from 1.0/tfidf.ps
  • Zhai, C. and Lafferty, J. (2001). A study of smoothing methods for language models applied to Ad Hoc information retrieval. Paper presented at the Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval.