Üstverinin Tam-Metin Bilgi Erişim Performansı Üzerindeki Etkisi: Küçük Ölçekli Türkçe Külliyat Üzerinde Deneysel Bir Araştırma

Bilgi kurumları üstveri, tam-metin veya hem üstveri hem de tam-metin melez içerikleri depolamak, dizinlemek ve eriştirmek için metin tabanlı bilgi erişim sistemleri kullanmaktadır. Araştırmanın amacı, bu içeriklerin bilgi erişim performansı üzerindeki etkisini değerlendirmektir. Bu amaçla, küçük ölçekli bir Türkçe külliyat için varsayılan Lucene bilgi erişim modelini kullanan üstveri ÜBES , tam-metin TBES ve melez MBES içerik bilgi erişim sistemleri geliştirilmiştir. Bu üç sistemin performansını değerlendirmek için "duyarlılık - anma" ve "normalize sıralama" testleri yapılmıştır. Deneysel bulgular, ÜBES ve TBES arasında ortalama duyarlılık performansında anlamlı bir fark olmadığını göstermiştir. Diğer taraftan, MBES’in ortalama duyarlılık performansı ÜBES ve TBES’ten anlamlı olarak yüksektir. Bilgi erişim performansı kullanıcı-merkezli olarak değerlendirildiğinde, ÜBES ve MBES’in normalize sıralama performansları TBES’e göre anlamlı olarak yüksektir. Ayrıca, üç bilgi erişim sisteminin eriştiği ilgili doküman ortalamaları arasında anlamlı bir farka ulaşılamamıştır. Bilgi erişim sistemlerinde üstveri ve tam-metin gibi faklı türlerdeki içeriklerin işlenmesinde terim yönetimi bakımından bazı avantajlar ve dezavantajlar bulunmaktadır. Melez içerik işleme MBES , avantajları bir araya getirmiş ve bilgi erişim performansını artırmıştır.

Anahtar Kelimeler:

Bilgi erişim, dizinleme, otomatik dizinleme, üstveri, performans değerlendirme, Türk Kütüphaneciliği

Impact of Metadata on Full-text Information Retrieval Performance: An Experimental Research on a Small Scale Turkish Corpus

Information institutions use text-based information retrieval systems to store, index and retrieve metadata, full-text, or both metadata and full-text hybrid contents. The aim of thisresearch was to evaluate impact of these contents on information retrieval performance. For this purpose, metadata MIR , full-text FIR and hybrid HIR content information retrieval systems were developed with default Lucene information retrieval model for a small scale Turkish corpus. In orderto evaluate performance of thisthree systems, “precision -recall” and “normalized recall” tests were conducted. Experimental findings showed that there were no significant differences between MIR and FIR in mean average precision MAP performance. On the other hand, MAP performance of HIR was significantly higher in comparison to MIR and FIR. When information retrieval performance was evaluated as user-centered, the “normalized recall” performances of MIR and HIR were significantly higher than FIR. Additionally, there were no significant differences between the systemsin retrieved relevant document means. Processing different types of contents such as metadata and full-text had some advantages and disadvantages for information retrieval systems in terms of term management. The advantages brought together in hybrid content processing HIR and information retrieval performance improved.

Keywords: