Tıp Veri Kümesi için Gizli Dirichlet Ayrımı

Bilimsel çalışmalarda ilgili alandaki literatürün incelenmesi oldukça önemli bir aşamadır. Literatür insan tarafından tarandığında, geniş kapsamlı bir inceleme yapılması mümkün olamamakta, ya da böyle bir arama çok uzun zaman almaktadır. Öte yandan literatürün otomatik olarak taranması derinlemesine bir anlamsal analizi mümkün kılmamaktadır. Bu çalışma kapsamında Türkiye’deki araştırmacılar tarafından yayınlanmış tıp makalelerinin otomatik ve anlamsal analizini gerçekleştiren bir konu modelleme yöntemi olan Gizli Dirichlet Ayrımı (GDA) uygulanmıştır. Deneysel çalışma, yıllara göre bir tıp veritabanı olan PubMed’den elde edilen son 11 (on bir) yıldaki yayınla tıp literatüründeki makaleler üzerinde gerçeklenmiştir. Deneysel sonuçlar incelendiğinde, son 11 (on bir) yılda trend olan çalışma başlıklarının başarılı bir şekilde keşfedildiği gözlenmiştir.

Latent Dirichlet Allocation for Medical Dataset

Examination of the literature in the relevant field is a very important stage in scientific studies. When the literature is reviewed manually, it is not possible to perform a comprehensive review or such a search takes a very long time. On the other hand, the automatic search of the literature does not enable in-depth semantic search. In this study, a topic modelling method Latent Dirichlet Allocation (LDA), that performs the automatic and semantic analysis of medical articles published by researchers in Turkey, is applied. The experimental study was carried out on articles in the medical literature in the last 11 (eleven) years from PubMed, which is a medical database based on years. When the experimental results are analyzed, it has been observed that the titles, which have trend in the last 11 (eleven) years, have been discovered successfully.

___

  • Blei, D.M., Ng, A.Y. 2003. Latent dirichlet allocation, The Journal of Machine Learning Research, Cilt. 3, s. 993-1022.
  • Agrawal, A., Fu, W., Menzies, T. 2018. What is wrong with topic modeling? And how to fix it using search-based software engineering, Information and Software Technology, Cilt. 98, s. 74-88. DOI: 10.1016/j.infsof.2018.02.005
  • Holzinger, A., Dehmer, M., Jurisica, I. 2014. Knowledge discovery and interactive data mining in bioinformatics: State-of-the-art, future challenges and research directions, BMC Bioinformatics, Cilt. 15, s. 1-9. DOI: 10.1186/1471-2105-15-S6-I1
  • Lin, J.M., Bohland, J.V., Andrews, P., Burns, G.A.P.C., Allen, C.B., Mitra, P.P. 2008. An Analysis of the Abstracts Presented at the Annual Meetings of the Society for Neuroscience from 2001 to 2006, PLoS One, Cilt. 3, s. 1-9. DOI: 10.1371/journal.pone.0002052
  • Bundschus, M., Tresp, V., Kriegel, H.P. 2009. Topic Models for Semantically Annotated Document Collections. NIPS Workshop: Applications for Topic Models: Text and Beyond, 7-10 Aralık, Whistler, 1-4.
  • Jiang, Z., Zhou, X., Zhang, X., Chen, S. 2012. Using Link Topic Model to Analyze Traditional Chinese Medicine Clinical Symptom-Herb Regularities. 2012 IEEE 14th International Conference on e-Health Networking, Applications and Services (Healthcom), 10-13 Ekim, Pekin, 15-18.
  • Redfield, C.K., Lou, X., Karaletsos, T., Crosbie, C., Gardos, S., Artz, D., Ratsch, G. 2013. An empirical analysis of topic modeling for mining cancer clinical notes. 2013 IEEE 13th International Conference on Data Mining Workshops, 7-10 Aralık, Dallas, 56-63.
  • Cui, M., Liang, Y., Li, Y., Guan, R. 2015. Exploring Trends of Cancer Research Based on Topic Model. 1st International Workshop on Semantic Technologies, 9-12 Mart, Changchun, 7-18.
  • Beykikhoshk, A., Arandjelović, O., Venkatesh, S., Phung, D. 2015. Hierarchical Dirichlet Process for Tracking Complex Topical Structure Evolution and Its Application to Autism Research Literature. ss 550-562. Cao, T., Lim, E.P., Zhou, Z.H., Ho, T.B., Cheung, D., Motoda, H., ed. 2015. Advances in Knowledge Discovery and Data Mining, Springer Cham, Switzerland, 763s.
  • Song, M., Heo, G.E., Lee, D. 2015. Identifying the landscape of Alzheimer’s disease research with network and content analysis, Scientometrics, Cilt. 102, s. 905-927. DOI: 10.1007/s11192-014-1372-x
  • van Altena, A.J., Moerland, P.D., Zwinderman, A.H., Olabarriaga, S.D. 2016. Understanding big data themes from scientific biomedical literature through topic modeling, Journal of Big Data, Cilt. 3, s. 1-21. DOI: 10.1186/s40537-016-0057-0
  • Speier, W., Ong, M.K., Arnold, C.W. 2016. Using phrases and document metadata to improve topic modeling of clinical reports, Journal of Biomedical Informatics, Cilt. 61C, s. 260-266. DOI: 10.1016/j.jbi.2016.04.005
  • Hahn, A., Mohanty, S.D., Manda, P. 2017. What’s Hot and What’s Not? - Exploring Trends in Bioinformatics Literature Using Topic Modeling and Keyword Analysis. ss 279-290. Cai, Z., Daescu, O., Li, M., ed. 2017. Bioinformatics Research and Applications, Springer Cham, Switzerland, 499s.
  • Drosatos, G., Kavvadias, S.E., Kaldoudi, E. 2018. Topics and Trends Analysis in eHealth Literature. ss 563-566. Eskola, H., Väisänen, O., Viik, J., Hyttinen, J., ed. 2018. IFMBE Proceedings, Springer Singapore, Singapore, 1139s.
  • Steyvers, M., Griffiths, T. 2007. Probabilistic Topic Models. ss 1-15. Landauer, T., McNamara, D.S., Dennis, S., Kintsch, W., ed. 2007. Handbook of Latent Semantic Analysis: A Road to Meaning, Psychology Press, USA, 544s.
  • Lu, Y., Mei, Q., Zhai, C.X. 2011. Investigating task performance of probabilistic topic models: an empirical study of PLSA and LDA, Information Retrieval, Cilt. 14, s. 178-203. DOI: 10.1007/s10791-010-9141-9
  • Blei, D.M. 2012. Probabilistic Topic Models, Communucations of the ACM, Cilt. 55, s. 77-84. DOI: 10.1145/2133806.2133826
  • Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harsman, R. 1990. Indexing by Latent Semantic Analysis, Journal of the American Society for Information Science, Cilt. 41, s. 391-407. DOI: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  • Hoffman, T. 1999. Probabilistic Latent Semantic Analysis. Fifteenth Conference on Uncertainty in Artificial Intelligence, 20 Temmuz-1 Ağustos, Stockholm, 289-296.
  • Popescul, A., Ungar, L., Pennock, D., Lawrence, S. 2001. Probabilistic models for unified collaborative and content-based recommendation in sparse-data environments. 17th Conference in Uncertainty in Artificial Intelligence, 2-5 Ağustos, Washington, 437-444.
  • Ng, K.W., Tian, G.L., Tang, M.L. 2011. Dirichlet and Related Distributions: Theory, Methods and Applications. Wiley, New York, 337s.
  • Ekinci, E., İlhan Omurca, S. 2017. Ürün Özelliklerinin Konu Modelleme Yöntemi ile Çıkarılması, Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, Cilt. 9, s. 51-58.
  • Griffiths, T.L., Steyvers, M. 2004. Finding Scientific Topics, Proceedings of the National Academy of Sciences of the United States of America, Cilt. 101, s. 5228-5235. DOI: 10.1073/pnas.0307752101
  • Mimno, D., Wallach, H.M., Talley, E., Leenders, M., McCallum, A. 2011. Optimizing semantic coherence in topic models. Conference on empirical methods in natural language processing, 27-31 Temmuz, Edinburgh, 262-272.
  • Newman, D., Lau, J.H., Grieser, K., Baldwin, T., McCallum, A. 2010. Automatic evaluation of topic coherence. The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, 2-4 Haziran, California, 100-108.
  • Atıcı, B., İlhan Omurca, S., Ekinci, E. 2017. Kullanıcı Şikayetlerindeki Ürün Özelliklerinin Gizli Dirichlet Ayırımı ile Saptanması. 2017 Uluslararası Bilgisayar Bilimleri ve Mühendisliği Konferansı, 5-8 Ekim, Antalya, 250-254.
Dokuz Eylül Üniversitesi Mühendislik Fakültesi Fen ve Mühendislik Dergisi-Cover
  • ISSN: 1302-9304
  • Yayın Aralığı: Yılda 3 Sayı
  • Başlangıç: 1999
  • Yayıncı: Dokuz Eylül Üniversitesi Mühendislik Fakültesi