Cervical Cancer Prediction Using SMOTE Algorithm and Machine Learning Approaches

Cervical Cancer Prediction Using SMOTE Algorithm and Machine Learning Approaches

Cervical cancer is one of the most successful types of treatment when diagnosed early. In this study, it is aimed to find and classify the disease with data mining methods on the digitized data set obtained as a result of the pap-smear test. Two-stage architecture has been proposed for the diagnosis of cervical cancer. In the first stage of the study, missing data were extracted from the used dataset, and in the second stage, a new dataset was obtained by using the Synthetic Minority Oversampling Technique (SMOTE) algorithm to balance the target classes in the dataset. By applying the majority voting (MV) method to the dataset used in the study, the structure with 4 target variables was reduced to a single target variable. On two data sets, Artificial Neural Network (ANN), Support Vector Machines (SVM), Decision Trees (DT), Random Forest (RF), and K-Nearest Neighbors (KNN) algorithms from data mining methods were used for the diagnosis of cervical cancer. The results obtained from the original dataset and the dataset produced with Smote were compared. ANN is the best method evaluated according to classification success and F-score, and the major voted target variable in the balanced data group produced with the Smote algorithm gave the most successful result. The experimental results showed that the use of MV and SMOTE algorithms together increased the classification success from 93% to 99%.

___

  • Abdullah, A. A., Sabri, N. A., Khairunizam, W., Zunaidi, I., Razlan, Z. M., & Shahriman, A. B. (2019). Development of predictive models for cervical cancer based on gene expression profiling data. In IOP Conference Series: Materials Science and Engineering (Vol. 557, p. 012003). IOP Publishing.
  • Adem, K., Kiliçarslan, S., & Cömert, O. (2019). Classification and diagnosis of cervical cancer with stacked autoencoder and softmax classification. Expert Systems with Applications, 115, 557–564. https://doi.org/10.1016/j.eswa.2018.08.050
  • Akyol, F. B., & Altun, O. (2020). Detection of cervix cancer from pap-smear images. Sakarya University Journal of Computer and Information Sciences, 3(2), 99–111.
  • Al Mudawi, N., & Alazeb, A. (2022). A Model for Predicting Cervical Cancer Using Machine Learning Algorithms. Sensors, 22(11), 4132.
  • Alam, T. M., Khan, M. M. A., Iqbal, M. A., Abdul, W., & Mushtaq, M. (2019, October 23). Cervical Cancer Prediction through Different Screening Methods Using Data Mining. SSRN Scholarly Paper, Rochester, NY. Retrieved from https://papers.ssrn.com/abstract=3474371
  • Ali, M. M., Ahmed, K., Bui, F. M., Paul, B. K., Ibrahim, S. M., Quinn, J. M. W., & Moni, M. A. (2021). Machine learning-based statistical analysis for early stage detection of cervical cancer. Computers in Biology and Medicine, 139, 104985. https://doi.org/10.1016/j.compbiomed.2021.104985
  • Allehaibi, K. H. S., Nugroho, L. E., Lazuardi, L., Prabuwono, A. S., & Mantoro, T. (2019). Segmentation and classification of cervical cells using deep learning. IEEE Access, 7, 116925–116941.
  • CH, N., Sai, P. P., Madhuri, G., Reddy, K. S., & BharathSimha Reddy, D. V. (2022). Artificial Intelligence based Cervical Cancer Risk Prediction Using M1 Algorithms. In 2022 International Conference on Emerging Smart Computing and Informatics (ESCI) (pp. 1–6). Presented at the 2022 International Conference on Emerging Smart Computing and Informatics (ESCI). https://doi.org/10.1109/ESCI53509.2022.9758241
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357.
  • Chen, W., Shen, W., Gao, L., & Li, X. (2022). Hybrid Loss-Constrained Lightweight Convolutional Neural Networks for Cervical Cell Classification. Sensors, 22(9), 3272. https://doi.org/10.3390/s22093272
  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297. Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, (4), 325–327.
  • Elen, A., Baş, S., & Közkurt, C. (2022). An Adaptive Gaussian Kernel for Support Vector Machine. Arabian Journal for Science and Engineering. https://doi.org/10.1007/s13369-022-06654-3
  • Eyüpoğlu, C. (2020). Korelasyon Temelli Özellik Seçimi, Genetik Arama ve Rastgele Ormanlar Tekniklerine Dayanan Yeni Bir Rahim Ağzı Kanseri Teşhis Yöntemi. Avrupa Bilim ve Teknoloji Dergisi, (19), 263–271.
  • Fernandes, K., Cardoso, J. S., & Fernandes, J. (2017). Transfer learning with partial observability applied to cervical cancer screening. In Iberian conference on pattern recognition and image analysis (pp. 243–250). Springer.
  • Gan, D., Shen, J., An, B., Xu, M., & Liu, N. (2020). Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis. Computers & Industrial Engineering, 140, 106266.
  • Güre, M. D. P., Karataş, M., & Başcıllar, M. (2022). “HPV Aşısı Haktır”: Halk Sağlığı Sosyal Hizmeti Perspektifinden HPV İle İlgili Tweetlerin Analizi. Toplum ve Sosyal Hizmet, 33(3), 955–973.
  • He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284.
  • Hu, F., & Li, H. (2013). A novel boundary oversampling algorithm based on neighborhood rough set model: NRSBoundary-SMOTE. Mathematical Problems in Engineering, 2013.
  • Ilyas, Q. M., & Ahmad, M. (2021). An enhanced ensemble diagnosis of cervical cancer: a pursuit of machine intelligence towards sustainable health. IEEE Access, 9, 12374–12388.
  • Islam, A.-U.-, Ripon, S. H., & Bhuiyan, N. Q. (2019). Cervical Cancer Risk Factors: Classification and Mining Associations. APTIKOM Journal on Computer Science and Information Technologies, 4(1), 8–18.
  • Karani, H., Gangurde, A., Dhumal, G., Gautam, W., Hiran, S., & Marathe, A. (2022). Comparison of Performance of Machine Learning Algorithms for Cervical Cancer Classification. In 2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT) (pp. 1–7). IEEE.
  • Kartal, E., & Özen, Z. (2017). Dengesiz veri setlerinde sınıflandırma. Mühendislikte Yapay Zekâ ve Uygulamaları, 1st ed., O. Torkul, S. Gülseçen, Y. Uyaroğlu, G. Çağıl, and MK Uçar, Eds. Sakarya: Sakarya Üniversitesi Kütüphanesi Yayınevi, 109, 131.
  • Khanam, F. (2021). Prediction of cervical cancer in Bangladesh using hybrid machine learning algorithms. Retrieved from http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6030
  • Kotsiantis, S. B. (2013). Decision trees: a recent overview. Artificial Intelligence Review, 39(4), 261–283.
  • Lam, L., & Suen, S. Y. (1997). Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 27(5), 553–568.
  • Liu, X.-Y., Wu, J., & Zhou, Z.-H. (2008). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 539–550.
  • Mitchell, T., Buchanan, B., DeJong, G., Dietterich, T., Rosenbloom, P., & Waibel, A. (1990). Machine learning. Annual review of computer science, 4(1), 417–433.
  • Nithya, B., & Ilango, V. (2019). Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction. SN Applied Sciences, 1(6), 1–16.
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Dubourg, V. (2011). “ Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, p.
  • Ratul, I. J., Al-Monsur, A., Tabassum, B., Ar-Rafi, A. M., Nishat, M. M., & Faisal, F. (2022). Early risk prediction of cervical cancer: A machine learning approach. In 2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) (pp. 1–4). Presented at the 2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). https://doi.org/10.1109/ECTI-CON54298.2022.9795429
  • Sharma, K. K., & Seal, A. (2021). Multi-view spectral clustering for uncertain objects. Information Sciences, 547, 723–745.
  • Sharma, K. K., & Seal, A. (2021). Outlier-robust multi-view clustering for uncertain data. Knowledge-Based Systems, 211, 106567
  • Suman, S. K., & Hooda, N. (2019). Predicting risk of Cervical Cancer: A case study of machine learning. Journal of Statistics and Management Systems, 22(4), 689–696.
  • Tanimu, J. J., Hamada, M., Hassan, M., Kakudi, H., & Abiodun, J. O. (2022). A Machine Learning Method for Classification of Cervical Cancer. Electronics, 11(3), 463. https://doi.org/10.3390/electronics11030463
  • Yang, W., Gou, X., Xu, T., Yi, X., & Jiang, M. (2019). Cervical cancer risk prediction model and analysis of risk factors based on machine learning. In Proceedings of the 2019 11th International Conference on Bioinformatics and Biomedical Technology (pp. 50–54).
  • Zhang, L., Zhu, Y., Song, Y., Han, Y., Sun, D., Qin, S., & Gao, Y. (2021). Intelligent Diagnosis of Cervical Cancer Based on Data Mining Algorithm. Computational and Mathematical Methods in Medicine, 2021.
Iğdır Üniversitesi Fen Bilimleri Enstitüsü Dergisi-Cover
  • ISSN: 2146-0574
  • Yayın Aralığı: Yılda 4 Sayı
  • Başlangıç: 2011
  • Yayıncı: -
Sayıdaki Diğer Makaleler

Bazı Bitkisel Çözelti Kombinasyon Uygulamalarının Soğuk Stresi altındaki Beyaz Lahanaların Büyümesine (Brassica oleracea var. Alba) Etkisi

Mehmet Selim ÇOBANOĞLU

‘Chandler’ Ceviz (Juglans regia L.) Çeşidinin Fenolojik ve Pomolojik Özellikleri Üzerine Farklı Çöğür Anaçlarının Etkisi

Mevlüt Batuhan KOŞAR, Dilan AHI KOŞAR, Eküle SÖNMEZ, Umran ERTÜRK

Manyetik Levitasyon Sistemleri için Doğrusal Olmayan Kontrol Yöntemlerinin Geliştirilmesi

Mustafa BULUT, Ahmet DUMLU

Taş Mastik Asfaltlarda Agrega ve Bitümlü Bağlayıcıların Süzülme Yüzdesine Etkisinin Araştırılması

Halis Bahadır KASİL, Muhammed Yasin ÇODUR

Phytochemicals of Hibiscus sabdariffa with Therapeutic Potential against SARS-CoV-2: A Molecular Docking Study

Emel AKBABA, Deniz KARATAŞ

Effects of Cd and Zn Treatments on Leaf Chemıcal Compounds of Japanese barberry (Berberis thunbergii), Boxwood (Buxus sempervirens var. rotundifolia), and Gold tassel (Euonymus japonica var. aurea) Species

Erkan GENÇ, Nezahat TURFAN

Bazı Endemik Linum (Linaceae) Türlerinin Histo-Anatomik Özellikleri

Sibel ULCAY

Bazı Azo Boyalarının QSAR Yöntemi ve Daphnia Magna ile Akut Toksisite Testi ile İncelenmesi

Yasemin KEŞKEK KARABULUT, Yelda YALÇIN GÜRKAN

Achillea millefolium (Civanperçemi) Bitkisinin Uçucu Yağlarının Antimikrobiyal ve Antifungal Etkinliğinin Değerlendirilmesi

Alper ZÖNGÜR

Anticancer Effects of Alpha-lipoic Acid on A172 and U373 Human Glioblastoma Cells

Doğukan MUTLU, Mücahit SEÇME, Şevki ARSLAN