Derin Öğrenme İle Meme Kanseri Tanısının Doğruluğunun Geliştirilmesi

Meme kanseri, kadınlarda çok sık görülen ve ölümlere neden olan hastalıklardan biridir. Bu çalışmada makina öğrenmesi tekniklerinden biri olan derin öğrenme metodu ile günümüzün büyük bir problemi olan meme kanseri tanısının doğruluğunun geliştirilmesi amacıyla yeni bir yaklaşım ortaya konulmuştur. Tasarlanan yöntemde, literatürde yer alan University of California, Irvine Makina öğrenmesi veri havuzunda bulunan Breast Cancer Wisconsin orjinal veri seti kullanılmıştır. Bu veri setinde 10 adet bağımsız değişken, 1 adet bağımlı değişkenden oluşan 699 veri mevcuttur. Bu verilerden bozuk olan 16 adet veri düzeltilerek tüm veri setinin kullanılması sağlanmıştır. Veri setinin öğrenme süresinin azaltılması amacıyla normalizasyon işlemi yapılmıştır. Kullanılan veri seti, % 80 eğitim için, %10 değerlendirme ve %10 test için ayrılmıştır. Derin öğrenme modeli için bir yapay sinir ağı tasarlanmıştır. Sinir ağı 10 nöronlu giriş katmanı, 1000’er nöronlu 3 adet gizli katman ve 3 nöronlu çıkış katmanı olmak üzere toplam 5 katmandan oluşturulmuştur. Uygulamada geliştirilen yazılım Python programlama dili için, etkileşimli geliştirme ortamı olan Spyder ile kodlanmıştır. Keras sinir ağı API’ si kullanılmıştır. Oluşturulan modelin performansı Confusion Matrix ve ROC (Receiver Operating Characteristic) analizi ile değerlendirilmiştir. Eğitim sonunda elde edilen test verilerine göre gerçekleştirilen modelin başarılı sonuçlar verdiği görülmüştür. Önerilen yöntemin meme kanseri tanısının doğruluğunun geliştirilmesine katkıda bulunacağı değerlendirilmektedir.

Enhancement Of Breast Cancer Diagnosis Accuracy With Deep Learning

Breast cancer is a highly fatal disease that is very prevalent among the female population. In this study, a new type of approach isproposed with the aim of improving the accuracy of breast cancer diagnosis, an important problem of our present time, by means ofdeep learning, one of the techniques in machine learning. In the designed method, the original data set of Breast Cancer Wisconsinbeing available in the Irvine Machine Learning Repository of University of California was used. Within this data set, there were 699data consisting of 10 independent variables and 1 dependent variable. The complete utilization of the entire data set was ensured bycorrection of 16 incorrect data. A normalization process was applied in the data set for the purpose of reducing the time required forlearning process. The used data set was allocated as 80% for training, 10% for validation, and 10% for testing. An artificial neuralnetwork was designed for the deep learning model. The neural network was set up of a total of 5 layers which were an input layerwith 10 neurons, 3 hidden layers with 1000 neurons for each layer, and an output layer with 3 neurons. The software, developed forimplementation was written by using Spyder which is an interactive development environment for Python programming language. Inaddition, Keras neural network API was used. The performance of the model was evaluated with Confusion Matrix and ROC(Receiver Operating Characteristic) analysis. According to the test data obtained at the end of the training, it was observed that theimplemented model provided successful results. It is considered that the proposed method will contribute to the improvement ofbreast cancer diagnosis accuracy.

___

  • [1] https://www.wcrf.org/dietandcancer/cancer-trends/breast-cancer-statistics (accessed 20.05.2019)
  • [2] Montúfar, G.F. (2014). Universal Approximation Depth and Errors of Narrow Belief Networks with Discrete Units, Neural Computation, Vol. 26, issue 7, pp 1386-1407.
  • [3] Schmidhuber, J. (2015). Deep Learning in Neural Networks: An Overview. Neural Networks, Vol. 61, pp.85-117.
  • [4] Goodfellow I, Bengio Y, Courville A. (2016). Deep Learning, MIT Press.
  • [5] Baneriee, C., Paul, S., Ghoshal, M. (2017). A Comparative Study of Different Ensemble Learning Techniques using Wisconsin Breast Cancer dataset, International Conference on Computer, Electrical & Communication Engineering (ICCECE)
  • [6] Ghosh, S., Hossain, J., Fattah S.A., Shahnaz, C. and Khan, A. I. (2017). Efficient approaches for accuracy improvement of breast cancer classification using Wisconsin database, 2017 IEEE Region 10 Humanitarian Technology Conference (R10-HTC), Dhaka, pp. 792-797.
  • [7] Mumtaz, K., Sheriff, S., Duraiswamy, K.(2009).Evaluation of Three Neural Net-work Models using Wisconsin Breast Cancer Database, International Conference on Control, Automation, Communication and Energy Conservation.
  • [8] Ashraf, M., Le, K., Huang, X. (2011). Iterative weighted k-nn for constructing missing feature values in Wisconsin breast cancer dataset. In: 2011 3rd International Conference on Data Mining and Intelligent Information Technology Applications (ICMIA), pp. 23–27
  • [9] Zhang, D., Zou, L., Zhou, X. and He, F.(2018).Integrating feature selection and feature extraction methods with deep learning to predict clinical outcome of breast cancer, IEEE Access, vol. 6, pp. 28936–28944.
  • [10] Wisconsin Breast Cancer original dataset https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+ (original) (accessed 10.05.2019)
  • [11] Mean Imputation technique. https://chrisalbon.com/machine_learning/preprocessing_structured_data/impute_missing_values_with_means/ (accessed 20.05.2019)
  • [12] https://scikit-learn.org/stable/modules/preprocessing.html (accessed 10.05.2019)
  • [13] https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html ((accessed 10.05.2019)
  • [14] Zhang, T., Zheng, H., Zhang L. (2018). Verification CAPTCHA Based on Deep Learning Proceedings of the 37th Chinese Control Conference, Wuhan, China
  • [15] Agarap, A.F. (2019).Deep learning using rectified linear units (relu), [Online]. ArXiv: 1803.08375v2 [cs.NE] 7 Feb 2019
  • [16] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. and Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, 15(1), pp: 1929–1958.
  • [17] Kingma, D. P. and Ba, J. (2015).Adam: A Method for Stochastic Optimization. ArXiv:1412.6980v8 [cs.LG] 23 Jul 2015
  • [18] https://www.jqr.com/article/000505 (accessed 5.05.2019)
  • [19] https://lasagne.readthedocs.io/en/latest/modules/objectives.html#lasagne.objectives.categorical_crossentropy (accessed 15.05.2019)
  • [20] Powers, D. M. W. (2011). Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation, Journal of Machine Learning Technologies, 2 (1), pp.37–63.
  • [21] Ting, K. M. (2011). Encyclopedia of machine learning. Springer.
  • [22] Zweig, M. H. and Campbell, G. (1993) “Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine,” Clin. Chem., vol. 39, no. 4, pp. 561–577.