Implementation of XGBoost Method for Healthcare Fraud Detection

Implementation of XGBoost Method for Healthcare Fraud Detection

The health care systems are quickly adapting digital health records, which will exponentially increase the quantity of medical data. The systems are generally faced with unsustainable costs and large volume of electronic medical data. Therefore, more efficient research, practices and real-world applications are needed to take advantage of all benefits of medical data. One strategy to cut back on the rising costs is the detection of fraud. In this paper, Xgboost, which is an implementation of gradient boosted decision trees, is employed, along with more traditional supervised algorithms to include Random Forest, k-Nearest neighbor. The List of Excluded Individuals/Entities (LEIE) database, which contains excluded providers information, is used to label as a fraud in the Medicare Part B dataset. Thus, the data has become available for use with supervised methods. According to the experimental results, the XGBoost algorithm outperformed traditional machine learning algorithms in terms of performance.

___

  • Hartman, M., et al. (2022). "National Health Care Spending In 2020: Growth Driven By Federal Spending In Response To The COVID-19 Pandemic: National Health Expenditures study examines US health care spending in 2020." Health Affairs 41(1): 13-25.
  • Loh, H. W., et al. (2022). "Application of Explainable Artificial Intelligence for Healthcare: A Systematic Review of the Last Decade (2011–2022)." Computer Methods and Programs in Biomedicine: 107161.
  • Bauder, R. A. and T. M. Khoshgoftaar (2016). A novel method for fraudulent medicare claims detection from expected payment deviations (application paper). Information Reuse and Integration (IRI), 2016 IEEE 17th International Conference on, IEEE.
  • Bauder, R. A., et al. (2016). Predicting medical provider specialties to detect anomalous insurance claims. Tools with Artificial Intelligence (ICTAI), 2016 IEEE 28th International Conference on, IEEE.
  • Chandola, V., et al. (2013). Knowledge discovery from massive healthcare claims data. Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM.
  • Bauder, R. A. and T. M. Khoshgoftaar (2018). The detection of medicare fraud using machine learning methods with excluded provider labels. The Thirty-First International Flairs Conference.
  • Bauder, R. A. and T. M. Khoshgoftaar (2016). A probabilistic programming approach for outlier detection in healthcare claims. Machine Learning and Applications (ICMLA), 2016 15th IEEE International Conference on, IEEE.
  • Khurjekar, N., et al. (2015). Detection of Fraudulent Claims Using Hierarchical Cluster Analysis. IIE Annual Conference. Proceedings, Institute of Industrial and Systems Engineers (IISE).
  • Herland, M., et al. (2017). Medical provider specialty predictions for the detection of anomalous medicare insurance claims. 2017 IEEE International Conference on Information Reuse and Integration (IRI), IEEE. [10] Herland, M., et al. (2018). "Big Data fraud detection using multiple medicare data sources." Journal of Big Data 5(1): 29.
  • Branting, L. K., et al. (2016). Graph analytics for healthcare fraud risk estimation. Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE Press.
  • Sadiq, S., et al. (2017). Mining Anomalies in Medicare Big Data Using Patient Rule Induction Method. Multimedia Big Data (BigMM), 2017 IEEE Third International Conference on, IEEE.
  • Data, P. (2014). "Medicare Fee-For Service Provider Utilization & Payment Data Physician and Other Supplier Public Use File: A Methodological Overview."
  • Chen, A., et al. (2018). "Characteristics of Physicians Excluded From US Medicare and State Public Insurance Programs for Fraud, Health Crimes, or Unlawful Prescribing of Controlled Substances." JAMA network open 1(8): e185805-e185805.
  • Söğüt, E., et al. (2021). "Detecting Different Types of Distributed Denial of Service Attacks." Gazi University Journal of Science Part C: Design and Technology 9(1): 12-25.
  • Duman, E. A. and Ş. Sağıroğlu (2017). Heath care fraud detection methods and new approaches. 2017 International Conference on Computer Science and Engineering (UBMK), IEEE.
  • Gangavarapu, T., et al. (2020). "Applicability of machine learning in spam and phishing email filtering: review and approaches." Artificial Intelligence Review 53(7): 5019-5081.
  • Quinlan, J. R. (2014). C4. 5: programs for machine learning, Elsevier.
  • Belgiu, M. and L. Drăguţ (2016). "Random forest in remote sensing: A review of applications and future directions." ISPRS journal of photogrammetry and remote sensing 114: 24-31.
  • Jaskowiak, P. A., et al. (2022). "The area under the ROC curve as a measure of clustering quality." Data Mining and Knowledge Discovery 36(3): 1219-1245.