A PERSONALIZED ONCOLOGY MOBILE APPLICATION INTEGRATING CLINICAL AND GENOMIC FEATURES TO PREDICT THE RISK STRATIFICATION OF LUNG CANCER PATIENTS VIA MACHINE LEARNING

A PERSONALIZED ONCOLOGY MOBILE APPLICATION INTEGRATING CLINICAL AND GENOMIC FEATURES TO PREDICT THE RISK STRATIFICATION OF LUNG CANCER PATIENTS VIA MACHINE LEARNING

Predicting lung adenocarcinoma (LUAD) and Lung Squamous Cell Carcinoma (LUSC) risk status is a crucial step in precision oncology. In current clinical practice, clinicians, and patients are informed about the patient's risk group only with cancer staging. Several machine learning approaches for stratifying LUAD and LUSC patients have recently been described, however, there has yet to be a study that compares the integrated modeling of clinical and genetic data from these two lung cancer types. In our work, we used a prognostic prediction model based on clinical and somatically altered gene features from 1026 patients to assess the relevance of features based on their impact on risk classification. By integrating the clinical features and somatically mutated genes of patients, we achieved the highest accuracy; 93% for LUAD and 89% for LUSC, respectively. Our second finding is that new prognostic genes such as KEAP1 for LUAD and CSMD3 for LUSC and new clinical factors such as the site of resection are significantly associated with the risk stratification and can be integrated into clinical decision making. We validated the most important features found on an independent RNAseq dataset from NCBI GEO with survival information (GSE81089) and integrated our model into a user-friendly mobile application. Using this machine learning model and mobile application, clinicians and patients can assess the survival risk of their patients using each patient’s own clinical and molecular feature set.

___

  • IARC. “Globocan 2020 - Cancer Today.” Int Agency Res Cancer 2022;
  • DeVita, V. T., Lawrence, T. S., & Rosenberg SA. DeVita, Hellman, and Rosenberg’s cancer: principles & practice of oncology. 10th ed. Lippincott Williams & Wilkins; 2015.
  • Liñares-Blanco J, Pazos A, Fernandez-Lozano C. “Machine learning analysis of TCGA cancer data.” PeerJ Comput Sci 2021;7:e584.
  • Bhargava N, Sharma S, Purohit R, et al. “Prediction of recurrence cancer using J48 algorithm.” 2017 2nd Int Conf Commun Electron Syst 2017;386–390.
  • Baskar S, Shakeel PM, Sridhar KP, et al. “Classification system for lung cancer nodule using machine learning technique and CT images.” 2019 Int Conf Commun Electron Syst 2019;1957–1962.
  • Sherafatian M, Arjmand F. “Decision tree-based classifiers for lung cancer diagnosis and subtyping using TCGA miRNA expression data.” Oncol Lett 2019;18:2125–2131.
  • Jones GD, Brandt WS, Shen R, et al. “A Genomic-Pathologic Annotated Risk Model to Predict Recurrence in Early-Stage Lung Adenocarcinoma.” JAMA Surg 2021;156:e205601.
  • Yang Y, Xu L, Sun L, et al. “Machine learning application in personalised lung cancer recurrence and survivability prediction.” Comput Struct Biotechnol J 2022;20:1811–1820.
  • Liu J, Lichtenberg T, Hoadley KA, et al. “An Integrated TCGA Pan-Cancer Clinical Data Resource to Drive High-Quality Survival Outcome Analytics.” Cell 2018;173:400-416.e11.
  • Zengin T, Önal-Süzek T. “Comprehensive profiling of genomic and transcriptomic differences between risk groups of lung adenocarcinoma and lung squamous cell carcinoma.” J Pers Med 2021;11:154.
  • Djureinovic D, Hallström BM, Horie M, et al. “Profiling cancer testis antigens in non–small-cell lung cancer.” JCI Insight 2019;1:1–18.
  • Yu L, Tao G, Zhu L, et al. “Prediction of pathologic stage in non-small cell lung cancer using machine learning algorithm based on CT image feature analysis.” BMC Cancer 2019;19:1–12.
  • Provost F, Fawcett T. “Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions.” KDD-97 Proc 1997;43–48.
  • Provost F, Fawcett T. “Robust classification for imprecise environments.” Mach Learn 2001;42:203–231.
  • Zhao S, Mao X, Lin H, et al. “Machine Learning Prediction for 50 Anti-Cancer Food Molecules from 968 Anti-Cancer Drugs.” Int J Intell Sci 2020;10:1–8.
  • Ramezan C, Warner T MA. “Evaluation of sampling and cross-validation tuning strategies for regional-scale machine learning classification.” Remote Sens 2019;11:185.
  • Bengio Y, Grandvalet Y. “No unbiased estimator of the variance of k-fold cross-validation.” Adv Neural Inf Process Syst 2003;16:.
  • Van den Eynden J, Fierro AC, Verbeke LPC, et al. “SomInaClust: detection of cancer genes based on somatic mutation patterns of inactivation and clustering.” BMC Bioinformatics 2015;16:1–12.
  • Romero R, Sayin VI, Davidson SM, et al. “Keap1 loss promotes Kras-driven lung cancer and results in dependence on glutaminolysis.” Nat Med 2017;23:1362–1368.
  • Liu P, Morrison C, Wang L, et al. “Identification of somatic mutations in non-small cell lung carcinomas using whole-exome sequencing.” Carcinogenesis 2012;33:1270–1276.
  • Anusewicz D, Orzechowska M, Bednarek AK. “Lung squamous cell carcinoma and lung adenocarcinoma differential gene expression regulation through pathways of Notch, Hedgehog, Wnt, and ErbB signalling.” Sci Rep 2020;10:21128.