Document Type
Dissertation
Date of Award
12-31-2019
Degree Name
Doctor of Philosophy in Computing Sciences - (Ph.D.)
Department
Computer Science
First Advisor
Usman W. Roshan
Second Advisor
Ioannis Koutis
Third Advisor
Michael Christopher Schatz
Fourth Advisor
Zhi Wei
Fifth Advisor
Chase Qishi Wu
Abstract
Accurate cancer risk and survival time prediction are important problems in personalized medicine, where disease diagnosis and prognosis are tuned to individuals based on their genetic material. Cancer risk prediction provides an informed decision about making regular screening that helps to detect disease at the early stage and therefore increases the probability of successful treatments. Cancer risk prediction is a challenging problem. Lifestyle, environment, family history, and genetic predisposition are some factors that influence the disease onset. Cancer risk prediction based on predisposing genetic variants has been studied extensively. Most studies have examined the predictive ability of variants in known mutated genes for specific cancers. However, previous studies have not explored the predictive ability of collective genomic variants from whole-exome sequencing data. It is crucial to train a model in one study and predict another related independent study to ensure that the predictive model generalizes to other datasets. Survival time prediction allows patients and physicians to evaluate the treatment feasibility and helps chart health treatment plans. Many studies have concluded that clinicians are inaccurate and often optimistic in predicting patients’ survival time; therefore, the need increases for automated survival time prediction from genomic and medical imaging data.
For cancer risk prediction, this dissertation explores the effectiveness of ranking genomic variants in whole-exome sequencing data with univariate features selection
methods on the predictive capability of machine learning classifiers. The dissertation performs cross-study in chronic lymphocytic leukemia, glioma, and kidney cancers that show that the top-ranked variants achieve better accuracy than the whole genomic variants.
For survival time prediction, many studies have devised 3D convolutional neural networks (CNNs) to improve the accuracy of structural magnetic resonance imaging (MRI) volumes to classify glioma patients into survival categories. This dissertation proposes a new multi-path convolutional neural network with SNP and demographic features to predict glioblastoma survival groups with a one-year threshold that improves upon existing machine learning methods. The dissertation also proposes a multi-path neural network system to predict glioblastoma survival categories with a 14-year threshold from a heterogeneous combination of genomic variations, messenger ribonucleic acid (RNA) expressions, 3D post-contrast T1 MRI volumes, and 2D post-contrast T1 MRI modality scans that show the malignancy. In 10-fold cross-validation, the mean 10-fold accuracy of the proposed network with handpicked 2D MRI slices (that manifest the tumor), mRNA expressions, and SNPs slightly improves upon each data source individually.
Recommended Citation
Aljouie, Abdulrhman Fahad M, "Cancer risk prediction with whole exome sequencing and machine learning" (2019). Dissertations. 1428.
https://digitalcommons.njit.edu/dissertations/1428