Machine learning based prediction of gliomas with germline mutations obtained from whole exome sequences from TCGA and 1000 Genomes Project
Document Type
Conference Proceeding
Publication Date
10-1-2019
Abstract
Germline variants can be early useful predictors of cancer risk. Here we present cross-study validation and cross-validation of two brain cancers: Gliobastoma Multiforme (GBM) and Lower Grade Glioma (LGG). We obtained whole exome germline sequences of European ancestry individuals with these cancers from The Cancer Genome Atlas and of European ancestry control individuals from the 1000 Genomes Project. We performed a rigorous quality controlled GATK procedure to obtain variants with which we perform cross-study and crossvalidation experiments. We find our germline variants to be highly predictive of both cancers in cross-study as well as in crossvalidation. Predicting LGG+controls from GBM+controls gives an 89% accuracy and predicting vice versa is 88% accurate both with the linear support vector machine classifier. We find that the main bulk of accuracy comes from the SNP rs10792053 that lies on gene OR9G1. We see that this SNP is in Hardy Weinberg equilibrium and allele frequencies similar to previously published in controls but not so in our cases. Our manual inspection of alignments reveals nothing unusual in the cases. We find our other top ranked SNPs to lie in genes known to be connected to brain cancer and cancer in general. Our study here shows a highly discriminative germline SNP for GBM and LGG cancer but requires replication studies to further verify.
Identifier
85078258051 (Scopus)
ISBN
[9781728100036]
Publication Title
2019 3rd International Conference on Intelligent Computing in Data Sciences ICDS 2019
External Full Text Location
https://doi.org/10.1109/ICDS47004.2019.8942246
Fund Ref
Nanjing Institute of Technology
Recommended Citation
Aljouie, Abdulrhman; Schatz, Michael; and Roshan, Usman, "Machine learning based prediction of gliomas with germline mutations obtained from whole exome sequences from TCGA and 1000 Genomes Project" (2019). Faculty Publications. 7314.
https://digitalcommons.njit.edu/fac_pubs/7314
