Author ORCID Identifier
0009-0000-6610-4142
Document Type
Dissertation
Date of Award
12-31-2024
Degree Name
Doctor of Philosophy in Computing Sciences - (Ph.D.)
Department
Computer Science
First Advisor
Zhi Wei
Second Advisor
James M. Calvin
Third Advisor
Chase Qishi Wu
Fourth Advisor
Lijing Wang
Fifth Advisor
Gao Zhang
Abstract
While immune therapies achieve remarkable success in treating various cancers, only a subset of patients achieves a durable clinical response, and many exhibit innate or acquired resistance. Precision medicine aims to tailor treatments to individual patients based on specific biological markers, ensuring that each patient receives the therapy most likely to be effective. Predictive biomarkers and gene signatures offer potential for more personalized treatment strategies by identifying patients likely to benefit. Recent studies suggest that gene signatures, comprising sets of genes, hold predictive value for certain clinical variables. Typically derived from biological expert knowledge, these signatures demonstrate substantial predictive potential, though their accuracy remains imperfect. Enhancing this accuracy through advanced machine learning techniques could bridge the gap, improving patient selection and optimizing therapy use.
This dissertation investigates predictive analytics and advanced modeling approaches to improve precision medicine, focusing specifically on enhancing cancer immunotherapy response prediction and drug-drug interaction (DDI) prediction. In cancer immunotherapy response prediction, existing predictive signatures, often based on limited gene sets, frequently lack reproducibility and generalizability. These limitations necessitate further investigation to construct and validate pathway-based signatures. Concentrating on predicting the response of metastatic melanoma to anti-PD1 immunotherapy, this dissertation develops a novel predictive framework to identify pathway-based super signatures using penalized regression models. The pathway-based signature construction involved a computation pipeline, consisting of differential expression gene (DEG) analysis, gene set enrichment analysis (GSEA), filtration of candidate pathways, and training and validation of an Elastic-Net penalized Logistic Regression (ENLR) model. Specifically, RNA sequencing data from public independent cohorts with both pre- and on-treatment tumor samples were utilized. DEG analysis was first conducted to identify genes significantly differentially expressed between pre-treatment and on-treatment samples, followed by GSEA to screen for candidate pathway signatures enriched in responders compared to non-responders. Next, single-sample GSEA (ssGSEA) was used to calculate a score for each of top-ranked pathways by using the leading-edge genes. Using these pathway scores, an ENLR model was fit to identify pathways with the highest predictive accuracy. The proposed predictive framework accurately predicts responses to anti-PD1 therapy with on-treatment data, supporting the use of pathway-based signatures to assess therapeutic responses. For DDI prediction, this dissertation introduces SMG-DDI, a self-supervised, multi-view graph representation learning model. Unlike public hierarchical graph-based models that rely on random splitting datasets, SMG-DDI employs a scaffold data splitting approach, providing a more realistic evaluation of model performance. SMG-DDI integrates both molecular graph representations and interaction network embeddings, leveraging pretrained graph convolution networks for efficient feature extraction. The model employs a Central Moment Discrepancy (CMD) regularizes to minimize distribution discrepancies between the inter-view drug molecular graph and the intra-view drug interaction network graph. SMG-DDI outperforms state-of-the-art methods across multiple public DDI datasets, demonstrating strong generalization capabilities. This model offers a computationally efficient and highly predictive tool for identifying potential DDIs, contributing to safer clinical drug use.
Recommended Citation
Du, Kuang, "Machine learning methods for pattern recognition analysis of genomic and molecular data" (2024). Dissertations. 1804.
https://digitalcommons.njit.edu/dissertations/1804
Included in
Bioinformatics Commons, Biostatistics Commons, Computer Sciences Commons, Immunology and Infectious Disease Commons, Pharmacy and Pharmaceutical Sciences Commons