Author ORCID Identifier

0009-0000-6610-4142

Document Type

Dissertation

Date of Award

12-31-2024

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

Zhi Wei

Second Advisor

James M. Calvin

Third Advisor

Chase Qishi Wu

Fourth Advisor

Lijing Wang

Fifth Advisor

Gao Zhang

Abstract

While immune therapies achieve remarkable success in treating various cancers, only a subset of patients achieves a durable clinical response, and many exhibit innate or acquired resistance. Precision medicine aims to tailor treatments to individual patients based on specific biological markers, ensuring that each patient receives the therapy most likely to be effective. Predictive biomarkers and gene signatures offer potential for more personalized treatment strategies by identifying patients likely to benefit. Recent studies suggest that gene signatures, comprising sets of genes, hold predictive value for certain clinical variables. Typically derived from biological expert knowledge, these signatures demonstrate substantial predictive potential, though their accuracy remains imperfect. Enhancing this accuracy through advanced machine learning techniques could bridge the gap, improving patient selection and optimizing therapy use.

This dissertation investigates predictive analytics and advanced modeling approaches to improve precision medicine, focusing specifically on enhancing cancer immunotherapy response prediction and drug-drug interaction (DDI) prediction. In cancer immunotherapy response prediction, existing predictive signatures, often based on limited gene sets, frequently lack reproducibility and generalizability. These limitations necessitate further investigation to construct and validate pathway-based signatures. Concentrating on predicting the response of metastatic melanoma to anti-PD1 immunotherapy, this dissertation develops a novel predictive framework to identify pathway-based super signatures using penalized regression models. The pathway-based signature construction involved a computation pipeline, consisting of differential expression gene (DEG) analysis, gene set enrichment analysis (GSEA), filtration of candidate pathways, and training and validation of an Elastic-Net penalized Logistic Regression (ENLR) model. Specifically, RNA sequencing data from public independent cohorts with both pre- and on-treatment tumor samples were utilized. DEG analysis was first conducted to identify genes significantly differentially expressed between pre-treatment and on-treatment samples, followed by GSEA to screen for candidate pathway signatures enriched in responders compared to non-responders. Next, single-sample GSEA (ssGSEA) was used to calculate a score for each of top-ranked pathways by using the leading-edge genes. Using these pathway scores, an ENLR model was fit to identify pathways with the highest predictive accuracy. The proposed predictive framework accurately predicts responses to anti-PD1 therapy with on-treatment data, supporting the use of pathway-based signatures to assess therapeutic responses. For DDI prediction, this dissertation introduces SMG-DDI, a self-supervised, multi-view graph representation learning model. Unlike public hierarchical graph-based models that rely on random splitting datasets, SMG-DDI employs a scaffold data splitting approach, providing a more realistic evaluation of model performance. SMG-DDI integrates both molecular graph representations and interaction network embeddings, leveraging pretrained graph convolution networks for efficient feature extraction. The model employs a Central Moment Discrepancy (CMD) regularizes to minimize distribution discrepancies between the inter-view drug molecular graph and the intra-view drug interaction network graph. SMG-DDI outperforms state-of-the-art methods across multiple public DDI datasets, demonstrating strong generalization capabilities. This model offers a computationally efficient and highly predictive tool for identifying potential DDIs, contributing to safer clinical drug use.

Recommended Citation

Du, Kuang, "Machine learning methods for pattern recognition analysis of genomic and molecular data" (2024). Dissertations. 1804.
https://digitalcommons.njit.edu/dissertations/1804

Download

Included in

Bioinformatics Commons, Biostatistics Commons, Computer Sciences Commons, Immunology and Infectious Disease Commons, Pharmacy and Pharmaceutical Sciences Commons

COinS

Dissertations

Machine learning methods for pattern recognition analysis of genomic and molecular data

Author ORCID Identifier

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Dissertations

Machine learning methods for pattern recognition analysis of genomic and molecular data

Author

Author ORCID Identifier

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links