Dissertations

Semi-supervised learning for annotation and representation of single-cell rna sequencing and spatial transcriptomics data

Haoran Liu, New Jersey Institute of Technology

Document Type

Dissertation

Date of Award

12-31-2025

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

Zhi Wei

Second Advisor

Guiling Wang

Third Advisor

Ioannis Koutis

Fourth Advisor

Pan Xu

Fifth Advisor

Nan Gao

Abstract

Semi-supervised learning has emerged as a powerful paradigm for analyzing single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics (ST) data, where full annotation is often costly or impractical. scRNA-seq technologies measure the expression of thousands of genes across tens of thousands of cells, whereas ST additionally captures the spatial coordinates of gene expression within intact tissue sections. Annotation is a key step in both scRNA-seq and ST analysis pipelines, aiming to identify cell types, spatial domains, and latent biological structures. However, most existing annotation approaches rely on separate clustering methods that are typically fully unsupervised and fail to leverage side information available in real-world experiments, such as partial labels or known marker genes. This dissertation focuses on developing semi-supervised learning frameworks that incorporate minimal supervision to enhance the interpretability, robustness, and accuracy of scRNA-seq and ST analyses.

To begin with, a semi-supervised active learning framework is proposed for annotating scRNA-seq data. By iteratively selecting and labeling the most informative cells, the model achieves superior annotation performance with only a few labeled samples compared with conventional unsupervised methods. This framework demonstrates efficient label utilization and strong potential for practical applications in resource-constrained biological studies. For spatial transcriptomics, a method called MGGNN (Marker Gene-Guided Graph Neural Network) is introduced. MGGNN constructs a spatial graph of tissue spots and learns representations through a self-supervised contrastive learning strategy, followed by fine-tuning with a small number of marker gene—derived labels. It achieves state-of-the-art annotation results on both simulated and real ST datasets and maintains robustness under noisy supervision. Furthermore, a model named SCDRL (Semi-Supervised Disentangled Representation Learning) is developed for scRNA-seq data. SCDRL separates latent representations into interpretable components—such as cell type, batch, and biological signature—while isolating residual variation. With only 5% labeled data, SCDRL consistently outperforms existing methods on multiple benchmark datasets by producing disentangled and biologically meaningful latent spaces. Collectively, these methods demonstrate the versatility and effectiveness of semi-supervised learning in scRNA-seq and ST data analysis. They provide scalable and interpretable solutions that bridge domain knowledge with deep learning, offering practical utility for uncovering cellular heterogeneity under realistic, annotation-scarce conditions.

Recommended Citation

Liu, Haoran, "Semi-supervised learning for annotation and representation of single-cell rna sequencing and spatial transcriptomics data" (2025). Dissertations. 1865.
https://digitalcommons.njit.edu/dissertations/1865

Download

Included in

Bioinformatics Commons, Data Science Commons

COinS

Dissertations

Semi-supervised learning for annotation and representation of single-cell rna sequencing and spatial transcriptomics data

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Dissertations

Semi-supervised learning for annotation and representation of single-cell rna sequencing and spatial transcriptomics data

Author

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links