Author ORCID Identifier

0000-0002-7634-5780

Document Type

Dissertation

Date of Award

12-31-2023

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

Zhi Wei

Second Advisor

Ioannis Koutis

Third Advisor

Wenge Guo

Fourth Advisor

Junwen Wang

Fifth Advisor

Nan Gao

Sixth Advisor

Yao Ma

Abstract

Clustering analysis has been conducted extensively in single-cell RNA sequencing (scRNA-seq) studies. scRNA-seq can profile tens of thousands of genes' activities within a single cell. Thousands or tens of thousands of cells can be captured simultaneously in a typical scRNA-seq experiment. Biologists would like to cluster these cells for exploring and elucidating cell types or subtypes. Numerous methods have been designed for clustering scRNA-seq data. Yet, single-cell technologies develop so fast in the past few years that those existing methods do not catch up with these rapid changes and fail to fully fulfil their potential. For instance, besides profiling transcription expression levels of genes, recent single-cell technologies can capture other auxiliary information at the single-cell level, such as protein expression (multi-omics scRNA-seq) and cells' spatial location information (spatial-resolved scRNA-seq). Most existing clustering methods for scRNA-seq are performed in an unsupervised manner and fail to exploit available side information for optimizing clustering performance.

This dissertation focuses on developing novel computational methods for clustering scRNA-seq data. The basic models are built on a deep autoencoder (AE) framework, which is coupled with a ZINB (zero-inflated negative binomial) loss to characterize the zero-inflated and over-dispersed scRNA-seq count data. To integrate multi-omics scRNA-seq data, a multimodal autoencoder (MAE) is employed. It applies one encoder for the multimodal inputs and two decoders for reconstructing each omics of data. This model is named scMDC (Single-Cell Multi-omics Deep Clustering). Besides, it is expected that cells in spatial proximity tend to be of the same cell types. To exploit cellular spatial information available for spatial-resolved scRNA-seq (sp-scRNA-seq) data, a novel model, DSSC (Deep Spatial-constrained Single-cell Clustering), is developed. DSSC integrates the spatial information of cells into the clustering process by two steps: 1) the spatial information is encoded by using a graphical neural network model; 2) cell-to-cell constraints are built based on the spatially expression pattern of the marker genes and added in the model to guide the clustering process. DSSC is the first model which can utilize the information from both the spatial coordinates and the marker genes to guide the cell/spot clustering. For both scMDC and DSSC, a clustering loss is optimized on the bottleneck layer of autoencoder along with the learning of feature representation. Extensive experiments on both simulated and real datasets demonstrate that scMDC and DSSC boost clustering performance significantly while costing no extra time and space during the training process. These models hold great promise as valuable tools for harnessing the full potential of state-of-the-art single-cell data.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.