Model-based autoencoders for imputing discrete single-cell RNA-seq data
Document Type
Article
Publication Date
8-1-2021
Abstract
Deep neural networks have been widely applied for missing data imputation. However, most existing studies have been focused on imputing continuous data, while discrete data imputation is under-explored. Discrete data is common in real world, especially in research areas of bioinformatics, genetics, and biochemistry. In particular, large amounts of recent genomic data are discrete count data generated from single-cell RNA sequencing (scRNA-seq) technology. Most scRNA-seq studies produce a discrete matrix with prevailing ‘false’ zero count observations (missing values). To make downstream analyses more effective, imputation, which recovers the missing values, is often conducted as the first step in pre-processing scRNA-seq data. In this paper, we propose a novel Zero-Inflated Negative Binomial (ZINB) model-based autoencoder for imputing discrete scRNA-seq data. The novelties of our method are twofold. First, in addition to optimizing the ZINB likelihood, we propose to explicitly model the dropout events that cause missing values by using the Gumbel-Softmax distribution. Second, the zero-inflated reconstruction is further optimized with respect to the raw count matrix. Extensive experiments on simulation datasets demonstrate that the zero-inflated reconstruction significantly improves imputation accuracy. Real data experiments show that the proposed imputation can enhance separating different cell types and improve the accuracy of differential expression analysis.
Identifier
85092251117 (Scopus)
Publication Title
Methods
External Full Text Location
https://doi.org/10.1016/j.ymeth.2020.09.010
e-ISSN
10959130
ISSN
10462023
PubMed ID
32971193
First Page
112
Last Page
119
Volume
192
Grant
CIE160021
Fund Ref
National Science Foundation
Recommended Citation
Tian, Tian; Min, Martin Renqiang; and Wei, Zhi, "Model-based autoencoders for imputing discrete single-cell RNA-seq data" (2021). Faculty Publications. 3912.
https://digitalcommons.njit.edu/fac_pubs/3912