Model-based autoencoders for imputing discrete single-cell RNA-seq data

Document Type

Article

Publication Date

8-1-2021

Abstract

Deep neural networks have been widely applied for missing data imputation. However, most existing studies have been focused on imputing continuous data, while discrete data imputation is under-explored. Discrete data is common in real world, especially in research areas of bioinformatics, genetics, and biochemistry. In particular, large amounts of recent genomic data are discrete count data generated from single-cell RNA sequencing (scRNA-seq) technology. Most scRNA-seq studies produce a discrete matrix with prevailing ‘false’ zero count observations (missing values). To make downstream analyses more effective, imputation, which recovers the missing values, is often conducted as the first step in pre-processing scRNA-seq data. In this paper, we propose a novel Zero-Inflated Negative Binomial (ZINB) model-based autoencoder for imputing discrete scRNA-seq data. The novelties of our method are twofold. First, in addition to optimizing the ZINB likelihood, we propose to explicitly model the dropout events that cause missing values by using the Gumbel-Softmax distribution. Second, the zero-inflated reconstruction is further optimized with respect to the raw count matrix. Extensive experiments on simulation datasets demonstrate that the zero-inflated reconstruction significantly improves imputation accuracy. Real data experiments show that the proposed imputation can enhance separating different cell types and improve the accuracy of differential expression analysis.

Identifier

85092251117 (Scopus)

Publication Title

Methods

External Full Text Location

https://doi.org/10.1016/j.ymeth.2020.09.010

e-ISSN

10959130

ISSN

10462023

PubMed ID

32971193

First Page

112

Last Page

119

Volume

192

Grant

CIE160021

Fund Ref

National Science Foundation

This document is currently not available here.

Share

COinS