Date of Award
Doctor of Philosophy in Computing Sciences - (Ph.D.)
James M. Calvin
Senjuti Basu Roy
In the era of big data, the rapidly growing flood of data represents an immense opportunity. New computational methods are desired to fully leverage the potential that exists within massive structured and unstructured data. However, decision-makers are often confronted with multiple diverse heterogeneous data sources. The heterogeneity includes different data types, different granularities, and different dimensions, posing a fundamental challenge in many applications. This dissertation focuses on designing hybrid deep neural networks for modeling various kinds of data heterogeneity.
The first part of this dissertation concerns modeling diverse data types, the first kind of data heterogeneity. Specifically, image data and heterogeneous meta data are modeled. Detecting Copy Number Variations (CNVs) in genetic studies is used as a motivating example. A CNN-DNN blended neural network is proposed to authenticate CNV calls made by current state-of-art CNV detection algorithms. It utilizes hybrid deep neural networks to leverage both scatter plot image signal and heterogeneous numerical meta data for improving CNV calling and review efficiency.
The second part of this dissertation deals with data of various frequencies or scales in time series data analysis, the second kind of data heterogeneity. The stock return forecasting problem in the finance field is used as a motivating example. A hybrid framework of Long-Short Term Memory and Deep Neural Network (LSTM-DNN) is developed to enrich the time-series forecasting task with static fundamental information. The application of the proposed framework is not limited to the stock return forecasting problem, but any time-series based prediction tasks.
The third part of this dissertation makes an extension of LSTM-DNN framework to account for both temporal and spatial dependency among variables, common in many applications. For example, it is known that stock prices of relevant firms tend to fluctuate together. Such coherent price changes among relevant stocks are referred to a spatial dependency. In this part, Variational Auto Encoder (VAE) is first utilized to recover the latent graphical dependency structure among variables. Then a hybrid deep neural network of Graph Convolutional Network and Long-Short Term Memory network (GCN-LSTM) is developed to model both the graph structured spatial dependency and temporal dependency of variables at different scales.
Extensive experiments are conducted to demonstrate the effectiveness of the proposed neural networks with application to solve three representative real-world problems. Additionally, the proposed frameworks can also be applied to other areas filled with similar heterogeneous inputs.
Hou, Xiurui, "Hybrid deep neural networks for mining heterogeneous data" (2020). Dissertations. 1475.