Author ORCID Identifier
0009-0002-2109-9436
Document Type
Dissertation
Date of Award
8-31-2023
Degree Name
Doctor of Philosophy in Computer Engineering - (Ph.D.)
Department
Electrical and Computer Engineering
First Advisor
Ali N. Akansu
Second Advisor
Nirwan Ansari
Third Advisor
Ali Abdi
Fourth Advisor
Abdallah Khreishah
Fifth Advisor
Hai Nhat Phan
Abstract
It is widely reported that deep neural networks outperform most competitors for a range of applications. The state-of-the-art neural networks have built-in inductive bias of architectural choices, regularizations, optimizer types, and initialization methods. Using inductive bias is intuitive to enhance the model approximation. Deep neural networks are mostly dense and heavily overparameterized. They tend to be biased towards low-rank solutions to reduce complexity and improve generalization performance, known as implicit regularization. The implicit regularization as observed in specific architectures and various real-world data sets suggests to overparameterize neural networks judiciously and learn compressed representations (lower rank approximation) with improved performance. Thus, a common strategy is to train overparameterized neural networks by using some form of inductive bias and to learn more compact representations (better approximation) with increased generalization performance.
Overparameterizing neural networks to exploit benign overfitting and implicit regularization for improved performance is a computationally inefficient strategy. Sparse neural networks achieve higher efficiency in the forms of speed, storage, and energy consumption with better (or marginally lower) performance than their dense counterparts. Besides, the state-of-the-art overparameterized networks are mostly created as black-box models. Their system performance is not explained by relating to behaviors of building blocks, i.e., weights, nodes, layers, input (training) signal characteristics, and the reshaping of the statistics. Procedurally designing a network architecture incorporating input signal statistics is an active research topic. The goal of this dissertation is to have a fresh look at the relationships among input signal statistics, node and layer behavior, network dimension, depth, and sparsity and to quantify their impact on Multilayer Perceptron (MLP) performance.
In this dissertation, the numerical performance studies for various MLP architectures and input signal statistics are performed. Firstly, the impact of input signal statistics on network performance is demonstrated. The results demonstrate that the signal statistics should be taken into consideration while a network architecture is designed. Secondly, the two metrics called node compression ratio and layer compression ratio are introduced to explain the inner-workings of MLP optimization at the node level. Their values are related to the implicit regularization forced by the optimizer and demonstrate the built-in sparsity on MLP caused by the implicit regularization. The invaluable insights for weight and node sparsity are gained from the simulations. Lastly, a signal-dependent (data-driven), correlation-based pruning algorithm is proposed to progressively sparsify inter-layer weight matrices of a MLP. The use of sparsity as an explicit regularizer in model optimization brings significantly higher gains in performance and efficiency than the implicit regularization performed by the optimizer itself. The results consistently suggest the use of a signal-dependent weight sparsity method in model optimization rather than a signal-independent (data-free) weight sparsity method for improved accuracy and higher computational efficiency. The numerical performances for various MLP architectures offer insights on the relationships among network dimension, depth, sparsity, statistics of data, node and layer behavior, and performance. Convincing evidence is presented that the network design should consider input statistics and track its transformations through the building blocks of the network to adaptively regularize the empirical optimization for improved performance and higher computational efficiency. This dissertation can be expanded on Neural Architecture Search (NAS), where a self re-configuring and adaptive network architecture with node and weight sparsities is realized.
Recommended Citation
Benar, Cem, "On explainability of neural networks" (2023). Dissertations. 1845.
https://digitalcommons.njit.edu/dissertations/1845
Included in
Artificial Intelligence and Robotics Commons, Data Science Commons, Signal Processing Commons
