Author ORCID Identifier

0009-0002-2109-9436

Document Type

Dissertation

Date of Award

8-31-2023

Degree Name

Doctor of Philosophy in Computer Engineering - (Ph.D.)

Department

Electrical and Computer Engineering

First Advisor

Ali N. Akansu

Second Advisor

Nirwan Ansari

Third Advisor

Ali Abdi

Fourth Advisor

Abdallah Khreishah

Fifth Advisor

Hai Nhat Phan

Abstract

It is widely reported that deep neural networks outperform most competitors for a range of applications. The state-of-the-art neural networks have built-in inductive bias of architectural choices, regularizations, optimizer types, and initialization methods. Using inductive bias is intuitive to enhance the model approximation. Deep neural networks are mostly dense and heavily overparameterized. They tend to be biased towards low-rank solutions to reduce complexity and improve generalization performance, known as implicit regularization. The implicit regularization as observed in specific architectures and various real-world data sets suggests to overparameterize neural networks judiciously and learn compressed representations (lower rank approximation) with improved performance. Thus, a common strategy is to train overparameterized neural networks by using some form of inductive bias and to learn more compact representations (better approximation) with increased generalization performance.

Overparameterizing neural networks to exploit benign overfitting and implicit regularization for improved performance is a computationally inefficient strategy. Sparse neural networks achieve higher efficiency in the forms of speed, storage, and energy consumption with better (or marginally lower) performance than their dense counterparts. Besides, the state-of-the-art overparameterized networks are mostly created as black-box models. Their system performance is not explained by relating to behaviors of building blocks, i.e., weights, nodes, layers, input (training) signal characteristics, and the reshaping of the statistics. Procedurally designing a network architecture incorporating input signal statistics is an active research topic. The goal of this dissertation is to have a fresh look at the relationships among input signal statistics, node and layer behavior, network dimension, depth, and sparsity and to quantify their impact on Multilayer Perceptron (MLP) performance.

In this dissertation, the numerical performance studies for various MLP architectures and input signal statistics are performed. Firstly, the impact of input signal statistics on network performance is demonstrated. The results demonstrate that the signal statistics should be taken into consideration while a network architecture is designed. Secondly, the two metrics called node compression ratio and layer compression ratio are introduced to explain the inner-workings of MLP optimization at the node level. Their values are related to the implicit regularization forced by the optimizer and demonstrate the built-in sparsity on MLP caused by the implicit regularization. The invaluable insights for weight and node sparsity are gained from the simulations. Lastly, a signal-dependent (data-driven), correlation-based pruning algorithm is proposed to progressively sparsify inter-layer weight matrices of a MLP. The use of sparsity as an explicit regularizer in model optimization brings significantly higher gains in performance and efficiency than the implicit regularization performed by the optimizer itself. The results consistently suggest the use of a signal-dependent weight sparsity method in model optimization rather than a signal-independent (data-free) weight sparsity method for improved accuracy and higher computational efficiency. The numerical performances for various MLP architectures offer insights on the relationships among network dimension, depth, sparsity, statistics of data, node and layer behavior, and performance. Convincing evidence is presented that the network design should consider input statistics and track its transformations through the building blocks of the network to adaptively regularize the empirical optimization for improved performance and higher computational efficiency. This dissertation can be expanded on Neural Architecture Search (NAS), where a self re-configuring and adaptive network architecture with node and weight sparsities is realized.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.