Author ORCID Identifier
0000-0001-8969-7825
Document Type
Dissertation
Date of Award
5-31-2025
Degree Name
Doctor of Philosophy in Computing Sciences - (Ph.D.)
Department
Computer Science
First Advisor
Frank Y. Shih
Second Advisor
Zhi Wei
Third Advisor
Hai Nhat Phan
Fourth Advisor
Yao Ma
Fifth Advisor
Hao Chen
Abstract
Nowadays, more and more interesting computer vision tasks are tackled by deep learning approaches. However, the increasing model complexity imposes significant computational and storage costs. To address this challenge, this dissertation explores efficient deep learning techniques, proposing morphological layer, an efficient feature extraction layer. It achieves competitive image classification accuracy with significantly decreased model parameters. Another attempt at efficient deep learning is a proposed channel pruning approach that compresses deep neural networks by identifying and removing redundant channels using optimal transport theory. This approach achieves significant reductions in model size and computational cost while maintaining or even improving performance across various tasks.
Furthermore, self-supervised learning has drawn much attention from researchers. A very well-studied self-supervised learning approach is contrastive learning. It employs a pretext task to guide the model supervisory learn the representation of a dataset. Among various pretext tasks, instance discrimination is most commonly used. However, instance discrimination highly relies on data augmentation to discriminate different instances from the dataset that is not well-annotated. Therefore, creating a powerful and reasonable data augmentation approach can be beneficial to contrastive learning. In this work, this dissertation explores searching for an optimal set of data augmentation searching strategy based on models, datasets by adversarial training. It can be integrated into most of the contrastive learning frameworks, such as MoCo, SimCLR, SimSiam, etc. This dissertation also aims to investigate the possibility of improving prompt tuning by optimal transport to improve the generalization of the CLIP pre-trained model. It provides better zero-shot classification accuracy on cross-dataset generalization.
Recommended Citation
Shen, Yucong, "Enriching vision representation by deep neural networks and self-supervised learning" (2025). Dissertations. 1839.
https://digitalcommons.njit.edu/dissertations/1839
