Author ORCID Identifier

0000-0001-8969-7825

Document Type

Dissertation

Date of Award

5-31-2025

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

Frank Y. Shih

Second Advisor

Zhi Wei

Third Advisor

Hai Nhat Phan

Fourth Advisor

Yao Ma

Fifth Advisor

Hao Chen

Abstract

Nowadays, more and more interesting computer vision tasks are tackled by deep learning approaches. However, the increasing model complexity imposes significant computational and storage costs. To address this challenge, this dissertation explores efficient deep learning techniques, proposing morphological layer, an efficient feature extraction layer. It achieves competitive image classification accuracy with significantly decreased model parameters. Another attempt at efficient deep learning is a proposed channel pruning approach that compresses deep neural networks by identifying and removing redundant channels using optimal transport theory. This approach achieves significant reductions in model size and computational cost while maintaining or even improving performance across various tasks.

Furthermore, self-supervised learning has drawn much attention from researchers. A very well-studied self-supervised learning approach is contrastive learning. It employs a pretext task to guide the model supervisory learn the representation of a dataset. Among various pretext tasks, instance discrimination is most commonly used. However, instance discrimination highly relies on data augmentation to discriminate different instances from the dataset that is not well-annotated. Therefore, creating a powerful and reasonable data augmentation approach can be beneficial to contrastive learning. In this work, this dissertation explores searching for an optimal set of data augmentation searching strategy based on models, datasets by adversarial training. It can be integrated into most of the contrastive learning frameworks, such as MoCo, SimCLR, SimSiam, etc. This dissertation also aims to investigate the possibility of improving prompt tuning by optimal transport to improve the generalization of the CLIP pre-trained model. It provides better zero-shot classification accuracy on cross-dataset generalization.

Recommended Citation

Shen, Yucong, "Enriching vision representation by deep neural networks and self-supervised learning" (2025). Dissertations. 1839.
https://digitalcommons.njit.edu/dissertations/1839

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Dissertations

Enriching vision representation by deep neural networks and self-supervised learning

Author ORCID Identifier

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Dissertations

Enriching vision representation by deep neural networks and self-supervised learning

Author

Author ORCID Identifier

Document Type

Date of Award

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Fifth Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links