Author ORCID Identifier

0000-0001-8969-7825

Document Type

Dissertation

Date of Award

5-31-2025

Degree Name

Doctor of Philosophy in Computing Sciences - (Ph.D.)

Department

Computer Science

First Advisor

Frank Y. Shih

Second Advisor

Zhi Wei

Third Advisor

Hai Nhat Phan

Fourth Advisor

Yao Ma

Fifth Advisor

Hao Chen

Abstract

Nowadays, more and more interesting computer vision tasks are tackled by deep learning approaches. However, the increasing model complexity imposes significant computational and storage costs. To address this challenge, this dissertation explores efficient deep learning techniques, proposing morphological layer, an efficient feature extraction layer. It achieves competitive image classification accuracy with significantly decreased model parameters. Another attempt at efficient deep learning is a proposed channel pruning approach that compresses deep neural networks by identifying and removing redundant channels using optimal transport theory. This approach achieves significant reductions in model size and computational cost while maintaining or even improving performance across various tasks.

Furthermore, self-supervised learning has drawn much attention from researchers. A very well-studied self-supervised learning approach is contrastive learning. It employs a pretext task to guide the model supervisory learn the representation of a dataset. Among various pretext tasks, instance discrimination is most commonly used. However, instance discrimination highly relies on data augmentation to discriminate different instances from the dataset that is not well-annotated. Therefore, creating a powerful and reasonable data augmentation approach can be beneficial to contrastive learning. In this work, this dissertation explores searching for an optimal set of data augmentation searching strategy based on models, datasets by adversarial training. It can be integrated into most of the contrastive learning frameworks, such as MoCo, SimCLR, SimSiam, etc. This dissertation also aims to investigate the possibility of improving prompt tuning by optimal transport to improve the generalization of the CLIP pre-trained model. It provides better zero-shot classification accuracy on cross-dataset generalization.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.