Faculty Publications

Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis

Hongkang Li, School of Engineering
Meng Wang, School of Engineering
Shuai Zhang, Ying Wu College of Computing
Sijia Liu, College of Engineering
Pin Yu Chen, IBM Thomas J. Watson Research Center

Document Type

Conference Proceeding

Publication Date

1-1-2024

Abstract

Efficient training and inference algorithms, such as low-rank adaption and model pruning, have shown impressive performance for learning Transformer-based large foundation models. However, due to the technical challenges of the non-convex optimization caused by the complicated architecture of Transformers, the theoretical study of why these methods can be applied to learn Transformers is mostly elusive. To the best of our knowledge, this paper shows the first theoretical analysis of the property of low-rank and sparsity of one-layer Transformers by characterizing the trained model after convergence using stochastic gradient descent. By focusing on a data model based on label-relevant and label-irrelevant patterns, we quantify that the gradient updates of trainable parameters are low-rank, which de-pends on the number of label-relevant patterns. We also analyze how model pruning affects the generalization while improving computation efficiency and conclude that proper magnitude-based pruning has a slight effect on the testing performance. We implement numerical experiments to support our findings.

Identifier

85203387061 (Scopus)

ISBN

[9798350344813]

Publication Title

Proceedings of the IEEE Sensor Array and Multichannel Signal Processing Workshop

External Full Text Location

https://doi.org/10.1109/SAM60225.2024.10636559

e-ISSN

2151870X

Recommended Citation

Li, Hongkang; Wang, Meng; Zhang, Shuai; Liu, Sijia; and Chen, Pin Yu, "Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis" (2024). Faculty Publications. 891.
https://digitalcommons.njit.edu/fac_pubs/891

This document is currently not available here.

COinS

DOI

10.1109/SAM60225.2024.10636559

Faculty Publications

Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

External Full Text Location

e-ISSN

Recommended Citation

DOI

Search

Browse

Author Corner

Links

Faculty Publications

Learning on Transformers is Provable Low-Rank and Sparse: A One-layer Analysis

Authors

Document Type

Publication Date

Abstract

Identifier

ISBN

Publication Title

External Full Text Location

e-ISSN

Recommended Citation

Share

DOI

Search

Browse

Author Corner

Links