ViTSen: Bridging Vision Transformers and Edge Computing with Advanced In/Near-Sensor Processing
Document Type
Article
Publication Date
1-1-2024
Abstract
This letter introduces ViTSen, optimizing vision transformers (ViTs) for resource-constrained edge devices. It features an in-sensor image compression technique to reduce data conversion and transmission power costs effectively. Further, ViTSen incorporates a ReRAM array, allowing efficient near-sensor analog convolution. This integration, novel pixel reading, and peripheral circuitry decrease the reliance on analog buffers and converters, significantly lowering power consumption. To make ViTSen compatible, several established ViT algorithms have undergone quantization and channel reduction. Circuit-to-application co-simulation results show that ViTSen maintains accuracy comparable to a full-precision baseline across various data precisions, achieving an efficiency of ∼3.1 TOp/s/W.
Identifier
85211620559 (Scopus)
Publication Title
IEEE Embedded Systems Letters
External Full Text Location
https://doi.org/10.1109/LES.2024.3449240
e-ISSN
19430671
ISSN
19430663
First Page
341
Last Page
344
Issue
4
Volume
16
Grant
2247156
Fund Ref
National Science Foundation
Recommended Citation
Tabrizchi, Sepehr; Reidy, Brendan C.; Najafi, Deniz; Angizi, Shaahin; Zand, Ramtin; and Roohi, Arman, "ViTSen: Bridging Vision Transformers and Edge Computing with Advanced In/Near-Sensor Processing" (2024). Faculty Publications. 782.
https://digitalcommons.njit.edu/fac_pubs/782