ViTSen: Bridging Vision Transformers and Edge Computing with Advanced In/Near-Sensor Processing

Document Type

Article

Publication Date

1-1-2024

Abstract

This letter introduces ViTSen, optimizing vision transformers (ViTs) for resource-constrained edge devices. It features an in-sensor image compression technique to reduce data conversion and transmission power costs effectively. Further, ViTSen incorporates a ReRAM array, allowing efficient near-sensor analog convolution. This integration, novel pixel reading, and peripheral circuitry decrease the reliance on analog buffers and converters, significantly lowering power consumption. To make ViTSen compatible, several established ViT algorithms have undergone quantization and channel reduction. Circuit-to-application co-simulation results show that ViTSen maintains accuracy comparable to a full-precision baseline across various data precisions, achieving an efficiency of ∼3.1 TOp/s/W.

Identifier

85211620559 (Scopus)

Publication Title

IEEE Embedded Systems Letters

External Full Text Location

https://doi.org/10.1109/LES.2024.3449240

e-ISSN

19430671

ISSN

19430663

First Page

341

Last Page

344

Issue

4

Volume

16

Grant

2247156

Fund Ref

National Science Foundation

This document is currently not available here.

Share

COinS