Modular vector processor architecture targeting at data-level parallelism
Document Type
Article
Publication Date
5-22-2015
Abstract
Taking advantage of DLP (Data-Level Parallelism) is indispensable in most data streaming and multimedia applications. Several architectures have been proposed to improve both the performance and energy consumption for such applications. Superscalar and VLIW (Very Long Instruction Word) processors along with SIMD (Single-Instruction Multiple-Data) and vector processor (VP) accelerators, are among the available options for designers to accomplish their desired requirements. We present an innovative architecture for a VP which separates the path for performing data shuffle and memory-indexed accesses from the data path for executing other vector instructions that access the memory. This separation speeds up the most common memory access operations by avoiding extra delays and unnecessary stalls. In our lane-based VP design, each vector lane uses its own private memory to avoid any stalls during memory access instructions. The proposed VP, which is developed in VHDL and prototyped on an FPGA, serves as a coprocessor for one or more scalar cores. Benchmarking shows that our VP can achieve very high performance. For example, it achieves a larger than 1500-fold speedup in the color space converting benchmark compared to running the code on a scalar core. The inclusion of distributed data shuffle engines across vector lanes has a spectacular impact on the execution time, primarily for applications like FFT (Fast-Fourier Transform) that require large amounts of data shuffling. Compared to running the benchmark on a VP without the shuffle engines, the speedup is 5.92 and 7.33 for the 64-point FFT without and with compiler optimization, respectively. Compared to runs on the scalar core, the achieved speedups for this benchmark are 52.07 and 110.45 without and with compiler optimization, respectively.
Identifier
84929581773 (Scopus)
Publication Title
Microprocessors and Microsystems
External Full Text Location
https://doi.org/10.1016/j.micpro.2015.04.007
ISSN
01419331
First Page
237
Last Page
249
Issue
4-5
Volume
39
Fund Ref
National Technical University of Athens
Recommended Citation
Rooholamin, Seyed A. and Ziavras, Sotirios G., "Modular vector processor architecture targeting at data-level parallelism" (2015). Faculty Publications. 6999.
https://digitalcommons.njit.edu/fac_pubs/6999
