On a Small File Merger for Fast Access and Modifiability of Small Files in HDFS
Document Type
Conference Proceeding
Publication Date
1-1-2021
Abstract
Hadoop Distributed File System (HDFS) was originally designed to store big files and has been widely used in big-data ecosystem. However, it may suffer from serious performance issues when handling a large number of small files. In this paper, we propose a novel archive system, referred to as Small File Merger (SFM), to solve small file problems in HDFS. The key idea is to combine small files into large ones and build an index for accessing original files. Unlike traditional archive systems such as Hadoop Archives (Har), SFM allows modification of archived files directly without re-archiving. Considering that most of the reads in HDFS are sequential, we design an adaptive readahead strategy based on the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm to maximize read performance. Furthermore, our system provides an HDFS-compatible interface, which can be used directly without recompiling and redeploying the existing HDFS cluster, hence facilitating convenient deployment for practical use. Preliminary experimental results show that our system achieves better performance than existing methods.
Identifier
85125626212 (Scopus)
ISBN
[9781665409698]
Publication Title
Proceedings of IEEE ACS International Conference on Computer Systems and Applications Aiccsa
External Full Text Location
https://doi.org/10.1109/AICCSA53542.2021.9686873
e-ISSN
21615330
ISSN
21615322
Volume
2021-December
Recommended Citation
Chen, Dingchao; Wu, Chase Q.; Shen, Wei; and Zhang, Yu, "On a Small File Merger for Fast Access and Modifiability of Small Files in HDFS" (2021). Faculty Publications. 4561.
https://digitalcommons.njit.edu/fac_pubs/4561