On a Small File Merger for Fast Access and Modifiability of Small Files in HDFS

Document Type

Conference Proceeding

Publication Date

1-1-2021

Abstract

Hadoop Distributed File System (HDFS) was originally designed to store big files and has been widely used in big-data ecosystem. However, it may suffer from serious performance issues when handling a large number of small files. In this paper, we propose a novel archive system, referred to as Small File Merger (SFM), to solve small file problems in HDFS. The key idea is to combine small files into large ones and build an index for accessing original files. Unlike traditional archive systems such as Hadoop Archives (Har), SFM allows modification of archived files directly without re-archiving. Considering that most of the reads in HDFS are sequential, we design an adaptive readahead strategy based on the Simultaneous Perturbation Stochastic Approximation (SPSA) algorithm to maximize read performance. Furthermore, our system provides an HDFS-compatible interface, which can be used directly without recompiling and redeploying the existing HDFS cluster, hence facilitating convenient deployment for practical use. Preliminary experimental results show that our system achieves better performance than existing methods.

Identifier

85125626212 (Scopus)

ISBN

[9781665409698]

Publication Title

Proceedings of IEEE ACS International Conference on Computer Systems and Applications Aiccsa

External Full Text Location

https://doi.org/10.1109/AICCSA53542.2021.9686873

e-ISSN

21615330

ISSN

21615322

Volume

2021-December

This document is currently not available here.

Share

COinS