On an Approximation Algorithm Combined with D3QN for HDFS Data Block Recovery in Heterogeneous Hadoop Clusters
Document Type
Conference Proceeding
Publication Date
1-1-2024
Abstract
Hadoop stands as a cornerstone in the realm of big data processing, with its Hadoop Distributed File System (HDFS) serving as a pivotal layer ensuring fault tolerance and high throughput data storage. Through mechanisms such as block replication and cluster-wide distribution, HDFS facilitates parallel computing in higher layers. However, the inherent heterogeneity within Hadoop clusters introduces complexities, particularly concerning the reliability of stored data. The failure of DataNodes within heterogeneous clusters poses a significant risk, potentially leading to data loss and compromising data reliability. Notably, the default block recovery strategy within HDFS overlooks the varying capacities of data nodes and the diverse patterns of data access, rendering it inadequate for heterogeneous environments. To address this gap, we first propose a novel approach for block recovery selection based on dueling double deep Q-networks, augmented with Gaussian Process Regression. We further formulate block recovery placement as an optimization problem in heterogeneous clusters, show its NP-completeness, and design an approximation algorithm that leverages linear programming-based iterative rounding (LPIR-BR), which offers a robust performance guarantee. Extensive experimental results validates the efficacy of LPIR-BR, showcasing its superiority over existing algorithms and affirming the soundness of our theoretical framework.
Identifier
85200982059 (Scopus)
ISBN
[9783031663284]
Publication Title
Lecture Notes in Networks and Systems
External Full Text Location
https://doi.org/10.1007/978-3-031-66329-1_25
e-ISSN
23673389
ISSN
23673370
First Page
381
Last Page
401
Volume
1065 LNNS
Recommended Citation
Zhang, Yijie; Wu, Chase Q.; and Hou, Aiqin, "On an Approximation Algorithm Combined with D3QN for HDFS Data Block Recovery in Heterogeneous Hadoop Clusters" (2024). Faculty Publications. 922.
https://digitalcommons.njit.edu/fac_pubs/922