On an Approximation Algorithm Combined with D3QN for HDFS Data Block Recovery in Heterogeneous Hadoop Clusters

Document Type

Conference Proceeding

Publication Date

1-1-2024

Abstract

Hadoop stands as a cornerstone in the realm of big data processing, with its Hadoop Distributed File System (HDFS) serving as a pivotal layer ensuring fault tolerance and high throughput data storage. Through mechanisms such as block replication and cluster-wide distribution, HDFS facilitates parallel computing in higher layers. However, the inherent heterogeneity within Hadoop clusters introduces complexities, particularly concerning the reliability of stored data. The failure of DataNodes within heterogeneous clusters poses a significant risk, potentially leading to data loss and compromising data reliability. Notably, the default block recovery strategy within HDFS overlooks the varying capacities of data nodes and the diverse patterns of data access, rendering it inadequate for heterogeneous environments. To address this gap, we first propose a novel approach for block recovery selection based on dueling double deep Q-networks, augmented with Gaussian Process Regression. We further formulate block recovery placement as an optimization problem in heterogeneous clusters, show its NP-completeness, and design an approximation algorithm that leverages linear programming-based iterative rounding (LPIR-BR), which offers a robust performance guarantee. Extensive experimental results validates the efficacy of LPIR-BR, showcasing its superiority over existing algorithms and affirming the soundness of our theoretical framework.

Identifier

85200982059 (Scopus)

ISBN

[9783031663284]

Publication Title

Lecture Notes in Networks and Systems

External Full Text Location

https://doi.org/10.1007/978-3-031-66329-1_25

e-ISSN

23673389

ISSN

23673370

First Page

381

Last Page

401

Volume

1065 LNNS

This document is currently not available here.

Share

COinS