Provably Good Randomized Strategies for Data Placement in Distributed Key-Value Stores

Document Type

Conference Proceeding

Publication Date

2-25-2023

Abstract

Distributed storage systems are used widely in clouds, databases, and file systems. These systems store a large amount of data across multiple servers. When a request to access data comes in, it is routed to the appropriate server, queued, and eventually processed. If the server's queue is full, then requests may be rejected. Thus, one important challenge when designing the algorithm for allocating data to servers is the fact that the request pattern may be unbalanced, unpredictable, and may change over time. If some servers get a large fraction of the requests, they are overloaded, leading to many rejects. In this paper, we analyze this problem theoretically under adversarial assumptions. In particular, we assume that the request sequence is generated by an adversarial process to maximize the number of rejects and analyze the performance of various algorithmic strategies in terms of the fraction of the requests rejected. We show that no deterministic strategy can perform well. On the other hand, a simple randomized strategy guarantees that at most a constant fraction of requests are rejected in expectation. We also show that moving data to load balance is essential if we want to reject a very small fraction (1/m where m is the number of servers) of requests. We design a strategy with randomization and data transfer to achieve this performance with speed augmentation. Finally, we conduct experiments and show that our algorithms perform well in practice.

Identifier

85149311918 (Scopus)

ISBN

[9798400700156]

Publication Title

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming PPOPP

External Full Text Location

https://doi.org/10.1145/3572848.3577501

First Page

1

Last Page

12

Grant

CCF–1725647

Fund Ref

National Science Foundation

This document is currently not available here.

Share

COinS