A distributed load balancing algorithm for deduplicated storage
A distributed load balancing algorithm for deduplicated storage
While deduplication brings the advantage of significant space savings in storage, it nevertheless incurs theoverhead of maintaining huge metadata. Updating such huge metadata during the data migration that arises due to loadbalancing activity results in significant overhead. In order to reduce this metadata update overhead, this paper proposesa suitable alternate index that tracks the data blocks even when they migrate across the nodes without explicitly storingthe location information. In addition, a virtual server-based load balancing (VSLB) algorithm has been proposed inorder to reduce the migration overhead. The experimental results indicate that the proposed index reduces the metadataupdate overhead by 74% when compared to the existing index. Furthermore, VSLB reduces the migration overhead by33% when compared to the existing ID reassignment-based load balancing approach.
___
- [1] Zeng W, Zhao Y, Ou K, Song W. Research on cloud storage architecture and key technologies. In: Proceedings of
2nd IEEE International Conference on Interaction Sciences: Information Technology, Culture, and Human; Seoul,
Korea; 2009. pp. 1044-1048.
- [2] Wu J, Hua Y, Zuo P, Sun Y. Improving restore performance in deduplication systems via a cost-efficient
rewriting scheme. IEEE Transactions on Parallel and Distributed Systems 2019; 30 (1): 119-132. doi:
10.1109/TPDS.2018.2852642
- [3] Douglis F, Bhardwaj D, Qian H, Shilane P. Content-aware load balancing for distributed backup. In: Proceedings
of 25th Conference on Large Installation System Administration; Boston, MA, USA; 2011. pp. 1–18.
- [4] Frey D, Kermarrec AM, Kloudas K. Probabilistic deduplication for cluster-based storage systems. In: Proceedings
of ACM Cloud Computing Symposium; San Jose, CA, USA; 2012. pp. 1–17.
- [5] Hsiao HC, Liao H, Chen ST, Huang KC. Load rebalancing for distributed file systems in clouds. IEEE Transactions
on Parallel and Distributed Systems 2013; 24 (5): 951–962. doi: 10.1109/TPDS.2012.196
- [6] Stoica I, Morris R, Liben-Nowell D, Karger DR, Khaashoek MF et al. Chord: A scalable peer-to-peer
lookup protocol for internet applications. IEEE/ACM Transactions on Networking 2003; 11 (1): 17–32. doi:
10.1109/TNET.2002.808407
- [7] Brunel J, Chemouil D, Tawa J. Analyzing the fundamental liveness property of the chord protocol. In: Proceedings
of IEEE International Conference on Formal Methods in Computer Aided Design; Austin, TX, USA; 2018. pp. 1-9.
- [8] Jelasity M, Montresor A, Babaoglu O. Gossip-based aggregation in large dynamic networks. ACM Transactions on
Computer Systems 2005; 23 (3): 219–252. doi: 10.1145/1082469.1082470
- [9] Xu L, Hu J, Mkandawire S, Jiang H. SHHC: A scalable hybrid hash cluster for cloud backup services in data
centers. In: Proceedings of 33rd IEEE International Conference on Distributed Computing System Workshops;
Philadelphia, PA, USA; 2013. pp. 61–65.
- [10] Santos W, Teixeira T, Machado C, Meira W Jr, Ferreira R et al. A scalable parallel deduplication algorithm. In:
Proceedings of 19th IEEE International Symposium on Computer Architecture and High Performance Computing;
Rio Grande do Sul, Brazil; 2007. pp. 79–86.
- [11] Bhagwat D, Eshghi K, Long DDE, Lillibridge M. Extreme binning: scalable, parallel deduplication for chunk-based
file backup. In: Proceedings of IEEE International Symposium on Modeling, Analysis and Simulation of Computer
and Telecommunication Systems; London, UK; 2009. pp. 1–9.
- [12] Fu Y, Jiang H, Xiao N, Tian L, Liu F. Aa-dedupe: An application-aware source deduplication approach for cloud
backup services in the personal computing environment. In: Proceedings of IEEE International Conference on
Cluster Computing; Austin, TX, USA; 2015. pp. 112–120.
- [13] Xu M, Zhu Y, Lee PP, Xu Y. Even data placement for load balance in reliable distributed deduplication storage
systems. In: Proceedings of 23rd IEEE International Symposium on Quality of Service; Portland, OR, USA; 2015.
pp. 349-358.
- [14] Huang Z, Li H, Li X, He W. SS-dedup: A high throughput stateful data routing algorithm for cluster deduplication
system. In: Proceedings of IEEE International Conference on Big Data; Washington, DC, USA; 2016. pp. 2991-2995.
- [15] Luo S, Zhang G, Wu C, Khan S, Li K, Boafft: Distributed deduplication for big data storage in the cloud. IEEE
Transactions on Cloud Computing 2018; 61 (11): 1-13. doi: 10.1109/TCC.2015.2511752