Pradeep RENGASWAMY, Chitra BABU, Prabavathy BALASUNDARAM

A distributed load balancing algorithm for deduplicated storage

While deduplication brings the advantage of significant space savings in storage, it nevertheless incurs theoverhead of maintaining huge metadata. Updating such huge metadata during the data migration that arises due to loadbalancing activity results in significant overhead. In order to reduce this metadata update overhead, this paper proposesa suitable alternate index that tracks the data blocks even when they migrate across the nodes without explicitly storingthe location information. In addition, a virtual server-based load balancing (VSLB) algorithm has been proposed inorder to reduce the migration overhead. The experimental results indicate that the proposed index reduces the metadataupdate overhead by 74% when compared to the existing index. Furthermore, VSLB reduces the migration overhead by33% when compared to the existing ID reassignment-based load balancing approach.

PDF

___

[1] Zeng W, Zhao Y, Ou K, Song W. Research on cloud storage architecture and key technologies. In: Proceedings of 2nd IEEE International Conference on Interaction Sciences: Information Technology, Culture, and Human; Seoul, Korea; 2009. pp. 1044-1048.
[2] Wu J, Hua Y, Zuo P, Sun Y. Improving restore performance in deduplication systems via a cost-efficient rewriting scheme. IEEE Transactions on Parallel and Distributed Systems 2019; 30 (1): 119-132. doi: 10.1109/TPDS.2018.2852642
[3] Douglis F, Bhardwaj D, Qian H, Shilane P. Content-aware load balancing for distributed backup. In: Proceedings of 25th Conference on Large Installation System Administration; Boston, MA, USA; 2011. pp. 1–18.
[4] Frey D, Kermarrec AM, Kloudas K. Probabilistic deduplication for cluster-based storage systems. In: Proceedings of ACM Cloud Computing Symposium; San Jose, CA, USA; 2012. pp. 1–17.
[5] Hsiao HC, Liao H, Chen ST, Huang KC. Load rebalancing for distributed file systems in clouds. IEEE Transactions on Parallel and Distributed Systems 2013; 24 (5): 951–962. doi: 10.1109/TPDS.2012.196
[6] Stoica I, Morris R, Liben-Nowell D, Karger DR, Khaashoek MF et al. Chord: A scalable peer-to-peer lookup protocol for internet applications. IEEE/ACM Transactions on Networking 2003; 11 (1): 17–32. doi: 10.1109/TNET.2002.808407
[7] Brunel J, Chemouil D, Tawa J. Analyzing the fundamental liveness property of the chord protocol. In: Proceedings of IEEE International Conference on Formal Methods in Computer Aided Design; Austin, TX, USA; 2018. pp. 1-9.
[8] Jelasity M, Montresor A, Babaoglu O. Gossip-based aggregation in large dynamic networks. ACM Transactions on Computer Systems 2005; 23 (3): 219–252. doi: 10.1145/1082469.1082470
[9] Xu L, Hu J, Mkandawire S, Jiang H. SHHC: A scalable hybrid hash cluster for cloud backup services in data centers. In: Proceedings of 33rd IEEE International Conference on Distributed Computing System Workshops; Philadelphia, PA, USA; 2013. pp. 61–65.
[10] Santos W, Teixeira T, Machado C, Meira W Jr, Ferreira R et al. A scalable parallel deduplication algorithm. In: Proceedings of 19th IEEE International Symposium on Computer Architecture and High Performance Computing; Rio Grande do Sul, Brazil; 2007. pp. 79–86.
[11] Bhagwat D, Eshghi K, Long DDE, Lillibridge M. Extreme binning: scalable, parallel deduplication for chunk-based file backup. In: Proceedings of IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems; London, UK; 2009. pp. 1–9.
[12] Fu Y, Jiang H, Xiao N, Tian L, Liu F. Aa-dedupe: An application-aware source deduplication approach for cloud backup services in the personal computing environment. In: Proceedings of IEEE International Conference on Cluster Computing; Austin, TX, USA; 2015. pp. 112–120.
[13] Xu M, Zhu Y, Lee PP, Xu Y. Even data placement for load balance in reliable distributed deduplication storage systems. In: Proceedings of 23rd IEEE International Symposium on Quality of Service; Portland, OR, USA; 2015. pp. 349-358.
[14] Huang Z, Li H, Li X, He W. SS-dedup: A high throughput stateful data routing algorithm for cluster deduplication system. In: Proceedings of IEEE International Conference on Big Data; Washington, DC, USA; 2016. pp. 2991-2995.
[15] Luo S, Zhang G, Wu C, Khan S, Li K, Boafft: Distributed deduplication for big data storage in the cloud. IEEE Transactions on Cloud Computing 2018; 61 (11): 1-13. doi: 10.1109/TCC.2015.2511752