Data deduplication technique is widely deployed in cloud backup storage system to reduce storage space and to minimize the transmission of redundant data for proper utilization of network bandwidth. During cloud backup service, redundancy of typical backup data dominated heavily by duplicate chunks. The intrinsic drawback of this system is detecting the similar chunks.The storage server consists of large volume of chunks, making the duplicate detection process much more complicated which decreases deduplication efﬁciency and increases deduplication overhead. In this paper we propose Bayesian method for source local deduplication for ﬁnding out duplicate chunks. For ﬁnding chunk similarity, the learning based similarity metrics are developed. The data features are used to train Bayesian system. Our experimental results shows that precision, recall and F measure values are high compared to SVM and GP. Due to these high values the proposed Bayesian method increases deduplication efﬁciency and reduces deduplication overhead. Therefore the proposed Bayesian method yields better performance than Support Vector Machine Model and Genetic approach.
Digital Object Identifier (DOI)
Neelaveni, P. and Vijayalakshmi, M.
"Bayesian Method for Source Local Deduplication in Cloud Backup Services,"
Applied Mathematics & Information Sciences: Vol. 10
, Article 36.
Available at: https://dc.naturalspublishing.com/amis/vol10/iss6/36