Data Deduplication; What Is Data Deduplication; Data Deduplication And The Hp Storeonce Backup System - HP D2D Manual

Hp d2d backup system concepts guide (eh985-90915, march 2011)
Hide thumbs Also See for D2D:
Table of Contents

Advertisement

4 Data deduplication

In this chapter:

What is data deduplication?

Data deduplication and the HP D2D Backup System
Tape rotation example with data deduplication
What is data deduplication?
Data deduplication is a process that compares blocks of data being written to the backup device
with data blocks previously stored on the device. If duplicate data is found, a pointer is established
to the original data, rather than storing the duplicate data sets. This removes, or "deduplicates,"
the redundant blocks. The key part of this is that the data deduplication is being done at the block
level and not at the file level which reduces the volume of data stored significantly.
Figure 3 Data stored after deduplication
The importance of the Index files
As a backup stream arrives at the HP StoreOnce Backup System the stream of data is "chunked"
into nominal 4K chunks, a hashing algorithm is run on each of these 4K chunks and this produces
a unique digital fingerprint which is written to an index file.
This process is repeated real time for every chunk of data involved in the first backup stream. When
subsequent backups run it is highly likely they will create identical hash codes, in which case the
hash count in the index is increased; the data associated with the hash code is not stored again
because it already resides in the Deduplication Store. So we only store the data once for any given
hash code – hence StoreOnce.
The Index files contain the mapping for the hashed data chunks created by deduplication and are
the main point of reference accessed and updated by both replication and housekeeping. Without
them, data cannot be restored successfully.

Data deduplication and the HP StoreOnce Backup System

Data deduplication is applied per library device or share. When you configure the library or share,
it defaults to deduplication enabled. It cannot be disabled on any G2 product. (On HP D2D2500
G1 and D2D4000/4009 machines, it may be disabled on virtual tape libraries, but not on NAS
shares.)
A device is associated with a host server and deduplication allows a greater amount of backup
history to be stored for that host. A larger number of full backups can be achieved, which makes
possible a rotation strategy with a longer retention history. It does not increase the number of host
servers that may be connected. The deduplication factor that has been applied to a device is
What is data deduplication?
15

Advertisement

Table of Contents
loading

Table of Contents