Data Deduplication - Oracle ZFS Storage Appliance Administration Manual

Hide thumbs Also See for ZFS Storage Appliance:
Table of Contents

Advertisement

Project and Share Properties
By default, filesystems implement file behavior according to POSIX standards. These standards
are fundamentally incompatible with the behavior required by the SMB protocol. For shares
where the primary protocol is SMB, this option should always be enabled. Changing this
property requires all clients to be disconnected and reconnect.

Data deduplication

The data deduplication property controls whether duplicate copies of data are eliminated.
Deduplication is synchronous, pool-wide, block-based, and can be enabled on a per project or
share basis.
To enable deduplication, select the Data Deduplication checkbox on the general properties
screen for projects or shares. The deduplication ratio will appear in the usage area of the Status
Dashboard. Data written with deduplication enabled is entered into the deduplication table
indexed by the data checksum. Deduplication forces the use of the cryptographically strong
SHA-256 checksum. Subsequent writes will identify duplicate data and retain only the existing
copy on disk. Deduplication can only happen between blocks of the same size, data written with
the same record size. For best results, set the record size to that of the application using the data;
for streaming workloads, use a large record size.
If your data does not contain any duplicates, enabling Data Deduplication will add overhead
(a more CPU-intensive checksum and on-disk deduplication table entries) without providing
any benefit. If your data does contain duplicates, enabling Data Deduplication will both save
space by storing only one copy of a given block regardless of how many times it occurs.
Deduplication necessarily will impact performance in that the checksum is more expensive to
compute and the metadata of the deduplication table must be accessed and maintained.
Note that deduplication has no effect on the calculated size of a share, but does affect the
amount of space used for the pool. For example, if two shares contain the same 1GB file, each
will appear to be 1GB in size, but the total for the pool will be just 1GB and the deduplication
ratio will be reported as 2x.
By its nature, deduplication requires modifying the deduplication table when a block is
written to or freed. If the deduplication table cannot fit in DRAM, writes and frees may induce
significant random read activity where there was previously none. As a result, the performance
impact of enabling deduplication can be severe. Moreover, for some cases -- in particular, share
or snapshot deletion -- the performance degradation from enabling deduplication may be felt
pool-wide. In general, it is not advised to enable deduplication unless it is known that a share
has a very high rate of duplicated data, and that the duplicated data plus the table to reference it
can comfortably reside in DRAM.
To determine if performance has been adversely affected by deduplication, enable advanced
analytics and then use analytics to measure "ZFS DMU operations broken down by DMU
366
Oracle ZFS Storage Appliance Administration Guide, Release OS8.6.x • September 2016

Advertisement

Table of Contents
loading

Table of Contents