Dell PowerVault Storage Area Network Administrator's Manual page 14

Dell dr series system administrator's guide

Hide thumbs Also See for PowerVault Storage Area Network:

Reference manual (131 pages)

Manual (63 pages)

Release note (16 pages)

Table Of Contents

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

Table of Contents

the best results for routine and repeated data backups of structured data. Block-level deduplication works efficiently

where there are multiple duplicate versions of the same file. This is because it looks at the actual sequence of the data–

the 0s and 1s–that comprise the data.

Whenever a document is repeatedly backed up, the 0s and 1s stay the same because the file is simply being duplicated.

The similarities between two files can be easily identified using block deduplication because the sequence of their 0s

and 1s remain exactly the same. In contrast to this, there are differences in online data. Online data has few exact

duplicates. Instead, online data files include files that may contain a lot of similarities between each file. For example, a

majority of files that contribute to increased data storage requirements come pre-compressed by their native

applications, such as:

•

Images and video (such as the JPEG, MPEG, TIFF, GIF, PNG formats)

•

Compound documents (such as .zip files, email, HTML, web pages, and PDFs)

•

Microsoft Office application documents (including PowerPoint, MS-Word, Excel, and SharePoint)

NOTE: The DR Series system experiences a reduced savings rate when the data it ingests is already

compression-enabled by the native data source. It is highly recommended that you disable data

compression used by the data source, and especially for first-time backups. For optimal savings, the native

data sources need to send data to the DR Series system in a raw state for ingestion.

Block deduplication is not as effective on existing compressed files due to the nature of file compression because its 0s

and 1s change from the original format. Data deduplication is a specialized form of data compression that eliminates a

lot of redundant data. The compression technique improves storage utilization, and it can be used in network data

transfers to reduce the number of bytes that must be sent across a link. Using deduplication, unique chunks of data, or

byte patterns, can be identified and stored during analysis. As the analysis continues, other chunks are compared to the

stored copy and when a match occurs, the redundant chunk is replaced with a small reference that points to its stored

chunk. This reduces the amount of data that must be stored or transferred, which contributes to network savings.

Network savings are achieved by the process of replicating data that has already undergone deduplication.

By contrast, standard file compression tools identify short repeated substrings inside individual files, with the intent of

storage-based data deduplication being to inspect large volumes of data and identify large amounts of data such as

entire files or large sections of files that are identical. Once this has been done, this process allows for the system to

store only one copy of the specific data. This copy will be additionally compressed using single-file compression

techniques. For example, there may be cases where an email system may contain 100 or more emails where the same 1

Megabyte (MB) file is sent as an attachment and the following shows how this is handled:

•

Without data deduplication, each time that email system is backed up, all 100 instances of the same attachment

are saved, which requires 100 MB of storage space.

•

With data deduplication, only one instance of the attachment is actually stored (all subsequent instances are

referenced back to the one saved copy), with the deduplication ratio being approximately 100 to 1). The unique

chunks of data that represent the attachment are deduplicated at the block chunking level.

NOTE: The DR Series system does not support deduplication of any encrypted data, so, there will be no

deduplication savings derived from ingesting encrypted data. The DR Series system cannot deduplicate

data that has already been encrypted because it considers that data to be unique, and as a result, cannot

deduplicate it.

In cases where self encrypting drives (SEDs) are used, when data is read by the backup application, it is decrypted by

the SED or the encryption layer. This works in the same way as if you were opening an MS-Word document that was

saved on a SED. This means that any data stored on a SED can be read and deduplicated. If you enable encryption in the

backup software, you will lose deduplication savings because each time the data is encrypted, the DR Series system

considers it to be unique.

Replication: Replication is the process by which the same key data is saved from multiple storage devices, with the goal

of maintaining consistency between redundant resources in data storage environments. Data replication improves the

level of fault-tolerance, which improves the reliability of maintaining saved data, and permits accessibility to the same

stored data. The DR Series system uses an active form of replication that lets you configure a primary-backup scheme.

Table of Contents

This manual is also suitable for:

Dr4100 Powervault dx6104 Powervault dx6112 Powervault lto3-080 Powervault lto4-120hh Dr4000

Dell PowerVault Storage Area Network Administrator's Manual page 14

Related Manuals for Dell PowerVault Storage Area Network

Related Products for Dell PowerVault Storage Area Network

This manual is also suitable for:

Table of Contents