17.03.2015 Views

Storage Area Networks For Dummies®

Storage Area Networks For Dummies®

Storage Area Networks For Dummies®

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 13: Using Data De-Duplication to Lighten the Load<br />

337<br />

The basic benefits of de-duplication can be summarized as follows:<br />

✓ Reduced hardware costs<br />

✓ Reduced backup costs<br />

✓ Reduced costs for disaster recovery<br />

✓ Increased efficiency of storage<br />

You can apply de-duplication in multiple places. Wherever you apply it, deduplication<br />

can affect costs for not only your SAN, but also for your entire IT<br />

infrastructure.<br />

Based on a typical enterprise environment running typical applications, you<br />

probably could squeeze out between 20 to 90 percent more storage space just<br />

by getting rid of duplicate and unnecessary data.<br />

How de-duplication works<br />

Most de-duplication solutions work by<br />

1. Dividing the input data into individual chunks<br />

2. Calculating a hash value for the chunk of data (see the following section<br />

for more on what a hash is) and storing the hash in an index<br />

3. Using the hash value of the original chunk of data and comparing it with<br />

the hash value of another new chunk of data to determine whether to<br />

store or ignore (de-dupe) the new data<br />

The process of data de-duplication can be implemented in several ways. You<br />

can manually compare two files and delete the one that’s older or no longer<br />

needed, or you can use a commercial de-duplication product. Commercial<br />

solutions use sophisticated methods (the actual math involved can make<br />

your head spin) to find duplicate data. Once you become expert in how it<br />

works, if your current line of work doesn’t pan out for you, hey, maybe you<br />

could get a job at the Central Intelligence Agency.<br />

Most of the data de-duplication solutions on the market today use standard<br />

data encryption techniques (see the nearby sidebar) to create a unique<br />

mathematical representation of the data in question — a hash — so that the<br />

hash can be compared with any new hashes to determine whether the data is<br />

unique. The hash also serves as the metadata (data about data) for the chunk<br />

of data in question. The hash is used as an index in a lookup table, allowing<br />

you to determine quickly whether any new data being stored is already present<br />

and can be eliminated.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!