26.07.2013 Views

Data De-duplication and Disk-to-Disk Backup Systems

Data De-duplication and Disk-to-Disk Backup Systems

Data De-duplication and Disk-to-Disk Backup Systems

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Data</strong> <strong>De</strong>-<strong>duplication</strong> <strong>and</strong><br />

<strong>Disk</strong>-<strong>to</strong>-<strong>Disk</strong> <strong>Backup</strong> <strong>Systems</strong><br />

Technical <strong>and</strong> Business Considerations<br />

By Tony Asaro <strong>and</strong> Heidi Biggar<br />

July 2007<br />

Copyright ©2007. The Enterprise Strategy Group, Inc. All Rights Reserved.


Table of Contents<br />

ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

Introduction...........................................................................................................................................................2<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>De</strong>fined ................................................................................................................................4<br />

The Business Value of <strong>Data</strong> <strong>De</strong>-Duplication.........................................................................................................9<br />

ESG’s View ........................................................................................................................................................ 14<br />

All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources the<br />

Enterprise Strategy Group (ESG) considers <strong>to</strong> be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are<br />

subject <strong>to</strong> change from time <strong>to</strong> time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this<br />

publication, in whole or in part, whether in hard-copy format, electronically, or otherwise <strong>to</strong> persons not authorized <strong>to</strong> receive it, without the express<br />

consent of the Enterprise Strategy Group, Inc., is in violation of U.S. copyright law <strong>and</strong> will be subject <strong>to</strong> an action for civil damages <strong>and</strong>, if applicable,<br />

criminal prosecution. Should you have any questions, please contact ESG Client Relations at (508) 482-0188.<br />

Enterprise Strategy Group Page 1


Introduction<br />

ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

<strong>Disk</strong>-<strong>to</strong>-disk (D2D) backup, combined with data de-<strong>duplication</strong>, is an emerging category within the data<br />

protection ecosystem that ESG believes has the potential <strong>to</strong> change the entire l<strong>and</strong>scape. D2D backup with<br />

data de-<strong>duplication</strong> solutions minimizes the disk <strong>and</strong>/or the b<strong>and</strong>width capacities required <strong>to</strong> s<strong>to</strong>re <strong>and</strong><br />

move data used for protection purposes.<br />

<strong>Data</strong> de-<strong>duplication</strong> solutions optimize physical s<strong>to</strong>rage <strong>and</strong> b<strong>and</strong>width by using less of each <strong>to</strong> protect your<br />

data. Why use less? Perhaps the first <strong>and</strong> most obvious answer is <strong>to</strong> reduce cost. By reducing capacity<br />

requirements, fewer disks are needed <strong>to</strong> s<strong>to</strong>re the same amount of effective data. This translates <strong>to</strong> less<br />

b<strong>and</strong>width being required <strong>to</strong> move <strong>and</strong> copy that data across the WAN. Beyond these cost reductions,<br />

there is perhaps an even more important reason <strong>to</strong> employ data de-<strong>duplication</strong>. By reducing the amount of<br />

s<strong>to</strong>rage <strong>and</strong> b<strong>and</strong>width required <strong>to</strong> protect data locally <strong>and</strong> remotely, organizations can significantly<br />

improve their levels of data protection <strong>and</strong> their ability <strong>to</strong> recover data quickly, reliably <strong>and</strong> cost effectively.<br />

Reducing the cost of the s<strong>to</strong>rage required for backup data in turn enables greater data protection <strong>and</strong><br />

recoverability.<br />

For years, there has been a considerable disparity between the prices of tape <strong>and</strong> disk-based s<strong>to</strong>rage<br />

systems. As such, it was an economic “no-brainer” <strong>to</strong> s<strong>to</strong>re backups on tape. In fact, the cost delta<br />

between tape <strong>and</strong> disk was so dramatic that despite the inherent weaknesses of tape—which include<br />

complexity, unreliability <strong>and</strong> slow performance—it is still the preferred media for s<strong>to</strong>ring backup data <strong>to</strong>day.<br />

Figure One: <strong>Disk</strong>-<strong>to</strong>-<strong>Disk</strong> Adoption US-based Adoption Rates<br />

Source: ESG Research September 2006<br />

A major market shift occurred when s<strong>to</strong>rage system vendors began supporting low cost, high-density ATA<br />

drives <strong>and</strong> the cost delta between disk <strong>and</strong> tape started <strong>to</strong> shrink significantly. Although the capital costsavings<br />

still favored tape, the gap narrowed <strong>to</strong> a point where the operational impact of tape including cost<br />

of management, unreliability issues <strong>and</strong> performance had finally moved the value dial from tape <strong>to</strong> disk for<br />

many end-users. The market responded, <strong>and</strong> the disk-<strong>to</strong>-disk (D2D) backup market was born. At first, endusers<br />

performed backups <strong>to</strong> lower cost drives within their existing primary s<strong>to</strong>rage systems <strong>and</strong> this is still a<br />

popular process. Additionally, the development of new purpose-built solutions, such as D2D appliances<br />

Enterprise Strategy Group Page 2


ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

<strong>and</strong> virtual tape libraries (VTL), created an entirely new market category. As shown in Figure One, a recent<br />

survey conducted by ESG found that 64% of all respondents either have or are intending <strong>to</strong> implement a<br />

purpose-built D2D backup solution.<br />

This is a strong validation that these solutions are either replacing or complementing tape libraries. The<br />

reason that end-users are embracing D2D backup solutions include improved backup performance,<br />

eliminating tape media management issues, scalability, ease of management <strong>and</strong> cost. Another—<strong>and</strong><br />

possibly the most important—advancement in D2D backup is data de-<strong>duplication</strong>. Our research found that<br />

33% of all respondents consider data de-<strong>duplication</strong> an important capability in their D2D backup solution.<br />

ESG believes that this is an especially large percentage based on the fact that data de-<strong>duplication</strong> is an<br />

emerging technology still requiring a great deal of education <strong>and</strong> awareness.<br />

<strong>Data</strong> de-<strong>duplication</strong>’s value is even more<br />

compelling above <strong>and</strong> beyond the use of high<br />

density SATA drives within disk-based s<strong>to</strong>rage<br />

systems. End-users employing D2D backup<br />

solutions with data de-<strong>duplication</strong> are<br />

experiencing backup data capacity reductions<br />

of 10, 20 <strong>and</strong> 30 times—possibly even more 1 .<br />

Consider the economic value of this level of<br />

reduction: it not only eliminates any delta<br />

between the capital costs of tape versus disk,<br />

but arguably swings the pendulum <strong>to</strong> the other<br />

side in disk’s favor. Add <strong>to</strong> this the operational<br />

efficiency, rapid <strong>and</strong> reliable recoveries <strong>and</strong> the<br />

Our research found that 33% of all<br />

respondents consider data de-<strong>duplication</strong><br />

an important capability in their D2D backup<br />

solution. ESG believes that this is an<br />

especially large percentage based on the<br />

fact that data de-<strong>duplication</strong> is an emerging<br />

technology still requiring a great deal of<br />

education <strong>and</strong> awareness.<br />

elimination or reduction in tape management enabled by D2D backup solutions <strong>and</strong> you’ve got a<br />

compelling <strong>and</strong> evident value proposition.<br />

<strong>Data</strong> de-<strong>duplication</strong> is a game-changing technology. It enables D2D backup by lowering the overall cost of<br />

these types of solutions. <strong>De</strong>-<strong>duplication</strong> reduces the amount of redundant data that is backed up, which<br />

results in less capacity required <strong>to</strong> s<strong>to</strong>re that data. Additionally, companies can retain more backup data on<br />

disk for longer periods of time, which reduces <strong>and</strong> potentially eliminates the need <strong>to</strong> recover data from<br />

tape. Where replication is supported, data can be more efficiently—<strong>and</strong> cost-effectively—moved between<br />

data sites for disaster recovery. <strong>Data</strong> de-<strong>duplication</strong> offers l<strong>and</strong>scape changing value that is easy <strong>to</strong><br />

quantify, improves reliability, simplifies management <strong>and</strong> provides rapid recovery of data.<br />

1 <strong>Data</strong> de-<strong>duplication</strong> ratios will vary based on the backup data (amount of redundancy <strong>and</strong> data change rates), the backup policy (frequency<br />

of incremental <strong>and</strong> full backups) <strong>and</strong> the data de-<strong>duplication</strong> technology (size of data files/chunks/segments used).<br />

Enterprise Strategy Group Page 3


<strong>Data</strong> <strong>De</strong>-Duplication <strong>De</strong>fined<br />

ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

Though the technology behind it can be quite sophisticated, the concept of data de-<strong>duplication</strong> is simple.<br />

<strong>Data</strong> de-<strong>duplication</strong> is the process of examining data <strong>to</strong> identify any redundancy. In the context of backup<br />

data, we can make a strong supposition that there is a great deal of duplicate data. The same data keeps<br />

getting backed up over <strong>and</strong> over again, consuming more s<strong>to</strong>rage space <strong>and</strong> impacting cost, thereby<br />

creating a chain of inefficiency.<br />

The following example, though simple, illustrates the potential power of de-<strong>duplication</strong>:<br />

Let’s say that a 2 MB image has been embedded in a Word document <strong>and</strong> e-mailed <strong>to</strong> dozens of people.<br />

Ten of the people who receive that document take that image <strong>and</strong> embed it in other documents. In fact, the<br />

image is proliferated throughout the organization <strong>to</strong> the point where the image has been embedded in 200<br />

other different documents. This creates 400 MB of additional capacity. With data de-<strong>duplication</strong>, only one<br />

copy of the image is s<strong>to</strong>red, saving 400 MB that would otherwise be consumed.<br />

Now consider a 400 MB file that has been sent <strong>to</strong> multiple users. There might be ten full copies of weekly<br />

backups of that400 MB that results in 4,000 MB (4 GB) of consumed s<strong>to</strong>rage. Reducing this <strong>to</strong> just the one<br />

unique copy is significant since otherwise 4 GB would be required <strong>to</strong> back up that same file multiple times.<br />

Another example of data-de<strong>duplication</strong>’s value—this time at the file level—involves a PowerPoint presentation<br />

attached <strong>to</strong> an e-mail. If the e-mail is sent <strong>to</strong> multiple recipients <strong>and</strong> then forwarded <strong>to</strong> yet another set of<br />

recipients, data de-<strong>duplication</strong> technology can be used <strong>to</strong> s<strong>to</strong>re the presentation only once. Next, consider what<br />

happens when one of the e-mail recipients modifies a slide in the presentation <strong>and</strong> again forwards it <strong>to</strong> a group<br />

of colleagues. Advanced data de-<strong>duplication</strong> algorithms work at the sub-file level <strong>and</strong> can be used <strong>to</strong> s<strong>to</strong>re only<br />

the data associated with the changed slide.<br />

The latter two examples include block or sub-block data de-<strong>duplication</strong>. This method works much like file<br />

level de-<strong>duplication</strong>, but identifies common data in “chunks” or ”blocks” that are less than a file in size. This<br />

method is typically implemented in purpose-built solutions that are dedicated <strong>to</strong> finding <strong>and</strong> eliminating duplicate<br />

data within a file.<br />

What does all this mean in real-life terms? Through h<strong>and</strong>s-on testing, ESG has found that data de<strong>duplication</strong><br />

technologies can provide 10 times, 20 times, 30 times, <strong>and</strong> even greater reduction in capacity<br />

needed for backup 2 . 3 This means that companies can s<strong>to</strong>re 10 TB <strong>to</strong> 30 TB of backup data on 1 TB of<br />

physical disk capacity, which has potentially tremendous economic benefits. For one thing, it potentially<br />

eliminates any delta between the capital costs of tape versus disk, making disk s<strong>to</strong>rage a more viable<br />

option. Fac<strong>to</strong>r in the operational efficiencies of not having <strong>to</strong> move, s<strong>to</strong>re <strong>and</strong> manage redundant data<br />

thanks <strong>to</strong> de-<strong>duplication</strong> <strong>and</strong> the elimination or reduction of tape management provided by D2D backup<br />

solutions, <strong>and</strong> users can extract real value from de-duplicated D2D backup.<br />

2 ESG has seen data de-<strong>duplication</strong> ratios range from 4:1 <strong>to</strong> 89:1—so mileage will vary.<br />

3 See Appendix for a list of ESG related reports.<br />

Enterprise Strategy Group Page 4


Redundant <strong>Data</strong><br />

<strong>Data</strong> <strong>De</strong>-Duplication<br />

Engine<br />

Figure Two: <strong>Data</strong> <strong>De</strong>-Duplication<br />

ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

<strong>Data</strong> de-<strong>duplication</strong> ratios will vary based on the types of data involved <strong>and</strong> the frequency of full backups<br />

<strong>and</strong> retention. As a rule of thumb, ESG believes a 20:1 ratio—when combined with data compression—<strong>to</strong> be<br />

broadly achievable. Though ESG has seen data de-<strong>duplication</strong> ratios of 89:1 <strong>and</strong> there is potential for<br />

even greater reductions, do not feel disappointed if you do not achieve 20:1 or greater, since reductions of<br />

5:1 or more are still extremely valuable.<br />

Technology Considerations<br />

There is a great deal of buzz in the market around data de-<strong>duplication</strong> <strong>and</strong> it is only going <strong>to</strong> get louder. As<br />

a result, there will be lots of confusion <strong>and</strong> convoluted messaging slung around regarding this <strong>to</strong>pic. Keep<br />

in mind that in the high tech industry, we often use the same term <strong>to</strong> mean different things <strong>and</strong> different<br />

terms <strong>to</strong> mean the same thing. In the interest of clarity, ESG has provided a set of questions aimed at<br />

providing some guidance <strong>to</strong> end-users interested in evaluating <strong>and</strong> implementing D2D backup solutions<br />

that support data de-<strong>duplication</strong>.<br />

Technology Questions <strong>to</strong> Ask:<br />

Unique<br />

<strong>Data</strong><br />

1. What type of data de-<strong>duplication</strong> ratio can I expect?<br />

A number of the vendors that support de-<strong>duplication</strong> provide high-level numbers. Dig deeper. The<br />

actual amount of data reduction an organization can expect <strong>to</strong> see can vary significantly depending on<br />

the type of data being backed up, retention periods, the frequency of full backups <strong>and</strong> the data de<strong>duplication</strong><br />

technology. Provide potential vendors with information about your environment, backup<br />

process, applications, retention SLAs <strong>and</strong> data types <strong>to</strong> better determine what <strong>to</strong> expect.<br />

2. How will data de-<strong>duplication</strong> affect my backup <strong>and</strong> res<strong>to</strong>re performance?<br />

<strong>Data</strong> de-<strong>duplication</strong> is a resource-intensive process. It needs <strong>to</strong> determine whether some new small<br />

sequence has been s<strong>to</strong>red before, often across hundreds of prior terabytes of data. A simple index of<br />

this information is <strong>to</strong>o big <strong>to</strong> fit in RAM, unless it is a very small deployment. So it needs <strong>to</strong> seek on<br />

Enterprise Strategy Group Page 5


ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

disk, <strong>and</strong> disk seeks are no<strong>to</strong>riously slow (<strong>and</strong> not getting better). The following questions allow you <strong>to</strong><br />

dig deeper with regard <strong>to</strong> performance:<br />

What is the single stream backup <strong>and</strong> res<strong>to</strong>re throughput? This is how fast a given file/DB can be<br />

backed up, res<strong>to</strong>red or copied <strong>to</strong> tape for archiving. The numbers may be different—read speed <strong>and</strong><br />

write speed may have separate issues. Because of backup windows for critical data, backup<br />

throughput is what most people ask about, though res<strong>to</strong>re time is more significant for most SLAs.<br />

LTO4 tapes need <strong>to</strong> receive data at >60 MB/sec. or they will operate well below their rated speed for<br />

streaming, so res<strong>to</strong>re stream speed matters significantly if tape will stay in your plans.<br />

What is the aggregate backup/res<strong>to</strong>re throughput per system? With many streams, how fast can a<br />

given controller perform? This will help gauge the number of controllers/systems needed for your<br />

deployment. It is mostly a measure of system management (number of systems) <strong>and</strong> cost—single<br />

stream speed is more important for getting the job done.<br />

Is the 30 th backup different from the 1 st ? If you backup images <strong>and</strong> delete them over time, does the<br />

performance of the system change? Since de-<strong>duplication</strong> uses so many references around the s<strong>to</strong>re<br />

for new documents, do the recovery characteristics for a recent backup (what you’ll mostly be<br />

recovering) a month or two in<strong>to</strong> deployment change vs. the first pilot? Talk <strong>to</strong> existing users of the<br />

vendor <strong>to</strong> find what others have seen.<br />

In many cases, performance in your deployment will depend on many fac<strong>to</strong>rs, including the backup<br />

software <strong>and</strong> the systems <strong>and</strong> networks supporting it. Underst<strong>and</strong> your current performance <strong>and</strong><br />

bottlenecks before challenging the vendor of a particular component <strong>to</strong> fix it all.<br />

3. Is the data de-<strong>duplication</strong> in-line or post process?<br />

As with any new technology, there is a lot of confusion in the industry about the differences between inline<br />

<strong>and</strong> post processing approaches, as well as abuse/misuse of the terms being used <strong>to</strong> differentiate<br />

the two. In ESG’s view, it comes down <strong>to</strong> one simple “yes or no” question: When the backup data is<br />

written <strong>to</strong> disk, is the data de-duplicated or not? If the answer is yes, then it has been de-duplicated inline.<br />

If at that point the answer is no, then the de-<strong>duplication</strong> is done post-process.<br />

What is the significance of one approach versus the other? The two areas that you need <strong>to</strong> research<br />

are the impact <strong>to</strong> performance for the in-line approach <strong>and</strong> capacity issues for the post-process<br />

approach. Underst<strong>and</strong> the trade-offs for each approach based on the vendor’s specific solutions.<br />

Since in-line de-<strong>duplication</strong> is an intelligent process performed during the backup process, there can<br />

be some performance degradation during data ingest. However, performance impact depends on a<br />

number of variables, including the de-<strong>duplication</strong> technology itself, the size of the backup volume, the<br />

granularity of the de-<strong>duplication</strong> process, the aggregate throughput of the architecture <strong>and</strong> the<br />

scalability of the solution.<br />

Post-process data de-<strong>duplication</strong> does require more disk capacity <strong>to</strong> be allocated upfront. But the size<br />

of the “capacity reserve” needed depends on a number of variables, including the amount of backup<br />

data <strong>and</strong> how long the de-<strong>duplication</strong> technology “holds” on<strong>to</strong> the capacity before releasing it.<br />

Some post-process de-<strong>duplication</strong> technologies wait for the entire backup job <strong>to</strong> be completed, while<br />

others start de-<strong>duplication</strong> as backup data is s<strong>to</strong>red.<br />

Solutions that wait for the backup process <strong>to</strong> complete before de-<strong>duplication</strong> of the data have a greater<br />

initial “capacity overhead” than solutions that start the de-<strong>duplication</strong> process earlier. These solutions<br />

have <strong>to</strong> allocate enough capacity <strong>to</strong> s<strong>to</strong>re the entire backup volume. The capacity is released when the<br />

backup job is complete <strong>and</strong> re-allocated before the next backup job begins. Beginning the de-<br />

Enterprise Strategy Group Page 6


ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

<strong>duplication</strong> process immediately after data is written <strong>to</strong> the backup target may shorten the length of the<br />

de-<strong>duplication</strong> process, but will also affect the data ingest rate.<br />

4. Does the D2D backup solution support de-duplicated remote replication?<br />

End-users may wish <strong>to</strong> implement remote replication for disaster recovery, remote backup <strong>and</strong> remote<br />

vaulting. The combination of data de-<strong>duplication</strong> <strong>and</strong> remote replication offers a great deal of value by<br />

reducing b<strong>and</strong>width as dramatically as it reduces s<strong>to</strong>rage capacity. In many cases, data de-<strong>duplication</strong><br />

can enable disaster recovery <strong>and</strong> remote backup in terms of meeting backup windows <strong>and</strong> budget<br />

constraints.<br />

Some data de-<strong>duplication</strong> solutions support multi-site remote replication. If this is a requirement, find<br />

out if the vendors you are considering support multi-site. The next question <strong>to</strong> ask is whether they<br />

support data de-<strong>duplication</strong> across the entire environment. For example, if there is duplicate data in<br />

five different sites, will it s<strong>to</strong>re only one copy at the target site? And will it make sure that no duplicate<br />

data will be sent over the WAN? Supporting multi-site data de-<strong>duplication</strong> raises the level of capacity<br />

reduction efficiency.<br />

5. Will de-<strong>duplication</strong> processes affect my disaster recovery windows?<br />

While much of the focus in the industry around data de-<strong>duplication</strong> is focused on reduction ratios <strong>and</strong><br />

performance capacity trade-offs, it is also important <strong>to</strong> consider the effect data de-<strong>duplication</strong> may have<br />

on disaster recovery windows. This refers <strong>to</strong> the time it takes from the start of the backup process <strong>to</strong> the<br />

point where DR copies are made <strong>and</strong> moved off-site. The length of this process— from start <strong>to</strong> finish—<br />

depends on a number of variables, including the data de-<strong>duplication</strong> approach, the speed of the de<strong>duplication</strong><br />

architecture, the DR process (is the data being written/exported <strong>to</strong> tape or de-duplicated<br />

<strong>and</strong> then replicated over a WAN <strong>to</strong> a remote facility), etc. It is important <strong>to</strong> consider the lag time from<br />

initiation of backup <strong>to</strong> when the image is complete at the DR site. If this timeframe is greater than 24<br />

hours, that image may miss <strong>to</strong>o much new data <strong>to</strong> meet the DR objectives of your deployment. Make<br />

sure you are meeting the Recovery Point Objectives (RPO) you have in mind for DR.<br />

6. Is the D2D solution easy <strong>to</strong> implement?<br />

One of the compelling things about data de-<strong>duplication</strong> is that it is easy. Or at least it should be. Users<br />

shouldn’t have <strong>to</strong> perform cumbersome, complex <strong>and</strong> time consuming tasks <strong>to</strong> get up <strong>and</strong> running in<br />

order <strong>to</strong> derive value from their solutions. <strong>Data</strong> de-<strong>duplication</strong> should be invisible <strong>to</strong> the backup <strong>and</strong><br />

recovery process.<br />

Purpose-built D2D backup appliances also offer a level of transparency, acting as a <strong>to</strong>tal solution that<br />

doesn’t require disk s<strong>to</strong>rage management functions. VTL solutions are typically associated with ease<br />

of implementation, since they emulate tape libraries. Certainly, they provide a great deal of value<br />

compared <strong>to</strong> just using a general purpose s<strong>to</strong>rage system, which often is complex <strong>to</strong> manage <strong>and</strong><br />

requires s<strong>to</strong>rage experts. The backup administra<strong>to</strong>r doesn’t want <strong>to</strong> manage RAID groups, LUNs <strong>and</strong><br />

volumes.<br />

7. What is the system impact of performance?<br />

More controllers? More disk? What ratio of each? This matters for cost <strong>and</strong> system management<br />

reasons.<br />

8. How does the D2D backup solution protect itself from data loss <strong>and</strong> corruption?<br />

It is important <strong>to</strong> underst<strong>and</strong> how “bullet proof” the D2D backup solution is. Find out what technologies<br />

it has <strong>to</strong> ensure data integrity <strong>and</strong> protection against system failures. While this is always important, it<br />

is an even bigger consideration in data protection with de-<strong>duplication</strong>. With D2D backup solutions that<br />

Enterprise Strategy Group Page 7


ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

support data de-<strong>duplication</strong>, there may be 1,000 backup images that rely on one copy of source data.<br />

Therefore, it becomes even more important that this source data needs <strong>to</strong> be kept accessible <strong>and</strong> with<br />

a high level of data integrity.<br />

9. How scalable is the D2D backup solution?<br />

You need <strong>to</strong> size your environment in terms of capacity <strong>and</strong> performance for <strong>to</strong>day, with considerations<br />

of the future.<br />

10. Does the solution provide flexible application support?<br />

How many backup applications are supported? Can non-backup applications work, <strong>to</strong>o? More<br />

flexibility means more consolidation is possible using less physical infrastructure.<br />

11. What are the other features <strong>and</strong> capabilities of your solution?<br />

<strong>Data</strong> de-<strong>duplication</strong> is a valuable technology, but it is not the only consideration. You must also<br />

evaluate the other important features <strong>and</strong> capabilities of the solution <strong>to</strong> see if it meets with your needs.<br />

Enterprise Strategy Group Page 8


ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

The Business Value of <strong>Data</strong> <strong>De</strong>-Duplication<br />

One of the greatest qualities of data de-<strong>duplication</strong> is that its value is easy <strong>to</strong> quantify. If you can reduce<br />

the amount of capacity <strong>to</strong> s<strong>to</strong>re backup data by 10:1, 20:1 or greater, you can get your calcula<strong>to</strong>r out <strong>and</strong><br />

put a dollar amount <strong>to</strong> the cost savings. While the cost savings may be significant enough for many <strong>to</strong><br />

move forward, there are other cost benefits that make the case for de-<strong>duplication</strong> even more compelling.<br />

Hard Dollars<br />

The hard dollar costs are easy <strong>to</strong> determine. The first metric is the reduced capital cost of the D2D backup<br />

solution with <strong>and</strong> without data de-<strong>duplication</strong>. The D2D backup solution will require more actual capacity <strong>to</strong><br />

s<strong>to</strong>re backups—in some cases, 20 or more times what the data de-<strong>duplication</strong>-enabled solution needs.<br />

There are other capital cost savings as well. D2D backup solutions can reduce the amount of tape<br />

infrastructure you acquire. Some end-users have <strong>to</strong>tally eliminated tape, while others have reduced the<br />

number of tape libraries they maintain.<br />

If you want <strong>to</strong> perform remote replication between your primary <strong>and</strong> remote sites, then data de-<strong>duplication</strong><br />

can significantly reduce your WAN b<strong>and</strong>width costs. You can effectively replicate data over long distances<br />

with far less b<strong>and</strong>width. Since WAN b<strong>and</strong>width is still expensive <strong>and</strong> a recurring cost, data de-<strong>duplication</strong><br />

can significantly improve the economics of implemented remote backup <strong>and</strong> disaster recovery.<br />

Figure Three: Remote Replication <strong>and</strong> <strong>Data</strong> <strong>De</strong>-Duplication<br />

It is also important <strong>to</strong> consider facility costs, which include power <strong>and</strong> cooling as well as floor space. Since<br />

you are using fewer disks, you are creating less heat <strong>and</strong> drawing less power. Again, at a 20:1 capacity<br />

reduction ratio, this can be a significant savings. In some cases, there are data centers that just can’t use<br />

up any more power—they are at or near their maximum limits. These companies should certainly evaluate<br />

data de-<strong>duplication</strong>-enabled solutions. .<br />

Additionally, floor space is at a premium. <strong>Data</strong> de-<strong>duplication</strong>-enabled solutions can reduce vertical growth<br />

by minimizing the amount of shelf space needed <strong>to</strong> s<strong>to</strong>re backup data. As discussed previously, you can<br />

Enterprise Strategy Group Page 9


ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

eliminate some or all of your tape libraries by moving <strong>to</strong> disk, which will free up floor space (<strong>and</strong> reduce<br />

power <strong>and</strong> cooling costs).<br />

<strong>Data</strong> de-<strong>duplication</strong> enables you <strong>to</strong> use less capacity <strong>to</strong> s<strong>to</strong>re backup data, but it also reduces the amount<br />

of processing power, b<strong>and</strong>width <strong>and</strong> memory per GB. This impacts all of the aforementioned fac<strong>to</strong>rs that<br />

make data de-<strong>duplication</strong>-enabled solutions easy <strong>to</strong> cost justify.<br />

Value of Increased Retention<br />

ESG has found that the majority of end-users still use tape backup as their main method of disaster<br />

recovery. However, we’ve also found that end-users consider the process of recovering data from tape <strong>to</strong><br />

be slow, complex <strong>and</strong> unreliable. These two realities are clearly at odds with one another.<br />

Recovering a single file from tape can take several minutes, whereas recovering data from disk is<br />

instantaneous. Multiply this by dozens, hundreds <strong>and</strong> even thous<strong>and</strong>s of files <strong>and</strong> the performance<br />

difference can be several hours, days <strong>and</strong> even weeks. Consider database tables that span multiple tapes<br />

<strong>and</strong> the process of trying <strong>to</strong> recover this information quickly. Consider the process of tape interleaving,<br />

which improves tape backup performance, but impacts res<strong>to</strong>re performance because server data is spread<br />

r<strong>and</strong>omly across the tapes. Additionally, recovery performance is greatly impacted by tape availability—<br />

whether it is within the library or offsite in a box somewhere far away.<br />

The fact that end-users are unsure whether they can actually recover 100% of their data from tape is<br />

another harsh reality. The very purpose of backing up your data is so you can recover if needed. Backing<br />

up data on<strong>to</strong> disk resolves recovery <strong>and</strong> reliability issues. <strong>Data</strong> de-<strong>duplication</strong> increases the amount of<br />

backup data that you can retain <strong>and</strong> extends the retention period. In effect, by using data de-<strong>duplication</strong>enabled<br />

D2D backup solutions, you can eliminate the need for ever having <strong>to</strong> res<strong>to</strong>re from tape again.<br />

That should be the objective of every IT organization—removing the slow, error prone, high <strong>to</strong>uch tape<br />

processes <strong>and</strong> replace them with modern solutions that provide you with fast, reliable <strong>and</strong> au<strong>to</strong>mated<br />

protection.<br />

In effect, data de-<strong>duplication</strong>-enabled D2D backup solutions eliminate the risks <strong>and</strong> inefficiencies of<br />

recovering from tape. Further, data de-<strong>duplication</strong>-enabled D2D backup provides a solution that meets the<br />

true needs of your data recovery requirements without the compromises you’ve come <strong>to</strong> accept with tape.<br />

The cost impacts of being able <strong>to</strong> rapidly <strong>and</strong> reliably recover data is harder <strong>to</strong> quantify than capital cost<br />

savings, but the implications range from inconvenience <strong>to</strong> complete data loss. Perhaps one of your<br />

employees had <strong>to</strong> wait a few hours <strong>to</strong> recover a lost file they were working on. If that file is unrecoverable,<br />

the cost goes up. In addition, what if that file contained valuable intellectual property that will be extremely<br />

difficult <strong>to</strong> recreate? What if there was specific litigation or an audit that required information within that<br />

file? Perhaps there was important information contained within that document that impacted a major<br />

business transaction or valuable research. These are the considerations that must be weighed against<br />

using outdated <strong>and</strong> archaic forms of data protection, especially when there are best-in-class D2D backup<br />

solutions in existence that address these issues without breaking your budget.<br />

Value of Operational Efficiencies<br />

If you just minimize <strong>and</strong> potentially remove the manual tasks of managing tape, there is an immediate<br />

positive impact on productivity. This may be manifested in time saving that were previously dedicated <strong>to</strong><br />

the day-<strong>to</strong>-day issues of managing the tape rotation process as well as any frantic scrambles <strong>to</strong> recover<br />

data in an emergency.<br />

D2D backup solutions are often used as a complement <strong>to</strong> tape. Companies often reduce the<br />

number/frequency of backups they perform <strong>to</strong> tape <strong>to</strong> once a week or even once a month, while daily<br />

backups are sent <strong>to</strong> disk.<br />

Enterprise Strategy Group Page 10


ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

In some cases, D2D backup solutions may replace tape systems. ESG has found that a growing number<br />

of companies are actually considering this. Much of this will be contingent on best practices <strong>and</strong><br />

governance of the company or organization. In some cases, removing tape systems is not an option based<br />

on regulations. However, for those companies that are not encumbered by these issues, tape removal is<br />

very attractive. The question arises—how do I protect my data from a major site disaster? Right now, if you<br />

are shipping tapes offsite as your main DR process, then you may need <strong>to</strong> consider implementing remote<br />

replication.<br />

Many D2D backup solutions support remote replication <strong>to</strong> another system at a remote site. As discussed<br />

previously, data de-<strong>duplication</strong>-enabled solutions can do this quickly <strong>and</strong> cost effectively. Recovering data<br />

from disk is instant, versus recovering from tape. The time <strong>to</strong> recovery is what is really important.<br />

Typically, data recovery is an urgent issue. It could take hours, days or even weeks <strong>to</strong> recover data fully<br />

from tape. This is becoming increasingly unacceptable <strong>and</strong> thanks <strong>to</strong> current technology developments, it<br />

is actually unnecessary <strong>to</strong> <strong>to</strong>lerate it. <strong>Data</strong> de-<strong>duplication</strong>-enabled D2D backup solutions allow end-users<br />

<strong>to</strong> retain data for longer periods of time, reducing <strong>and</strong> potentially eliminating the need <strong>to</strong> ever recover data<br />

from tape again. The end result is faster <strong>and</strong> more reliable recoveries of data.<br />

<strong>Data</strong> de-<strong>duplication</strong>-enabled solutions provide easier management than D2D backup solutions that don’t<br />

provide capacity optimization. Since traditional D2D backup solutions require more capacity, the process<br />

of managing those systems is inherently more complex. Capacity utilization will have <strong>to</strong> be moni<strong>to</strong>red more<br />

often, backup data will need <strong>to</strong> be removed or new capacity added more frequently <strong>and</strong> you will still need <strong>to</strong><br />

rely heavily on tape for recoveries.<br />

In many cases, operational costs outweigh capital costs. More importantly, there are always more projects<br />

that need IT personnel’s attention. By removing the mundane <strong>and</strong> time consuming process of managing<br />

tape, your team can focus on more important pursuits that help the business.<br />

Time <strong>to</strong> Protection<br />

Time <strong>to</strong> Protection is important since it impacts how quickly you can get your data protected. A key value<br />

of data de-<strong>duplication</strong> is that it is easy. End-users don’t have <strong>to</strong> perform Herculean tasks <strong>to</strong> get the value<br />

out of data de-<strong>duplication</strong>-enabled solutions. <strong>Data</strong> de-<strong>duplication</strong> should be invisible <strong>to</strong> the backup <strong>and</strong><br />

recovery process. If it isn’t, then you need <strong>to</strong> re-evaluate the data de-<strong>duplication</strong>-enabled solution <strong>and</strong><br />

consider another avenue.<br />

One of the big advantages of data de-<strong>duplication</strong>-enabled solutions is the ability <strong>to</strong> replicate data over less<br />

b<strong>and</strong>width. This not only reduces cost, but also allows you <strong>to</strong> transfer <strong>and</strong> protect data much more quickly.<br />

If you had <strong>to</strong> send all of your backup data<br />

over the WAN, it could take several hours or<br />

even days. However, with data de<strong>duplication</strong>-enabled<br />

solutions, the process<br />

should be several times faster. Thus, Time<br />

<strong>to</strong> Protection is more rapid <strong>and</strong> the safety of<br />

replicated data is guaranteed more quickly<br />

than non-data de-<strong>duplication</strong> approaches.<br />

It is important <strong>to</strong> consider that it isn’t just an<br />

issue of time <strong>and</strong> how quickly you can<br />

protect data, but data de-<strong>duplication</strong>enabled<br />

solutions can actually enable a<br />

level of data protection that isn’t otherwise<br />

practical. ESG spoke with an end-user that<br />

<strong>Data</strong> <strong>De</strong>-Duplication ROI Analysis<br />

<strong>Disk</strong> <strong>and</strong> Tape Cost Reduction<br />

Reduced B<strong>and</strong>width Requirements<br />

Lower Power <strong>and</strong> Cooling Consumption<br />

Smaller Floor Space Footprint<br />

Reliable <strong>Data</strong> Recoveries<br />

Fast Recovery of <strong>Data</strong><br />

Lower Operational Cost – Less Media H<strong>and</strong>ling<br />

Time <strong>to</strong> Protection<br />

Lower Total Cost of Recovery<br />

implemented remote backup from Bos<strong>to</strong>n <strong>to</strong> Los Angeles using a data de-<strong>duplication</strong>-enabled solution. He<br />

said that without data de-<strong>duplication</strong> performing remote backups, these long distance backups would be<br />

<strong>to</strong>o costly <strong>and</strong> require <strong>to</strong>o much time <strong>to</strong> perform.<br />

Enterprise Strategy Group Page 11


ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

Total Cost of Recovery<br />

When you add all of these elements <strong>to</strong>gether, the cost of recovery using tape or traditional D2D backup<br />

solutions compared <strong>to</strong> data de-<strong>duplication</strong>-enabled D2D solutions is as about a “no-brainer” situation as<br />

you can get in the data center. As a summary, the following are the cost saving data de-<strong>duplication</strong>enabled<br />

elements:<br />

• Reduced disk capacity for data protection<br />

• Potentially fewer D2D backup s<strong>to</strong>rage systems over time<br />

• Fewer tapes or potential elimination of tapes<br />

• Fewer tape libraries or potential elimination of tape libraries<br />

• Reduced power <strong>and</strong> cooling costs<br />

• More available floor space based on fewer D2D backup <strong>and</strong> tape systems<br />

• Reduced WAN b<strong>and</strong>width costs<br />

• More reliable data recoveries<br />

• Faster data recoveries<br />

• Less people hours managing tape <strong>and</strong> disk administration<br />

• All of the above for each site<br />

The Total Cost of Recovery (TCR) for data de-<strong>duplication</strong>-enabled D2D backup solutions is clearly far less<br />

than tape or D2D backup solutions that do not support capacity optimization.<br />

Business Questions <strong>to</strong> Ask:<br />

1. How many cus<strong>to</strong>mers do you have using your product in production environments <strong>to</strong>day?<br />

The number of cus<strong>to</strong>mers is important <strong>to</strong> underst<strong>and</strong> whether the market is adopting it or not. From a<br />

product value perspective, quantity is quality. If there are only 10 cus<strong>to</strong>mers <strong>and</strong> they have been in the<br />

market for 5 years, this is a red flag. If they have hundreds or thous<strong>and</strong>s of cus<strong>to</strong>mers, then you have<br />

market validation. Again, for newer solutions there will be fewer implementations. That is why it’s<br />

important <strong>to</strong> get cus<strong>to</strong>mer references.<br />

2. Can you provide us a cost saving analysis from companies similar <strong>to</strong> ours? Please include capital,<br />

operational <strong>and</strong> facilities cost savings.<br />

Vendors often talk about value, but hardly ever show you real numbers. <strong>Data</strong> de-<strong>duplication</strong> is easy <strong>to</strong><br />

quantify, so ask the vendors <strong>to</strong> provide you with real data. This will help you better underst<strong>and</strong> what<br />

cost savings you might obtain by using their products. Having more than one data point is important as<br />

well, since there are multiple variables <strong>to</strong> consider.<br />

3. Can you provide us with some existing cus<strong>to</strong>mers that we can talk <strong>to</strong> about working with you <strong>and</strong> your<br />

products?<br />

Talking <strong>to</strong> other users is always valuable. They can give you insight as <strong>to</strong> what <strong>to</strong> expect when you<br />

deploy a vendor’s solution. Of course they will be a happy cus<strong>to</strong>mer, but they will still share their real<br />

life perspective with you.<br />

4. How disruptive will your product be <strong>to</strong> our environment?<br />

Implementing a new solution that provides real value <strong>to</strong> your company is always desirable, but at what<br />

cost? You need <strong>to</strong> underst<strong>and</strong> if this new innovative solution will be overly disruptive <strong>to</strong> your<br />

environment.<br />

Enterprise Strategy Group Page 12


5. How many hours a week does it take <strong>to</strong> support your solution?<br />

ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

If the solution is complex <strong>and</strong> requires a great deal of manual management, then you need <strong>to</strong> consider<br />

whether you have the resources <strong>to</strong> support it. On the other h<strong>and</strong>, the solution may require little<br />

management, but it’s important <strong>to</strong> find out. Additionally, ask for this data based on what their current<br />

cus<strong>to</strong>mers are experiencing. Also ask about training—is it required or recommended? If the answer is<br />

yes, then that is a red flag. If the product is so easy, why do we need training?<br />

6. What else does the vendor have <strong>to</strong> offer?<br />

Vendor selection should play a role in the decision-making process. It is important <strong>to</strong> underst<strong>and</strong> the<br />

vendor’s business success <strong>and</strong> long term viability, their support capability, how well they communicate<br />

with you <strong>and</strong> what other services or products they could offer <strong>to</strong> you <strong>to</strong>day <strong>and</strong> over time. You should<br />

also consider positive existing relationships with the vendor <strong>and</strong>/or system integra<strong>to</strong>r.<br />

Enterprise Strategy Group Page 13


ESG’s View<br />

ESG Report<br />

<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />

Disasters will happen. They can range from a file being lost, <strong>to</strong> a data center being flooded, <strong>to</strong> an entire<br />

building being destroyed. Some of these incidents are common ones, such as file loss or data corruption,<br />

systems <strong>and</strong> infrastructure going down <strong>and</strong> becoming unavailable, disk drives failing, <strong>and</strong> users err<br />

(remember, humans invented human error). Then there are the somewhat common incidents, including<br />

facility disasters involving flooding or fire. Even though the odds are less likely, there may be major natural<br />

disasters <strong>to</strong> contend with including earthquakes, <strong>to</strong>rnadoes, <strong>and</strong> hurricanes. And there have been a few<br />

recent incidents of large geographic blackouts that take hours or even days <strong>to</strong> correct.<br />

The fact that most companies <strong>and</strong> organization still use tape as their primary defense against these events<br />

is troubling. Once, there was an economic rationalization <strong>to</strong> use it, but data de-<strong>duplication</strong>-enabled<br />

solutions invalidate this. There are end-users that are m<strong>and</strong>ated <strong>to</strong> use tape for governance <strong>and</strong> regula<strong>to</strong>ry<br />

reasons, but they can use data de-<strong>duplication</strong>-enabled solutions <strong>to</strong> augment their environments. For those<br />

companies not so encumbered, they should certainly consider data de-<strong>duplication</strong> <strong>to</strong> complement <strong>and</strong> even<br />

replace their tape systems.<br />

<strong>Data</strong> de-<strong>duplication</strong> is a powerful form of virtualization—the ability <strong>to</strong> logically view <strong>and</strong> manage physical<br />

assets for greater utilization <strong>and</strong> au<strong>to</strong>mation of otherwise manual tasks. <strong>Data</strong> de-<strong>duplication</strong> achieves both<br />

of these goals by significantly reducing the amount of capacity required <strong>to</strong> s<strong>to</strong>re backup data—5:1, 10:1,<br />

20:1 <strong>and</strong> beyond. Additionally, data de-<strong>duplication</strong> reduces or even eliminates the need <strong>to</strong> manage tapes.<br />

<strong>De</strong>aling with tape media management is archaic in this digital age. It is analogous <strong>to</strong> someone still<br />

stubbornly h<strong>and</strong> washing the dishes even though he or she has dishwasher right next <strong>to</strong> the sink.<br />

Tape will be around for some time <strong>to</strong> come. There are still governance <strong>and</strong> regula<strong>to</strong>ry m<strong>and</strong>ates that<br />

ensure its survival. Additionally, incumbency often trumps innovation <strong>and</strong> the change management policy<br />

is <strong>to</strong> not change anything. There is still a great deal of education that also needs <strong>to</strong> occur. Not enough<br />

people know about data de-<strong>duplication</strong> or are skeptical about its abilities.<br />

<strong>Data</strong> de-<strong>duplication</strong> is very real <strong>and</strong> provides excellent value. ESG believes that it will become prevalent<br />

over time within D2D backup <strong>and</strong> all s<strong>to</strong>rage <strong>and</strong> application tiers. However, it is important <strong>to</strong> not only<br />

evaluate data de-<strong>duplication</strong> capabilities, but the entire product, cus<strong>to</strong>mer references, market <strong>and</strong> company<br />

success. ESG encourages you <strong>to</strong> ask the questions outlined in this report in order <strong>to</strong> leverage the benefits<br />

that can certainly be derived by data de-<strong>duplication</strong>. <strong>Data</strong> de-<strong>duplication</strong> changes the data protection<br />

l<strong>and</strong>scape <strong>and</strong> is one of the few categories that offers such a clear “no-brainer” value proposition.<br />

Enterprise Strategy Group Page 14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!