Data De-duplication and Disk-to-Disk Backup Systems
Data De-duplication and Disk-to-Disk Backup Systems
Data De-duplication and Disk-to-Disk Backup Systems
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>Data</strong> <strong>De</strong>-<strong>duplication</strong> <strong>and</strong><br />
<strong>Disk</strong>-<strong>to</strong>-<strong>Disk</strong> <strong>Backup</strong> <strong>Systems</strong><br />
Technical <strong>and</strong> Business Considerations<br />
By Tony Asaro <strong>and</strong> Heidi Biggar<br />
July 2007<br />
Copyright ©2007. The Enterprise Strategy Group, Inc. All Rights Reserved.
Table of Contents<br />
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
Introduction...........................................................................................................................................................2<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>De</strong>fined ................................................................................................................................4<br />
The Business Value of <strong>Data</strong> <strong>De</strong>-Duplication.........................................................................................................9<br />
ESG’s View ........................................................................................................................................................ 14<br />
All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources the<br />
Enterprise Strategy Group (ESG) considers <strong>to</strong> be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are<br />
subject <strong>to</strong> change from time <strong>to</strong> time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this<br />
publication, in whole or in part, whether in hard-copy format, electronically, or otherwise <strong>to</strong> persons not authorized <strong>to</strong> receive it, without the express<br />
consent of the Enterprise Strategy Group, Inc., is in violation of U.S. copyright law <strong>and</strong> will be subject <strong>to</strong> an action for civil damages <strong>and</strong>, if applicable,<br />
criminal prosecution. Should you have any questions, please contact ESG Client Relations at (508) 482-0188.<br />
Enterprise Strategy Group Page 1
Introduction<br />
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
<strong>Disk</strong>-<strong>to</strong>-disk (D2D) backup, combined with data de-<strong>duplication</strong>, is an emerging category within the data<br />
protection ecosystem that ESG believes has the potential <strong>to</strong> change the entire l<strong>and</strong>scape. D2D backup with<br />
data de-<strong>duplication</strong> solutions minimizes the disk <strong>and</strong>/or the b<strong>and</strong>width capacities required <strong>to</strong> s<strong>to</strong>re <strong>and</strong><br />
move data used for protection purposes.<br />
<strong>Data</strong> de-<strong>duplication</strong> solutions optimize physical s<strong>to</strong>rage <strong>and</strong> b<strong>and</strong>width by using less of each <strong>to</strong> protect your<br />
data. Why use less? Perhaps the first <strong>and</strong> most obvious answer is <strong>to</strong> reduce cost. By reducing capacity<br />
requirements, fewer disks are needed <strong>to</strong> s<strong>to</strong>re the same amount of effective data. This translates <strong>to</strong> less<br />
b<strong>and</strong>width being required <strong>to</strong> move <strong>and</strong> copy that data across the WAN. Beyond these cost reductions,<br />
there is perhaps an even more important reason <strong>to</strong> employ data de-<strong>duplication</strong>. By reducing the amount of<br />
s<strong>to</strong>rage <strong>and</strong> b<strong>and</strong>width required <strong>to</strong> protect data locally <strong>and</strong> remotely, organizations can significantly<br />
improve their levels of data protection <strong>and</strong> their ability <strong>to</strong> recover data quickly, reliably <strong>and</strong> cost effectively.<br />
Reducing the cost of the s<strong>to</strong>rage required for backup data in turn enables greater data protection <strong>and</strong><br />
recoverability.<br />
For years, there has been a considerable disparity between the prices of tape <strong>and</strong> disk-based s<strong>to</strong>rage<br />
systems. As such, it was an economic “no-brainer” <strong>to</strong> s<strong>to</strong>re backups on tape. In fact, the cost delta<br />
between tape <strong>and</strong> disk was so dramatic that despite the inherent weaknesses of tape—which include<br />
complexity, unreliability <strong>and</strong> slow performance—it is still the preferred media for s<strong>to</strong>ring backup data <strong>to</strong>day.<br />
Figure One: <strong>Disk</strong>-<strong>to</strong>-<strong>Disk</strong> Adoption US-based Adoption Rates<br />
Source: ESG Research September 2006<br />
A major market shift occurred when s<strong>to</strong>rage system vendors began supporting low cost, high-density ATA<br />
drives <strong>and</strong> the cost delta between disk <strong>and</strong> tape started <strong>to</strong> shrink significantly. Although the capital costsavings<br />
still favored tape, the gap narrowed <strong>to</strong> a point where the operational impact of tape including cost<br />
of management, unreliability issues <strong>and</strong> performance had finally moved the value dial from tape <strong>to</strong> disk for<br />
many end-users. The market responded, <strong>and</strong> the disk-<strong>to</strong>-disk (D2D) backup market was born. At first, endusers<br />
performed backups <strong>to</strong> lower cost drives within their existing primary s<strong>to</strong>rage systems <strong>and</strong> this is still a<br />
popular process. Additionally, the development of new purpose-built solutions, such as D2D appliances<br />
Enterprise Strategy Group Page 2
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
<strong>and</strong> virtual tape libraries (VTL), created an entirely new market category. As shown in Figure One, a recent<br />
survey conducted by ESG found that 64% of all respondents either have or are intending <strong>to</strong> implement a<br />
purpose-built D2D backup solution.<br />
This is a strong validation that these solutions are either replacing or complementing tape libraries. The<br />
reason that end-users are embracing D2D backup solutions include improved backup performance,<br />
eliminating tape media management issues, scalability, ease of management <strong>and</strong> cost. Another—<strong>and</strong><br />
possibly the most important—advancement in D2D backup is data de-<strong>duplication</strong>. Our research found that<br />
33% of all respondents consider data de-<strong>duplication</strong> an important capability in their D2D backup solution.<br />
ESG believes that this is an especially large percentage based on the fact that data de-<strong>duplication</strong> is an<br />
emerging technology still requiring a great deal of education <strong>and</strong> awareness.<br />
<strong>Data</strong> de-<strong>duplication</strong>’s value is even more<br />
compelling above <strong>and</strong> beyond the use of high<br />
density SATA drives within disk-based s<strong>to</strong>rage<br />
systems. End-users employing D2D backup<br />
solutions with data de-<strong>duplication</strong> are<br />
experiencing backup data capacity reductions<br />
of 10, 20 <strong>and</strong> 30 times—possibly even more 1 .<br />
Consider the economic value of this level of<br />
reduction: it not only eliminates any delta<br />
between the capital costs of tape versus disk,<br />
but arguably swings the pendulum <strong>to</strong> the other<br />
side in disk’s favor. Add <strong>to</strong> this the operational<br />
efficiency, rapid <strong>and</strong> reliable recoveries <strong>and</strong> the<br />
Our research found that 33% of all<br />
respondents consider data de-<strong>duplication</strong><br />
an important capability in their D2D backup<br />
solution. ESG believes that this is an<br />
especially large percentage based on the<br />
fact that data de-<strong>duplication</strong> is an emerging<br />
technology still requiring a great deal of<br />
education <strong>and</strong> awareness.<br />
elimination or reduction in tape management enabled by D2D backup solutions <strong>and</strong> you’ve got a<br />
compelling <strong>and</strong> evident value proposition.<br />
<strong>Data</strong> de-<strong>duplication</strong> is a game-changing technology. It enables D2D backup by lowering the overall cost of<br />
these types of solutions. <strong>De</strong>-<strong>duplication</strong> reduces the amount of redundant data that is backed up, which<br />
results in less capacity required <strong>to</strong> s<strong>to</strong>re that data. Additionally, companies can retain more backup data on<br />
disk for longer periods of time, which reduces <strong>and</strong> potentially eliminates the need <strong>to</strong> recover data from<br />
tape. Where replication is supported, data can be more efficiently—<strong>and</strong> cost-effectively—moved between<br />
data sites for disaster recovery. <strong>Data</strong> de-<strong>duplication</strong> offers l<strong>and</strong>scape changing value that is easy <strong>to</strong><br />
quantify, improves reliability, simplifies management <strong>and</strong> provides rapid recovery of data.<br />
1 <strong>Data</strong> de-<strong>duplication</strong> ratios will vary based on the backup data (amount of redundancy <strong>and</strong> data change rates), the backup policy (frequency<br />
of incremental <strong>and</strong> full backups) <strong>and</strong> the data de-<strong>duplication</strong> technology (size of data files/chunks/segments used).<br />
Enterprise Strategy Group Page 3
<strong>Data</strong> <strong>De</strong>-Duplication <strong>De</strong>fined<br />
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
Though the technology behind it can be quite sophisticated, the concept of data de-<strong>duplication</strong> is simple.<br />
<strong>Data</strong> de-<strong>duplication</strong> is the process of examining data <strong>to</strong> identify any redundancy. In the context of backup<br />
data, we can make a strong supposition that there is a great deal of duplicate data. The same data keeps<br />
getting backed up over <strong>and</strong> over again, consuming more s<strong>to</strong>rage space <strong>and</strong> impacting cost, thereby<br />
creating a chain of inefficiency.<br />
The following example, though simple, illustrates the potential power of de-<strong>duplication</strong>:<br />
Let’s say that a 2 MB image has been embedded in a Word document <strong>and</strong> e-mailed <strong>to</strong> dozens of people.<br />
Ten of the people who receive that document take that image <strong>and</strong> embed it in other documents. In fact, the<br />
image is proliferated throughout the organization <strong>to</strong> the point where the image has been embedded in 200<br />
other different documents. This creates 400 MB of additional capacity. With data de-<strong>duplication</strong>, only one<br />
copy of the image is s<strong>to</strong>red, saving 400 MB that would otherwise be consumed.<br />
Now consider a 400 MB file that has been sent <strong>to</strong> multiple users. There might be ten full copies of weekly<br />
backups of that400 MB that results in 4,000 MB (4 GB) of consumed s<strong>to</strong>rage. Reducing this <strong>to</strong> just the one<br />
unique copy is significant since otherwise 4 GB would be required <strong>to</strong> back up that same file multiple times.<br />
Another example of data-de<strong>duplication</strong>’s value—this time at the file level—involves a PowerPoint presentation<br />
attached <strong>to</strong> an e-mail. If the e-mail is sent <strong>to</strong> multiple recipients <strong>and</strong> then forwarded <strong>to</strong> yet another set of<br />
recipients, data de-<strong>duplication</strong> technology can be used <strong>to</strong> s<strong>to</strong>re the presentation only once. Next, consider what<br />
happens when one of the e-mail recipients modifies a slide in the presentation <strong>and</strong> again forwards it <strong>to</strong> a group<br />
of colleagues. Advanced data de-<strong>duplication</strong> algorithms work at the sub-file level <strong>and</strong> can be used <strong>to</strong> s<strong>to</strong>re only<br />
the data associated with the changed slide.<br />
The latter two examples include block or sub-block data de-<strong>duplication</strong>. This method works much like file<br />
level de-<strong>duplication</strong>, but identifies common data in “chunks” or ”blocks” that are less than a file in size. This<br />
method is typically implemented in purpose-built solutions that are dedicated <strong>to</strong> finding <strong>and</strong> eliminating duplicate<br />
data within a file.<br />
What does all this mean in real-life terms? Through h<strong>and</strong>s-on testing, ESG has found that data de<strong>duplication</strong><br />
technologies can provide 10 times, 20 times, 30 times, <strong>and</strong> even greater reduction in capacity<br />
needed for backup 2 . 3 This means that companies can s<strong>to</strong>re 10 TB <strong>to</strong> 30 TB of backup data on 1 TB of<br />
physical disk capacity, which has potentially tremendous economic benefits. For one thing, it potentially<br />
eliminates any delta between the capital costs of tape versus disk, making disk s<strong>to</strong>rage a more viable<br />
option. Fac<strong>to</strong>r in the operational efficiencies of not having <strong>to</strong> move, s<strong>to</strong>re <strong>and</strong> manage redundant data<br />
thanks <strong>to</strong> de-<strong>duplication</strong> <strong>and</strong> the elimination or reduction of tape management provided by D2D backup<br />
solutions, <strong>and</strong> users can extract real value from de-duplicated D2D backup.<br />
2 ESG has seen data de-<strong>duplication</strong> ratios range from 4:1 <strong>to</strong> 89:1—so mileage will vary.<br />
3 See Appendix for a list of ESG related reports.<br />
Enterprise Strategy Group Page 4
Redundant <strong>Data</strong><br />
<strong>Data</strong> <strong>De</strong>-Duplication<br />
Engine<br />
Figure Two: <strong>Data</strong> <strong>De</strong>-Duplication<br />
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
<strong>Data</strong> de-<strong>duplication</strong> ratios will vary based on the types of data involved <strong>and</strong> the frequency of full backups<br />
<strong>and</strong> retention. As a rule of thumb, ESG believes a 20:1 ratio—when combined with data compression—<strong>to</strong> be<br />
broadly achievable. Though ESG has seen data de-<strong>duplication</strong> ratios of 89:1 <strong>and</strong> there is potential for<br />
even greater reductions, do not feel disappointed if you do not achieve 20:1 or greater, since reductions of<br />
5:1 or more are still extremely valuable.<br />
Technology Considerations<br />
There is a great deal of buzz in the market around data de-<strong>duplication</strong> <strong>and</strong> it is only going <strong>to</strong> get louder. As<br />
a result, there will be lots of confusion <strong>and</strong> convoluted messaging slung around regarding this <strong>to</strong>pic. Keep<br />
in mind that in the high tech industry, we often use the same term <strong>to</strong> mean different things <strong>and</strong> different<br />
terms <strong>to</strong> mean the same thing. In the interest of clarity, ESG has provided a set of questions aimed at<br />
providing some guidance <strong>to</strong> end-users interested in evaluating <strong>and</strong> implementing D2D backup solutions<br />
that support data de-<strong>duplication</strong>.<br />
Technology Questions <strong>to</strong> Ask:<br />
Unique<br />
<strong>Data</strong><br />
1. What type of data de-<strong>duplication</strong> ratio can I expect?<br />
A number of the vendors that support de-<strong>duplication</strong> provide high-level numbers. Dig deeper. The<br />
actual amount of data reduction an organization can expect <strong>to</strong> see can vary significantly depending on<br />
the type of data being backed up, retention periods, the frequency of full backups <strong>and</strong> the data de<strong>duplication</strong><br />
technology. Provide potential vendors with information about your environment, backup<br />
process, applications, retention SLAs <strong>and</strong> data types <strong>to</strong> better determine what <strong>to</strong> expect.<br />
2. How will data de-<strong>duplication</strong> affect my backup <strong>and</strong> res<strong>to</strong>re performance?<br />
<strong>Data</strong> de-<strong>duplication</strong> is a resource-intensive process. It needs <strong>to</strong> determine whether some new small<br />
sequence has been s<strong>to</strong>red before, often across hundreds of prior terabytes of data. A simple index of<br />
this information is <strong>to</strong>o big <strong>to</strong> fit in RAM, unless it is a very small deployment. So it needs <strong>to</strong> seek on<br />
Enterprise Strategy Group Page 5
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
disk, <strong>and</strong> disk seeks are no<strong>to</strong>riously slow (<strong>and</strong> not getting better). The following questions allow you <strong>to</strong><br />
dig deeper with regard <strong>to</strong> performance:<br />
What is the single stream backup <strong>and</strong> res<strong>to</strong>re throughput? This is how fast a given file/DB can be<br />
backed up, res<strong>to</strong>red or copied <strong>to</strong> tape for archiving. The numbers may be different—read speed <strong>and</strong><br />
write speed may have separate issues. Because of backup windows for critical data, backup<br />
throughput is what most people ask about, though res<strong>to</strong>re time is more significant for most SLAs.<br />
LTO4 tapes need <strong>to</strong> receive data at >60 MB/sec. or they will operate well below their rated speed for<br />
streaming, so res<strong>to</strong>re stream speed matters significantly if tape will stay in your plans.<br />
What is the aggregate backup/res<strong>to</strong>re throughput per system? With many streams, how fast can a<br />
given controller perform? This will help gauge the number of controllers/systems needed for your<br />
deployment. It is mostly a measure of system management (number of systems) <strong>and</strong> cost—single<br />
stream speed is more important for getting the job done.<br />
Is the 30 th backup different from the 1 st ? If you backup images <strong>and</strong> delete them over time, does the<br />
performance of the system change? Since de-<strong>duplication</strong> uses so many references around the s<strong>to</strong>re<br />
for new documents, do the recovery characteristics for a recent backup (what you’ll mostly be<br />
recovering) a month or two in<strong>to</strong> deployment change vs. the first pilot? Talk <strong>to</strong> existing users of the<br />
vendor <strong>to</strong> find what others have seen.<br />
In many cases, performance in your deployment will depend on many fac<strong>to</strong>rs, including the backup<br />
software <strong>and</strong> the systems <strong>and</strong> networks supporting it. Underst<strong>and</strong> your current performance <strong>and</strong><br />
bottlenecks before challenging the vendor of a particular component <strong>to</strong> fix it all.<br />
3. Is the data de-<strong>duplication</strong> in-line or post process?<br />
As with any new technology, there is a lot of confusion in the industry about the differences between inline<br />
<strong>and</strong> post processing approaches, as well as abuse/misuse of the terms being used <strong>to</strong> differentiate<br />
the two. In ESG’s view, it comes down <strong>to</strong> one simple “yes or no” question: When the backup data is<br />
written <strong>to</strong> disk, is the data de-duplicated or not? If the answer is yes, then it has been de-duplicated inline.<br />
If at that point the answer is no, then the de-<strong>duplication</strong> is done post-process.<br />
What is the significance of one approach versus the other? The two areas that you need <strong>to</strong> research<br />
are the impact <strong>to</strong> performance for the in-line approach <strong>and</strong> capacity issues for the post-process<br />
approach. Underst<strong>and</strong> the trade-offs for each approach based on the vendor’s specific solutions.<br />
Since in-line de-<strong>duplication</strong> is an intelligent process performed during the backup process, there can<br />
be some performance degradation during data ingest. However, performance impact depends on a<br />
number of variables, including the de-<strong>duplication</strong> technology itself, the size of the backup volume, the<br />
granularity of the de-<strong>duplication</strong> process, the aggregate throughput of the architecture <strong>and</strong> the<br />
scalability of the solution.<br />
Post-process data de-<strong>duplication</strong> does require more disk capacity <strong>to</strong> be allocated upfront. But the size<br />
of the “capacity reserve” needed depends on a number of variables, including the amount of backup<br />
data <strong>and</strong> how long the de-<strong>duplication</strong> technology “holds” on<strong>to</strong> the capacity before releasing it.<br />
Some post-process de-<strong>duplication</strong> technologies wait for the entire backup job <strong>to</strong> be completed, while<br />
others start de-<strong>duplication</strong> as backup data is s<strong>to</strong>red.<br />
Solutions that wait for the backup process <strong>to</strong> complete before de-<strong>duplication</strong> of the data have a greater<br />
initial “capacity overhead” than solutions that start the de-<strong>duplication</strong> process earlier. These solutions<br />
have <strong>to</strong> allocate enough capacity <strong>to</strong> s<strong>to</strong>re the entire backup volume. The capacity is released when the<br />
backup job is complete <strong>and</strong> re-allocated before the next backup job begins. Beginning the de-<br />
Enterprise Strategy Group Page 6
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
<strong>duplication</strong> process immediately after data is written <strong>to</strong> the backup target may shorten the length of the<br />
de-<strong>duplication</strong> process, but will also affect the data ingest rate.<br />
4. Does the D2D backup solution support de-duplicated remote replication?<br />
End-users may wish <strong>to</strong> implement remote replication for disaster recovery, remote backup <strong>and</strong> remote<br />
vaulting. The combination of data de-<strong>duplication</strong> <strong>and</strong> remote replication offers a great deal of value by<br />
reducing b<strong>and</strong>width as dramatically as it reduces s<strong>to</strong>rage capacity. In many cases, data de-<strong>duplication</strong><br />
can enable disaster recovery <strong>and</strong> remote backup in terms of meeting backup windows <strong>and</strong> budget<br />
constraints.<br />
Some data de-<strong>duplication</strong> solutions support multi-site remote replication. If this is a requirement, find<br />
out if the vendors you are considering support multi-site. The next question <strong>to</strong> ask is whether they<br />
support data de-<strong>duplication</strong> across the entire environment. For example, if there is duplicate data in<br />
five different sites, will it s<strong>to</strong>re only one copy at the target site? And will it make sure that no duplicate<br />
data will be sent over the WAN? Supporting multi-site data de-<strong>duplication</strong> raises the level of capacity<br />
reduction efficiency.<br />
5. Will de-<strong>duplication</strong> processes affect my disaster recovery windows?<br />
While much of the focus in the industry around data de-<strong>duplication</strong> is focused on reduction ratios <strong>and</strong><br />
performance capacity trade-offs, it is also important <strong>to</strong> consider the effect data de-<strong>duplication</strong> may have<br />
on disaster recovery windows. This refers <strong>to</strong> the time it takes from the start of the backup process <strong>to</strong> the<br />
point where DR copies are made <strong>and</strong> moved off-site. The length of this process— from start <strong>to</strong> finish—<br />
depends on a number of variables, including the data de-<strong>duplication</strong> approach, the speed of the de<strong>duplication</strong><br />
architecture, the DR process (is the data being written/exported <strong>to</strong> tape or de-duplicated<br />
<strong>and</strong> then replicated over a WAN <strong>to</strong> a remote facility), etc. It is important <strong>to</strong> consider the lag time from<br />
initiation of backup <strong>to</strong> when the image is complete at the DR site. If this timeframe is greater than 24<br />
hours, that image may miss <strong>to</strong>o much new data <strong>to</strong> meet the DR objectives of your deployment. Make<br />
sure you are meeting the Recovery Point Objectives (RPO) you have in mind for DR.<br />
6. Is the D2D solution easy <strong>to</strong> implement?<br />
One of the compelling things about data de-<strong>duplication</strong> is that it is easy. Or at least it should be. Users<br />
shouldn’t have <strong>to</strong> perform cumbersome, complex <strong>and</strong> time consuming tasks <strong>to</strong> get up <strong>and</strong> running in<br />
order <strong>to</strong> derive value from their solutions. <strong>Data</strong> de-<strong>duplication</strong> should be invisible <strong>to</strong> the backup <strong>and</strong><br />
recovery process.<br />
Purpose-built D2D backup appliances also offer a level of transparency, acting as a <strong>to</strong>tal solution that<br />
doesn’t require disk s<strong>to</strong>rage management functions. VTL solutions are typically associated with ease<br />
of implementation, since they emulate tape libraries. Certainly, they provide a great deal of value<br />
compared <strong>to</strong> just using a general purpose s<strong>to</strong>rage system, which often is complex <strong>to</strong> manage <strong>and</strong><br />
requires s<strong>to</strong>rage experts. The backup administra<strong>to</strong>r doesn’t want <strong>to</strong> manage RAID groups, LUNs <strong>and</strong><br />
volumes.<br />
7. What is the system impact of performance?<br />
More controllers? More disk? What ratio of each? This matters for cost <strong>and</strong> system management<br />
reasons.<br />
8. How does the D2D backup solution protect itself from data loss <strong>and</strong> corruption?<br />
It is important <strong>to</strong> underst<strong>and</strong> how “bullet proof” the D2D backup solution is. Find out what technologies<br />
it has <strong>to</strong> ensure data integrity <strong>and</strong> protection against system failures. While this is always important, it<br />
is an even bigger consideration in data protection with de-<strong>duplication</strong>. With D2D backup solutions that<br />
Enterprise Strategy Group Page 7
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
support data de-<strong>duplication</strong>, there may be 1,000 backup images that rely on one copy of source data.<br />
Therefore, it becomes even more important that this source data needs <strong>to</strong> be kept accessible <strong>and</strong> with<br />
a high level of data integrity.<br />
9. How scalable is the D2D backup solution?<br />
You need <strong>to</strong> size your environment in terms of capacity <strong>and</strong> performance for <strong>to</strong>day, with considerations<br />
of the future.<br />
10. Does the solution provide flexible application support?<br />
How many backup applications are supported? Can non-backup applications work, <strong>to</strong>o? More<br />
flexibility means more consolidation is possible using less physical infrastructure.<br />
11. What are the other features <strong>and</strong> capabilities of your solution?<br />
<strong>Data</strong> de-<strong>duplication</strong> is a valuable technology, but it is not the only consideration. You must also<br />
evaluate the other important features <strong>and</strong> capabilities of the solution <strong>to</strong> see if it meets with your needs.<br />
Enterprise Strategy Group Page 8
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
The Business Value of <strong>Data</strong> <strong>De</strong>-Duplication<br />
One of the greatest qualities of data de-<strong>duplication</strong> is that its value is easy <strong>to</strong> quantify. If you can reduce<br />
the amount of capacity <strong>to</strong> s<strong>to</strong>re backup data by 10:1, 20:1 or greater, you can get your calcula<strong>to</strong>r out <strong>and</strong><br />
put a dollar amount <strong>to</strong> the cost savings. While the cost savings may be significant enough for many <strong>to</strong><br />
move forward, there are other cost benefits that make the case for de-<strong>duplication</strong> even more compelling.<br />
Hard Dollars<br />
The hard dollar costs are easy <strong>to</strong> determine. The first metric is the reduced capital cost of the D2D backup<br />
solution with <strong>and</strong> without data de-<strong>duplication</strong>. The D2D backup solution will require more actual capacity <strong>to</strong><br />
s<strong>to</strong>re backups—in some cases, 20 or more times what the data de-<strong>duplication</strong>-enabled solution needs.<br />
There are other capital cost savings as well. D2D backup solutions can reduce the amount of tape<br />
infrastructure you acquire. Some end-users have <strong>to</strong>tally eliminated tape, while others have reduced the<br />
number of tape libraries they maintain.<br />
If you want <strong>to</strong> perform remote replication between your primary <strong>and</strong> remote sites, then data de-<strong>duplication</strong><br />
can significantly reduce your WAN b<strong>and</strong>width costs. You can effectively replicate data over long distances<br />
with far less b<strong>and</strong>width. Since WAN b<strong>and</strong>width is still expensive <strong>and</strong> a recurring cost, data de-<strong>duplication</strong><br />
can significantly improve the economics of implemented remote backup <strong>and</strong> disaster recovery.<br />
Figure Three: Remote Replication <strong>and</strong> <strong>Data</strong> <strong>De</strong>-Duplication<br />
It is also important <strong>to</strong> consider facility costs, which include power <strong>and</strong> cooling as well as floor space. Since<br />
you are using fewer disks, you are creating less heat <strong>and</strong> drawing less power. Again, at a 20:1 capacity<br />
reduction ratio, this can be a significant savings. In some cases, there are data centers that just can’t use<br />
up any more power—they are at or near their maximum limits. These companies should certainly evaluate<br />
data de-<strong>duplication</strong>-enabled solutions. .<br />
Additionally, floor space is at a premium. <strong>Data</strong> de-<strong>duplication</strong>-enabled solutions can reduce vertical growth<br />
by minimizing the amount of shelf space needed <strong>to</strong> s<strong>to</strong>re backup data. As discussed previously, you can<br />
Enterprise Strategy Group Page 9
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
eliminate some or all of your tape libraries by moving <strong>to</strong> disk, which will free up floor space (<strong>and</strong> reduce<br />
power <strong>and</strong> cooling costs).<br />
<strong>Data</strong> de-<strong>duplication</strong> enables you <strong>to</strong> use less capacity <strong>to</strong> s<strong>to</strong>re backup data, but it also reduces the amount<br />
of processing power, b<strong>and</strong>width <strong>and</strong> memory per GB. This impacts all of the aforementioned fac<strong>to</strong>rs that<br />
make data de-<strong>duplication</strong>-enabled solutions easy <strong>to</strong> cost justify.<br />
Value of Increased Retention<br />
ESG has found that the majority of end-users still use tape backup as their main method of disaster<br />
recovery. However, we’ve also found that end-users consider the process of recovering data from tape <strong>to</strong><br />
be slow, complex <strong>and</strong> unreliable. These two realities are clearly at odds with one another.<br />
Recovering a single file from tape can take several minutes, whereas recovering data from disk is<br />
instantaneous. Multiply this by dozens, hundreds <strong>and</strong> even thous<strong>and</strong>s of files <strong>and</strong> the performance<br />
difference can be several hours, days <strong>and</strong> even weeks. Consider database tables that span multiple tapes<br />
<strong>and</strong> the process of trying <strong>to</strong> recover this information quickly. Consider the process of tape interleaving,<br />
which improves tape backup performance, but impacts res<strong>to</strong>re performance because server data is spread<br />
r<strong>and</strong>omly across the tapes. Additionally, recovery performance is greatly impacted by tape availability—<br />
whether it is within the library or offsite in a box somewhere far away.<br />
The fact that end-users are unsure whether they can actually recover 100% of their data from tape is<br />
another harsh reality. The very purpose of backing up your data is so you can recover if needed. Backing<br />
up data on<strong>to</strong> disk resolves recovery <strong>and</strong> reliability issues. <strong>Data</strong> de-<strong>duplication</strong> increases the amount of<br />
backup data that you can retain <strong>and</strong> extends the retention period. In effect, by using data de-<strong>duplication</strong>enabled<br />
D2D backup solutions, you can eliminate the need for ever having <strong>to</strong> res<strong>to</strong>re from tape again.<br />
That should be the objective of every IT organization—removing the slow, error prone, high <strong>to</strong>uch tape<br />
processes <strong>and</strong> replace them with modern solutions that provide you with fast, reliable <strong>and</strong> au<strong>to</strong>mated<br />
protection.<br />
In effect, data de-<strong>duplication</strong>-enabled D2D backup solutions eliminate the risks <strong>and</strong> inefficiencies of<br />
recovering from tape. Further, data de-<strong>duplication</strong>-enabled D2D backup provides a solution that meets the<br />
true needs of your data recovery requirements without the compromises you’ve come <strong>to</strong> accept with tape.<br />
The cost impacts of being able <strong>to</strong> rapidly <strong>and</strong> reliably recover data is harder <strong>to</strong> quantify than capital cost<br />
savings, but the implications range from inconvenience <strong>to</strong> complete data loss. Perhaps one of your<br />
employees had <strong>to</strong> wait a few hours <strong>to</strong> recover a lost file they were working on. If that file is unrecoverable,<br />
the cost goes up. In addition, what if that file contained valuable intellectual property that will be extremely<br />
difficult <strong>to</strong> recreate? What if there was specific litigation or an audit that required information within that<br />
file? Perhaps there was important information contained within that document that impacted a major<br />
business transaction or valuable research. These are the considerations that must be weighed against<br />
using outdated <strong>and</strong> archaic forms of data protection, especially when there are best-in-class D2D backup<br />
solutions in existence that address these issues without breaking your budget.<br />
Value of Operational Efficiencies<br />
If you just minimize <strong>and</strong> potentially remove the manual tasks of managing tape, there is an immediate<br />
positive impact on productivity. This may be manifested in time saving that were previously dedicated <strong>to</strong><br />
the day-<strong>to</strong>-day issues of managing the tape rotation process as well as any frantic scrambles <strong>to</strong> recover<br />
data in an emergency.<br />
D2D backup solutions are often used as a complement <strong>to</strong> tape. Companies often reduce the<br />
number/frequency of backups they perform <strong>to</strong> tape <strong>to</strong> once a week or even once a month, while daily<br />
backups are sent <strong>to</strong> disk.<br />
Enterprise Strategy Group Page 10
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
In some cases, D2D backup solutions may replace tape systems. ESG has found that a growing number<br />
of companies are actually considering this. Much of this will be contingent on best practices <strong>and</strong><br />
governance of the company or organization. In some cases, removing tape systems is not an option based<br />
on regulations. However, for those companies that are not encumbered by these issues, tape removal is<br />
very attractive. The question arises—how do I protect my data from a major site disaster? Right now, if you<br />
are shipping tapes offsite as your main DR process, then you may need <strong>to</strong> consider implementing remote<br />
replication.<br />
Many D2D backup solutions support remote replication <strong>to</strong> another system at a remote site. As discussed<br />
previously, data de-<strong>duplication</strong>-enabled solutions can do this quickly <strong>and</strong> cost effectively. Recovering data<br />
from disk is instant, versus recovering from tape. The time <strong>to</strong> recovery is what is really important.<br />
Typically, data recovery is an urgent issue. It could take hours, days or even weeks <strong>to</strong> recover data fully<br />
from tape. This is becoming increasingly unacceptable <strong>and</strong> thanks <strong>to</strong> current technology developments, it<br />
is actually unnecessary <strong>to</strong> <strong>to</strong>lerate it. <strong>Data</strong> de-<strong>duplication</strong>-enabled D2D backup solutions allow end-users<br />
<strong>to</strong> retain data for longer periods of time, reducing <strong>and</strong> potentially eliminating the need <strong>to</strong> ever recover data<br />
from tape again. The end result is faster <strong>and</strong> more reliable recoveries of data.<br />
<strong>Data</strong> de-<strong>duplication</strong>-enabled solutions provide easier management than D2D backup solutions that don’t<br />
provide capacity optimization. Since traditional D2D backup solutions require more capacity, the process<br />
of managing those systems is inherently more complex. Capacity utilization will have <strong>to</strong> be moni<strong>to</strong>red more<br />
often, backup data will need <strong>to</strong> be removed or new capacity added more frequently <strong>and</strong> you will still need <strong>to</strong><br />
rely heavily on tape for recoveries.<br />
In many cases, operational costs outweigh capital costs. More importantly, there are always more projects<br />
that need IT personnel’s attention. By removing the mundane <strong>and</strong> time consuming process of managing<br />
tape, your team can focus on more important pursuits that help the business.<br />
Time <strong>to</strong> Protection<br />
Time <strong>to</strong> Protection is important since it impacts how quickly you can get your data protected. A key value<br />
of data de-<strong>duplication</strong> is that it is easy. End-users don’t have <strong>to</strong> perform Herculean tasks <strong>to</strong> get the value<br />
out of data de-<strong>duplication</strong>-enabled solutions. <strong>Data</strong> de-<strong>duplication</strong> should be invisible <strong>to</strong> the backup <strong>and</strong><br />
recovery process. If it isn’t, then you need <strong>to</strong> re-evaluate the data de-<strong>duplication</strong>-enabled solution <strong>and</strong><br />
consider another avenue.<br />
One of the big advantages of data de-<strong>duplication</strong>-enabled solutions is the ability <strong>to</strong> replicate data over less<br />
b<strong>and</strong>width. This not only reduces cost, but also allows you <strong>to</strong> transfer <strong>and</strong> protect data much more quickly.<br />
If you had <strong>to</strong> send all of your backup data<br />
over the WAN, it could take several hours or<br />
even days. However, with data de<strong>duplication</strong>-enabled<br />
solutions, the process<br />
should be several times faster. Thus, Time<br />
<strong>to</strong> Protection is more rapid <strong>and</strong> the safety of<br />
replicated data is guaranteed more quickly<br />
than non-data de-<strong>duplication</strong> approaches.<br />
It is important <strong>to</strong> consider that it isn’t just an<br />
issue of time <strong>and</strong> how quickly you can<br />
protect data, but data de-<strong>duplication</strong>enabled<br />
solutions can actually enable a<br />
level of data protection that isn’t otherwise<br />
practical. ESG spoke with an end-user that<br />
<strong>Data</strong> <strong>De</strong>-Duplication ROI Analysis<br />
<strong>Disk</strong> <strong>and</strong> Tape Cost Reduction<br />
Reduced B<strong>and</strong>width Requirements<br />
Lower Power <strong>and</strong> Cooling Consumption<br />
Smaller Floor Space Footprint<br />
Reliable <strong>Data</strong> Recoveries<br />
Fast Recovery of <strong>Data</strong><br />
Lower Operational Cost – Less Media H<strong>and</strong>ling<br />
Time <strong>to</strong> Protection<br />
Lower Total Cost of Recovery<br />
implemented remote backup from Bos<strong>to</strong>n <strong>to</strong> Los Angeles using a data de-<strong>duplication</strong>-enabled solution. He<br />
said that without data de-<strong>duplication</strong> performing remote backups, these long distance backups would be<br />
<strong>to</strong>o costly <strong>and</strong> require <strong>to</strong>o much time <strong>to</strong> perform.<br />
Enterprise Strategy Group Page 11
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
Total Cost of Recovery<br />
When you add all of these elements <strong>to</strong>gether, the cost of recovery using tape or traditional D2D backup<br />
solutions compared <strong>to</strong> data de-<strong>duplication</strong>-enabled D2D solutions is as about a “no-brainer” situation as<br />
you can get in the data center. As a summary, the following are the cost saving data de-<strong>duplication</strong>enabled<br />
elements:<br />
• Reduced disk capacity for data protection<br />
• Potentially fewer D2D backup s<strong>to</strong>rage systems over time<br />
• Fewer tapes or potential elimination of tapes<br />
• Fewer tape libraries or potential elimination of tape libraries<br />
• Reduced power <strong>and</strong> cooling costs<br />
• More available floor space based on fewer D2D backup <strong>and</strong> tape systems<br />
• Reduced WAN b<strong>and</strong>width costs<br />
• More reliable data recoveries<br />
• Faster data recoveries<br />
• Less people hours managing tape <strong>and</strong> disk administration<br />
• All of the above for each site<br />
The Total Cost of Recovery (TCR) for data de-<strong>duplication</strong>-enabled D2D backup solutions is clearly far less<br />
than tape or D2D backup solutions that do not support capacity optimization.<br />
Business Questions <strong>to</strong> Ask:<br />
1. How many cus<strong>to</strong>mers do you have using your product in production environments <strong>to</strong>day?<br />
The number of cus<strong>to</strong>mers is important <strong>to</strong> underst<strong>and</strong> whether the market is adopting it or not. From a<br />
product value perspective, quantity is quality. If there are only 10 cus<strong>to</strong>mers <strong>and</strong> they have been in the<br />
market for 5 years, this is a red flag. If they have hundreds or thous<strong>and</strong>s of cus<strong>to</strong>mers, then you have<br />
market validation. Again, for newer solutions there will be fewer implementations. That is why it’s<br />
important <strong>to</strong> get cus<strong>to</strong>mer references.<br />
2. Can you provide us a cost saving analysis from companies similar <strong>to</strong> ours? Please include capital,<br />
operational <strong>and</strong> facilities cost savings.<br />
Vendors often talk about value, but hardly ever show you real numbers. <strong>Data</strong> de-<strong>duplication</strong> is easy <strong>to</strong><br />
quantify, so ask the vendors <strong>to</strong> provide you with real data. This will help you better underst<strong>and</strong> what<br />
cost savings you might obtain by using their products. Having more than one data point is important as<br />
well, since there are multiple variables <strong>to</strong> consider.<br />
3. Can you provide us with some existing cus<strong>to</strong>mers that we can talk <strong>to</strong> about working with you <strong>and</strong> your<br />
products?<br />
Talking <strong>to</strong> other users is always valuable. They can give you insight as <strong>to</strong> what <strong>to</strong> expect when you<br />
deploy a vendor’s solution. Of course they will be a happy cus<strong>to</strong>mer, but they will still share their real<br />
life perspective with you.<br />
4. How disruptive will your product be <strong>to</strong> our environment?<br />
Implementing a new solution that provides real value <strong>to</strong> your company is always desirable, but at what<br />
cost? You need <strong>to</strong> underst<strong>and</strong> if this new innovative solution will be overly disruptive <strong>to</strong> your<br />
environment.<br />
Enterprise Strategy Group Page 12
5. How many hours a week does it take <strong>to</strong> support your solution?<br />
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
If the solution is complex <strong>and</strong> requires a great deal of manual management, then you need <strong>to</strong> consider<br />
whether you have the resources <strong>to</strong> support it. On the other h<strong>and</strong>, the solution may require little<br />
management, but it’s important <strong>to</strong> find out. Additionally, ask for this data based on what their current<br />
cus<strong>to</strong>mers are experiencing. Also ask about training—is it required or recommended? If the answer is<br />
yes, then that is a red flag. If the product is so easy, why do we need training?<br />
6. What else does the vendor have <strong>to</strong> offer?<br />
Vendor selection should play a role in the decision-making process. It is important <strong>to</strong> underst<strong>and</strong> the<br />
vendor’s business success <strong>and</strong> long term viability, their support capability, how well they communicate<br />
with you <strong>and</strong> what other services or products they could offer <strong>to</strong> you <strong>to</strong>day <strong>and</strong> over time. You should<br />
also consider positive existing relationships with the vendor <strong>and</strong>/or system integra<strong>to</strong>r.<br />
Enterprise Strategy Group Page 13
ESG’s View<br />
ESG Report<br />
<strong>Data</strong> <strong>De</strong>-Duplication <strong>and</strong> D2D <strong>Backup</strong> <strong>Systems</strong><br />
Disasters will happen. They can range from a file being lost, <strong>to</strong> a data center being flooded, <strong>to</strong> an entire<br />
building being destroyed. Some of these incidents are common ones, such as file loss or data corruption,<br />
systems <strong>and</strong> infrastructure going down <strong>and</strong> becoming unavailable, disk drives failing, <strong>and</strong> users err<br />
(remember, humans invented human error). Then there are the somewhat common incidents, including<br />
facility disasters involving flooding or fire. Even though the odds are less likely, there may be major natural<br />
disasters <strong>to</strong> contend with including earthquakes, <strong>to</strong>rnadoes, <strong>and</strong> hurricanes. And there have been a few<br />
recent incidents of large geographic blackouts that take hours or even days <strong>to</strong> correct.<br />
The fact that most companies <strong>and</strong> organization still use tape as their primary defense against these events<br />
is troubling. Once, there was an economic rationalization <strong>to</strong> use it, but data de-<strong>duplication</strong>-enabled<br />
solutions invalidate this. There are end-users that are m<strong>and</strong>ated <strong>to</strong> use tape for governance <strong>and</strong> regula<strong>to</strong>ry<br />
reasons, but they can use data de-<strong>duplication</strong>-enabled solutions <strong>to</strong> augment their environments. For those<br />
companies not so encumbered, they should certainly consider data de-<strong>duplication</strong> <strong>to</strong> complement <strong>and</strong> even<br />
replace their tape systems.<br />
<strong>Data</strong> de-<strong>duplication</strong> is a powerful form of virtualization—the ability <strong>to</strong> logically view <strong>and</strong> manage physical<br />
assets for greater utilization <strong>and</strong> au<strong>to</strong>mation of otherwise manual tasks. <strong>Data</strong> de-<strong>duplication</strong> achieves both<br />
of these goals by significantly reducing the amount of capacity required <strong>to</strong> s<strong>to</strong>re backup data—5:1, 10:1,<br />
20:1 <strong>and</strong> beyond. Additionally, data de-<strong>duplication</strong> reduces or even eliminates the need <strong>to</strong> manage tapes.<br />
<strong>De</strong>aling with tape media management is archaic in this digital age. It is analogous <strong>to</strong> someone still<br />
stubbornly h<strong>and</strong> washing the dishes even though he or she has dishwasher right next <strong>to</strong> the sink.<br />
Tape will be around for some time <strong>to</strong> come. There are still governance <strong>and</strong> regula<strong>to</strong>ry m<strong>and</strong>ates that<br />
ensure its survival. Additionally, incumbency often trumps innovation <strong>and</strong> the change management policy<br />
is <strong>to</strong> not change anything. There is still a great deal of education that also needs <strong>to</strong> occur. Not enough<br />
people know about data de-<strong>duplication</strong> or are skeptical about its abilities.<br />
<strong>Data</strong> de-<strong>duplication</strong> is very real <strong>and</strong> provides excellent value. ESG believes that it will become prevalent<br />
over time within D2D backup <strong>and</strong> all s<strong>to</strong>rage <strong>and</strong> application tiers. However, it is important <strong>to</strong> not only<br />
evaluate data de-<strong>duplication</strong> capabilities, but the entire product, cus<strong>to</strong>mer references, market <strong>and</strong> company<br />
success. ESG encourages you <strong>to</strong> ask the questions outlined in this report in order <strong>to</strong> leverage the benefits<br />
that can certainly be derived by data de-<strong>duplication</strong>. <strong>Data</strong> de-<strong>duplication</strong> changes the data protection<br />
l<strong>and</strong>scape <strong>and</strong> is one of the few categories that offers such a clear “no-brainer” value proposition.<br />
Enterprise Strategy Group Page 14