Data De-duplication and Disk-to-Disk Backup Systems

Data De-duplication and Disk-to-Disk Backup Systems

Data De-duplication and

Disk-to-Disk Backup Systems

Technical and Business Considerations

By Tony Asaro and Heidi Biggar

July 2007

Copyright ©2007. The Enterprise Strategy Group, Inc. All Rights Reserved.

Table of Contents

ESG Report

Data De-Duplication and D2D Backup Systems


Data De-Duplication Defined ................................................................................................................................4

The Business Value of Data De-Duplication.........................................................................................................9

ESG’s View ........................................................................................................................................................ 14

All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources the

Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are

subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this

publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express

consent of the Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable,

criminal prosecution. Should you have any questions, please contact ESG Client Relations at (508) 482-0188.

Enterprise Strategy Group Page 1


ESG Report

Data De-Duplication and D2D Backup Systems

Disk-to-disk (D2D) backup, combined with data de-duplication, is an emerging category within the data

protection ecosystem that ESG believes has the potential to change the entire landscape. D2D backup with

data de-duplication solutions minimizes the disk and/or the bandwidth capacities required to store and

move data used for protection purposes.

Data de-duplication solutions optimize physical storage and bandwidth by using less of each to protect your

data. Why use less? Perhaps the first and most obvious answer is to reduce cost. By reducing capacity

requirements, fewer disks are needed to store the same amount of effective data. This translates to less

bandwidth being required to move and copy that data across the WAN. Beyond these cost reductions,

there is perhaps an even more important reason to employ data de-duplication. By reducing the amount of

storage and bandwidth required to protect data locally and remotely, organizations can significantly

improve their levels of data protection and their ability to recover data quickly, reliably and cost effectively.

Reducing the cost of the storage required for backup data in turn enables greater data protection and


For years, there has been a considerable disparity between the prices of tape and disk-based storage

systems. As such, it was an economic “no-brainer” to store backups on tape. In fact, the cost delta

between tape and disk was so dramatic that despite the inherent weaknesses of tape—which include

complexity, unreliability and slow performance—it is still the preferred media for storing backup data today.

Figure One: Disk-to-Disk Adoption US-based Adoption Rates

Source: ESG Research September 2006

A major market shift occurred when storage system vendors began supporting low cost, high-density ATA

drives and the cost delta between disk and tape started to shrink significantly. Although the capital costsavings

still favored tape, the gap narrowed to a point where the operational impact of tape including cost

of management, unreliability issues and performance had finally moved the value dial from tape to disk for

many end-users. The market responded, and the disk-to-disk (D2D) backup market was born. At first, endusers

performed backups to lower cost drives within their existing primary storage systems and this is still a

popular process. Additionally, the development of new purpose-built solutions, such as D2D appliances

Enterprise Strategy Group Page 2

ESG Report

Data De-Duplication and D2D Backup Systems

and virtual tape libraries (VTL), created an entirely new market category. As shown in Figure One, a recent

survey conducted by ESG found that 64% of all respondents either have or are intending to implement a

purpose-built D2D backup solution.

This is a strong validation that these solutions are either replacing or complementing tape libraries. The

reason that end-users are embracing D2D backup solutions include improved backup performance,

eliminating tape media management issues, scalability, ease of management and cost. Another—and

possibly the most important—advancement in D2D backup is data de-duplication. Our research found that

33% of all respondents consider data de-duplication an important capability in their D2D backup solution.

ESG believes that this is an especially large percentage based on the fact that data de-duplication is an

emerging technology still requiring a great deal of education and awareness.

Data de-duplication’s value is even more

compelling above and beyond the use of high

density SATA drives within disk-based storage

systems. End-users employing D2D backup

solutions with data de-duplication are

experiencing backup data capacity reductions

of 10, 20 and 30 times—possibly even more 1 .

Consider the economic value of this level of

reduction: it not only eliminates any delta

between the capital costs of tape versus disk,

but arguably swings the pendulum to the other

side in disk’s favor. Add to this the operational

efficiency, rapid and reliable recoveries and the

Our research found that 33% of all

respondents consider data de-duplication

an important capability in their D2D backup

solution. ESG believes that this is an

especially large percentage based on the

fact that data de-duplication is an emerging

technology still requiring a great deal of

education and awareness.

elimination or reduction in tape management enabled by D2D backup solutions and you’ve got a

compelling and evident value proposition.

Data de-duplication is a game-changing technology. It enables D2D backup by lowering the overall cost of

these types of solutions. De-duplication reduces the amount of redundant data that is backed up, which

results in less capacity required to store that data. Additionally, companies can retain more backup data on

disk for longer periods of time, which reduces and potentially eliminates the need to recover data from

tape. Where replication is supported, data can be more efficiently—and cost-effectively—moved between

data sites for disaster recovery. Data de-duplication offers landscape changing value that is easy to

quantify, improves reliability, simplifies management and provides rapid recovery of data.

1 Data de-duplication ratios will vary based on the backup data (amount of redundancy and data change rates), the backup policy (frequency

of incremental and full backups) and the data de-duplication technology (size of data files/chunks/segments used).

Enterprise Strategy Group Page 3

Data De-Duplication Defined

ESG Report

Data De-Duplication and D2D Backup Systems

Though the technology behind it can be quite sophisticated, the concept of data de-duplication is simple.

Data de-duplication is the process of examining data to identify any redundancy. In the context of backup

data, we can make a strong supposition that there is a great deal of duplicate data. The same data keeps

getting backed up over and over again, consuming more storage space and impacting cost, thereby

creating a chain of inefficiency.

The following example, though simple, illustrates the potential power of de-duplication:

Let’s say that a 2 MB image has been embedded in a Word document and e-mailed to dozens of people.

Ten of the people who receive that document take that image and embed it in other documents. In fact, the

image is proliferated throughout the organization to the point where the image has been embedded in 200

other different documents. This creates 400 MB of additional capacity. With data de-duplication, only one

copy of the image is stored, saving 400 MB that would otherwise be consumed.

Now consider a 400 MB file that has been sent to multiple users. There might be ten full copies of weekly

backups of that400 MB that results in 4,000 MB (4 GB) of consumed storage. Reducing this to just the one

unique copy is significant since otherwise 4 GB would be required to back up that same file multiple times.

Another example of data-deduplication’s value—this time at the file level—involves a PowerPoint presentation

attached to an e-mail. If the e-mail is sent to multiple recipients and then forwarded to yet another set of

recipients, data de-duplication technology can be used to store the presentation only once. Next, consider what

happens when one of the e-mail recipients modifies a slide in the presentation and again forwards it to a group

of colleagues. Advanced data de-duplication algorithms work at the sub-file level and can be used to store only

the data associated with the changed slide.

The latter two examples include block or sub-block data de-duplication. This method works much like file

level de-duplication, but identifies common data in “chunks” or ”blocks” that are less than a file in size. This

method is typically implemented in purpose-built solutions that are dedicated to finding and eliminating duplicate

data within a file.

What does all this mean in real-life terms? Through hands-on testing, ESG has found that data deduplication

technologies can provide 10 times, 20 times, 30 times, and even greater reduction in capacity

needed for backup 2 . 3 This means that companies can store 10 TB to 30 TB of backup data on 1 TB of

physical disk capacity, which has potentially tremendous economic benefits. For one thing, it potentially

eliminates any delta between the capital costs of tape versus disk, making disk storage a more viable

option. Factor in the operational efficiencies of not having to move, store and manage redundant data

thanks to de-duplication and the elimination or reduction of tape management provided by D2D backup

solutions, and users can extract real value from de-duplicated D2D backup.

2 ESG has seen data de-duplication ratios range from 4:1 to 89:1—so mileage will vary.

3 See Appendix for a list of ESG related reports.

Enterprise Strategy Group Page 4

Redundant Data

Data De-Duplication


Figure Two: Data De-Duplication

ESG Report

Data De-Duplication and D2D Backup Systems

Data de-duplication ratios will vary based on the types of data involved and the frequency of full backups

and retention. As a rule of thumb, ESG believes a 20:1 ratio—when combined with data compression—to be

broadly achievable. Though ESG has seen data de-duplication ratios of 89:1 and there is potential for

even greater reductions, do not feel disappointed if you do not achieve 20:1 or greater, since reductions of

5:1 or more are still extremely valuable.

Technology Considerations

There is a great deal of buzz in the market around data de-duplication and it is only going to get louder. As

a result, there will be lots of confusion and convoluted messaging slung around regarding this topic. Keep

in mind that in the high tech industry, we often use the same term to mean different things and different

terms to mean the same thing. In the interest of clarity, ESG has provided a set of questions aimed at

providing some guidance to end-users interested in evaluating and implementing D2D backup solutions

that support data de-duplication.

Technology Questions to Ask:



1. What type of data de-duplication ratio can I expect?

A number of the vendors that support de-duplication provide high-level numbers. Dig deeper. The

actual amount of data reduction an organization can expect to see can vary significantly depending on

the type of data being backed up, retention periods, the frequency of full backups and the data deduplication

technology. Provide potential vendors with information about your environment, backup

process, applications, retention SLAs and data types to better determine what to expect.

2. How will data de-duplication affect my backup and restore performance?

Data de-duplication is a resource-intensive process. It needs to determine whether some new small

sequence has been stored before, often across hundreds of prior terabytes of data. A simple index of

this information is too big to fit in RAM, unless it is a very small deployment. So it needs to seek on

Enterprise Strategy Group Page 5

ESG Report

Data De-Duplication and D2D Backup Systems

disk, and disk seeks are notoriously slow (and not getting better). The following questions allow you to

dig deeper with regard to performance:

What is the single stream backup and restore throughput? This is how fast a given file/DB can be

backed up, restored or copied to tape for archiving. The numbers may be different—read speed and

write speed may have separate issues. Because of backup windows for critical data, backup

throughput is what most people ask about, though restore time is more significant for most SLAs.

LTO4 tapes need to receive data at >60 MB/sec. or they will operate well below their rated speed for

streaming, so restore stream speed matters significantly if tape will stay in your plans.

What is the aggregate backup/restore throughput per system? With many streams, how fast can a

given controller perform? This will help gauge the number of controllers/systems needed for your

deployment. It is mostly a measure of system management (number of systems) and cost—single

stream speed is more important for getting the job done.

Is the 30 th backup different from the 1 st ? If you backup images and delete them over time, does the

performance of the system change? Since de-duplication uses so many references around the store

for new documents, do the recovery characteristics for a recent backup (what you’ll mostly be

recovering) a month or two into deployment change vs. the first pilot? Talk to existing users of the

vendor to find what others have seen.

In many cases, performance in your deployment will depend on many factors, including the backup

software and the systems and networks supporting it. Understand your current performance and

bottlenecks before challenging the vendor of a particular component to fix it all.

3. Is the data de-duplication in-line or post process?

As with any new technology, there is a lot of confusion in the industry about the differences between inline

and post processing approaches, as well as abuse/misuse of the terms being used to differentiate

the two. In ESG’s view, it comes down to one simple “yes or no” question: When the backup data is

written to disk, is the data de-duplicated or not? If the answer is yes, then it has been de-duplicated inline.

If at that point the answer is no, then the de-duplication is done post-process.

What is the significance of one approach versus the other? The two areas that you need to research

are the impact to performance for the in-line approach and capacity issues for the post-process

approach. Understand the trade-offs for each approach based on the vendor’s specific solutions.

Since in-line de-duplication is an intelligent process performed during the backup process, there can

be some performance degradation during data ingest. However, performance impact depends on a

number of variables, including the de-duplication technology itself, the size of the backup volume, the

granularity of the de-duplication process, the aggregate throughput of the architecture and the

scalability of the solution.

Post-process data de-duplication does require more disk capacity to be allocated upfront. But the size

of the “capacity reserve” needed depends on a number of variables, including the amount of backup

data and how long the de-duplication technology “holds” onto the capacity before releasing it.

Some post-process de-duplication technologies wait for the entire backup job to be completed, while

others start de-duplication as backup data is stored.

Solutions that wait for the backup process to complete before de-duplication of the data have a greater

initial “capacity overhead” than solutions that start the de-duplication process earlier. These solutions

have to allocate enough capacity to store the entire backup volume. The capacity is released when the

backup job is complete and re-allocated before the next backup job begins. Beginning the de-

Enterprise Strategy Group Page 6

ESG Report

Data De-Duplication and D2D Backup Systems

duplication process immediately after data is written to the backup target may shorten the length of the

de-duplication process, but will also affect the data ingest rate.

4. Does the D2D backup solution support de-duplicated remote replication?

End-users may wish to implement remote replication for disaster recovery, remote backup and remote

vaulting. The combination of data de-duplication and remote replication offers a great deal of value by

reducing bandwidth as dramatically as it reduces storage capacity. In many cases, data de-duplication

can enable disaster recovery and remote backup in terms of meeting backup windows and budget


Some data de-duplication solutions support multi-site remote replication. If this is a requirement, find

out if the vendors you are considering support multi-site. The next question to ask is whether they

support data de-duplication across the entire environment. For example, if there is duplicate data in

five different sites, will it store only one copy at the target site? And will it make sure that no duplicate

data will be sent over the WAN? Supporting multi-site data de-duplication raises the level of capacity

reduction efficiency.

5. Will de-duplication processes affect my disaster recovery windows?

While much of the focus in the industry around data de-duplication is focused on reduction ratios and

performance capacity trade-offs, it is also important to consider the effect data de-duplication may have

on disaster recovery windows. This refers to the time it takes from the start of the backup process to the

point where DR copies are made and moved off-site. The length of this process— from start to finish—

depends on a number of variables, including the data de-duplication approach, the speed of the deduplication

architecture, the DR process (is the data being written/exported to tape or de-duplicated

and then replicated over a WAN to a remote facility), etc. It is important to consider the lag time from

initiation of backup to when the image is complete at the DR site. If this timeframe is greater than 24

hours, that image may miss too much new data to meet the DR objectives of your deployment. Make

sure you are meeting the Recovery Point Objectives (RPO) you have in mind for DR.

6. Is the D2D solution easy to implement?

One of the compelling things about data de-duplication is that it is easy. Or at least it should be. Users

shouldn’t have to perform cumbersome, complex and time consuming tasks to get up and running in

order to derive value from their solutions. Data de-duplication should be invisible to the backup and

recovery process.

Purpose-built D2D backup appliances also offer a level of transparency, acting as a total solution that

doesn’t require disk storage management functions. VTL solutions are typically associated with ease

of implementation, since they emulate tape libraries. Certainly, they provide a great deal of value

compared to just using a general purpose storage system, which often is complex to manage and

requires storage experts. The backup administrator doesn’t want to manage RAID groups, LUNs and


7. What is the system impact of performance?

More controllers? More disk? What ratio of each? This matters for cost and system management


8. How does the D2D backup solution protect itself from data loss and corruption?

It is important to understand how “bullet proof” the D2D backup solution is. Find out what technologies

it has to ensure data integrity and protection against system failures. While this is always important, it

is an even bigger consideration in data protection with de-duplication. With D2D backup solutions that

Enterprise Strategy Group Page 7

ESG Report

Data De-Duplication and D2D Backup Systems

support data de-duplication, there may be 1,000 backup images that rely on one copy of source data.

Therefore, it becomes even more important that this source data needs to be kept accessible and with

a high level of data integrity.

9. How scalable is the D2D backup solution?

You need to size your environment in terms of capacity and performance for today, with considerations

of the future.

10. Does the solution provide flexible application support?

How many backup applications are supported? Can non-backup applications work, too? More

flexibility means more consolidation is possible using less physical infrastructure.

11. What are the other features and capabilities of your solution?

Data de-duplication is a valuable technology, but it is not the only consideration. You must also

evaluate the other important features and capabilities of the solution to see if it meets with your needs.

Enterprise Strategy Group Page 8

ESG Report

Data De-Duplication and D2D Backup Systems

The Business Value of Data De-Duplication

One of the greatest qualities of data de-duplication is that its value is easy to quantify. If you can reduce

the amount of capacity to store backup data by 10:1, 20:1 or greater, you can get your calculator out and

put a dollar amount to the cost savings. While the cost savings may be significant enough for many to

move forward, there are other cost benefits that make the case for de-duplication even more compelling.

Hard Dollars

The hard dollar costs are easy to determine. The first metric is the reduced capital cost of the D2D backup

solution with and without data de-duplication. The D2D backup solution will require more actual capacity to

store backups—in some cases, 20 or more times what the data de-duplication-enabled solution needs.

There are other capital cost savings as well. D2D backup solutions can reduce the amount of tape

infrastructure you acquire. Some end-users have totally eliminated tape, while others have reduced the

number of tape libraries they maintain.

If you want to perform remote replication between your primary and remote sites, then data de-duplication

can significantly reduce your WAN bandwidth costs. You can effectively replicate data over long distances

with far less bandwidth. Since WAN bandwidth is still expensive and a recurring cost, data de-duplication

can significantly improve the economics of implemented remote backup and disaster recovery.

Figure Three: Remote Replication and Data De-Duplication

It is also important to consider facility costs, which include power and cooling as well as floor space. Since

you are using fewer disks, you are creating less heat and drawing less power. Again, at a 20:1 capacity

reduction ratio, this can be a significant savings. In some cases, there are data centers that just can’t use

up any more power—they are at or near their maximum limits. These companies should certainly evaluate

data de-duplication-enabled solutions. .

Additionally, floor space is at a premium. Data de-duplication-enabled solutions can reduce vertical growth

by minimizing the amount of shelf space needed to store backup data. As discussed previously, you can

Enterprise Strategy Group Page 9

ESG Report

Data De-Duplication and D2D Backup Systems

eliminate some or all of your tape libraries by moving to disk, which will free up floor space (and reduce

power and cooling costs).

Data de-duplication enables you to use less capacity to store backup data, but it also reduces the amount

of processing power, bandwidth and memory per GB. This impacts all of the aforementioned factors that

make data de-duplication-enabled solutions easy to cost justify.

Value of Increased Retention

ESG has found that the majority of end-users still use tape backup as their main method of disaster

recovery. However, we’ve also found that end-users consider the process of recovering data from tape to

be slow, complex and unreliable. These two realities are clearly at odds with one another.

Recovering a single file from tape can take several minutes, whereas recovering data from disk is

instantaneous. Multiply this by dozens, hundreds and even thousands of files and the performance

difference can be several hours, days and even weeks. Consider database tables that span multiple tapes

and the process of trying to recover this information quickly. Consider the process of tape interleaving,

which improves tape backup performance, but impacts restore performance because server data is spread

randomly across the tapes. Additionally, recovery performance is greatly impacted by tape availability—

whether it is within the library or offsite in a box somewhere far away.

The fact that end-users are unsure whether they can actually recover 100% of their data from tape is

another harsh reality. The very purpose of backing up your data is so you can recover if needed. Backing

up data onto disk resolves recovery and reliability issues. Data de-duplication increases the amount of

backup data that you can retain and extends the retention period. In effect, by using data de-duplicationenabled

D2D backup solutions, you can eliminate the need for ever having to restore from tape again.

That should be the objective of every IT organization—removing the slow, error prone, high touch tape

processes and replace them with modern solutions that provide you with fast, reliable and automated


In effect, data de-duplication-enabled D2D backup solutions eliminate the risks and inefficiencies of

recovering from tape. Further, data de-duplication-enabled D2D backup provides a solution that meets the

true needs of your data recovery requirements without the compromises you’ve come to accept with tape.

The cost impacts of being able to rapidly and reliably recover data is harder to quantify than capital cost

savings, but the implications range from inconvenience to complete data loss. Perhaps one of your

employees had to wait a few hours to recover a lost file they were working on. If that file is unrecoverable,

the cost goes up. In addition, what if that file contained valuable intellectual property that will be extremely

difficult to recreate? What if there was specific litigation or an audit that required information within that

file? Perhaps there was important information contained within that document that impacted a major

business transaction or valuable research. These are the considerations that must be weighed against

using outdated and archaic forms of data protection, especially when there are best-in-class D2D backup

solutions in existence that address these issues without breaking your budget.

Value of Operational Efficiencies

If you just minimize and potentially remove the manual tasks of managing tape, there is an immediate

positive impact on productivity. This may be manifested in time saving that were previously dedicated to

the day-to-day issues of managing the tape rotation process as well as any frantic scrambles to recover

data in an emergency.

D2D backup solutions are often used as a complement to tape. Companies often reduce the

number/frequency of backups they perform to tape to once a week or even once a month, while daily

backups are sent to disk.

Enterprise Strategy Group Page 10

ESG Report

Data De-Duplication and D2D Backup Systems

In some cases, D2D backup solutions may replace tape systems. ESG has found that a growing number

of companies are actually considering this. Much of this will be contingent on best practices and

governance of the company or organization. In some cases, removing tape systems is not an option based

on regulations. However, for those companies that are not encumbered by these issues, tape removal is

very attractive. The question arises—how do I protect my data from a major site disaster? Right now, if you

are shipping tapes offsite as your main DR process, then you may need to consider implementing remote


Many D2D backup solutions support remote replication to another system at a remote site. As discussed

previously, data de-duplication-enabled solutions can do this quickly and cost effectively. Recovering data

from disk is instant, versus recovering from tape. The time to recovery is what is really important.

Typically, data recovery is an urgent issue. It could take hours, days or even weeks to recover data fully

from tape. This is becoming increasingly unacceptable and thanks to current technology developments, it

is actually unnecessary to tolerate it. Data de-duplication-enabled D2D backup solutions allow end-users

to retain data for longer periods of time, reducing and potentially eliminating the need to ever recover data

from tape again. The end result is faster and more reliable recoveries of data.

Data de-duplication-enabled solutions provide easier management than D2D backup solutions that don’t

provide capacity optimization. Since traditional D2D backup solutions require more capacity, the process

of managing those systems is inherently more complex. Capacity utilization will have to be monitored more

often, backup data will need to be removed or new capacity added more frequently and you will still need to

rely heavily on tape for recoveries.

In many cases, operational costs outweigh capital costs. More importantly, there are always more projects

that need IT personnel’s attention. By removing the mundane and time consuming process of managing

tape, your team can focus on more important pursuits that help the business.

Time to Protection

Time to Protection is important since it impacts how quickly you can get your data protected. A key value

of data de-duplication is that it is easy. End-users don’t have to perform Herculean tasks to get the value

out of data de-duplication-enabled solutions. Data de-duplication should be invisible to the backup and

recovery process. If it isn’t, then you need to re-evaluate the data de-duplication-enabled solution and

consider another avenue.

One of the big advantages of data de-duplication-enabled solutions is the ability to replicate data over less

bandwidth. This not only reduces cost, but also allows you to transfer and protect data much more quickly.

If you had to send all of your backup data

over the WAN, it could take several hours or

even days. However, with data deduplication-enabled

solutions, the process

should be several times faster. Thus, Time

to Protection is more rapid and the safety of

replicated data is guaranteed more quickly

than non-data de-duplication approaches.

It is important to consider that it isn’t just an

issue of time and how quickly you can

protect data, but data de-duplicationenabled

solutions can actually enable a

level of data protection that isn’t otherwise

practical. ESG spoke with an end-user that

Data De-Duplication ROI Analysis

Disk and Tape Cost Reduction

Reduced Bandwidth Requirements

Lower Power and Cooling Consumption

Smaller Floor Space Footprint

Reliable Data Recoveries

Fast Recovery of Data

Lower Operational Cost – Less Media Handling

Time to Protection

Lower Total Cost of Recovery

implemented remote backup from Boston to Los Angeles using a data de-duplication-enabled solution. He

said that without data de-duplication performing remote backups, these long distance backups would be

too costly and require too much time to perform.

Enterprise Strategy Group Page 11

ESG Report

Data De-Duplication and D2D Backup Systems

Total Cost of Recovery

When you add all of these elements together, the cost of recovery using tape or traditional D2D backup

solutions compared to data de-duplication-enabled D2D solutions is as about a “no-brainer” situation as

you can get in the data center. As a summary, the following are the cost saving data de-duplicationenabled


• Reduced disk capacity for data protection

• Potentially fewer D2D backup storage systems over time

• Fewer tapes or potential elimination of tapes

• Fewer tape libraries or potential elimination of tape libraries

• Reduced power and cooling costs

• More available floor space based on fewer D2D backup and tape systems

• Reduced WAN bandwidth costs

• More reliable data recoveries

• Faster data recoveries

• Less people hours managing tape and disk administration

• All of the above for each site

The Total Cost of Recovery (TCR) for data de-duplication-enabled D2D backup solutions is clearly far less

than tape or D2D backup solutions that do not support capacity optimization.

Business Questions to Ask:

1. How many customers do you have using your product in production environments today?

The number of customers is important to understand whether the market is adopting it or not. From a

product value perspective, quantity is quality. If there are only 10 customers and they have been in the

market for 5 years, this is a red flag. If they have hundreds or thousands of customers, then you have

market validation. Again, for newer solutions there will be fewer implementations. That is why it’s

important to get customer references.

2. Can you provide us a cost saving analysis from companies similar to ours? Please include capital,

operational and facilities cost savings.

Vendors often talk about value, but hardly ever show you real numbers. Data de-duplication is easy to

quantify, so ask the vendors to provide you with real data. This will help you better understand what

cost savings you might obtain by using their products. Having more than one data point is important as

well, since there are multiple variables to consider.

3. Can you provide us with some existing customers that we can talk to about working with you and your


Talking to other users is always valuable. They can give you insight as to what to expect when you

deploy a vendor’s solution. Of course they will be a happy customer, but they will still share their real

life perspective with you.

4. How disruptive will your product be to our environment?

Implementing a new solution that provides real value to your company is always desirable, but at what

cost? You need to understand if this new innovative solution will be overly disruptive to your


Enterprise Strategy Group Page 12

5. How many hours a week does it take to support your solution?

ESG Report

Data De-Duplication and D2D Backup Systems

If the solution is complex and requires a great deal of manual management, then you need to consider

whether you have the resources to support it. On the other hand, the solution may require little

management, but it’s important to find out. Additionally, ask for this data based on what their current

customers are experiencing. Also ask about training—is it required or recommended? If the answer is

yes, then that is a red flag. If the product is so easy, why do we need training?

6. What else does the vendor have to offer?

Vendor selection should play a role in the decision-making process. It is important to understand the

vendor’s business success and long term viability, their support capability, how well they communicate

with you and what other services or products they could offer to you today and over time. You should

also consider positive existing relationships with the vendor and/or system integrator.

Enterprise Strategy Group Page 13

ESG’s View

ESG Report

Data De-Duplication and D2D Backup Systems

Disasters will happen. They can range from a file being lost, to a data center being flooded, to an entire

building being destroyed. Some of these incidents are common ones, such as file loss or data corruption,

systems and infrastructure going down and becoming unavailable, disk drives failing, and users err

(remember, humans invented human error). Then there are the somewhat common incidents, including

facility disasters involving flooding or fire. Even though the odds are less likely, there may be major natural

disasters to contend with including earthquakes, tornadoes, and hurricanes. And there have been a few

recent incidents of large geographic blackouts that take hours or even days to correct.

The fact that most companies and organization still use tape as their primary defense against these events

is troubling. Once, there was an economic rationalization to use it, but data de-duplication-enabled

solutions invalidate this. There are end-users that are mandated to use tape for governance and regulatory

reasons, but they can use data de-duplication-enabled solutions to augment their environments. For those

companies not so encumbered, they should certainly consider data de-duplication to complement and even

replace their tape systems.

Data de-duplication is a powerful form of virtualization—the ability to logically view and manage physical

assets for greater utilization and automation of otherwise manual tasks. Data de-duplication achieves both

of these goals by significantly reducing the amount of capacity required to store backup data—5:1, 10:1,

20:1 and beyond. Additionally, data de-duplication reduces or even eliminates the need to manage tapes.

Dealing with tape media management is archaic in this digital age. It is analogous to someone still

stubbornly hand washing the dishes even though he or she has dishwasher right next to the sink.

Tape will be around for some time to come. There are still governance and regulatory mandates that

ensure its survival. Additionally, incumbency often trumps innovation and the change management policy

is to not change anything. There is still a great deal of education that also needs to occur. Not enough

people know about data de-duplication or are skeptical about its abilities.

Data de-duplication is very real and provides excellent value. ESG believes that it will become prevalent

over time within D2D backup and all storage and application tiers. However, it is important to not only

evaluate data de-duplication capabilities, but the entire product, customer references, market and company

success. ESG encourages you to ask the questions outlined in this report in order to leverage the benefits

that can certainly be derived by data de-duplication. Data de-duplication changes the data protection

landscape and is one of the few categories that offers such a clear “no-brainer” value proposition.

Enterprise Strategy Group Page 14

Similar magazines