Redshift Vs Teradata - An In-Depth Comparison

EBOOK 

REDSHIFT VS TERADATA 

AN IN-DEPTH COMPARISON 

AMAZON REDSHIFT 

TERADATA

Table of Contents 

Redshift Vs Teradata 1 

Redshift Architecture & Its Features 1 

Teradata Architecture & Its Features 2 

Redshift Data Model 4 

Teradata Data Model 7 

Pros 8 

Cons 9 

Teradata Pros and Cons 12 

Pros 12 

Cons 13 

Features supported only by Teradata, not Redshift 15 

Redshift Vs Teradata In A Nutshell 16 

Pricing and Effort Comparison 20 

When and How to Migrate data from Teradata to Redshift 21 

Summary 22 

ETL Challenges While Working With Amazon Redshift 23

1 

Redshift Vs Teradata 

Redshift versus Teradata has been one of the most debatable data 

warehouse comparisons. In this ebook, we will cover the detailed 

comparison between Redshift and Teradata. 

Redshift Architecture & Its Features 

Redshift is a fully managed petabyte scale data warehouse on the cloud. 

You can even start working from a few Gigabytes or Terabytes of data. 

Additionally, you can also scale it up to petabytes depending upon your 

business requirement. Redshift engine is also called a cluster and it is 

built up from one or more nodes. There are two types of nodes called 

Compute and Leader node. Compute node contains 2 or more slices 

depending upon node types. Leader node does multiple roles which 

include communicating with JDBC/ODBC client and creating the query 

execution plan to transfer it to compute node(s). Also, the cluster is 

incomplete without a Leader node. 

You can check out our blog for a detailed article on Redshift Architecture.

2 

Teradata Architecture & Its Features 

Teradata is an RDBMS, meant for a data warehouse with an on-premise 

setup. It requires installation since it is unavailable on cloud platforms. 

Although Teradata is not over the cloud, you can spin up a Teradata 

instance on a cloud VM. Teradata is designed on MPP shared nothing 

architecture. 

Here is a diagrammatic representation of Teradata Architecture.

3 

The four major components of Teradata are as follows: 

1. Node: The primary component of Teradata is called Node, which is a 

basic unit of Teradata. It has its own OS, CPU, RAM, disk space etc. 

2. Parsing Engine: Parsing Engine or PE is responsible for preparing the 

query execution plan. 

3. BYNET: BYNET receives query execution plan from PE and transfers it 

to AMPs aka Virtual Processor and vice versa. It is also called as Message 

Parsing layer. 

4. Access Module Processor (AMP): AMP is an important component of 

Teradata. AMP manages the processing of data by storing it in vDisks. 

Data can be stored in any AMP depending on the hash algorithm. In case 

the first BYNET fails there is an additional BYNET to take over. BYNET is 

responsible to communicate between the AMPs. In multi-node systems, 

Teradata will have at least two BYNETs to make the system fault tolerant.

4 

Redshift Data Model 

Redshift data model is designed for Data warehousing purposes. The 

unique features of Redshift make it a smart Data warehouse choice. 

1. Redshift is a fully managed data warehouse. You don't have to worry 

about setting up and installing the database. You just have to spin up 

your cluster and the database is ready. 

2. Redshift’s backup and restore are fully automatic. Through automatic 

snapshots, data in Redshift automatically gets backed up in S3 internally 

at regular intervals. 

3. Data is fully secured by inbound security rule and SSL connection. It 

has VPC for VPC mode and inbound security rule for classic mode cluster. 

4. Redshift stores data in the columnar format, unlike other data 

warehouses storage. For example, if you hit your query for a specific 

column, Redshift will exclusively search in that specific column instead of 

the entire row. This saves an enormous amount of time in query 

processing. 

5. Data is stored in blocks of 1 MB instead of typical blocks of 8 KB or 64 

KB which helps Redshift to store more data in a single block. 

6. Redshift does not have the concept of indexes. Instead, it has zone 

maps. With the help of zone map Redshift easily identifies which block 

has lowest and highest value for that column. Zone maps inform the 

cluster about all the blocks that are needed to be read. 

7. Redshift has column compression (encoding). ANALYZE 

COMPRESSION command automatically tells what compression strategy 

to apply for that table. Redshift provides various encoding techniques. 

Refer AWS documentation for more details on encoding.

5 

8. Redshift has a feature of caching the result of repeat queries for faster 

performance. To check whether your query has used cache, you can see 

the output of column source_query available in SVL_QLOG. If your query 

has used cache it will store the value of query id of which was run by the 

specific user id. 

Example: 

SELECT USERID, QUERY, ELAPSED, SOURCE_QUERY from SVL_QLOG WHERE 

USERID in (600, 601); 

In the below example, QUERY ID 853219 of USERID 601 has used the 

cache. (QUERY ID 123456 of USERID 600). Also, QUERY ID 853219 ran 

by userid → 601 has utilized the cache and elapsed time in microseconds 

has reduced drastically. 

USERID | QUERY ID 

| ELAPSED | SOURCE_QUERY 

--------+-------------+----------+--------------- 

600 | 123456 | 90000 | NULL 

600 | 567890 | 80000 | NULL 

601 | 853219 | 30 | 123456 

9. Redshift data model is similar to a typical data warehouse when it 

comes to analytical queries. You can create fact tables, dimension tables, 

and views. It supports all major query execution strategy i.e., Inner join, 

Outer join, Subquery, and Common Table Expressions (with clause). 

10. From a storage perspective Redshift cluster maintains multiple copies 

of your data as part of fault tolerance.

6 

Teradata Data Model 

1. Teradata is a massive parallel Data warehouse with shared-nothing 

architecture. However, unlike Redshift, the data is stored in a row-based 

format. 

2. Teradata uses a different kind of indexes for fast data retrieval. Indexes 

include Primary, Secondary, Join, and Hash Indexes, etc. Please note that 

Secondary Index does not affect the distribution of rows across AMPs. 

Although, the secondary index takes extra processing overhead. 

3. Teradata supports and enforces Primary and Secondary index. 

4. Teradata has a hybrid storage concept where frequently used data is 

stored in SSD while the less accessed data is stored in HDD. Teradata 

has a higher storage capacity than Redshift. 

5. Teradata does support Table partitioning feature, unlike Redshift. 

6. Teradata uses the Hash algorithm to distribute data into various disk 

storage units. 

7. Teradata can scale up to 2048 nodes. It has a storage capacity ranging 

from 10 TB to 94 petabytes thus providing higher storage capacity than 

Redshift. 

8. Teradata supports all kinds of major SQL related features (Primary 

Index, Secondary Index, Sequences, Stored Procedures, User Defined 

Functions, and Macros etc) which are compulsorily needed as part of Data 

Warehouse RDBMS. 

9. Teradata's data model is designed to be fault tolerant. It is also 

designed to be scalable with redundant network connectivity to ensure 

throughout data connectivity and availability.

7 

Redshift Pros and Cons 

Pros 

1. Loading and unloading of data is exceptionally fast. You can load data 

in parallel mode. Redshift, even for a high volume of data, supports data 

loading from the zipped file. Redshift recommends loading the data from 

the COPY command for faster performance. 

2. You can load data from NoSQL database service, AWS DynamoDB. 

Refer AWS documentation for more detailed information about 

DynamoDB. 

3. You have an option to choose the node type (Dense Storage or Dense 

Compute) of your cluster depending upon your data needs and business 

requirements. 

4. You can scale your cluster's storage and CPU for better performance at 

any instant without any impact to the cluster. 

5. You can migrate your data from various data warehouses into Redshift 

without much hassle. AWS does provide a service for the same called 

Database Migration Service (DMS). Refer to AWS documentation for 

more detailed information. 

6. You do not have to worry about the security as you can build your 

cluster inside a VPC and also use SSL encryption for further protection. 

7. Redshift backup and restore feature is pretty simple. Through 

automatic snapshots, your data is automatically backed up regularly. 

Snapshots are incremental, so you do not have to worry about any 

misses. You can also copy data to another region in case of any business 

need. Kindly refer AWS documentation for more details on working with 

snapshots.

8 

8. Redshift has an advanced feature called Redshift Spectrum. Using 

Redshift Spectrum you can query huge amounts of data directly from S3. 

While doing so, you can skip the loading of data through COPY command 

or any other method. You can refer to the detailed guide on Redshift 

Spectrum for more information. 

9. Using Sort Keys, data can be pre-sorted based on specific columns. 

Also, the query performance can be improved automatically. 

10. Using Distribution Keys, data can be easily distributed across nodes 

equally to increase the query performance. 

11. Redshift provides various pre-built system tables and views to help 

developers and designers to help out during ETL and other processes. 

12. Setup related commands can be run through various modes such as 

AWS console, Command Line Interface (CLI), API, etc. 

13. AWS Redshift applies some patches and upgrades to the cluster 

automatically through maintenance window (configurable value). ence 

you do not have to worry about applying patches. 

Cons 

1. In Redshift, there is no concept of function, triggers, and procedures. 

2. There is no concept of sequence column in Redshift. You need to 

handle it through your ETL logic in case you need to generate sequence 

number of your column. 

3. Unlike other common data warehouses, Redshift does not enforce 

Primary keys or Foreign keys which can create data integrity issues.

9 

4. Only S3, DynamoDB, and EMR support a parallel load in Redshift. In 

case you want to load data from other services you need to write ETL 

scripts or use ETL solutions such as Hevo. 

5. It requires a good understanding of Sort and Dist key. There are some 

basic ground rules to set for sort and dist keys. If set improperly then it 

could lead to hampering of performance. 

6. Distribution keys cannot be changed once it is created. You need to be 

extremely careful while designing your tables. Wrong distribution keys 

could hamper the overall performance. 

7. In Redshift, there is no concept of DBLink, you cannot directly connect 

to another database/data warehouse tables for your queries. 

8. In Redshift, VACUUM and ANALYZE are mandatory on key tables. It 

can hamper the performance badly if run during business hours. Hence it 

needs to be handled carefully. 

9. In Redshift cluster, there is a limit on the number of nodes, databases, 

tables, etc. Maximum storage limit is still lesser than data warehouses like 

Teradata. Here is the node limitation list: 

Node Type vCPU Storage per Node Node Range 

dc1.large 2 160 GB SSD 1-32 

dc1.8xlarge 32 2.56 TB SSD 2-128 

dc2.large 2 160 GB NVMe-SSD 1-32 

dc2.8xlarge 32 2.56 TB NVMe-SSD 2-128 

ds2.xlarge 4 2 TB HDD 1-32 

ds2.8xlarge 36 16 TB HDD 2-128 

You can refer to AWS documentation to know more about the limits in 

Amazon Redshift.

10 

10. Although Redshift in classic mode is still in use, its cluster 

performance is relatively modest. 

11. Redshift still supports only a single AZ environment and does not 

support multi-AZ environment. 

12. Redshift has a limit on query concurrency of 15. You can have a 

maximum of 8 queues in a cluster. If your queues are unmanaged, then it 

hinders the performance. 

13. Your design should make sure that the cluster is not in use during the 

maintenance window period, else job will fail. 

14. There is no concept of table partitioning in Redshift. 

15. In Redshift, you do not have a concept of SET and MULTISET tables 

(SET tables are the tables that do not allow duplicates). This needs to be 

handled programmatically else it could lead to reporting errors if handled 

inappropriately. 

You can refer to Hevo’s blog which talks about the Pros and Cons of 

Amazon Redshift in complete detail.

11 

Teradata Pros and Cons 

Pros 

1. Teradata is a massively parallel data warehouse with shared nothing 

architecture. 

2. Teradata has provided pre-built utilities i.e. Fastload, Multiload, TPT, 

BTEQ etc. 

3. Teradata is linearly scalable. If data volume rises, AMPs or Nodes can 

also be increased. 

4. Teradata also has fallback feature. In case one AMP is down, another 

AMP will take over for data retrieval. 

5. Teradata provides an impressive tool called Teradata Visual Explain. It 

visually shows the execution plan of queries in a graphical manner. This 

helps developers/designers to fine-tune their queries. 

6. Teradata provides Ferret utility to set and display storage space 

utilization.

12 

Cons 

1. One of the biggest cons of Teradata is that it is not cloud-based unless 

scaled up to run over the cloud. It requires some initial setup or you need 

to integrate with other cloud service providers i.e, AWS or Azure. 

2. It is not a columnar data warehouse. 

3. Since Teradata is not a columnar DB, it runs entire row even if you 

search over a single column. You may end up with performance issues 

unless your data warehouse is properly designed. 

4. If a query runs on a set of different columns over the bigger dataset, it 

could lead to performance issues; unless query has been run on the 

indexed columns. 

5. Teradata only supports a maximum of 128 joins in a single query. If you 

want to perform more joins, you need to break them into chunks and 

handle it accordingly. 

6. Redshift outperforms Teradata in Analytical performance, Visualisation 

on storage, & CPU utilization visualization. Everything can be viewed in a 

single AWS console or through the Cloudwatch monitor in Redshift. On 

the other hand, Teradata provides separate visual tools while for few 

others checks and commands need to be hit in Teradata client. 

7. Teradata has no default column compression mechanism. Column 

compression needs to be done manually, and you can perform up to 256 

unique column value compression per column. 

8. There are a lot of limitations on the number of columns, table value, and 

table name length in Teradata. You can refer to Teradata documentation 

for more detailed information.

13 

Features supported only by Redshift, not 

Teradata 

1. The most valuable feature of Redshift is that it is cloud-based and fully 

managed. Although, Teradata has a Teradata Database Developer (Single 

Node) a full-featured data warehouse software. 

2. No need to worry about backup and restore as manual snapshots and 

restore can also be done. 

3. Backed up data (snapshot) is automatically stored in S3. No need to 

worry about storing data in tape or any outside system. 

4. Redshift has an excellent feature of loading data through COPY 

command that too in the parallel mode where all nodes/slices can 

participate together to make the performance faster. 

5. Redshift performs automatic column level compression, and it suggests 

compression mechanisms on all table columns (command is ANALYZE 

COMPRESSION). 

6. Due to the VPC feature in AWS, Redshift security is too tight and well 

controlled.

14 

Features supported only by Teradata, not 

Redshift 

1. Teradata supports various features including Procedures, Triggers, etc. 

2. Teradata has a column sequencing feature while Redshift doesn't. 

3. Teradata provides various load and unload utilities i.e. TPT, FastLoad, 

FastExport, Multiload, TPump, and BTEQ. You can use them depending 

upon data volume, business logic, and leverage it in your ETL logic. 

4. Teradata has a few visual utilities which Redshift should have such as 

Teradata Visual Explain. In Redshift, you need to hit query to view Explain 

plan. 

5. Teradata supports MULTISET and SET tables while Redshift doesn't. 

6. Teradata supports Macros but Redshift doesn't. Macros are a set of 

predefined SQL statements logically stored in Database. Macros also 

reduce LAN traffic. 

Example: 

CREATE MACRO Get_Sales AS ( 

SELECT SalesId, StoreId, StoreName, StoreAddress FROM Stores ORDER BY 

StoreId; 

); 

Exec Get_Sales; 

→ This macro execute command will retrieve all rows from Stores table.

15 

Redshift Vs Teradata In A Nutshell 

Items Redshift Teradata 

Cloud perspective 

Backup and restore 

strategy 

Data Load and 

Unload 

Table Storage 

Fully managed Data Warehouse 

over cloud. 

Backups are automatically taken 

care of through the snapshot 

feature. Snapshots are stored 

internally stored in S3, which is 

highly durable. 

Redshift leverages data load 

through COPY command and 

unload through UNLOAD 

command. Using COPY 

command, data is loaded 

automatically so that all nodes 

can participate equally for faster 

performance. 

Redshift follows columnar 

storage format. If the query is hit 

based on a specific set of the 

columns or only on specific 

column then it provides an 

impressive performance. Hence, 

aggregates are very fast in 

Redshift as it leverages column 

level hit. 

Core Data Warehouse is not 

over the cloud. Initial setup is 

required by DBAs/Export. 

Teradata can be scaled to run 

over the cloud (AWS/Azure) 

with pay-as-you-go model. 

Teradata backup and restore 

can be manual or automated 

(using BAR) but data is stored 

in an outside system. 

In Teradata, we have separate 

utilities to handle load/unload. 

Teradata provides TPT, 

FastExport, FastLoad, etc. They 

can be leveraged accordingly 

for your ETL/ELT. 

Teradata follows row level 

storage. Teradata requires a 

proper indexing on columns so 

that data can be stored 

properly in AMPs. If indexes 

are not proper or table hit is 

done on non-indexed column 

then it could cause 

performance issue.

16 

Internal Storage 

In Redshift, data is stored over 

chunks of 1 MB blocks of each 

column. Each block follows zone 

mapping. Using zone mapping, 

blocks stores minimum and 

maximum value of that column. 

In Teradata, the data storage is 

managed by AMPs under 

vDisks and data is distributed 

based on hash algorithm (i.e. 

based on index defined etc) 

and data is retrieved 

accordingly. 

Referential 

Integrity Model 

Redshift tables do have Primary 

Keys and Foreign Keys but it 

does not follow enforcement. 

You need to apply your logic 

such that referential integrity 

model is applied on Redshift 

tables. 

Teradata tables have Primary 

Keys and Foreign Keys and it 

follows enforcement. 

Hence, it has an additional 

overhead of doing reference 

checks while processing. 

Sequence Support 

Triggers, Stored 

Procedures 

Visual Features 

There is no concept of column 

sequencing. If you want to create 

a sequence on any column you 

need to handle it 

programmatically. 

In Redshift, there is no concept 

of Triggers or Stored Procedures. 

Redshift is a part of AWS, 

an integrated service. Entire 

Redshift performance can be 

monitored through AWS 

console, Cloudwatch, and 

automatic alerts. 

You can define Sequence on a 

column. 

You can create Triggers or 

Stored Procedures in Teradata. 

It has few visual tools like 

Teradata Visual Explain but 

they are cluttered. 

Max Concurrency 

Maximum 15 concurrent queries. Runs more than 15 concurrent

17 

By default its concurrency is 5. 

queries. 

Macros Support No concept of Macros. Supports Macros. 

NoSQL to Redshift 

Feature 

Although, Redshift cannot load 

NoSQL data from other vendors 

but it can load data from 

DynamoDB. 

No such feature supported yet. 

Maximum Storage 

Capacity 

2 PB 

(16*128 DS2.8xlarge ~ 2 PB) 

Storage capacity of much more 

than 2 PB of data. 

Column 

Compression 

In Redshift, when the table is 

created it automatically creates 

default compression on all 

columns. It also provides a 

command called ANALYSE 

COMPRESSION to help on 

column compression. 

In Teradata, you need to 

specify column compress on 

individual columns. You can 

compress 

up to 128 unique values per 

column in a table. 

Maximum Columns 

Per Table 

Maximum 1600 columns per 

table. 

Maximum 258 columns per 

row. 

Maximum Joins No limit as such. 64 joins per query block. 

Data Warehouse 

Maintenance/ 

Updates 

Table Indexes 

Redshift applies regular patches 

and does automatic maintenance 

inside maintenance window. 

It does not have table index 

concept but its performance is 

In Teradata, DBAs need to take 

care of all these activities 

manually or through some tool. 

Teradata does provide various 

types of index i.e. Primary

18 

unaffected due to zone mapping 

and sort key features. 

Index, Secondary Index, etc. 

Table partitioning 

Redshift Spectrum has but 

Redshift doesn’t. 

Tables can be partitioned. 

Fault Tolerance 

Redshift is Fault Tolerant. In 

case, there is any node failure, 

Redshift will automatically 

replace the failed node with the 

replacement node. Although, 

multi-AZ is not supported in 

Redshift. 

Teradata is also fault tolerant. 

In case, there is a failover in 

AMP, fallback AMP will take 

over automatically.

19 

Pricing and Effort Comparison 

Redshift leads Teradata in effort and in-house pricing. Redshift is cheaper 

and easier than Teradata. For Redshift, you only need to turn on the 

cluster, set up security settings, few other options (maintenance window 

period, snapshot enabling option, etc), and you are ready to go. This way 

DBAs efforts get reduced. 

However, in terms of storage, Teradata has upper hand because Redshift 

cluster has limitations. However, in Redshift, we can still handle that 

through S3 as it does not have any space limitation. 

Remember, both Teradata and Redshift Data Warehouses are designed 

to solve different purposes. 

You can refer to Redshift and Teradata to know about pricing.

20 

When and How to Migrate data from Teradata to 

Redshift 

There are various considerations that need to be made on whether to 

migrate from Teradata to AWS/cloud. 

1) How stable is your Teradata Warehouse? 

2) How much is your Teradata data volume? 

3) How complex is your Teradata data model? 

4) How much is your current Teradata data latency? 

5) How good is your Teradata RDBMS performance? 

6) How many BI tools are you using on your Teradata 

tables/views/cubes? 

7) Are you using plenty of unsupported features of Redshift in 

Teradata? 

8) Will migrating your data warehouse from Teradata to Redshift 

break your system? 

9) Your budget of maintaining the Redshift and other key AWS 

services post-migration. 

If all conditions are satisfied, you easily migrate your data from Teradata 

to Redshift. AWS provides a useful service called Data Migration Service 

(DMS) and Schema Conversion Tool (SCT). Although, this pretty handy 

service is not fully automated as some minor manual efforts are required. 

Please refer to AWS documentation for migrating data from Teradata to 

Redshift.

21 

Summary 

Choosing between Redshift and Teradata is a tough question to answer 

as both are solving different purposes. Redshift performs analytics and 

reporting extremely well. Since Redshift is a columnar base data 

warehouse, its performance is really good when it comes to hitting the 

table/view based columns and aggregate functions (sum, avg, count(*), 

etc). As Redshift is a part of AWS service, it is integrated with all vital 

AWS services. Hence you don't need to store millions of data in Redshift 

alone as you can archive old data in S3. If required, you can leverage 

Redshift Spectrum to build your analytics and reports on top of it. Stored 

procedures can be handled through AWS Lambda Service. In terms of 

age, Redshift is a comparatively newer data warehouse. Redshift is still 

developing features which other key data warehouses offer. 

On the other hand, Teradata is pretty matured and old. Teradata as an 

RDBMS may not provide similar performance as Redshift unless it has a 

properly designed data model, fully leveraged features (FastLoad, 

Multiload, TPT, BTEQ, etc), and table/views are properly tuned. Although, 

some established customers might be reluctant to migrate from Teradata 

to Redshift. They can also look for the hybrid model option. 

In conclusion, it is still an ongoing debate, both Redshift and Teradata 

have its pros and cons.

22 

ETL Challenges While Working With Amazon Redshift 

Data loading is one of the biggest challenges of Redshift. To perform ETL 

to Redshift, you would need to invest precious engineering resources to 

extract, clean, enrich, and build data pipelines. However, writing complex 

scripts to automate all of this is not easy. It gets harder if you want to 

stream your data real-time. Data loss becomes an everyday phenomenon 

due to issues that crop up with changing sources, unstructured & unclean 

data, incorrect data mapping at the warehouse, and more. 

Using a data integration platform like Hevo can solve all your Redshift 

ETL problems. With Hevo you can move any data into Redshift in minutes 

in a hassle-free fashion. Hevo integrates with a variety of data sources 

ranging from SQL, NoSQL, SaaS, File Storage Base, Webhooks, etc. with 

the click of a button. 

Sign up for a free trial here or view a quick video on how Hevo can help. 

About Author: 

Ankur Shrivastava is a AWS Solution Designer with hands-on experience 

on Data Warehousing, ETL, and Data Analytics. He is an AWS Certified 

Solution Architect Associate. In his free time, he enjoys all outdoor sports 

and practices.

Looking for a simple and reliable way to bring Data 

from Any Source to AWS Redshift? 

TRY HEVO 

SIGN UP FOR FREE TRIAL

Redshift Vs Teradata - An In-Depth Comparison

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?