25.09.2015 Views

Teradata Parallel Data Pump

Teradata Parallel Data Pump Reference - Teradata Developer ...

Teradata Parallel Data Pump Reference - Teradata Developer ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 1: Overview<br />

<strong>Teradata</strong> T<strong>Pump</strong> Utility<br />

Economy of Scale and Performance<br />

MultiLoad performance improves as change volume increases because, in phase two of<br />

MultiLoad, changes are applied to target tables in a single pass. All changes for any physical<br />

data block are effected using one read and one write of the block. Furthermore, the temporary<br />

table and the sorting process used by MultiLoad are additional overheads that must be<br />

“amortized” through the volume of changes.<br />

<strong>Teradata</strong> T<strong>Pump</strong>, on the other hand, performs better for relatively low change volume because<br />

there is no temporary table overhead. <strong>Teradata</strong> T<strong>Pump</strong> becomes expensive for large volumes<br />

of data because multiple updates to a physical data block will most likely result in multiple<br />

reads and writes of the block.<br />

Loading No Primary Index (NoPI) Tables<br />

A NoPI table has no primary index. These tables can be used as staging tables where data is<br />

always appended to the table, making population of the table generally faster than that of a<br />

traditional table containing a primary index.<br />

NoPI tables could increase performance for <strong>Teradata</strong> T<strong>Pump</strong> Array INSERT.<br />

Multiple Statement Requests<br />

The most important technique used by <strong>Teradata</strong> T<strong>Pump</strong> to improve performance over<br />

MultiLoad is the multiple statement request. Placing more statements in a single request is<br />

beneficial for two reasons. First, it reduces network overhead because large messages are more<br />

efficient than small ones. Secondly, (in ROBUST mode) it reduces <strong>Teradata</strong> T<strong>Pump</strong> recovery<br />

overhead, which amounts to one extra database row written for each request. <strong>Teradata</strong> T<strong>Pump</strong><br />

automatically packs multiple statements into a request based upon the PACK specification in<br />

the BEGIN LOAD command.<br />

Macro Creation<br />

<strong>Teradata</strong> T<strong>Pump</strong> uses macros to efficiently modify tables rather than actual DML commands.<br />

The technique of changing statements into equivalent macros before beginning the job greatly<br />

improves performance.<br />

Specifically, the benefits of using macros are:<br />

• The size of network (and channel) messages sent to the database by <strong>Teradata</strong> T<strong>Pump</strong> are<br />

reduced.<br />

• <strong>Teradata</strong> <strong>Data</strong>base parsing engine overhead is reduced because the execution plans (or<br />

steps) for macros are cached and re-used. This eliminates normal parser handling, where<br />

each request sent by <strong>Teradata</strong> T<strong>Pump</strong> is planned and optimized.<br />

Because the space required by macros is negligible, the only issue regarding macros is where<br />

they are placed in the database. Macros are put into the database that contains the restart log<br />

table or the database specified using the MACRODB keyword in the BEGIN LOAD command.<br />

Locking and Transactional Logic<br />

In contrast to MultiLoad, <strong>Teradata</strong> T<strong>Pump</strong> uses conventional row hash locking, which allows<br />

for some amount of concurrent read and write access to its target tables. At any point,<br />

<strong>Teradata</strong> <strong>Parallel</strong> <strong>Data</strong> <strong>Pump</strong> Reference 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!