Teradata Parallel Data Pump
Teradata Parallel Data Pump Reference - Teradata Developer ...
Teradata Parallel Data Pump Reference - Teradata Developer ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 1: Overview<br />
<strong>Teradata</strong> T<strong>Pump</strong> Utility<br />
Economy of Scale and Performance<br />
MultiLoad performance improves as change volume increases because, in phase two of<br />
MultiLoad, changes are applied to target tables in a single pass. All changes for any physical<br />
data block are effected using one read and one write of the block. Furthermore, the temporary<br />
table and the sorting process used by MultiLoad are additional overheads that must be<br />
“amortized” through the volume of changes.<br />
<strong>Teradata</strong> T<strong>Pump</strong>, on the other hand, performs better for relatively low change volume because<br />
there is no temporary table overhead. <strong>Teradata</strong> T<strong>Pump</strong> becomes expensive for large volumes<br />
of data because multiple updates to a physical data block will most likely result in multiple<br />
reads and writes of the block.<br />
Loading No Primary Index (NoPI) Tables<br />
A NoPI table has no primary index. These tables can be used as staging tables where data is<br />
always appended to the table, making population of the table generally faster than that of a<br />
traditional table containing a primary index.<br />
NoPI tables could increase performance for <strong>Teradata</strong> T<strong>Pump</strong> Array INSERT.<br />
Multiple Statement Requests<br />
The most important technique used by <strong>Teradata</strong> T<strong>Pump</strong> to improve performance over<br />
MultiLoad is the multiple statement request. Placing more statements in a single request is<br />
beneficial for two reasons. First, it reduces network overhead because large messages are more<br />
efficient than small ones. Secondly, (in ROBUST mode) it reduces <strong>Teradata</strong> T<strong>Pump</strong> recovery<br />
overhead, which amounts to one extra database row written for each request. <strong>Teradata</strong> T<strong>Pump</strong><br />
automatically packs multiple statements into a request based upon the PACK specification in<br />
the BEGIN LOAD command.<br />
Macro Creation<br />
<strong>Teradata</strong> T<strong>Pump</strong> uses macros to efficiently modify tables rather than actual DML commands.<br />
The technique of changing statements into equivalent macros before beginning the job greatly<br />
improves performance.<br />
Specifically, the benefits of using macros are:<br />
• The size of network (and channel) messages sent to the database by <strong>Teradata</strong> T<strong>Pump</strong> are<br />
reduced.<br />
• <strong>Teradata</strong> <strong>Data</strong>base parsing engine overhead is reduced because the execution plans (or<br />
steps) for macros are cached and re-used. This eliminates normal parser handling, where<br />
each request sent by <strong>Teradata</strong> T<strong>Pump</strong> is planned and optimized.<br />
Because the space required by macros is negligible, the only issue regarding macros is where<br />
they are placed in the database. Macros are put into the database that contains the restart log<br />
table or the database specified using the MACRODB keyword in the BEGIN LOAD command.<br />
Locking and Transactional Logic<br />
In contrast to MultiLoad, <strong>Teradata</strong> T<strong>Pump</strong> uses conventional row hash locking, which allows<br />
for some amount of concurrent read and write access to its target tables. At any point,<br />
<strong>Teradata</strong> <strong>Parallel</strong> <strong>Data</strong> <strong>Pump</strong> Reference 19