Teradata Parallel Data Pump
Teradata Parallel Data Pump Reference - Teradata Developer ...
Teradata Parallel Data Pump Reference - Teradata Developer ...
- No tags were found...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Chapter 1: Overview<br />
<strong>Teradata</strong> T<strong>Pump</strong> Utility<br />
fields with the KEY modifier when the fields are part of the primary index of the table. If the<br />
DML statements in the <strong>Teradata</strong> T<strong>Pump</strong> script specify more than one target table then it is up<br />
to the script author to make sure that primary indices of all the tables match when using the<br />
serialization feature.<br />
The serialization feature works by hashing each data record based upon its key to determine<br />
which session transmits the record to the database. Thus the extra overhead in the application<br />
is derived from the mathematical operation of hashing and from the extra amount of<br />
buffering necessary to save data rows when a request is already pending on the session chosen<br />
for transmission.<br />
The serialization feature greatly reduces the potential frequency of database deadlock.<br />
Deadlocks can occur when requests for the application happen to affect row(s) that use the<br />
same hash code within the database. Although deadlocks are handled by the database and by<br />
<strong>Teradata</strong> T<strong>Pump</strong> correctly, the resolution process is time-consuming and adds additional<br />
overhead to the application because it must re-execute requests that roll back due to deadlock.<br />
In addition to using SERIALIZEON in the BEGIN LOAD command, the SERIALIZEON<br />
keyword can also be specified in the DML command. This lets serialization to be turned on for<br />
the fields specified. For more information on the DML-based serialization feature, refer to<br />
“DML” on page 115.<br />
Dual <strong>Data</strong>base Strategy<br />
The serialization feature is intended to support a variety of other potential customer<br />
applications that go under the general heading dual database. These are applications that in<br />
some way take a live feed of inserts, updates, and deletes from another database and apply<br />
them without any preprocessing to <strong>Teradata</strong> <strong>Data</strong>base.<br />
Both <strong>Teradata</strong> T<strong>Pump</strong> and MultiLoad are potential parts of the dual database strategy. A dual<br />
database application will generate a DML stream which will be routed to <strong>Teradata</strong> T<strong>Pump</strong> or<br />
MultiLoad through a paramod/inmod specific to the application. The choice between<br />
<strong>Teradata</strong> T<strong>Pump</strong> or MultiLoad will depend on such things as the volume of data (with higher<br />
volumes favoring MultiLoad) and the concurrent access requirements (with greater access<br />
requirements favoring <strong>Teradata</strong> T<strong>Pump</strong>).<br />
Resource Usage and Limitations<br />
A feature unique to <strong>Teradata</strong> T<strong>Pump</strong> is the ability to constrain run-time resource usage<br />
through the statement rate feature. <strong>Teradata</strong> T<strong>Pump</strong> provides control over the rate per minute<br />
at which statements are sent to the database and the statement rate correlates directly to<br />
resource usage on both the client and in the database. The statement rate can be controlled in<br />
two ways, either dynamically while the job is running, or it can be scripted into the job with<br />
the RATE keyword on the BEGIN LOAD command. Dynamic control over the statement rate<br />
is provided by updates to a table on the database.<br />
In contrast with <strong>Teradata</strong> T<strong>Pump</strong>, MultiLoad always uses CPU and memory very efficiently.<br />
During phase one (assuming that the database is not a bottleneck), MultiLoad will probably<br />
bottleneck on the client, consuming significant network or channel resources. During phase<br />
two, MultiLoad uses very significant database disk, CPU, and memory resources. In fact, the<br />
database limits the number of concurrent MultiLoad, FastLoad, and FastExport jobs for the<br />
<strong>Teradata</strong> <strong>Parallel</strong> <strong>Data</strong> <strong>Pump</strong> Reference 21