25.09.2015 Views

Teradata Parallel Data Pump

Teradata Parallel Data Pump Reference - Teradata Developer ...

Teradata Parallel Data Pump Reference - Teradata Developer ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 1: Overview<br />

<strong>Teradata</strong> T<strong>Pump</strong> Utility<br />

fields with the KEY modifier when the fields are part of the primary index of the table. If the<br />

DML statements in the <strong>Teradata</strong> T<strong>Pump</strong> script specify more than one target table then it is up<br />

to the script author to make sure that primary indices of all the tables match when using the<br />

serialization feature.<br />

The serialization feature works by hashing each data record based upon its key to determine<br />

which session transmits the record to the database. Thus the extra overhead in the application<br />

is derived from the mathematical operation of hashing and from the extra amount of<br />

buffering necessary to save data rows when a request is already pending on the session chosen<br />

for transmission.<br />

The serialization feature greatly reduces the potential frequency of database deadlock.<br />

Deadlocks can occur when requests for the application happen to affect row(s) that use the<br />

same hash code within the database. Although deadlocks are handled by the database and by<br />

<strong>Teradata</strong> T<strong>Pump</strong> correctly, the resolution process is time-consuming and adds additional<br />

overhead to the application because it must re-execute requests that roll back due to deadlock.<br />

In addition to using SERIALIZEON in the BEGIN LOAD command, the SERIALIZEON<br />

keyword can also be specified in the DML command. This lets serialization to be turned on for<br />

the fields specified. For more information on the DML-based serialization feature, refer to<br />

“DML” on page 115.<br />

Dual <strong>Data</strong>base Strategy<br />

The serialization feature is intended to support a variety of other potential customer<br />

applications that go under the general heading dual database. These are applications that in<br />

some way take a live feed of inserts, updates, and deletes from another database and apply<br />

them without any preprocessing to <strong>Teradata</strong> <strong>Data</strong>base.<br />

Both <strong>Teradata</strong> T<strong>Pump</strong> and MultiLoad are potential parts of the dual database strategy. A dual<br />

database application will generate a DML stream which will be routed to <strong>Teradata</strong> T<strong>Pump</strong> or<br />

MultiLoad through a paramod/inmod specific to the application. The choice between<br />

<strong>Teradata</strong> T<strong>Pump</strong> or MultiLoad will depend on such things as the volume of data (with higher<br />

volumes favoring MultiLoad) and the concurrent access requirements (with greater access<br />

requirements favoring <strong>Teradata</strong> T<strong>Pump</strong>).<br />

Resource Usage and Limitations<br />

A feature unique to <strong>Teradata</strong> T<strong>Pump</strong> is the ability to constrain run-time resource usage<br />

through the statement rate feature. <strong>Teradata</strong> T<strong>Pump</strong> provides control over the rate per minute<br />

at which statements are sent to the database and the statement rate correlates directly to<br />

resource usage on both the client and in the database. The statement rate can be controlled in<br />

two ways, either dynamically while the job is running, or it can be scripted into the job with<br />

the RATE keyword on the BEGIN LOAD command. Dynamic control over the statement rate<br />

is provided by updates to a table on the database.<br />

In contrast with <strong>Teradata</strong> T<strong>Pump</strong>, MultiLoad always uses CPU and memory very efficiently.<br />

During phase one (assuming that the database is not a bottleneck), MultiLoad will probably<br />

bottleneck on the client, consuming significant network or channel resources. During phase<br />

two, MultiLoad uses very significant database disk, CPU, and memory resources. In fact, the<br />

database limits the number of concurrent MultiLoad, FastLoad, and FastExport jobs for the<br />

<strong>Teradata</strong> <strong>Parallel</strong> <strong>Data</strong> <strong>Pump</strong> Reference 21

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!