Teradata Parallel Data Pump

More documents

Recommendations

Info

Chapter 1: Overview Teradata TPump Utility Teradata TPump can be stopped while retaining full accessibility to the target tables. Note however, that if Teradata TPump is stopped, depending on the nature of the update process, the relational integrity of the data might be impaired. This differs from MultiLoad, which operates as a single logical update to one or more target tables. Once MultiLoad goes into phase two of its logic, the job is essentially irreversible and all target tables are locked for write access until the job completes. If Teradata TPump operates on rows that have associated triggers, the triggers are invoked as necessary. Recovery Logic and Overhead In Teradata TPump’s ROBUST mode, one database row is written in the log restart table for every request that it issues. This collection of rows in the restart log table can be referred to as the request log. Because a request is guaranteed by the database to either completely finish or completely rollback, the request log will always accurately reflect the completion status of a Teradata TPump import. Thus, the request log overhead for restart logic decreases as the number of statements packed per request increases. Teradata TPump also allows a checkpoint interval to be specified. During the checkpoint process Teradata TPump flushes all pending changes from the import file to the database and also cleans out the request log. The larger the checkpoint interval, the larger the request log (and its table) is going to grow. Upon an unexpected restart, Teradata TPump scans the import data source along with the request log in order to re-execute the statements not found in the request log. In Teradata TPump’s SIMPLE (non-ROBUST) mode, basic checkpoints are created. If a restart occurs between checkpoints, then some requests will likely be reprocessed. This is adequate protection under some circumstances. In contrast, phase one of MultiLoad uses checkpoints so restarts do not force a job to always restart from the beginning. During phase two, MultiLoad uses its temporary table as a repository of all changes to be applied. The database process of applying the changes guarantees that no changes are missed or applied more than once. Serialization of Changes In certain uses of Teradata TPump or MultiLoad, it is possible to have multiple changes to one row in a single job. For instance, a row might be inserted, then updated during the batch job, or it might be updated, then deleted. In any case, the correct ordering of these operations is obviously very important. MultiLoad automatically guarantees that this ordering of operations is maintained correctly. By using the serialization feature, Teradata TPump can accomplish the same ordering of operations, but to make it happen in Teradata TPump, a small amount of scripting work and utility overhead are required. The use of the serialize option on the BEGIN LOAD command guarantees that Teradata TPump will send each change for a data record of a given key in order. The KEY modifier to the FIELD command is how a script specifies that a given field is to be part of the serialization key. The intent of this feature is to allow specification of the key corresponding to the primary index of the target table. In fact, the TABLE command automatically qualifies the generated 20 Teradata Parallel Data Pump Reference
Chapter 1: Overview Teradata TPump Utility fields with the KEY modifier when the fields are part of the primary index of the table. If the DML statements in the Teradata TPump script specify more than one target table then it is up to the script author to make sure that primary indices of all the tables match when using the serialization feature. The serialization feature works by hashing each data record based upon its key to determine which session transmits the record to the database. Thus the extra overhead in the application is derived from the mathematical operation of hashing and from the extra amount of buffering necessary to save data rows when a request is already pending on the session chosen for transmission. The serialization feature greatly reduces the potential frequency of database deadlock. Deadlocks can occur when requests for the application happen to affect row(s) that use the same hash code within the database. Although deadlocks are handled by the database and by Teradata TPump correctly, the resolution process is time-consuming and adds additional overhead to the application because it must re-execute requests that roll back due to deadlock. In addition to using SERIALIZEON in the BEGIN LOAD command, the SERIALIZEON keyword can also be specified in the DML command. This lets serialization to be turned on for the fields specified. For more information on the DML-based serialization feature, refer to “DML” on page 115. Dual Database Strategy The serialization feature is intended to support a variety of other potential customer applications that go under the general heading dual database. These are applications that in some way take a live feed of inserts, updates, and deletes from another database and apply them without any preprocessing to Teradata Database. Both Teradata TPump and MultiLoad are potential parts of the dual database strategy. A dual database application will generate a DML stream which will be routed to Teradata TPump or MultiLoad through a paramod/inmod specific to the application. The choice between Teradata TPump or MultiLoad will depend on such things as the volume of data (with higher volumes favoring MultiLoad) and the concurrent access requirements (with greater access requirements favoring Teradata TPump). Resource Usage and Limitations A feature unique to Teradata TPump is the ability to constrain run-time resource usage through the statement rate feature. Teradata TPump provides control over the rate per minute at which statements are sent to the database and the statement rate correlates directly to resource usage on both the client and in the database. The statement rate can be controlled in two ways, either dynamically while the job is running, or it can be scripted into the job with the RATE keyword on the BEGIN LOAD command. Dynamic control over the statement rate is provided by updates to a table on the database. In contrast with Teradata TPump, MultiLoad always uses CPU and memory very efficiently. During phase one (assuming that the database is not a bottleneck), MultiLoad will probably bottleneck on the client, consuming significant network or channel resources. During phase two, MultiLoad uses very significant database disk, CPU, and memory resources. In fact, the database limits the number of concurrent MultiLoad, FastLoad, and FastExport jobs for the Teradata Parallel Data Pump Reference 21
Page 1 and 2: Teradata Parallel Data Pump Referen
Page 3 and 4: Preface Purpose This book provides
Page 5 and 6: Preface Additional Information Date
Page 7 and 8: Table of Contents Preface. . . . .
Page 9 and 10: Table of Contents Teradata TPump St
Page 11 and 12: Table of Contents Teradata TPump/No
Page 13 and 14: List of Tables Table 1: Teradata TP
Page 15 and 16: CHAPTER 1 Overview This chapter pro
Page 17 and 18: Chapter 1: Overview Teradata TPump
Page 19: Chapter 1: Overview Teradata TPump
Page 23 and 24: Chapter 1: Overview Operating Featu
Page 25 and 26: Chapter 1: Overview Operating Featu
Page 33 and 34: Chapter 1: Overview The Teradata TP
Page 35 and 36: Chapter 1: Overview The Teradata TP
Page 37 and 38: CHAPTER 2 Using Teradata TPump This
Page 39 and 40: Chapter 2: Using Teradata TPump Inv
Page 51 and 52: Chapter 2: Using Teradata TPump Ter
Page 53 and 54: Chapter 2: Using Teradata TPump Res
Page 55 and 56: Chapter 2: Using Teradata TPump Res
Page 57 and 58: Chapter 2: Using Teradata TPump Pro
Page 69 and 70: Chapter 2: Using Teradata TPump Wri
Page 71 and 72:
Chapter 2: Using Teradata TPump Wri
Page 73 and 74:
Chapter 2: Using Teradata TPump Wri
Page 75 and 76:
Chapter 2: Using Teradata TPump Vie
Page 77 and 78:
Page 79 and 80:
Page 81 and 82:
Chapter 2: Using Teradata TPump Mon
Page 83 and 84:
Chapter 2: Using Teradata TPump Mon
Page 85 and 86:
Chapter 2: Using Teradata TPump Est
Page 87 and 88:
Chapter 2: Using Teradata TPump Est
Page 89 and 90:
CHAPTER 3 Teradata TPump Commands T
Page 91 and 92:
Chapter 3: Teradata TPump Commands
Page 93 and 94:
Page 95 and 96:
Page 97 and 98:
Page 99 and 100:
Page 101 and 102:
Page 103 and 104:
Page 105 and 106:
Page 107 and 108:
Page 109 and 110:
Page 111 and 112:
Page 113 and 114:
Page 115 and 116:
Page 117 and 118:
Page 119 and 120:
Page 121 and 122:
Page 123 and 124:
Page 125 and 126:
Page 127 and 128:
Page 129 and 130:
Page 131 and 132:
Page 133 and 134:
Page 135 and 136:
Page 137 and 138:
Page 139 and 140:
Page 141 and 142:
Page 143 and 144:
Page 145 and 146:
Page 147 and 148:
Page 149 and 150:
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
Page 181 and 182:
Page 183 and 184:
Page 185 and 186:
Page 187 and 188:
Page 189 and 190:
Page 191 and 192:
Page 193 and 194:
Page 195 and 196:
CHAPTER 4 Troubleshooting This chap
Page 197 and 198:
Chapter 4: Troubleshooting Error Me
Page 199 and 200:
Chapter 4: Troubleshooting Reading
Page 201 and 202:
Chapter 4: Troubleshooting Reading
Page 203 and 204:
CHAPTER 5 Using INMOD and Notify Ex
Page 205 and 206:
Chapter 5: Using INMOD and Notify E
Page 207 and 208:
Page 209 and 210:
Page 211 and 212:
Page 213 and 214:
Page 215 and 216:
Page 217 and 218:
Page 219 and 220:
Page 221 and 222:
Page 223 and 224:
Page 225 and 226:
Page 227 and 228:
Page 229 and 230:
APPENDIX A How to Read Syntax Diagr
Page 231 and 232:
Appendix A: How to Read Syntax Diag
Page 233 and 234:
Appendix A: How to Read Syntax Diag
Page 235 and 236:
APPENDIX B Teradata TPump Examples
Page 237 and 238:
Appendix B: Teradata TPump Examples
Page 239 and 240:
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
APPENDIX C INMOD and Notify Exit Ro
Page 249 and 250:
Appendix C: INMOD and Notify Exit R
Page 251 and 252:
Page 253 and 254:
Page 255 and 256:
Page 257 and 258:
APPENDIX D User-Defined-Types and U
Page 259 and 260:
Appendix D: User-Defined-Types and
Page 261 and 262:
Glossary A abend: Abnormal END of a
Page 263 and 264:
Glossary data loading: The process
Page 265 and 266:
Glossary join: result. A SELECT ope
Page 267 and 268:
Glossary script: or job. A file tha
Page 269 and 270:
Glossary z/OS (MVS (Multiple Virtua
Page 271 and 272:
Index Symbols - 46 &SYSAPLYCNT syst
Page 273 and 274:
Index function 30 syntax 186 usage
Page 275 and 276:
Index FastLoad 215 IBM interface 21
Page 277 and 278:
Index reduced print output runtime
Page 279 and 280:
Index THRU keyword IMPORT command 1
show all

Teradata Parallel Data Pump

Create successful ePaper yourself

Delete template?

Save as template?