09.02.2013 Views

Rdb Continuous LogMiner and the JCC LogMiner Loader - Oracle

Rdb Continuous LogMiner and the JCC LogMiner Loader - Oracle

Rdb Continuous LogMiner and the JCC LogMiner Loader - Oracle

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Rdb</strong> <strong>Continuous</strong> <strong>LogMiner</strong><br />

<strong>and</strong> <strong>the</strong> <strong>JCC</strong> <strong>LogMiner</strong><br />

<strong>Loader</strong><br />

A presentation of MnSCU’s MnSCU s use of<br />

this technology


• Overview of MnSCU<br />

• Logmining Uses<br />

• Loading Data<br />

Topics<br />

• Bonus: Global Buffer implementation <strong>and</strong><br />

resulting performance improvements


Overview of MnSCU


Overview of MnSCU<br />

• Minnesota State College <strong>and</strong> University System<br />

– Comprised of 32 Institutions<br />

• State Universities, Community <strong>and</strong> Technical Colleges<br />

• 53 Campuses in 46 Communities<br />

• Over 16,000 faculty <strong>and</strong> staff<br />

• More than 3,600 degree programs<br />

– Serves over 240,000 Students per year<br />

• Additional 130,000 Students in non-credit courses<br />

– About 30,000 graduates each year


I S R S<br />

• MnSCU’s Primary Application<br />

– ISRS: Integrated State-wide Record System<br />

• Written in Uniface (4GL), Cobol, C, JAVA<br />

– 2,000+ 3GL programs; 2,200+ 4GL forms<br />

– Over 2,900,000 lines of code


North Region<br />

9 Institution Dbs<br />

1 Regional Db<br />

South Region<br />

8 Institution Dbs<br />

1 Regional Db<br />

MnSCU’s MnSCU s <strong>Rdb</strong> Topology<br />

Central Region<br />

7 Institution Dbs<br />

1 Regional Db<br />

1 Central Db<br />

Metro Region<br />

13 Institution Dbs<br />

1 Regional Db<br />

Development<br />

20+ Dvlp/QC/Train<br />

Dbs<br />

•Each Institution Db has over 1200 tables<br />

•Over 900,000,000 rows<br />

•Over 1 Terra-byte total disk space<br />

•Over 20% Annual Data Growth


Production Users<br />

• Each regional center supports:<br />

– Between 500 <strong>and</strong> 1,000 on-line users (during<br />

<strong>the</strong> day)<br />

– Numerous batch reporting <strong>and</strong> update jobs<br />

daily <strong>and</strong> over-night<br />

– 25,000+ Web transactions each day 24x7<br />

• Registrations, Grades, Charges (fees),<br />

On-line Payments, etc…


LogMining


LogMining Modes<br />

• Static<br />

– The <strong>Rdb</strong> <strong>LogMiner</strong> runs, by default, as a<br />

st<strong>and</strong>-alone process against backup copies of<br />

<strong>the</strong> source database AIJ files<br />

• <strong>Continuous</strong><br />

– The <strong>Rdb</strong> <strong>LogMiner</strong> can run against live<br />

database AIJs to produce a continuous output<br />

stream that captures transactions when <strong>the</strong>y<br />

are committed


LogMining Uses<br />

• Hot St<strong>and</strong>by (replication) replacement<br />

• Mining a single Db to multiple targets<br />

• Mining to Non-<strong>Rdb</strong> Target(s)<br />

– XML, File, API, Tuxedo, Orrible<br />

• Mining multiple Dbs to a single target<br />

• Minimizing production Db maintenance<br />

downtime


LogMining at MnSCU<br />

The combination of <strong>the</strong> <strong>Rdb</strong> <strong>Continuous</strong> <strong>LogMiner</strong><br />

<strong>and</strong> <strong>the</strong> <strong>JCC</strong> <strong>LogMiner</strong> <strong>Loader</strong> allow us to:<br />

• Distribute centrally controlled data to multiple<br />

local databases<br />

• Replicate production databases into multiple<br />

partitioned query databases<br />

• Roll up multiple production databases into a<br />

single data warehouse<br />

• Replicate production data into non-<strong>Rdb</strong><br />

development databases to support development<br />

of database-independent application


CENTRALDB<br />

36 St<strong>and</strong>by <strong>Rdb</strong><br />

Reporting Dbs<br />

NTC<br />

FFC<br />

TRF<br />

BTC<br />

CLM w/DBK<br />

4 sessions<br />

REGIONALDB<br />

CLM w/PK<br />

4 sessions<br />

REGIONALDB<br />

HS<br />

36 Dbs<br />

The Big Picture<br />

REGIONALDB<br />

REGIONALDB<br />

37 Production<br />

<strong>Rdb</strong> DBs<br />

NWR db<br />

CLM w/PK<br />

37 sessions<br />

CLM w/DBK<br />

37 sessions<br />

CLM w/FilterMaps<br />

DBK+RC_ID<br />

37 sessions<br />

CC_SUMMARY<br />

Combined (all 37)<br />

MNSCUALL<br />

Schema<br />

5 Tables<br />

Combined<br />

(all 37)<br />

37 Schemas<br />

Each with 238<br />

Tables<br />

CORE Tables<br />

Combined<br />

(all 37)<br />

MVs<br />

WAREHOUSE<br />

MV/DJ<br />

ISRS<br />

REPL<br />

MVs to<br />

Combined<br />

Tables<br />

VAL Tables<br />

1 central<br />

copy<br />

MV/DJ<br />

MVs<br />

O<strong>the</strong>r Data<br />

Warehouse<br />

Structures<br />

MnOnline<br />

Application<br />

Tables (20+)<br />

O<strong>the</strong>r Data<br />

Warehouse<br />

Structures<br />

37 Schemas<br />

Each with 238<br />

Tables<br />

MnOnline Application<br />

Read-Only 24 x 7<br />

MVs to<br />

Individual<br />

Schemas<br />

Future<br />

Ad-Hoc Users<br />

Read-Only


Replication Replacement<br />

• Historically we’ve used <strong>Rdb</strong>’s Hot St<strong>and</strong>by<br />

feature to maintain reporting Dbs for users<br />

(ODBC <strong>and</strong> some batch reports)<br />

• However, of <strong>the</strong> 1200+ tables in<br />

production ISRS Dbs, we found users only<br />

use 260 tables for reporting from <strong>the</strong><br />

st<strong>and</strong>by Dbs<br />

37 Production<br />

<strong>Rdb</strong> DBs<br />

• Batch reports were found to use ano<strong>the</strong>r 222 tables<br />

• With Logminer, we can maintain just this subset of tables<br />

(482) for reporting purposes<br />

NTC<br />

FFC<br />

TRF<br />

BTC<br />

NWR db


Mining Single Db to<br />

Multiple Targets<br />

• We’ve begun combining<br />

institutions in some of our<br />

production databases<br />

• Introduced a new column called<br />

RC_ID to over 700 tables for row-<br />

level security<br />

37 Production<br />

<strong>Rdb</strong> DBs<br />

• Row security provided by views<br />

• However, <strong>the</strong> reporting databases still needed to<br />

be separate so users wouldn’t have to change<br />

hundreds of existing queries to include RC_ID<br />

NTC<br />

FFC<br />

TRF<br />

BTC<br />

NWR db


‘CORE CORE’ Tables<br />

• ‘CORE’ tables contain common data for<br />

MnSCU<br />

• Tables like PERSON, ADDRESS, PHONE,<br />

etc. that can be combined across all<br />

institutions<br />

• These tables incorporate a new surrogate<br />

key called MNSCU_ID which is unique<br />

across all institutions<br />

• Users do not access ‘CORE’ tables directly<br />

37 Production<br />

<strong>Rdb</strong> DBs<br />

NWR db


• ‘CORE’ data is presented to<br />

users via views that join RC_ID<br />

(an institution-specific value) <strong>and</strong><br />

TECH_ID (<strong>the</strong> original surrogate<br />

key) to determine MNSCU_ID,<br />

via a third ‘REPORTING’ table<br />

• All ‘CORE’ tables (PERSON,<br />

ADDRESS, PHONE, …) are<br />

accessed via <strong>the</strong> views<br />

‘CORE CORE’ Table Views<br />

37 Production<br />

<strong>Rdb</strong> DBs<br />

NWR db<br />

View PERSON:<br />

Columns:<br />

----------------<br />

TECH_ID<br />

RC_ID<br />

Rest of Attributes<br />

CR_REPORTING:<br />

Columns:<br />

-------------------<br />

MNSCU_ID<br />

TECH_ID<br />

RC_ID<br />

CR_PERSON:<br />

Columns:<br />

----------------<br />

MNSCU_ID<br />

Rest of attributes


Query Performance<br />

• Row-Level security in our production dbs<br />

(‘CORE’ table views) have increased <strong>the</strong> query<br />

tuning challenges<br />

• Implemented better reporting performance in<br />

target db by using a filtering-trigger into a table<br />

<strong>the</strong> users use, ra<strong>the</strong>r than using <strong>the</strong> same views<br />

as Production<br />

• Reporting Db has different indexes <strong>and</strong> uses<br />

row-caching to maximize query performance


Production Reporting<br />

CR_REPORTING<br />

MNSCU_ID<br />

RC_ID<br />

TECH_ID<br />

CR_PERSON<br />

MNSCU_ID<br />

‘CORE CORE’ Triggers<br />

CR_REPORTING<br />

CR_PERSON<br />

If Exists by<br />

MNSCU_ID<br />

AND Exists by<br />

RC_ID<br />

AFTER INSERT<br />

CR_RC_ID<br />

RC_ID<br />

Insert Into<br />

PERSON


Production Db<br />

NWR<br />

0142<br />

0263<br />

0215<br />

0303<br />

Non-’CORE<br />

Non- CORE’ Tables to<br />

Multiple Targets<br />

Logminer w/ filtering for RC_ID=‘0142’<br />

Logminer w/ filtering for RC_ID=‘0263’<br />

Logminer w/ filtering for RC_ID=‘0215’<br />

Logminer w/ filtering for RC_ID=‘0303’<br />

Reporting Dbs<br />

0142<br />

0263<br />

0215<br />

0303


Eliminating Remote<br />

Attaches<br />

• We also have some centralized<br />

data, like ‘codes tables’<br />

CENTRALDB<br />

REGIONALDB<br />

CLM w/PK<br />

4 sessions<br />

REGIONALDB<br />

• Some jobs require reading this<br />

centralized data while processing against<br />

each Production ISRS database<br />

– Forces a remote attachment to <strong>the</strong> central db<br />

• With Logminer, we can maintain regional copies<br />

of this data so <strong>the</strong>se jobs do not have to use<br />

remote attaches to get data<br />

REGIONALDB<br />

REGIONALDB


Non-<strong>Rdb</strong> Target<br />

• Since we have 37 separate institutional<br />

databases (<strong>and</strong> reporting databases),<br />

getting combined data for system-wide<br />

reporting was difficult<br />

• With Logminer we can combine data from<br />

each of <strong>the</strong> production ISRS databases<br />

into one target, in this case our <strong>Oracle</strong><br />

warehouse ISRS<br />

37<br />

Production<br />

<strong>Rdb</strong> DBs<br />

NWR<br />

db<br />

CLM w/FilterMaps<br />

DBK+RC_ID<br />

37 sessions<br />

MNSCUALL<br />

Schema<br />

5 Tables<br />

Combined<br />

(all 37)


Non-<strong>Rdb</strong> Target<br />

• With Logminer we can mine tables from<br />

each of our Production ISRS databases<br />

into individual <strong>Oracle</strong> Schemas<br />

• This allows us to test our application<br />

against a different DBMS<br />

– Structure <strong>and</strong> data are <strong>the</strong> identical<br />

• Keeps data separate for institutional<br />

reporting<br />

37 Production<br />

<strong>Rdb</strong> DBs<br />

NWR<br />

db<br />

CLM w/DBK<br />

37 sessions<br />

ISRS<br />

37 Schemas<br />

Each with 238<br />

Tables


ISRS Instance<br />

37 Institution Schemas<br />

(238 tables each)<br />

1 Combined Schema<br />

(5 tables)<br />

1 VAL Schema<br />

(Codes Tables)<br />

MnSCU’s MnSCU s <strong>Oracle</strong> Topology<br />

O<strong>the</strong>r Reporting Structures<br />

WAREHOUS Schema<br />

Several Warehouse<br />

Schemas<br />

Development<br />

Schema<br />

•Over 800,000,000 rows<br />

(And growing rapidly)<br />

•Over 500 Gb total disk space<br />

•Various ETL


• Sessions with <strong>Rdb</strong> Targets:<br />

LogMining Scope at<br />

MnSCU<br />

– NWR: 482 tables from 1 source to 4 <strong>Rdb</strong> targets<br />

(4 sessions; Example of mining 1 to Many)<br />

• In <strong>the</strong>se sessions we are separating data from a combined institutional<br />

database into separate reporting databases for each institution<br />

• Triggers used to populate base tables from production ‘Core’ tables<br />

– CENTRLDB: 3 tables from 1 source db to 4 target <strong>Rdb</strong> dbs<br />

• In this session we are taking centralized data <strong>and</strong> placing copies of<br />

it on our regional servers (allows us to maintain <strong>the</strong>se 3 tables<br />

centrally without changing our application which reads <strong>the</strong> data<br />

locally)


LogMining Scope at<br />

MnSCU<br />

• Sessions with <strong>Oracle</strong> Targets:<br />

– ISRS: 238 tables from each of 37 source dbs to 37 <strong>Oracle</strong><br />

schemas<br />

(37 sessions; Non-<strong>Rdb</strong> target; example of mining many to many)<br />

• These sessions allow us to create exact copies of our databases in <strong>Oracle</strong><br />

• From this <strong>Oracle</strong> data materialized views are built to provide value-added<br />

reporting ‘data-marts’<br />

• Allows us to shift our reporting focus to <strong>Oracle</strong> while continuing to base<br />

production on <strong>Rdb</strong><br />

• Also allows us to work on converting our application to use <strong>Oracle</strong> without<br />

impacting production or having to do a cold-switch<br />

– MNSCUALL: 5 tables from each of 37 source dbs to 1 <strong>Oracle</strong><br />

schema<br />

(37 sessions; Example of mining Many sources to 1 target)<br />

• These sessions allow us to build a combined version of data<br />

• These are some of our larger tables, with over 250,000,000 rows<br />

• Total of 9,513 tables being mined by 119<br />

separate continuous <strong>LogMiner</strong> sessions!


Session Support<br />

• To support so many sessions we’ve developed a<br />

naming convention for sessions<br />

• Includes a specific directory structure<br />

• Built several tools to simplify <strong>the</strong> task of<br />

creating/recreating/reloading tables<br />

• Some of <strong>the</strong> tools are based on <strong>the</strong> naming<br />

convention<br />

• Currently our tools are all DCL, but better<br />

implementations could be made with 3GLs


Source Db Preparation<br />

• The <strong>LogMiner</strong> input to <strong>the</strong> <strong>Loader</strong> is created when <strong>the</strong><br />

source database is enabled for LogMining<br />

• This is accomplished with an RMU comm<strong>and</strong>. The<br />

optional parameter ‘continuous’ is used to specify<br />

continuous operation:<br />

$ rmu/set logminer/enable[/continuous] <br />

$ rmu/backup/after/quiet ...<br />

• Many of <strong>the</strong> procedures included with <strong>the</strong> <strong>Loader</strong> kit rely<br />

on <strong>the</strong> procedure vms_functions.sql having been applied<br />

to <strong>the</strong> source database:<br />

SQL> attach ‘filename ’;<br />

SQL> @jcc_tool_sql:vms_functions.sql<br />

SQL> commit;


Target Db Preparation<br />

• Besides <strong>the</strong> task of creating <strong>the</strong> target db<br />

<strong>and</strong> tables itself, <strong>the</strong>re are many ‘details’ to<br />

attend to depending upon target db type<br />

• For <strong>Oracle</strong> targets, <strong>Rdb</strong> field <strong>and</strong> table<br />

name lengths (<strong>and</strong> names) can be an<br />

issue (<strong>and</strong> target tablespaces too)<br />

• One thing in common: <strong>the</strong> HighWater table<br />

– Used by <strong>the</strong> loader to keep track of what has<br />

been processed


• Basic Steps:<br />

– Do an AIJ backup<br />

Minimizing Production<br />

Downtime<br />

– Create copy of production Db<br />

– Perform restructuring / maintenance / etc on Db copy<br />

• This could take many hours<br />

– Remove users from Production Db<br />

– Apply AIJ transactions to Db copy using <strong>LogMiner</strong><br />

• This step requires minimal time<br />

– Switch applications to use Db copy – This is now <strong>the</strong><br />

new production Db!


Loading Data


Loading Data<br />

• 3 Methods to accomplish this:<br />

– Direct Load (RMU, SQLLOADER)<br />

• Could impose data restrictions by using views<br />

• Can configure commit-interval<br />

– <strong>LogMiner</strong> Pump<br />

• Use a no-change update transaction on source<br />

• Allows for data restrictions<br />

• Commit-interval matches source transaction<br />

• Consumes AIJ space<br />

– <strong>JCC</strong> Data Pump<br />

• Configurable to do parent/child tables, data restrictions,<br />

commit-interval <strong>and</strong> delay-interval to minimize performance<br />

impact<br />

• Consumes AIJ space


Loading Data Example<br />

• Loaded a table with 19,986 rows with<br />

SQLLOADER<br />

– This took about 13 minutes, no AIJ blocks,<br />

some disk space for .UNL file<br />

• Used <strong>LogMiner</strong> to ‘pump’ <strong>the</strong> rows via a nochange<br />

update<br />

– This took about 10 minutes; 20,000 AIJ-blocks


Pumping Data Example<br />

• Using <strong>the</strong> <strong>JCC</strong> Data Pump<br />

– About 22 seconds to update source data:<br />

– Only 3 minutes to update target data!!


<strong>Loader</strong> Performance<br />

• SCSU_REPISRS session<br />

– From CLM log:<br />

17-NOV-2004 16:15:42.26 20234166 CLM SCSU_REPISR Total : 36381<br />

records written (36048 modify, 333 delete)<br />

– The 3 processes <strong>the</strong>mselves:<br />

• CTL: 14.6 CPU secs / 1667 Direct IO<br />

• CLM: 1 min 11.4 CPU secs / 67,112 Direct IO<br />

• LML: 7 min 55.7 CPU secs / 67 Direct IO<br />

– After 11:15 hours of connect time, this is about<br />

1.6 Direct IO per second average<br />

– Processed about .9 records per second average


Heartbeat<br />

• Without Heartbeat a session can become<br />

‘stale’<br />

• AIJ backups can be blocked<br />

• With Heartbeat enabled this does not<br />

occur<br />

– Side-affect is that ‘trailing’ messages are not<br />

displayed with heartbeat enabled


Bonus: Global Buffers<br />

• Despite great performance gains from Row<br />

Cache over <strong>the</strong> past couple of years, we still<br />

were anticipating issues for our fall busy period<br />

• We turned on GB on many Dbs<br />

– On our busiest server, we enabled it on all dbs<br />

– On o<strong>the</strong>r servers, we have about 50% implementation<br />

• Used to run with RDM$BIND_BUFFERS of 220<br />

• Estimated GB at max number of users@200 ea


Global Buffers<br />

• Prior to implementing GB, our busiest<br />

server was running at a constant 6,000-<br />

7,000 IO/sec<br />

• O<strong>the</strong>r servers were running around 3,000<br />

but had spikes to 7,000 or more<br />

• Global buffers both lowered overall IO, as<br />

well as eliminated spikes<br />

• Cost is in total Locks (Resources)<br />

– Increase LOCKIDTBL <strong>and</strong> RESHASHTBL


7,000<br />

6,000<br />

5,000<br />

4,000<br />

3,000<br />

2,000<br />

1,000<br />

-<br />

28-Jul<br />

1 w/Gb<br />

2-Aug<br />

1 w/Gb<br />

Before GB<br />

METE<br />

3-Aug<br />

1 w/Gb<br />

10-11 IO<br />

2-3 IO<br />

10-11PRC<br />

2-3 PRC


7,000<br />

6,000<br />

5,000<br />

4,000<br />

3,000<br />

2,000<br />

1,000<br />

-<br />

11-aug 9 w/gb<br />

13-aug 11 w/gb<br />

17-aug 11 w/gb<br />

19-aug 11 w/gb<br />

23-aug 11 w/gb<br />

25-aug 11 w/gb<br />

27-aug 11 w/gb<br />

30-aug 11 w/gb<br />

2-sep 11 w/gb<br />

7-sep 11 w/gb<br />

9-sep 11 w/gb<br />

13-sep 11 w/gb<br />

15-sep 11 w/gb<br />

17-sep 11 w/gb<br />

21-sep 11 w/gb<br />

23-Sep<br />

27-Sep<br />

29-Sep<br />

1-Oct<br />

5-Oct<br />

7-Oct<br />

28-Jul 1 w/Gb<br />

3-Aug 1 w/Gb<br />

5-Aug 7 w/Gb<br />

9-aug 9 w/gb<br />

After GB<br />

METE<br />

10-11 IO<br />

2-3 IO<br />

10-11 PRC<br />

2-3 PRC


6,000<br />

5,000<br />

4,000<br />

3,000<br />

2,000<br />

1,000<br />

-<br />

28-Jul 0 w/Gb<br />

3-Aug 0 w/Gb<br />

5-Aug 1 w/Gb<br />

9-aug 1 w/gb<br />

11-aug 2 w/gb<br />

13-aug 4 w/gb<br />

17-aug 4 w/gb<br />

19-aug 4 w/gb<br />

23-aug 4 w/gb<br />

25-aug 4 w/gb<br />

27-aug 4 w/gb<br />

31-aug 4 w/gb<br />

2-sep 4 w/gb<br />

After Gb<br />

MNSCU1<br />

7-sep 4 w/gb<br />

9-sep 4 w/gb<br />

13-sep 4 w/gb<br />

15-sep 4 w/gb<br />

17-sep 4 w/gb<br />

21-sep 4 w/gb<br />

23-Sep<br />

27-Sep<br />

29-Sep<br />

1-Oct<br />

5-Oct<br />

7-Oct<br />

10-11 IO<br />

2-3 IO<br />

10-11 PRC<br />

2-3 PRC


Resources


CPU<br />

• Since we are now using less IO, more<br />

CPU is available<br />

• Before:<br />

SDA> lck show lck /rep=5/int=10<br />

23-AUG-2004 14:28:48.80 Delta sec: 10.0 Ave Spin: 24005<br />

Ave Req: 39875 Req/sec: 15656.9 Busy: 62.4%<br />

23-AUG-2004 14:28:58.80 Delta sec: 10.0 Ave Spin: 19121<br />

Ave Req: 30166 Req/sec: 20289.7 Busy: 61.2%<br />

• After:<br />

24-AUG-2004 11:44:01.90 Delta sec: 10.0 Ave Spin: 12846<br />

Ave Req: 13439 Req/sec: 38043.3 Busy: 51.1%<br />

24-AUG-2004 11:44:11.90 Delta sec: 10.0 Ave Spin: 16629<br />

Ave Req: 15514 Req/sec: 31109.1 Busy: 48.3%


Lock Rates<br />

• Reducing <strong>the</strong>se numbers to Lock<br />

Operations per 1% of CPU time yields:<br />

– Before: 299<br />

– After: 573


For More Information<br />

• Miles.Oustad@CSU.MNSCU.EDU<br />

• (218) 755-4614


Q U E S T I O N S<br />

&<br />

A N S W E R S

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!