08.01.2013 Views

Back Room Front Room 2

Back Room Front Room 2

Back Room Front Room 2

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

86<br />

ENTERPRISE INFORMATION SYSTEMS VI<br />

The same criteria can be used for dependent<br />

variables, namely those involved in a referential<br />

integrity dependencies. In these cases, it is necessary<br />

to look at the join operation as a unique independent<br />

set prior to dealing with the selection operation<br />

itself.<br />

4 TESTS AND RESULTS<br />

The theory presented in this paper has been<br />

implemented and tested in industrial environments,<br />

during the auditing and migration stages of decision<br />

support solutions, under the edge of national funded<br />

project Karma (ADI P060-P31B-09/97).<br />

Table 1 refers to the auditing results in a<br />

Enterprise Resource Planning System of a small<br />

company, regarding its sales information.<br />

In this database, although the relational model<br />

was hard-coded in the application, the engine didn’t<br />

implement the concept of referential integrity.<br />

The referential integrity between the Orders<br />

table and the OrderDetails table, as well as the<br />

referential integrity between Orders and Customers<br />

tables and between OrderDetails and Products<br />

tables, have been tested using sampling auditing and<br />

full auditing processes. To evaluate the samples’<br />

behaviour when dealing with independent variables,<br />

the mean value of a purchasing order as well as the<br />

number of items in regular order were calculated for<br />

the samples. These values were also compared with<br />

real observations in the entire tables. From the final<br />

results some conclusions were taken:<br />

a) The validation of referential integrity over<br />

samples using a classification of the population<br />

presents poor results when dealing with small<br />

samples, with estimations above the real values.<br />

b) The validation of existential integrity (for<br />

example the uniqueness), under the same<br />

circumstances, presents poor results when<br />

dealing with small samples, with estimations<br />

bellow the real values.<br />

c) Mean values and distribution results are also<br />

influenced by the scope of the sample, and must<br />

be transformed by the sample size ratio.<br />

For the referential integrity cases, this is an<br />

expected result since the set of records in the<br />

referring table (say T1) is much larger than the strict<br />

domain of the referred table (say T2). The error of<br />

the estimator must be affected by the percentage of<br />

records involved in the sample. Let:<br />

�� t1 be the sample of the referring table T1;<br />

�� t2 be the sample of the referred table T2;<br />

�� �2 be the percentage of records of T2 selected for<br />

the sample (#t2/#T2);<br />

�� �(T1,T2) be the percentage of records in T1 that<br />

validates the R.I. in T2.<br />

It would be expected that E[�(t1,t2)] = �(T1,T2),<br />

but when dealing with samples in the referred table<br />

(T2) the expected value will match E[�(t1,t2)] =<br />

�(T1,T2) * �2. The estimated error is given by � = 1-<br />

� and therefore E[�(t1,t2)] = 1-[1-�(T1,T2)] * �2.<br />

Table 1 and figures 1, 2 and 3 show the referential<br />

integrity problems detected �(T 1,T 2), the sampling<br />

error �(t1,t2) and the expected error value for each<br />

case E[�(t 1,t 2)].<br />

It is possible to establish the same corrective<br />

parameter when dealing with existential integrity,<br />

frequencies or distributions.<br />

Several other tests were made in medium size<br />

and large size databases, corroborating the results<br />

presented above (Cortes, 2002).<br />

Table 1: Referential integrity tests on a ERP database<br />

(OD) OrderDetails, (O) Orders<br />

(P) Products and(C) Customers tables<br />

R.I. �(T 1,T2) �2 �(t1,t2) E[�(t1,t2)]<br />

Sample I (90% confidence, 5% accuracy)<br />

OD�O 5.8% 25.3% 72.1% 76.1%<br />

OD�P 12.9% 77.3% 30.4% 32.6%<br />

O�C 4.7% 74.4% 27.2% 29.1%<br />

Sample II (95% confidence, 5% accuracy)<br />

OD�O 5.8% 32.5% 66.6% 69.3%<br />

OD�P 12.9% 82.6% 23.0% 28.0%<br />

O�C 4.7% 81.1% 22.0% 22.8%<br />

Sample III (98% confidence, 5% accuracy)<br />

OD�O 5.8% 40.4% 58.3% 61.9%<br />

OD�P 12.9% 86.6% 22.0% 24.5%<br />

O�C 4.7% 85.5% 19.0% 18.5%<br />

Sample IV (99% confidence, 2% accuracy)<br />

OD�O 5.8% 83.7% 19.5% 21.1%<br />

OD�P 12.9% 97.3% 14.5% 15.2%<br />

O�C 4.7% 97.7% 7.5% 6.9%<br />

Sample V (99.5% confidence, 1% accuracy)<br />

OD�O 5.8% 96.0% 9.4% 9.5%<br />

OD�P 12.9% 98.8% 14.4% 14.1%<br />

O�C 4.7% 98.8% 5.5% 5.8%

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!