Back Room Front Room 2

More documents

Recommendations

Info

86 ENTERPRISE INFORMATION SYSTEMS VI The same criteria can be used for dependent variables, namely those involved in a referential integrity dependencies. In these cases, it is necessary to look at the join operation as a unique independent set prior to dealing with the selection operation itself. 4 TESTS AND RESULTS The theory presented in this paper has been implemented and tested in industrial environments, during the auditing and migration stages of decision support solutions, under the edge of national funded project Karma (ADI P060-P31B-09/97). Table 1 refers to the auditing results in a Enterprise Resource Planning System of a small company, regarding its sales information. In this database, although the relational model was hard-coded in the application, the engine didn’t implement the concept of referential integrity. The referential integrity between the Orders table and the OrderDetails table, as well as the referential integrity between Orders and Customers tables and between OrderDetails and Products tables, have been tested using sampling auditing and full auditing processes. To evaluate the samples’ behaviour when dealing with independent variables, the mean value of a purchasing order as well as the number of items in regular order were calculated for the samples. These values were also compared with real observations in the entire tables. From the final results some conclusions were taken: a) The validation of referential integrity over samples using a classification of the population presents poor results when dealing with small samples, with estimations above the real values. b) The validation of existential integrity (for example the uniqueness), under the same circumstances, presents poor results when dealing with small samples, with estimations bellow the real values. c) Mean values and distribution results are also influenced by the scope of the sample, and must be transformed by the sample size ratio. For the referential integrity cases, this is an expected result since the set of records in the referring table (say T1) is much larger than the strict domain of the referred table (say T2). The error of the estimator must be affected by the percentage of records involved in the sample. Let: �� t1 be the sample of the referring table T1; �� t2 be the sample of the referred table T2; �� 2 be the percentage of records of T2 selected for the sample (#t2/#T2); �� (T1,T2) be the percentage of records in T1 that validates the R.I. in T2. It would be expected that E[�(t1,t2)] = �(T1,T2), but when dealing with samples in the referred table (T2) the expected value will match E[�(t1,t2)] = �(T1,T2) * �2. The estimated error is given by � = 1- � and therefore E[�(t1,t2)] = 1-[1-�(T1,T2)] * �2. Table 1 and figures 1, 2 and 3 show the referential integrity problems detected �(T 1,T 2), the sampling error �(t1,t2) and the expected error value for each case E[�(t 1,t 2)]. It is possible to establish the same corrective parameter when dealing with existential integrity, frequencies or distributions. Several other tests were made in medium size and large size databases, corroborating the results presented above (Cortes, 2002). Table 1: Referential integrity tests on a ERP database (OD) OrderDetails, (O) Orders (P) Products and(C) Customers tables R.I. �(T 1,T2) �2 �(t1,t2) E[�(t1,t2)] Sample I (90% confidence, 5% accuracy) OD�O 5.8% 25.3% 72.1% 76.1% OD�P 12.9% 77.3% 30.4% 32.6% O�C 4.7% 74.4% 27.2% 29.1% Sample II (95% confidence, 5% accuracy) OD�O 5.8% 32.5% 66.6% 69.3% OD�P 12.9% 82.6% 23.0% 28.0% O�C 4.7% 81.1% 22.0% 22.8% Sample III (98% confidence, 5% accuracy) OD�O 5.8% 40.4% 58.3% 61.9% OD�P 12.9% 86.6% 22.0% 24.5% O�C 4.7% 85.5% 19.0% 18.5% Sample IV (99% confidence, 2% accuracy) OD�O 5.8% 83.7% 19.5% 21.1% OD�P 12.9% 97.3% 14.5% 15.2% O�C 4.7% 97.7% 7.5% 6.9% Sample V (99.5% confidence, 1% accuracy) OD�O 5.8% 96.0% 9.4% 9.5% OD�P 12.9% 98.8% 14.4% 14.1% O�C 4.7% 98.8% 5.5% 5.8%
RELATIONAL SAMPLING FOR DATA QUALITY AUDITING AND DECISION SUPPORT 87 Figure 1: OrderDetails � Orders dependency Figure 2: OrderDetails � Products dependency Figure 3: Orders � Customers dependency The use of estimators to determine data quality based on join and selection operations produces good results, in particular when dealing with large volumes of data. Table 2 indicates the results of referential integrity estimations on an industrial environment. The testing environment was a major university database with data quality problems after a partial database migration. Since several records of students, courses and grades were not completely migrated to the new database, the RDBMS presented a poor behaviour in terms of referential integrity, among other things. To a more significant number of records (between 700.000 to 1.000.000 or more), estimation must be taken into consideration as a preliminary auditing tool, saving time and resources. The equivalent tests in the entire database universe took several hours of CPU time in a parallel virtual machine with 8 nodes. In this particular case, the choice of the best estimator was previously decided with an uniformity test as describe in the previous chapter. Comparing the number of students and courses in the university with the number of grades, we might say that data is contained within an almost uniform interval, which makes it appropriate for the use of Jackknife estimator. Several other tests were made and reported in (Cortes, 2002). Table 2: Integrity estimation on university IS (G) Grades, (C) Courses and (S) Students tables R.U. #T1 �(T 1,T 2) �JK �(t 1,t 2) G-�S 711043 12.4% 563019 20.5% G�C 711043 59.4% 327185 55.0% 5 CONCLUSIONS AND FURTHER WORK The research described in this paper presents a different strategy for data quality auditing processes based on data sampling. Our strategy is particularly useful if adopted at the earlier stages of an auditing project. It saves time and resources in the identification of critical inconsistencies and guides the detailed auditing process itself. The representative samples can further be used in determining association rules or evaluating times series, two areas more related with decision support itself. But even though the results achieved are encouraging to proceed with this methodology, it is important to be aware that: �� There is no perfect recipe to produce an universal sample. Each case must be approached according to the data’s profile – size, distribution, dependencies among other issues – and auditing purposes. �� Sampling will not produce accurate results, only good estimators. It will give us a general picture of the state of the art of a database, but more accurate processes – such as data cleansing – must involve an entire data set treatment. The clustered analysis of data (“divide and conquer”) maintaining data dependencies is an efficient and accurate method and can be optimised
Page 1 and 2:
www.dbeBooks.com - An Ebook Library
Page 3 and 4:
A C.I.P. Catalogue record for this
Page 5 and 6:
vi TABLE OF CONTENTS PART 1 - DATAB
Page 7 and 8:
viii TABLE OF CONTENTS A WIRELESS A
Page 9 and 10:
CONFERENCE COMMITTEE Honorary Presi
Page 11 and 12:
Hung, P. (AUSTRALIA) Jahankhani, H.
Page 13 and 14:
Invited Papers
Page 15 and 16:
2 ENTERPRISE INFORMATION SYSTEMS VI
Page 17 and 18:
Page 19 and 20:
Page 21 and 22:
Page 23 and 24:
10 ENTERPRISE INFORMATION SYSTEMS V
Page 25 and 26:
Page 27 and 28:
Page 29 and 30:
Page 31 and 32:
Page 33 and 34:
Page 35 and 36:
Page 37 and 38:
EVOLUTIONARY PROJECT MANAGEMENT: MU
Page 39 and 40:
Page 41 and 42:
Page 43 and 44:
MANAGING COMPLEXITY OF ENTERPRISE I
Page 45 and 46:
Page 47 and 48: 34 ENTERPRISE INFORMATION SYSTEMS V
Page 67 and 68: ASSESSING EFFORT PREDICTION MODELS
Page 75 and 76: ORGANIZATIONAL AND TECHNOLOGICAL CR
Page 79 and 80: ASAP Processes W Project Kickoff A
Page 83 and 84: since it means changing the relevan
Page 85 and 86: ACME-DB: AN ADAPTIVE CACHING MECHAN
Page 87 and 88: ACME-DB: AN ADAPTIVE CACHING MECHAN
Page 89 and 90: Hit Rate (%) ACME-DB: AN ADAPTIVE C
Page 91 and 92: 5.3.6 Effect on all Policies by Int
Page 93 and 94: ACKNOWLEDGEMENTS We would like to t
Page 95 and 96: This paper describes the data sampl
Page 97: The major limit A of the operation
Page 101 and 102: TOWARDS DESIGN RATIONALES OF SOFTWA
Page 103 and 104: ((Brown et al., 1998), pp. 159-190,
Page 105 and 106: sive data filtering, presentation (
Page 107 and 108: • implementation changes in servi
Page 109 and 110: MEMORY MANAGEMENT FOR LARGE SCALE D
Page 111 and 112: Data Rate [bytes/sec] 1600000 14000
Page 113 and 114: E.g., DV Camcorder Camera Packets (
Page 115 and 116: Procedure CheckOptimal (n rs 1 ...n
Page 117 and 118: MEMORY MANAGEMENT FOR LARGE SCALE D
Page 119 and 120: PART 2 Artificial Intelligence and
Page 121 and 122: 110 ENTERPRISE INFORMATION SYSTEMS
Page 127 and 128: DYNAMIC MULTI-AGENT BASED VARIETY F
Page 149 and 150:
138 ENTERPRISE INFORMATION SYSTEMS
Page 151 and 152:
Page 153 and 154:
Page 155 and 156:
Page 157 and 158:
Page 159 and 160:
Page 161 and 162:
Page 163 and 164:
Page 165 and 166:
Page 167 and 168:
Page 169 and 170:
AN EXPERIENCE IN MANAGEMENT OF IMPR
Page 171 and 172:
Page 173 and 174:
Page 175 and 176:
Page 177 and 178:
Page 179 and 180:
ANALYSIS AND RE-ENGINEERING OF WEB
Page 181 and 182:
Module p0 Customer Itinerary Send I
Page 183 and 184:
departments: the marketing dept. se
Page 185 and 186:
• An acyclic module M is usable,
Page 187 and 188:
BALANCING STAKEHOLDER’S PREFERENC
Page 189 and 190:
The scenarios will provide an abstr
Page 191 and 192:
BALANCING STAKEHOLDER'S PREFERENCES
Page 193 and 194:
BALANCING STAKEHOLDER'S PREFERENCES
Page 195 and 196:
A POLYMORPHIC CONTEXT FRAME TO SUPP
Page 197 and 198:
A POLYMORPHIC CONTE XT FRAME TO SUP
Page 199 and 200:
Page 201 and 202:
Page 203 and 204:
FEATURE MATCHING IN MODEL-BASED SOF
Page 205 and 206:
combination of solution domains - a
Page 207 and 208:
• features for all the concepts:
Page 209 and 210:
Existence In both steps it is possi
Page 211 and 212:
matching strategies are maximal, mi
Page 213 and 214:
TOWARDS A META MODEL FOR DESCRIBING
Page 215 and 216:
elementary message (ae-message) can
Page 217 and 218:
problems on a pragmatic level, whic
Page 219 and 220:
TOWARDS A META MODEL FOR DESCRIBING
Page 221 and 222:
INTRUSION DETECTION SYSTEMS USING A
Page 223 and 224:
INTRUSION DETECTION SYSTEMS USING A
Page 225 and 226:
(k� 1) (k) (k�1) (k) p � w
Page 227 and 228:
Table 5: Performance Comparison of
Page 229 and 230:
A USER-CENTERED METHODOLOGY TO GENE
Page 231 and 232:
carries out the second phase of the
Page 233 and 234:
1 1 1 1 2 PROCESS STORE ENTITY EDGE
Page 235 and 236:
constraints rules. KOGGE (Ebert et
Page 237 and 238:
PART 4 Software Agents and Internet
Page 239 and 240:
Page 241 and 242:
Page 243 and 244:
Page 245 and 246:
Page 247 and 248:
Page 249 and 250:
Page 251 and 252:
Page 253 and 254:
Page 255 and 256:
Page 257 and 258:
Page 259 and 260:
Page 261 and 262:
Page 263 and 264:
Page 265 and 266:
Page 267 and 268:
Page 269 and 270:
Page 271 and 272:
Page 273 and 274:
Page 275 and 276:
Page 277 and 278:
Page 279 and 280:
Page 281 and 282:
Page 283 and 284:
Page 285 and 286:
CABA 2 L A BLISS PREDICTIVE COMPOSI
Page 287 and 288:
y disables evidencing severe mental
Page 289 and 290:
Figure 4: Symbols emission in DAR-H
Page 291 and 292:
they evidence a generalization erro
Page 293 and 294:
A METHODOLOGY FOR INTERFACE DESIGN
Page 295 and 296:
and mobility loss coupled with redu
Page 297 and 298:
Tradeoff: The default input is poss
Page 299 and 300:
Sessions on the VABS which are held
Page 301 and 302:
A CONTACT RECOMMENDER SYSTEM FOR A
Page 303 and 304:
A CONTACT RECOMMENDER SYSTEM FOR A
Page 305 and 306:
- "Going to the sources". The user
Page 307 and 308:
Here are the matrix W and P for our
Page 309 and 310:
EMOTION SYNTHESIS IN VIRTUAL ENVIRO
Page 311 and 312:
theme turns out to be surprisingly
Page 313 and 314:
6 FROM FEATURES TO SYMBOLS 6.1 Face
Page 315 and 316:
sions are described by different em
Page 317 and 318:
Electronic Imaging 2000 Conference
Page 319 and 320:
to the accessibility problem for th
Page 321 and 322:
Figure 3: Hierarchical View of the
Page 323 and 324:
4 CONCLUSIONS AND FUTURE WORK On th
Page 325 and 326:
PERSONALISED RESOURCE DISCOVERY SEA
Page 327 and 328:
Page 329 and 330:
profile, the user defines his curre
Page 331 and 332:
Page 333 and 334:
Abdelkafi, N. .....................
show all

Back Room Front Room 2

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?