31.01.2015 Views

an empirical study of run-time coupling and cohesion software metrics

an empirical study of run-time coupling and cohesion software metrics

an empirical study of run-time coupling and cohesion software metrics

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

AN EMPIRICAL STUDY OF<br />

RUN-TIME COUPLING AND<br />

COHESION SOFTWARE<br />

METRICS<br />

Áine Mitchell<br />

Supervisor: Dr. James Power<br />

A Thesis presented for the degree <strong>of</strong><br />

Doctor <strong>of</strong> Philosophy in Computer Science<br />

Department <strong>of</strong> Computer Science<br />

National University <strong>of</strong> Irel<strong>an</strong>d, Maynooth<br />

Co. Kildare, Irel<strong>an</strong>d<br />

October 2005


Dedicated to<br />

My parents Patrick <strong>an</strong>d Ann Mitchell


An <strong>empirical</strong> <strong>study</strong> <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>an</strong>d<br />

<strong>cohesion</strong> s<strong>of</strong>tware <strong>metrics</strong><br />

Áine Mitchell<br />

Submitted for the degree <strong>of</strong> Doctor <strong>of</strong> Philosophy<br />

Oct 2005<br />

Abstract<br />

The extent <strong>of</strong> <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> in <strong>an</strong> object-oriented system has implications<br />

for its external quality. Various static <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> <strong>metrics</strong> have been<br />

proposed <strong>an</strong>d used in past <strong>empirical</strong> investigations, however none <strong>of</strong> these have<br />

taken the <strong>run</strong>-<strong>time</strong> properties <strong>of</strong> a program into account. As program behaviour is<br />

a function <strong>of</strong> its operational environment as well as the complexity <strong>of</strong> the source<br />

code, static <strong>metrics</strong> may fail to qu<strong>an</strong>tify all the underlying dimensions <strong>of</strong> <strong>coupling</strong><br />

<strong>an</strong>d <strong>cohesion</strong>. By considering both <strong>of</strong> these influences, one will acquire a more<br />

comprehensive underst<strong>an</strong>ding <strong>of</strong> the quality <strong>of</strong> critical components <strong>of</strong> a s<strong>of</strong>tware<br />

system. We believe that <strong>an</strong>y measurement <strong>of</strong> these attributes should include ch<strong>an</strong>ges<br />

that take place at <strong>run</strong>-<strong>time</strong>. For this reason, in this work we address the utility <strong>of</strong><br />

<strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> complexity through the <strong>empirical</strong> evaluation <strong>of</strong> a<br />

selection <strong>of</strong> <strong>run</strong>-<strong>time</strong> measures for these properties. This <strong>study</strong> is carried out using<br />

a comprehensive selection <strong>of</strong> Java benchmark <strong>an</strong>d real world programs.<br />

Our first case <strong>study</strong> investigates the influence <strong>of</strong> instruction coverage on the relationship<br />

between static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong>. Our second case <strong>study</strong> defines<br />

a new <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> metric that c<strong>an</strong> be used to <strong>study</strong> object behaviour <strong>an</strong>d<br />

investigates the ability <strong>of</strong> measures <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> to predict such behaviour.<br />

Finally, we investigate whether <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> are good predictors <strong>of</strong><br />

s<strong>of</strong>tware fault-proneness in comparison to st<strong>an</strong>dard coverage measures. To the best<br />

<strong>of</strong> our knowledge this is the largest <strong>empirical</strong> <strong>study</strong> that has been performed to date<br />

on the <strong>run</strong>-<strong>time</strong> <strong>an</strong>alysis <strong>of</strong> Java programs.


Declaration<br />

The work in this thesis is based on research carried out at the Department <strong>of</strong> Computer<br />

Science, in the National University <strong>of</strong> Irel<strong>an</strong>d Maynooth, Co. Kildare, Irel<strong>an</strong>d.<br />

No part <strong>of</strong> this thesis has been submitted elsewhere for <strong>an</strong>y other degree or qualification<br />

<strong>an</strong>d it is all my own work unless referenced to the contrary in the text.<br />

Signature:. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Date:. . . . . . . . . . . . . . . . . . . . . . . .<br />

Copyright c○ 2005 Áine Mitchell.<br />

“The copyright <strong>of</strong> this thesis rests with the author. No quotations from it should be<br />

published without the author’s prior written consent <strong>an</strong>d information derived from<br />

it should be acknowledged”.<br />

iv


Acknowledgements<br />

I would like to th<strong>an</strong>k my PhD adviser, Dr. James Power, for his advice, guid<strong>an</strong>ce,<br />

support, <strong>an</strong>d encouragement throughout my PhD effort.<br />

A special th<strong>an</strong>ks to my parents without whose continual support this work would<br />

not have been possible.<br />

I would also like to th<strong>an</strong>k all my friends who were there for me throughout it all.<br />

This work has been funded by the Embark initiative, operated by the Irish<br />

Research Council for Science, Engineering <strong>an</strong>d Technology (IRCSET).<br />

v


Contents<br />

Abstract<br />

Declaration<br />

Acknowledgements<br />

iii<br />

iv<br />

v<br />

1 Introduction 1<br />

1.1 S<strong>of</strong>tware Metrics <strong>an</strong>d Complexity . . . . . . . . . . . . . . . . . . . . 1<br />

1.2 Traditional Measures <strong>of</strong> Complexity . . . . . . . . . . . . . . . . . . . 3<br />

1.3 Object-Oriented Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 4<br />

1.4 Definitions <strong>of</strong> Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . 5<br />

1.5 Definitions <strong>of</strong> Cohesion . . . . . . . . . . . . . . . . . . . . . . . . . . 6<br />

1.6 Static <strong>an</strong>d Run-<strong>time</strong> Metrics . . . . . . . . . . . . . . . . . . . . . . . 7<br />

1.7 Factors Influencing S<strong>of</strong>tware Metrics . . . . . . . . . . . . . . . . . . 8<br />

1.7.1 Coverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8<br />

1.7.2 Metrics <strong>an</strong>d Object Behaviour . . . . . . . . . . . . . . . . . . 9<br />

1.7.3 Metrics <strong>an</strong>d S<strong>of</strong>tware Testing . . . . . . . . . . . . . . . . . . 9<br />

1.8 Aims <strong>of</strong> Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9<br />

1.9 Structure <strong>of</strong> Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10<br />

2 Literature Review 12<br />

2.1 Static Coupling Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 12<br />

2.1.1 Chidamber <strong>an</strong>d Kemerer . . . . . . . . . . . . . . . . . . . . . 13<br />

2.1.2 Other Coupling Metrics . . . . . . . . . . . . . . . . . . . . . 14<br />

2.2 Frameworks for Static Coupling Measurement . . . . . . . . . . . . . 15<br />

vi


Contents<br />

vii<br />

2.2.1 Eder et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16<br />

2.2.2 Hitz <strong>an</strong>d Montazeri . . . . . . . . . . . . . . . . . . . . . . . . 16<br />

2.2.3 Bri<strong>an</strong>d et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17<br />

2.2.4 Revised Framework by Bri<strong>an</strong>d et al. . . . . . . . . . . . . . . . 18<br />

2.3 Static Cohesion Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 18<br />

2.3.1 Chidamber <strong>an</strong>d Kemerer . . . . . . . . . . . . . . . . . . . . . 19<br />

2.3.2 Other Cohesion Metrics . . . . . . . . . . . . . . . . . . . . . 20<br />

2.4 Frameworks for Static Cohesion Measurement . . . . . . . . . . . . . 21<br />

2.4.1 Eder et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21<br />

2.4.2 Bri<strong>an</strong>d et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22<br />

2.5 Run-<strong>time</strong>/Dynamic Coupling Metrics . . . . . . . . . . . . . . . . . . 23<br />

2.5.1 Yacoub et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br />

2.5.2 Arisholm et al. . . . . . . . . . . . . . . . . . . . . . . . . . . 23<br />

2.6 Run-<strong>time</strong>/Dynamic Cohesion Metrics . . . . . . . . . . . . . . . . . . 25<br />

2.6.1 Gupta <strong>an</strong>d Rao . . . . . . . . . . . . . . . . . . . . . . . . . . 25<br />

2.7 Other Studies <strong>of</strong> Dynamic Behaviour . . . . . . . . . . . . . . . . . . 25<br />

2.7.1 Dynamic Behaviour Studies . . . . . . . . . . . . . . . . . . . 25<br />

2.8 Coverage Metrics <strong>an</strong>d S<strong>of</strong>tware Testing . . . . . . . . . . . . . . . . . 26<br />

2.8.1 Instruction Coverage . . . . . . . . . . . . . . . . . . . . . . . 27<br />

2.8.2 Alex<strong>an</strong>der <strong>an</strong>d Offutt . . . . . . . . . . . . . . . . . . . . . . . 27<br />

2.9 Previous Work by the Author . . . . . . . . . . . . . . . . . . . . . . 28<br />

2.10 Definition <strong>of</strong> Run-<strong>time</strong> Metrics . . . . . . . . . . . . . . . . . . . . . . 29<br />

2.10.1 Coupling Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 29<br />

2.10.2 Cohesion Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 31<br />

2.11 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33<br />

3 Experimental Design 34<br />

3.1 Methods for Collecting Run-<strong>time</strong> Information . . . . . . . . . . . . . 34<br />

3.1.1 Instrumenting a Virtual Machine . . . . . . . . . . . . . . . . 34<br />

3.1.2 Sun’s Java Platform Debug Architecture (JPDA) . . . . . . . 35<br />

3.1.3 Bytecode Instrumentation . . . . . . . . . . . . . . . . . . . . 35<br />

3.2 Metrics Data Collection Tools (Design Objectives) . . . . . . . . . . . 35


Contents<br />

viii<br />

3.2.1 Class-Level Metrics Collection Tool (ClMet) . . . . . . . . . . 36<br />

3.2.2 Object-Level Metrics Collection Tool (ObMet) . . . . . . . . . 37<br />

3.2.3 Static Data Collection Tool (StatMet) . . . . . . . . . . . . . 38<br />

3.2.4 Coverage Data Collection Tool (InCov) . . . . . . . . . . . . . 39<br />

3.2.5 Fault Detection Study . . . . . . . . . . . . . . . . . . . . . . 40<br />

3.3 Test Case Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41<br />

3.3.1 Benchmark Programs . . . . . . . . . . . . . . . . . . . . . . . 41<br />

3.3.2 Real-World Programs . . . . . . . . . . . . . . . . . . . . . . . 43<br />

3.3.3 Execution <strong>of</strong> Programs . . . . . . . . . . . . . . . . . . . . . . 45<br />

3.4 Statistical Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 45<br />

3.4.1 Descriptive Statistics . . . . . . . . . . . . . . . . . . . . . . . 45<br />

3.4.2 Normality Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 47<br />

3.4.3 Normalising Tr<strong>an</strong>sformations . . . . . . . . . . . . . . . . . . 48<br />

3.4.4 Pearson Correlation Test . . . . . . . . . . . . . . . . . . . . . 49<br />

3.4.5 T-Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49<br />

3.4.6 Principal Component Analysis . . . . . . . . . . . . . . . . . . 50<br />

3.4.7 Cluster Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 52<br />

3.4.8 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . 53<br />

3.4.9 Analysis <strong>of</strong> Vari<strong>an</strong>ce (ANOVA) . . . . . . . . . . . . . . . . . 55<br />

3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56<br />

4 Case Study 1: The Influence <strong>of</strong> Instruction Coverage on the Relationship<br />

Between Static <strong>an</strong>d Run-<strong>time</strong> Coupling Metrics 57<br />

4.1 Goals <strong>an</strong>d Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . 58<br />

4.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 59<br />

4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60<br />

4.3.1 Experiment 1: To investigate the relationship between static<br />

<strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> . . . . . . . . . . . . . . . . . . 60<br />

4.3.2 Experiment 2: The influence <strong>of</strong> instruction coverage . . . . . . 62<br />

4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68


Contents<br />

ix<br />

5 Case Study 2: The Impact <strong>of</strong> Run-<strong>time</strong> Cohesion on Object Behaviour<br />

69<br />

5.1 Goals <strong>an</strong>d Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . 70<br />

5.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 71<br />

5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73<br />

5.3.1 Experiment 1: To determine if objects from the same class<br />

behave differently at <strong>run</strong>-<strong>time</strong> from the point <strong>of</strong> view <strong>of</strong> <strong>coupling</strong> 74<br />

5.3.2 Experiment 2: The influence <strong>of</strong> <strong>cohesion</strong> on the N OC . . . . . 77<br />

5.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81<br />

6 Case Study 3: A Study <strong>of</strong> Run-<strong>time</strong> Coupling Metrics <strong>an</strong>d Fault<br />

Detection 82<br />

6.1 Goals <strong>an</strong>d Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . 83<br />

6.2 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 84<br />

6.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85<br />

6.3.1 Experiment 1: To examine the relationship between instruction<br />

coverage <strong>an</strong>d fault detection. . . . . . . . . . . . . . . . . 85<br />

6.3.2 Experiment 2: To examine the relationship between <strong>run</strong>-<strong>time</strong><br />

<strong>coupling</strong> <strong>metrics</strong> <strong>an</strong>d fault detection. . . . . . . . . . . . . . . 87<br />

6.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89<br />

7 Conclusions 90<br />

7.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94<br />

7.2 Applications <strong>of</strong> this Work . . . . . . . . . . . . . . . . . . . . . . . . 96<br />

7.3 Threats to Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96<br />

7.3.1 Internal Threats . . . . . . . . . . . . . . . . . . . . . . . . . . 96<br />

7.3.2 External Threats . . . . . . . . . . . . . . . . . . . . . . . . . 97<br />

7.4 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98<br />

Appendix 100<br />

A Case Study 1: To Investigate the Influence <strong>of</strong> Instruction Coverage<br />

on the Relationship Between Static <strong>an</strong>d Run-<strong>time</strong> Coupling Metric100


Contents<br />

x<br />

A.1 PCA Test Results for all programs. . . . . . . . . . . . . . . . . . . . 101<br />

A.1.1 SPECjvm98 Benchmark Suite . . . . . . . . . . . . . . . . . . 101<br />

A.1.2 JOlden Benchmark Suite . . . . . . . . . . . . . . . . . . . . . 101<br />

A.1.3 Real-World Programs, Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant . . . . . . . . 102<br />

A.2 Multiple linear regression results for all programs . . . . . . . . . . . 103<br />

A.2.1 SPECjvm98 Benchmark Suite . . . . . . . . . . . . . . . . . . 103<br />

A.2.2 JOlden Benchmark Suite . . . . . . . . . . . . . . . . . . . . . 104<br />

A.2.3 Real-World Programs, Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant . . . . . . . . 105<br />

B Case Study 2: The Impact <strong>of</strong> Run-<strong>time</strong> Cohesion on Object Behaviour<br />

106<br />

B.1 PCA Test Results for all programs. . . . . . . . . . . . . . . . . . . . 106<br />

B.1.1 JOlden Benchmark Suite . . . . . . . . . . . . . . . . . . . . . 106<br />

B.1.2 Real-World Programs, Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant . . . . . . . . 107<br />

B.2 Multiple linear regression results for all programs. . . . . . . . . . . . 107<br />

B.2.1 JOlden Benchmark Suite . . . . . . . . . . . . . . . . . . . . . 107<br />

B.2.2 Real-World Programs, Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant . . . . . . . . 108<br />

C Case Study 3: A Study <strong>of</strong> Run-<strong>time</strong> Coupling Metrics <strong>an</strong>d Fault<br />

Detection 109<br />

C.1 Regression <strong>an</strong>alysis results for real-world programs, Velocity, Xal<strong>an</strong><br />

<strong>an</strong>d Ant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110<br />

C.1.1 For Class Mut<strong>an</strong>ts . . . . . . . . . . . . . . . . . . . . . . . . 110<br />

C.1.2 For Traditional Mut<strong>an</strong>ts . . . . . . . . . . . . . . . . . . . . . 111<br />

C.2 Regression <strong>an</strong>alysis results for real-world programs, Velocity, Xal<strong>an</strong><br />

<strong>an</strong>d Ant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111<br />

C.2.1 For Class Mut<strong>an</strong>ts . . . . . . . . . . . . . . . . . . . . . . . . 111<br />

C.2.2 For Traditional Mut<strong>an</strong>ts . . . . . . . . . . . . . . . . . . . . . 112<br />

D Mutation operators in µJava 113


List <strong>of</strong> Figures<br />

1.1 The s<strong>of</strong>tware quality model shows how different measures <strong>of</strong> internal<br />

quality c<strong>an</strong> characterise the overall quality <strong>of</strong> a s<strong>of</strong>tware product . . . 3<br />

3.1 Components <strong>of</strong> <strong>run</strong>-<strong>time</strong> class-level <strong>metrics</strong> collection tool, ClMet . . 37<br />

3.2 Components <strong>of</strong> <strong>run</strong>-<strong>time</strong> object-level <strong>metrics</strong> collection tool, ObMet . 38<br />

3.3 Components <strong>of</strong> static <strong>metrics</strong> collection tool, StatMet . . . . . . . . . 39<br />

3.4 Dendrogram: At the cutting line there are two clusters . . . . . . . . 54<br />

4.1 PCA test results for all programs for <strong>metrics</strong> in PC1, PC2 <strong>an</strong>d PC3.<br />

In all graphs the bars represents the PCA value obtained for the<br />

corresponding metric. PC1 contains import level <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>.<br />

PC2 contains the export level <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> <strong>an</strong>d PC3 contain the<br />

static CBO metric. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63<br />

4.2 Multiple linear regression results for class-level <strong>metrics</strong> (IC CC <strong>an</strong>d<br />

EC CC). In both graphs the lighter bars represents the R 2 value for<br />

CBO, <strong>an</strong>d the darker bars represents the R 2 value for CBO <strong>an</strong>d I c<br />

combined. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65<br />

4.3 Multiple linear regression results for method-level <strong>metrics</strong> (IC CM<br />

<strong>an</strong>d EC CM). In both graphs the lighter bars represents the R 2<br />

value for CBO, <strong>an</strong>d the darker bars represents the R 2 value for CBO<br />

<strong>an</strong>d I c combined. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66<br />

5.1 C V <strong>of</strong> IC OC for classes from the programs studied. The bars represent<br />

the number <strong>of</strong> classes in each program that have C V in the<br />

corresponding r<strong>an</strong>ge. . . . . . . . . . . . . . . . . . . . . . . . . . . . 75<br />

xi


List <strong>of</strong> Figures<br />

xii<br />

5.2 N OC results <strong>of</strong> cluster <strong>an</strong>alysis. The bars represent the number <strong>of</strong><br />

classes in each program that have the corresponding N OC value. . . . 76<br />

5.3 PCA Test Results for all programs for <strong>metrics</strong> in PC1 <strong>an</strong>d PC2. In<br />

both graphs the bars represents the PCA value obtained for the corresponding<br />

metric. PC1 contains R LCOM <strong>an</strong>d RW LCOM . PC2 contains<br />

S LCOM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78<br />

5.4 Results from multiple linear regression where Y=N OC . The lighter<br />

bars represent the R 2 for S LCOM , <strong>an</strong>d the darker bars represent the<br />

R 2 value for S LCOM <strong>an</strong>d R LCOM combined. . . . . . . . . . . . . . . . 80<br />

6.1 Mutation test results for real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d<br />

Ant. In all graphs the bars represents the number <strong>of</strong> classes that<br />

exhibit a percentage mut<strong>an</strong>t kill rate in the corresponding r<strong>an</strong>ge. . . . 86<br />

6.2 Regression <strong>an</strong>alysis results for the effectiveness <strong>of</strong> I c in predicting<br />

class <strong>an</strong>d traditional-level mutations in real-world programs Velocity,<br />

Xal<strong>an</strong> <strong>an</strong>d Ant. The bars represents the R 2 value for the <strong>run</strong>-<strong>time</strong><br />

metric under consideration. . . . . . . . . . . . . . . . . . . . . . . . 87<br />

6.3 Regression <strong>an</strong>alysis results for the effectiveness <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>coupling</strong><br />

<strong>metrics</strong> in predicting class-level mutations in real-world programs Velocity,<br />

Xal<strong>an</strong> <strong>an</strong>d Ant. The bars represents the R 2 value for the<br />

<strong>run</strong>-<strong>time</strong> metric under consideration. . . . . . . . . . . . . . . . . . . 89<br />

7.1 Findings from case <strong>study</strong> one that show our <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong><br />

are not simply surrogate measures for static CBO <strong>an</strong>d coverage plus<br />

static <strong>metrics</strong> are better predictors <strong>of</strong> <strong>run</strong>-<strong>time</strong> measures th<strong>an</strong> static<br />

measure alone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91<br />

7.2 Findings from case <strong>study</strong> two that show <strong>run</strong>-<strong>time</strong> object-level <strong>coupling</strong><br />

measures c<strong>an</strong> be used to identify objects that are exhibiting<br />

different behaviours at <strong>run</strong>-<strong>time</strong> <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> measures are<br />

good predictors <strong>of</strong> this type <strong>of</strong> behaviour. . . . . . . . . . . . . . . . . 93


List <strong>of</strong> Figures<br />

xiii<br />

7.3 Findings from case <strong>study</strong> three that show <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong><br />

are good predictors <strong>of</strong> class-type faults <strong>an</strong>d instruction coverage is a<br />

good predictor <strong>of</strong> traditional faults in programs. . . . . . . . . . . . . 94


List <strong>of</strong> Tables<br />

2.1 Abbreviations for the dynamic <strong>coupling</strong> <strong>metrics</strong> <strong>of</strong> Arisholm et al. . . 24<br />

3.1 Description <strong>of</strong> the SPECjvm98 benchmarks . . . . . . . . . . . . . . . 42<br />

3.2 Description <strong>of</strong> the JOlden benchmarks . . . . . . . . . . . . . . . . . 43<br />

3.3 Programs used for each case <strong>study</strong> . . . . . . . . . . . . . . . . . . . . 45<br />

4.1 Descriptive statistic results for all programs . . . . . . . . . . . . . . 61<br />

5.1 Matrix <strong>of</strong> unique accesses per object, for objects BlackNode 1 , . . . , BlackNode 4<br />

to classes GreyNode, QuadTreeNode <strong>an</strong>d WhiteNode . . . . . . . . . 72<br />

5.2 Descriptive statistic results for all programs . . . . . . . . . . . . . . 73<br />

D.1 Traditional-level mutation operators in µJava . . . . . . . . . . . . . 113<br />

D.2 Class-level mutation operators in µJava . . . . . . . . . . . . . . . . . 114<br />

xiv


Chapter 1<br />

Introduction<br />

S<strong>of</strong>tware <strong>metrics</strong> have become essential in some disciplines <strong>of</strong> s<strong>of</strong>tware engineering.<br />

In forward engineering they are used to measure s<strong>of</strong>tware quality <strong>an</strong>d to estimate<br />

the cost <strong>an</strong>d effort <strong>of</strong> s<strong>of</strong>tware projects [40]. In the field <strong>of</strong> s<strong>of</strong>tware evolution,<br />

<strong>metrics</strong> c<strong>an</strong> be used for identifying stable or unstable parts <strong>of</strong> s<strong>of</strong>tware systems,<br />

as well as identifying where refactorings c<strong>an</strong> be applied or have been applied [32],<br />

<strong>an</strong>d detecting increases or decreases <strong>of</strong> quality in the structure <strong>of</strong> evolving s<strong>of</strong>tware<br />

systems. In the field <strong>of</strong> s<strong>of</strong>tware re-engineering <strong>an</strong>d reverse engineering, <strong>metrics</strong> are<br />

used for assessing the quality <strong>an</strong>d complexity <strong>of</strong> s<strong>of</strong>tware systems, <strong>an</strong>d also to get a<br />

basic underst<strong>an</strong>ding <strong>an</strong>d provide clues about sensitive parts <strong>of</strong> s<strong>of</strong>tware systems [27].<br />

1.1 S<strong>of</strong>tware Metrics <strong>an</strong>d Complexity<br />

S<strong>of</strong>tware <strong>metrics</strong> evaluate different aspects <strong>of</strong> the complexity <strong>of</strong> a s<strong>of</strong>tware product.<br />

S<strong>of</strong>tware complexity was originally defined as “a measurement <strong>of</strong> the resources that<br />

must be expended in developing, testing, debugging, mainten<strong>an</strong>ce, user training,<br />

operation, <strong>an</strong>d correction <strong>of</strong> s<strong>of</strong>tware products” [94]. Complexity has been characterised<br />

in terms <strong>of</strong> seven different levels, the correlation <strong>an</strong>d interdependence <strong>of</strong><br />

which will determine the overall level <strong>of</strong> complexity in a s<strong>of</strong>tware product [44]. The<br />

levels are as follows:<br />

• Control Structure<br />

1


1.1. S<strong>of</strong>tware Metrics <strong>an</strong>d Complexity 2<br />

• Module Coupling<br />

• Algorithm<br />

• Code<br />

• Nesting<br />

• Module Cohesion<br />

• Data Structure<br />

However, most <strong>metrics</strong> measure only one s<strong>of</strong>tware complexity factor. These<br />

foundations <strong>of</strong> complexity will determine the internal quality <strong>of</strong> a product.<br />

Internal quality measures are those which are performed in terms <strong>of</strong> the s<strong>of</strong>tware<br />

product itself <strong>an</strong>d are measurable both during <strong>an</strong>d after the creation <strong>of</strong> the s<strong>of</strong>tware<br />

product. They have however, no inherent, practical me<strong>an</strong>ing within themselves. To<br />

give them me<strong>an</strong>ing they must be characterised in terms <strong>of</strong> the product’s external<br />

quality.<br />

External quality measures are evaluated with respect to how a product relates to<br />

its environment <strong>an</strong>d are deemed to be inherently me<strong>an</strong>ingful, such examples would<br />

be the maintainability or testability <strong>of</strong> a product.<br />

It should be noted that good internal quality is a requirement for good external<br />

quality. Figure 1.1 illustrates the s<strong>of</strong>tware quality model which depicts the relationship<br />

between these measures. Much research has contributed models <strong>an</strong>d measures<br />

<strong>of</strong> both internal s<strong>of</strong>tware quality attributes <strong>an</strong>d external attributes <strong>of</strong> a design. Although<br />

the relationships between these attributes is for the most part intuitive, e.g.,<br />

more complex code will require greater effort to maintain, the precise functional form<br />

<strong>of</strong> those relationships c<strong>an</strong> be less clear <strong>an</strong>d is the subject <strong>of</strong> intense practical <strong>an</strong>d<br />

research concern [31]. Empirical validation aims at demonstrating the usefulness<br />

<strong>of</strong> a measure in practice <strong>an</strong>d is, therefore, a crucial activity to establish the overall<br />

validity <strong>of</strong> a measure [6]. Therefore it is the belief <strong>of</strong> the author that a well-designed<br />

<strong>empirical</strong> <strong>study</strong> serves to clarify <strong>an</strong>d strengthen the observed relationships.


1.2. Traditional Measures <strong>of</strong> Complexity 3<br />

Metrics<br />

Complexity<br />

External<br />

Quality<br />

Quality<br />

In Use<br />

External<br />

Maintainability<br />

Coupling<br />

Testability<br />

Internal<br />

Cohesion<br />

Reusability<br />

Internal Quality<br />

Figure 1.1: The s<strong>of</strong>tware quality model shows how different measures <strong>of</strong> internal<br />

quality c<strong>an</strong> characterise the overall quality <strong>of</strong> a s<strong>of</strong>tware product<br />

1.2 Traditional Measures <strong>of</strong> Complexity<br />

The earliest s<strong>of</strong>tware measure, which was proposed in the late 1960s, is the Source<br />

Lines <strong>of</strong> Code (SLOC) metric, which is still used today. It is used to measure the<br />

amount <strong>of</strong> code in a s<strong>of</strong>tware program. It is typically used to estimate the amount <strong>of</strong><br />

effort that will be required to develop a program, as well as to estimate productivity<br />

or effort once the s<strong>of</strong>tware is produced. Two major types <strong>of</strong> SLOC measures exist:<br />

physical SLOC <strong>an</strong>d logical SLOC. Exact definitions <strong>of</strong> these measures vary. The<br />

most common definition <strong>of</strong> physical SLOC is a count <strong>of</strong> “non-bl<strong>an</strong>k, non-comment<br />

lines” in the text <strong>of</strong> the program’s source code. Logical SLOC measures attempt<br />

to measure the number <strong>of</strong> “statements”, however their specific definitions are tied<br />

to specific computer l<strong>an</strong>guages. Therefore, it is much easier to create tools that<br />

measure physical SLOC, <strong>an</strong>d physical SLOC definitions are easier to explain. However,<br />

physical SLOC measures are sensitive to logically irrelev<strong>an</strong>t formatting <strong>an</strong>d<br />

style conventions, while logical SLOC is less sensitive to formatting <strong>an</strong>d style con-


1.3. Object-Oriented Metrics 4<br />

ventions.<br />

The are a number <strong>of</strong> drawbacks <strong>of</strong> using a crude measure such as LOC as a surrogate<br />

measure for different notions <strong>of</strong> program size such as effort, functionality <strong>an</strong>d<br />

complexity. The need for more discriminating measures became especially urgent<br />

with the increasing diversity <strong>of</strong> programming l<strong>an</strong>guages, as LOC in <strong>an</strong> assembly<br />

l<strong>an</strong>guage is not comparable in effort, functionality, or complexity to <strong>an</strong> LOC in a<br />

high-level l<strong>an</strong>guage [39].<br />

Thus from the mid-1970s there was <strong>an</strong> increase in the number <strong>of</strong> different complexity<br />

<strong>metrics</strong> defined. Some <strong>of</strong> the more prevalent ones were Halstead’s s<strong>of</strong>tware<br />

science <strong>metrics</strong> [47], which made <strong>an</strong> attempt to capture notions <strong>of</strong> size <strong>an</strong>d complexity<br />

beyond simply counting lines <strong>of</strong> code. Although the work has had a lasting<br />

impact they are principally regarded as <strong>an</strong> example <strong>of</strong> confused <strong>an</strong>d inadequate<br />

measurements [40].<br />

McCabe defined a measure known as Cyclomatic Complexity [71]. It may be<br />

considered as a broad measure <strong>of</strong> soundness <strong>an</strong>d confidence for a program. It measures<br />

the number <strong>of</strong> linearly-independent paths through a program module <strong>an</strong>d it is<br />

intended to be independent <strong>of</strong> l<strong>an</strong>guage <strong>an</strong>d l<strong>an</strong>guage format.<br />

Function points, which were pioneered by Albrecht [2] in 1977, are a measure<br />

<strong>of</strong> the size <strong>of</strong> computer applications <strong>an</strong>d the projects that build them. The size is<br />

measured from a functional, or user, point <strong>of</strong> view. It is independent <strong>of</strong> the computer<br />

l<strong>an</strong>guage, development methodology, technology or capability <strong>of</strong> the project team<br />

used to develop the application. The original metric has been augmented <strong>an</strong>d refined<br />

to cover more th<strong>an</strong> the original emphasis on business-related data processing.<br />

However as object-oriented techniques became more prevalent there was <strong>an</strong> increasing<br />

need for <strong>metrics</strong> that could correctly evaluate their properties.<br />

1.3 Object-Oriented Metrics<br />

Object-oriented design <strong>an</strong>d development is becoming very popular in today’s s<strong>of</strong>tware<br />

development environment. Object-oriented development requires not only a<br />

different approach to design <strong>an</strong>d implementation but it requires a different approach


1.4. Definitions <strong>of</strong> Coupling 5<br />

to s<strong>of</strong>tware <strong>metrics</strong>. Since object oriented technology uses objects <strong>an</strong>d not algorithms<br />

as its fundamental building blocks, the approach to s<strong>of</strong>tware <strong>metrics</strong> for object oriented<br />

programs must be different from the st<strong>an</strong>dard <strong>metrics</strong> set. Metrics, such as<br />

lines <strong>of</strong> code <strong>an</strong>d cyclomatic complexity, have become accepted as st<strong>an</strong>dard for traditional<br />

functional/ procedural programs <strong>an</strong>d were used to evaluate object-oriented<br />

environments at the beginning <strong>of</strong> the object-oriented design revolution. However,<br />

traditional <strong>metrics</strong> for procedural approaches are not adequate for evaluating objectoriented<br />

s<strong>of</strong>tware, primarily because they are not designed to measure basic elements<br />

like classes, objects, polymorphism, <strong>an</strong>d message-passing. Even when adjusted to<br />

syntactically <strong>an</strong>alyse object-oriented s<strong>of</strong>tware they c<strong>an</strong> only capture a small part <strong>of</strong><br />

such s<strong>of</strong>tware <strong>an</strong>d thus provide a weak quality indication [50, 65]. Since this <strong>time</strong><br />

there have been m<strong>an</strong>y proposed object-oriented <strong>metrics</strong> in the literature. The question<br />

now is, which object-oriented <strong>metrics</strong> should a project use As the quality<br />

<strong>of</strong> object-oriented s<strong>of</strong>tware, like other s<strong>of</strong>tware, is a complex concept there c<strong>an</strong> be<br />

no single, simple measure <strong>of</strong> s<strong>of</strong>tware quality acceptable to everyone. To assess or<br />

improve s<strong>of</strong>tware quality in you must define the aspects <strong>of</strong> quality in which you are<br />

interested, then decide how you are going to measure them. By defining quality<br />

in a measurable way, you make it easier for other people to underst<strong>an</strong>d your viewpoint<br />

<strong>an</strong>d relate your notions to their own [60]. As illustrated by Chapter 2 some <strong>of</strong><br />

the seminal methods <strong>of</strong> evaluating <strong>an</strong> object-oriented design are through the use <strong>of</strong><br />

measures for <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong>.<br />

1.4 Definitions <strong>of</strong> Coupling<br />

Stevens et al. [95] first introduced <strong>coupling</strong> in the context <strong>of</strong> structured development<br />

techniques. They defined <strong>coupling</strong> as “the measure <strong>of</strong> the strength <strong>of</strong> association<br />

established by a connection from one module to <strong>an</strong>other”. They stated that the<br />

stronger the <strong>coupling</strong> between modules, that is, the more inter-related they are, the<br />

more difficult these modules are to underst<strong>an</strong>d, ch<strong>an</strong>ge <strong>an</strong>d correct <strong>an</strong>d thus the<br />

more complex the resulting s<strong>of</strong>tware system.<br />

Myers [82] refined the concept <strong>of</strong> <strong>coupling</strong> by defining six distinct levels <strong>of</strong> cou-


1.5. Definitions <strong>of</strong> Cohesion 6<br />

pling. However <strong>coupling</strong> could only be determined by h<strong>an</strong>d as the definitions were<br />

neither precise nor prescriptive, leaving room for subjective interpretations <strong>of</strong> the<br />

levels.<br />

Const<strong>an</strong>tine <strong>an</strong>d Yourdon [29] also stated that the modularity <strong>of</strong> s<strong>of</strong>tware design<br />

c<strong>an</strong> be measured by <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong>. They stated that <strong>coupling</strong> between two<br />

units reflect the interconnections between units <strong>an</strong>d that faults in one unit may<br />

affect the coupled unit.<br />

Page <strong>an</strong>d Jones [89] ordered <strong>coupling</strong> into eight different levels according to their<br />

effects on the underst<strong>an</strong>dability, maintainability, modifiability <strong>an</strong>d reusability <strong>of</strong> the<br />

coupled modules.<br />

Troy <strong>an</strong>d Zweben [98] showed that <strong>coupling</strong> between units is a good indicator<br />

<strong>of</strong> the number <strong>of</strong> faults in s<strong>of</strong>tware. However their <strong>study</strong> was based on subjective<br />

interpretation <strong>of</strong> design documents instead <strong>of</strong> real code.<br />

Offutt et al. [85] extended the eight levels <strong>of</strong> <strong>coupling</strong> to twelve thus providing a<br />

finer grained measure <strong>of</strong> <strong>coupling</strong>. They also described algorithms to automatically<br />

measure the <strong>coupling</strong> level between each pair <strong>of</strong> units in a program. The <strong>coupling</strong><br />

levels are defined between pairs <strong>of</strong> units A <strong>an</strong>d B. For each <strong>coupling</strong> level the parameters<br />

are classified by the way they are used. Uses are classified into computation<br />

uses (C-uses) [42], predicate uses (P-uses) <strong>an</strong>d indirect uses (I-uses) [85]. A C-use<br />

occurs when a variable is used on the right side <strong>of</strong> <strong>an</strong> assignment statement, in <strong>an</strong><br />

output statement, or a procedure call. A P-use occurs when a variable is used in a<br />

predicate statement. An I-use occurs when a variable is used in <strong>an</strong> assignment to<br />

<strong>an</strong>other variable <strong>an</strong>d the defined variable is later used in a predicate. The I-use is<br />

considered to be in the predicate rather th<strong>an</strong> in the assignment.<br />

1.5 Definitions <strong>of</strong> Cohesion<br />

The <strong>cohesion</strong> <strong>of</strong> a module is the extent to which its individual components are<br />

needed to perform the same task [40]. Cohesion was first introduced within the<br />

context <strong>of</strong> module design by Stevens et al. [95]. In their definition, the <strong>cohesion</strong> <strong>of</strong> a<br />

module is measured by inspecting the association between all pairs <strong>of</strong> its processing


1.6. Static <strong>an</strong>d Run-<strong>time</strong> Metrics 7<br />

elements. The term processing element was defined as <strong>an</strong> action performed by<br />

a module such as a statement, procedure call, or something which must be done<br />

in a module but which has not yet been reduced to code [29]. Their definition<br />

was informal thereby leaving it open for interpretation. They developed a scale <strong>of</strong><br />

<strong>cohesion</strong> that provide <strong>an</strong> ordinal scale <strong>of</strong> measurement that describes the degree to<br />

which the actions performed by a module contribute to a unified function. There<br />

are seven categories <strong>of</strong> <strong>cohesion</strong> which r<strong>an</strong>ge from the most desirable (functional) to<br />

least desirable (coincidental). They stated that it is possible for a module to exhibit<br />

more th<strong>an</strong> one type <strong>of</strong> <strong>cohesion</strong>, in this case the module is categorized by its least<br />

desirable type <strong>of</strong> <strong>cohesion</strong>. In the principle <strong>of</strong> good s<strong>of</strong>tware design it is desirable to<br />

have highly cohesive modules, preferably functional.<br />

Emerson [36,37] based his <strong>cohesion</strong> measure on a control flow graph representation<br />

<strong>of</strong> a module. The r<strong>an</strong>ge <strong>of</strong> this complexity measure varies from 0 to 1. Emerson<br />

indicates that his method for computing <strong>cohesion</strong> is related to program slicing. He<br />

reclassifies the seven levels <strong>of</strong> <strong>cohesion</strong> into three.<br />

Ott <strong>an</strong>d Thuss [88] used program slicing to evaluate their <strong>cohesion</strong> measurements.<br />

They reclassified the original seven levels <strong>of</strong> <strong>cohesion</strong> into four categories.<br />

Lakhotia [61] codified the natural l<strong>an</strong>guage definitions <strong>of</strong> the seven levels <strong>of</strong><br />

<strong>cohesion</strong>. He developed a method for computing <strong>cohesion</strong> based on <strong>an</strong> <strong>an</strong>alysis <strong>of</strong><br />

the variable dependence graphs <strong>of</strong> a module. Pairs <strong>of</strong> outputs were examined to<br />

identify <strong>an</strong>y data or control dependences that exist between the two outputs. Rules<br />

were provided for determining the <strong>cohesion</strong> <strong>of</strong> the pairs.<br />

1.6 Static <strong>an</strong>d Run-<strong>time</strong> Metrics<br />

A large number <strong>of</strong> <strong>metrics</strong> have been proposed to measure object-oriented design<br />

quality. Design <strong>metrics</strong> c<strong>an</strong> be classified into two categories; static <strong>an</strong>d <strong>run</strong><strong>time</strong>/dynamic.<br />

Static <strong>metrics</strong> measure what may happen when a program is executed<br />

<strong>an</strong>d are said to qu<strong>an</strong>tify different aspects <strong>of</strong> the complexity <strong>of</strong> the source<br />

code. Run-<strong>time</strong> <strong>metrics</strong> measure what actually happens when a program is executed.<br />

They evaluate the source code’s <strong>run</strong>-<strong>time</strong> characteristics <strong>an</strong>d behaviour as


1.7. Factors Influencing S<strong>of</strong>tware Metrics 8<br />

well as its complexity.<br />

Despite the rich body <strong>of</strong> research <strong>an</strong>d practice in developing design quality <strong>metrics</strong>,<br />

there has been less emphasis on <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> for object-oriented designs<br />

mainly due to the fact that a <strong>run</strong>-<strong>time</strong> code <strong>an</strong>alysis is more expensive <strong>an</strong>d complex<br />

to perform. [99]. However, due to polymorphism, dynamic binding, <strong>an</strong>d the common<br />

presence <strong>of</strong> unused (dead) code in s<strong>of</strong>tware, static <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> measures<br />

do not perfectly reflect the actual situation taking place amongst classes at <strong>run</strong>-<strong>time</strong>.<br />

The complex dynamic behaviour <strong>of</strong> m<strong>an</strong>y real-<strong>time</strong> applications motivates a shift<br />

in interest from traditional static <strong>metrics</strong> to <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>. In this work, we investigate<br />

whether useful information on design quality c<strong>an</strong> be provided by <strong>run</strong>-<strong>time</strong><br />

measures <strong>of</strong> <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> over <strong>an</strong>d above that which is given by simple<br />

static measures. This will determine if it is worthwhile to continue the investigation<br />

into <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> <strong>metrics</strong> <strong>an</strong>d their relationship with the external<br />

quality.<br />

1.7 Factors Influencing S<strong>of</strong>tware Metrics<br />

This section discusses factors which affect s<strong>of</strong>tware <strong>metrics</strong>, including coverage <strong>an</strong>d<br />

object-level behaviour. The relationship with s<strong>of</strong>tware testing is also discussed.<br />

1.7.1 Coverage<br />

When relating static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> measures, it is import<strong>an</strong>t to have a thorough underst<strong>an</strong>ding<br />

<strong>of</strong> the degree to which the <strong>an</strong>alysed source code corresponds to the code<br />

that is actually executed. In this thesis, this relationship is studied using instruction<br />

coverage measures with regard to the influence <strong>of</strong> coverage on the relationship<br />

between static <strong>an</strong>d dynamic <strong>metrics</strong>. It is proposed that coverage results have a signific<strong>an</strong>t<br />

influence on the relationship <strong>an</strong>d thus should always be a measured, recorded<br />

factor in <strong>an</strong>y such comparison.


1.8. Aims <strong>of</strong> Thesis 9<br />

1.7.2 Metrics <strong>an</strong>d Object Behaviour<br />

To date little work has been done on the <strong>an</strong>alysis <strong>of</strong> code at the object-level, that is<br />

the use <strong>of</strong> <strong>metrics</strong> to identify specific object behaviours. We identify this behaviour<br />

through the use <strong>of</strong> <strong>run</strong>-<strong>time</strong> object-level <strong>coupling</strong> <strong>metrics</strong>. Run-<strong>time</strong> object-level<br />

<strong>coupling</strong> qu<strong>an</strong>tifies the level <strong>of</strong> dependencies between objects in a system whereas<br />

<strong>run</strong>-<strong>time</strong> class-level <strong>coupling</strong> qu<strong>an</strong>tifies the level <strong>of</strong> dependencies between the classes<br />

that implement the methods or variables <strong>of</strong> the caller object <strong>an</strong>d the receiver object<br />

[5]. The class <strong>of</strong> the object sending or receiving a message may be different from the<br />

class implementing the corresponding method due to the impact <strong>of</strong> inherit<strong>an</strong>ce. We<br />

also investigate the ability <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> measures to predict such behaviour.<br />

1.7.3 Metrics <strong>an</strong>d S<strong>of</strong>tware Testing<br />

Testing is one <strong>of</strong> the most effort-intensive activities during s<strong>of</strong>tware development [7].<br />

Much research is directed toward developing new <strong>an</strong>d improved fault detection mech<strong>an</strong>isms.<br />

A number <strong>of</strong> papers have investigated the relationships between static design<br />

<strong>metrics</strong> <strong>an</strong>d the detection <strong>of</strong> faults in object-oriented s<strong>of</strong>tware [6, 15]. However, to<br />

date no work has been conducted on the correlation <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> <strong>an</strong>d<br />

fault detection. In this thesis, we investigate whether measures for <strong>run</strong>-<strong>time</strong> <strong>coupling</strong><br />

are good predictors <strong>of</strong> fault-proneness, <strong>an</strong> import<strong>an</strong>t s<strong>of</strong>tware quality attribute.<br />

1.8 Aims <strong>of</strong> Thesis<br />

In summary, the central aims <strong>of</strong> this thesis are to outline operational definitions for<br />

<strong>run</strong>-<strong>time</strong> class <strong>an</strong>d object-level <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> <strong>metrics</strong> suitable for evaluating<br />

the quality <strong>of</strong> <strong>an</strong> object-oriented application. The motivation for these measures<br />

is to complement existing measures that are based on static <strong>an</strong>alysis by actually<br />

measuring <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> at <strong>run</strong><strong>time</strong>.<br />

It is necessary to provide tools for accurately collecting such measures for Java<br />

systems effectively. Java was chosen as the target l<strong>an</strong>guage for this <strong>an</strong>alysis because<br />

Java is executed on a virtual machine which makes it relatively simple to<br />

collect <strong>run</strong>-<strong>time</strong> trace information in comparison to l<strong>an</strong>guages like C or C++. Java


1.9. Structure <strong>of</strong> Thesis 10<br />

also combines a wide r<strong>an</strong>ge <strong>of</strong> l<strong>an</strong>guage features found in different programming<br />

l<strong>an</strong>guages, for example, <strong>an</strong> object-oriented model, exception h<strong>an</strong>dling <strong>an</strong>d garbage<br />

collection. Its features <strong>of</strong> portability, robustness, simplicity <strong>an</strong>d security have made<br />

it increasingly popular within the s<strong>of</strong>tware engineering community, underpinning its<br />

import<strong>an</strong>ce <strong>an</strong>d providing a good selection <strong>of</strong> sample applications for <strong>study</strong>.<br />

Finally, a thorough <strong>empirical</strong> investigation using both Java benchmark <strong>an</strong>d realworld<br />

programs needs to be performed. The objectives <strong>of</strong> this are:<br />

1. To assess the fundamental properties <strong>of</strong> the <strong>run</strong>-<strong>time</strong> measures <strong>an</strong>d to investigate<br />

whether they are redund<strong>an</strong>t with respect to the most commonly used<br />

<strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> measures, as defined by Chidamber <strong>an</strong>d Kemerer [26].<br />

2. To examine the influence <strong>of</strong> test case coverage on the relationship between<br />

static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong>. Intuitively, one would expect the better<br />

the coverage <strong>of</strong> the test cases used, the better the static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>metrics</strong><br />

should correlate.<br />

3. To investigate <strong>run</strong>-<strong>time</strong> object behaviour, that is, to determine if objects from<br />

the same class behave differently at <strong>run</strong>-<strong>time</strong>, through the use <strong>of</strong> object-level<br />

<strong>coupling</strong> <strong>metrics</strong>.<br />

4. To investigate <strong>run</strong>-<strong>time</strong> object behaviour using <strong>run</strong>-<strong>time</strong> measures for <strong>cohesion</strong>.<br />

5. To conduct a <strong>study</strong> investigating the correlation between <strong>run</strong>-<strong>time</strong> <strong>coupling</strong><br />

measures <strong>an</strong>d fault detection in object-oriented s<strong>of</strong>tware.<br />

1.9 Structure <strong>of</strong> Thesis<br />

This thesis describes how <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> c<strong>an</strong> be defined <strong>an</strong>d precisely measured<br />

based on the <strong>run</strong>-<strong>time</strong> <strong>an</strong>alysis <strong>of</strong> systems. An <strong>empirical</strong> evaluation <strong>of</strong> the proposed<br />

<strong>run</strong>-<strong>time</strong> measures is reported using a selection <strong>of</strong> benchmarks <strong>an</strong>d real-world<br />

Java applications. An investigation is conducted to determine if these measures are<br />

redund<strong>an</strong>t with respect to their static counterparts. We also determine if coverage<br />

has a signific<strong>an</strong>t impact on the correlation between static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>. We


1.9. Structure <strong>of</strong> Thesis 11<br />

examine object behaviour using a <strong>run</strong>-<strong>time</strong> object-level <strong>coupling</strong> metric <strong>an</strong>d we investigate<br />

the relationship <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> on this. Finally, we <strong>study</strong><br />

the fault detection capabilities <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> measures.<br />

Chapter 2 presents a literature survey <strong>of</strong> <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> <strong>metrics</strong> <strong>an</strong>d<br />

associated studies. Chapter 3 defines the <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> used in this <strong>study</strong> <strong>an</strong>d<br />

outlines the experimental tools <strong>an</strong>d techniques. Chapter 4 presents a case <strong>study</strong> on<br />

the correlation between static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> measures <strong>an</strong>d the influence <strong>of</strong><br />

coverage on this correlation. Chapter 5 discusses a case <strong>study</strong> on object behaviour<br />

<strong>an</strong>d the impact <strong>of</strong> <strong>cohesion</strong> on this. Chapter 6 presents a case <strong>study</strong> on <strong>run</strong>-<strong>time</strong><br />

<strong>coupling</strong> <strong>metrics</strong> <strong>an</strong>d fault detection. Chapter 7 presents the final conclusions <strong>an</strong>d<br />

discusses future work.


Chapter 2<br />

Literature Review<br />

In this chapter, a comprehensive survey <strong>an</strong>d literature review <strong>of</strong> existing static <strong>an</strong>d<br />

<strong>run</strong>-<strong>time</strong>/dynamic measures <strong>an</strong>d frameworks for <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> in objectoriented<br />

systems is presented. Previous work which describes a <strong>coupling</strong> based<br />

testing approach for object-oriented s<strong>of</strong>tware is presented. Finally, the role coverage<br />

measures play in s<strong>of</strong>tware testing is discussed. In Section 2.1 <strong>an</strong>d 2.3, we present<br />

existing <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> measures <strong>an</strong>d discuss them. Sections 2.2 <strong>an</strong>d 2.4,<br />

present alternative frameworks for <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong>. Measures for the <strong>run</strong>-<strong>time</strong><br />

evaluation <strong>of</strong> <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> are presented in Sections 2.5 <strong>an</strong>d 2.6 respectively.<br />

Other work in studies <strong>of</strong> dynamic behaviour is described in Section 2.7. A discussion<br />

<strong>of</strong> coverage <strong>metrics</strong> <strong>an</strong>d the role they play in s<strong>of</strong>tware testing is presented in Section<br />

2.8. Previous work by the author is discussed in 2.9. Finally, a description <strong>of</strong> the<br />

<strong>run</strong>-<strong>time</strong> measures used in the subsequent case studies are provided in Section 2.10.<br />

2.1 Static Coupling Metrics<br />

There exists a large variety <strong>of</strong> measurements for <strong>coupling</strong>. A comprehensive review<br />

<strong>of</strong> existing measures performed by Bri<strong>an</strong>d et al. [13] found that more th<strong>an</strong> thirty<br />

different measures <strong>of</strong> object-oriented <strong>coupling</strong> exist. The most prevalent ones are<br />

explained in the following subsections:<br />

12


2.1. Static Coupling Metrics 13<br />

2.1.1 Chidamber <strong>an</strong>d Kemerer<br />

In their papers [25, 26] Chidamber <strong>an</strong>d Kemerer propose <strong>an</strong>d validate a set <strong>of</strong> six<br />

s<strong>of</strong>tware <strong>metrics</strong> for object-oriented systems, including two measures for <strong>coupling</strong>.<br />

As these are the most accepted <strong>an</strong>d widely used <strong>coupling</strong> <strong>metrics</strong>, we use these as<br />

the basis for our <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> measures.<br />

Coupling Between Objects (CBO)<br />

They first define a measure CBO for a class as, “a count <strong>of</strong> the number <strong>of</strong> noninherit<strong>an</strong>ce<br />

related couples with other classes” [25]. An object <strong>of</strong> a class is coupled<br />

to <strong>an</strong>other if the methods <strong>of</strong> one class use the methods or attributes <strong>of</strong> the other.<br />

They later revise this definition to state, “CBO for a class is a count <strong>of</strong> the number<br />

<strong>of</strong> other classes to which it is coupled ” [26]. A footnote added that “this includes<br />

<strong>coupling</strong> due to inherit<strong>an</strong>ce.”<br />

They state that <strong>coupling</strong> has <strong>an</strong> adverse effect on the mainten<strong>an</strong>ce, reuse <strong>an</strong>d<br />

testing <strong>of</strong> a design <strong>an</strong>d that excessive <strong>coupling</strong> between object classes is detrimental<br />

to modular design <strong>an</strong>d prevents reuse. As the more independent a class is, the easier<br />

it is to reuse in <strong>an</strong>other application. They state that inter-object class couples should<br />

be kept to a minimum in order to improve modularity <strong>an</strong>d promote encapsulation.<br />

The larger the number <strong>of</strong> couples, the higher the sensitivity to ch<strong>an</strong>ges in other parts<br />

<strong>of</strong> the design, making mainten<strong>an</strong>ce more difficult. A measure <strong>of</strong> <strong>coupling</strong> is useful<br />

to determine how complex the testing <strong>of</strong> various parts <strong>of</strong> a design are likely to be.<br />

The higher the inter-object class <strong>coupling</strong> the more rigorous the testing needs to be.<br />

Response for class (RFC):<br />

The response set (RS) <strong>of</strong> a class is a set <strong>of</strong> methods that c<strong>an</strong> potentially be executed<br />

in response to a message received by <strong>an</strong> object <strong>of</strong> that class. RFC is simply the<br />

number <strong>of</strong> methods in the set, that is, RFC = #{RS}. A given method is counted<br />

only once. Since RFC specifically includes methods called from outside the class,<br />

it is also a measure <strong>of</strong> the potential communication between the class <strong>an</strong>d other<br />

classes.


2.1. Static Coupling Metrics 14<br />

RS = M ∪ alli R i = [∪ i∈M R i ] (2.1)<br />

Equation 2.1 gives the response set for a class where R i is the set <strong>of</strong> methods<br />

called by the method i <strong>an</strong>d M is the set <strong>of</strong> all methods in the class.<br />

If a large number <strong>of</strong> methods c<strong>an</strong> be invoked in response to a message, the testing<br />

<strong>an</strong>d debugging <strong>of</strong> the class becomes more complicated since it requires a greater level<br />

<strong>of</strong> underst<strong>an</strong>ding on the part <strong>of</strong> the tester. The complexity <strong>of</strong> a class increases with<br />

the number <strong>of</strong> methods that c<strong>an</strong> be invoked from it.<br />

2.1.2 Other Coupling Metrics<br />

In their paper [63] Li <strong>an</strong>d Henry identify a number <strong>of</strong> <strong>metrics</strong> that c<strong>an</strong> predict the<br />

maintainability <strong>of</strong> a design. They define two measures, message passing <strong>coupling</strong><br />

(MPC) <strong>an</strong>d data abstraction <strong>coupling</strong> (DAC). MPC is defined as the number <strong>of</strong><br />

send statements defined in a class. The number <strong>of</strong> send statements sent out from a<br />

class may indicate how dependent the implementation <strong>of</strong> the local methods is on the<br />

methods in other classes. MPC only counts invocations <strong>of</strong> methods <strong>of</strong> other classes,<br />

not its own. DAC is defined as “the number <strong>of</strong> abstract data types (ADT) defined<br />

in a class”. An ADT is defined in a class c if it is the type <strong>of</strong> <strong>an</strong> attribute <strong>of</strong> class c.<br />

It is also specified that “the number <strong>of</strong> variables having <strong>an</strong> ADT type may indicate<br />

the number <strong>of</strong> data structures dependent on the definitions <strong>of</strong> other classes”.<br />

Martin describes two <strong>coupling</strong> <strong>metrics</strong> that c<strong>an</strong> be used to measure the quality <strong>of</strong><br />

<strong>an</strong> object-oriented design in terms <strong>of</strong> the interdependence between the subsystems<br />

<strong>of</strong> that design [70]. Afferent Coupling (Ca) is the number <strong>of</strong> classes outside this<br />

category that depend upon classes within this category. Efferent Coupling (Ce) is<br />

the number <strong>of</strong> classes inside this category that depend upon classes outside this<br />

category. A category is a set <strong>of</strong> classes that belong together in the sense that<br />

they achieve some common goal. Martin does not specify exactly what constitutes<br />

dependencies between classes.<br />

Abreu et al. present a <strong>coupling</strong> metric known as Coupling Factor (COF) for the<br />

design quality evaluation <strong>of</strong> object-oriented s<strong>of</strong>tware systems [1]. COF is the actual<br />

number <strong>of</strong> client-server relationships between classes that are not related via inher-


2.2. Frameworks for Static Coupling Measurement 15<br />

it<strong>an</strong>ce divided by the maximum possible number <strong>of</strong> such client-server relationships.<br />

It is normalised to r<strong>an</strong>ge between 0 <strong>an</strong>d 1 to allow for comparisons for systems <strong>of</strong><br />

different sizes. It was not specified how to account for such factors as polymorphism<br />

<strong>an</strong>d method overriding.<br />

Lee et al. measure <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> <strong>of</strong> <strong>an</strong> object-oriented program based<br />

on information flow through programs [62]. They define a measure, Informationflow-based<br />

<strong>coupling</strong> (ICP), that counts for a method m <strong>of</strong> a class c, the number<br />

<strong>of</strong> methods that are invoked polymorphically from other classes, weighted by the<br />

number <strong>of</strong> parameters <strong>of</strong> the invoked method. This count c<strong>an</strong> be scaled up to<br />

classes <strong>an</strong>d subsystems. They go on to derive two more sets <strong>of</strong> measures which<br />

measure inherit<strong>an</strong>ce-based <strong>coupling</strong> (<strong>coupling</strong> to <strong>an</strong>cestor classes (IH-ICP)) <strong>an</strong>d<br />

noninherit<strong>an</strong>ce-based <strong>coupling</strong> (<strong>coupling</strong> to unrelated classes (NIH-ICP)) <strong>an</strong>d deduce<br />

that ICP is simply the sum <strong>of</strong> IH-ICP <strong>an</strong>d NIH-ICP<br />

Bri<strong>an</strong>d et al. perform a comprehensive <strong>empirical</strong> validation <strong>of</strong> product measures,<br />

such as <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong>, in object-oriented systems <strong>an</strong>d explore the probability<br />

<strong>of</strong> fault detection in system classes during testing [11]. They define a number <strong>of</strong><br />

measures which count the number <strong>of</strong> class-attribute (CA), class-method (CM) <strong>an</strong>d<br />

method-method (MM) interactions for each class. They take into account which<br />

class the interactions originate from or are directed at <strong>an</strong>d the number <strong>of</strong> <strong>an</strong>cestor<br />

or other classes. A CA-interaction occurs from class c to class d if <strong>an</strong> attribute <strong>of</strong><br />

class c is <strong>of</strong> type class d. A CM-interaction occurs from class c to class d if a newly<br />

defined method <strong>of</strong> class c has a parameter <strong>of</strong> type class d. An MM-interaction occurs<br />

from class c to class d if a method implemented at class c statically invokes a newly<br />

defined or overriding method <strong>of</strong> class d, or receives a pointer to such a method. This<br />

set has sixteen <strong>metrics</strong> in total.<br />

2.2 Frameworks for Static Coupling Measurement<br />

Several different authors describe frameworks to characterise different approaches to<br />

<strong>coupling</strong> <strong>an</strong>d to assign relative strengths to different types <strong>of</strong> <strong>coupling</strong>. A framework<br />

defines what constitutes <strong>coupling</strong>. This is done in <strong>an</strong> attempt to determine the


2.2. Frameworks for Static Coupling Measurement 16<br />

potential use <strong>of</strong> <strong>coupling</strong> <strong>metrics</strong> <strong>an</strong>d how different <strong>metrics</strong> complement each other.<br />

There are three existing frameworks:<br />

2.2.1 Eder et al.<br />

Eder et al. identify three different types <strong>of</strong> relationships [34]. These are, interaction<br />

relationships between methods, component relationships between classes, <strong>an</strong>d inherit<strong>an</strong>ce<br />

between classes. These relationships are then used to derive three different<br />

dimensions <strong>of</strong> <strong>coupling</strong> which are classified according to different strengths:<br />

1. Interaction <strong>coupling</strong>: Two methods are said to be interaction coupled if i) one<br />

method invokes the other, or ii) they communicate via the sharing <strong>of</strong> data.<br />

There are seven types <strong>of</strong> interaction <strong>coupling</strong>.<br />

2. Component <strong>coupling</strong>: Two classes c <strong>an</strong>d d are component coupled, if d is the<br />

type <strong>of</strong> either i) <strong>an</strong> attribute <strong>of</strong> c, or ii) <strong>an</strong> input or output parameter <strong>of</strong> a<br />

method <strong>of</strong> c, or iii) a local variable <strong>of</strong> a method <strong>of</strong> c, or iv) <strong>an</strong> input or output<br />

parameter <strong>of</strong> a method invoked within a method <strong>of</strong> c. There are four different<br />

degrees <strong>of</strong> component <strong>coupling</strong>.<br />

3. Inherit<strong>an</strong>ce <strong>coupling</strong>: two classes c <strong>an</strong>d d are inherit<strong>an</strong>ce coupled, if one class<br />

is <strong>an</strong> <strong>an</strong>cestor <strong>of</strong> the other. There are four degrees <strong>of</strong> inherit<strong>an</strong>ce <strong>coupling</strong>.<br />

2.2.2 Hitz <strong>an</strong>d Montazeri<br />

Hitz <strong>an</strong>d Montazeri derive two different types <strong>of</strong> <strong>coupling</strong>, object <strong>an</strong>d class-level <strong>coupling</strong><br />

[52]. These are determined by the state <strong>of</strong> <strong>an</strong> object (value <strong>of</strong> its attributes at<br />

a given moment at <strong>run</strong><strong>time</strong>) <strong>an</strong>d state <strong>of</strong> <strong>an</strong> object’s implementation (class interface<br />

<strong>an</strong>d body at a given <strong>time</strong> in the development cycle).<br />

Class level <strong>coupling</strong> (CLC) results from state dependencies between two classes<br />

in a system during the development cycle. This c<strong>an</strong> only be determined from a static<br />

<strong>an</strong>alysis <strong>of</strong> the design documents or source code. This is import<strong>an</strong>t when considering<br />

mainten<strong>an</strong>ce <strong>an</strong>d ch<strong>an</strong>ge dependencies as ch<strong>an</strong>ges in one class may lead to ch<strong>an</strong>ges<br />

in other classes which use it.


2.2. Frameworks for Static Coupling Measurement 17<br />

Object level <strong>coupling</strong> (OLC) results from state dependencies between two objects<br />

during the <strong>run</strong>-<strong>time</strong> <strong>of</strong> a system. This depends on concrete object structure at <strong>run</strong><strong>time</strong>,<br />

which in turn is determined by actual input data. Therefore, it is a function<br />

<strong>of</strong> design or source code <strong>an</strong>d input data at <strong>run</strong>-<strong>time</strong>. This is relev<strong>an</strong>t for <strong>run</strong>-<strong>time</strong><br />

oriented activities such as testing <strong>an</strong>d debugging.<br />

2.2.3 Bri<strong>an</strong>d et al.<br />

In the framework by Bri<strong>an</strong>d et al. <strong>coupling</strong> is constituted as interactions between<br />

classes [14]. The strength is determined by the type <strong>of</strong> the interaction (Class-<br />

Attribute, Class-Method, Method-Method), the relationship between the classes (Inherit<strong>an</strong>ce,<br />

Other) <strong>an</strong>d the interaction’s locus <strong>of</strong> impact (Import/Client, Export/Server).<br />

They assign no strengths to the different kinds <strong>of</strong> interactions. There are three basic<br />

criteria in the framework which are as follows:<br />

1. Type <strong>of</strong> interaction: This determines the mech<strong>an</strong>ism by which two classes are<br />

coupled. A class-attribute interaction is present if aggregation occurs, that is,<br />

if a class c is the type <strong>of</strong> <strong>an</strong> attribute <strong>of</strong> class d. A class-method interaction<br />

occurs if a class c is the type <strong>of</strong> a parameter <strong>of</strong> method m d <strong>of</strong> a class d, or if a<br />

class c is the return type <strong>of</strong> method m d . A method-method interaction occurs<br />

if a m d <strong>of</strong> a class d directly invokes a method m c , or a method m d receives via<br />

parameter a pointer to m c thereby invoking m c indirectly.<br />

2. Relationship: An inherit<strong>an</strong>ce relationship occurs if a class c is <strong>an</strong> <strong>an</strong>cestor <strong>of</strong><br />

class d or vice versa. Friendship is present if a class c declares class d as its<br />

friend, which gr<strong>an</strong>ts class d access to the non-public elements <strong>of</strong> c. There is<br />

<strong>an</strong>other relationship when no inherit<strong>an</strong>ce or friendship relationship is present<br />

between classes c <strong>an</strong>d d.<br />

3. Locus: If a class c is involved in <strong>an</strong> interaction with <strong>an</strong>other class, a distinction<br />

is made between export <strong>an</strong>d import <strong>coupling</strong>. Export is when a class c is the<br />

used class or server in the interaction. Import is when a class c is the using<br />

class or client in the interaction.


2.3. Static Cohesion Metrics 18<br />

2.2.4 Revised Framework by Bri<strong>an</strong>d et al.<br />

Bri<strong>an</strong>d et al. outline a new unified framework for <strong>coupling</strong> in object-oriented systems<br />

[13]. It is characterised based on the issues identified by comparing existing <strong>coupling</strong><br />

frameworks. There are six different criteria in the framework <strong>an</strong>d each criterion<br />

determines one basic aspect <strong>of</strong> the resulting measure. The criteria are as follows:<br />

1. The type <strong>of</strong> connection: This determines what constitutes <strong>coupling</strong>. It is the<br />

type <strong>of</strong> link between a client <strong>an</strong>d a server item which could be <strong>an</strong> attribute,<br />

method, or class.<br />

2. The locus <strong>of</strong> impact: This is import or export <strong>coupling</strong>. Import <strong>coupling</strong> <strong>an</strong>alyses<br />

attributes, methods, or classes in their role as clients <strong>of</strong> other attributes,<br />

methods, or classes. Export <strong>coupling</strong> <strong>an</strong>alyses the attributes, methods, <strong>an</strong>d<br />

classes in their role as servers to other attributes, methods or classes.<br />

3. The gr<strong>an</strong>ularity <strong>of</strong> the measure: This is the domain <strong>of</strong> the measure, that is,<br />

what components are to be measured <strong>an</strong>d how to count <strong>coupling</strong> connections.<br />

4. The Stability <strong>of</strong> server: Should both stable <strong>an</strong>d unstable classes be included<br />

Classes c<strong>an</strong> be, a) stable which are those that are not subject to ch<strong>an</strong>ge in the<br />

project at h<strong>an</strong>d, for example classes imported from libraries, or b) unstable<br />

which are those which are subject to development or modification in the project<br />

at h<strong>an</strong>d.<br />

5. Direct or indirect <strong>coupling</strong>: Should only direct connections be counted or<br />

should indirect connections also be taken into account<br />

6. Inherit<strong>an</strong>ce: Inherit<strong>an</strong>ce-based versus noninherit<strong>an</strong>ce-based <strong>coupling</strong>. Also<br />

how to account for polymorphism <strong>an</strong>d how to assign attributes <strong>an</strong>d methods<br />

to classes.<br />

2.3 Static Cohesion Metrics<br />

A large number <strong>of</strong> alternative measures are proposed for measuring <strong>cohesion</strong>. Bri<strong>an</strong>d<br />

et al. [12] carry out a broad survey on the current state <strong>of</strong> <strong>cohesion</strong> measurement


2.3. Static Cohesion Metrics 19<br />

in object-oriented systems <strong>an</strong>d find fifteen separate measurements <strong>of</strong> <strong>cohesion</strong>. A<br />

review <strong>of</strong> these measures is presented in the following subsections.<br />

2.3.1 Chidamber <strong>an</strong>d Kemerer<br />

The Lack <strong>of</strong> Cohesion in Methods (LCOM1) measure was first suggested by Chidamber<br />

<strong>an</strong>d Kemerer [25]. It is the most prevalently used <strong>cohesion</strong> measure today<br />

<strong>an</strong>d therefore is used as the basis for the definition <strong>of</strong> our <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> measures.<br />

It is defined as “the degree <strong>of</strong> similarity <strong>of</strong> methods” <strong>an</strong>d is theoretically<br />

based on the ontology <strong>of</strong> objects by Bunge [21]. Within this ontology, the similarity<br />

<strong>of</strong> things is defined as the set <strong>of</strong> properties that the things have in common.<br />

For a given class C with a number <strong>of</strong> methods, M 1 , M 2 , ..., M n , let {I i } be the<br />

set <strong>of</strong> inst<strong>an</strong>ce variables accessed by the method M i . As there are n methods there<br />

will be n such sets, one set per method. The LCOM metric is then determined by<br />

counting the number <strong>of</strong> disjoint sets formed by the intersection <strong>of</strong> the n sets.<br />

However, this was found to be quite ambiguous <strong>an</strong>d the pair later redefined their<br />

metric (LCOM2) [26]. For a class C 1 with n methods, M 1 , . . . , M n , let {I i } be the<br />

set <strong>of</strong> inst<strong>an</strong>ce variables referenced by method M i . There are n such sets I 1 , ...I n .<br />

We c<strong>an</strong> define two disjoint sets:<br />

P = {(I i , I j ) | I i ∩ I j = ∅}<br />

(2.2)<br />

Q = {(I i , I j ) | I i ∩ I j ≠ ∅}<br />

The lack <strong>of</strong> <strong>cohesion</strong> in methods is then defined from the cardinality <strong>of</strong> these sets<br />

by:<br />

LCOM = |P | − |Q|,<br />

(2.3)<br />

if |P | > |Q| or 0 otherwise<br />

LCOM is <strong>an</strong> inverse <strong>cohesion</strong> measure. An LCOM value <strong>of</strong> zero indicates a<br />

cohesive class. Cohesiveness <strong>of</strong> methods within a class is desirable as it promotes<br />

encapsulation. Any measure <strong>of</strong> disparateness <strong>of</strong> methods helps identify flaws in the<br />

design <strong>of</strong> classes. If the value is greater th<strong>an</strong> zero this indicates that the class c<strong>an</strong><br />

be split into two or more classes, since its variables belong in disjoint sets. Low


2.3. Static Cohesion Metrics 20<br />

<strong>cohesion</strong> is said to increase complexity, thereby increasing the likelihood <strong>of</strong> errors<br />

during the development process.<br />

2.3.2 Other Cohesion Metrics<br />

Bri<strong>an</strong>d et al. define a set <strong>of</strong> <strong>cohesion</strong> measures for object-based systems [16,17] which<br />

are adapted in [12] to object-oriented systems. For this adaption a class is viewed as<br />

a collection <strong>of</strong> data declarations <strong>an</strong>d methods. A data declaration is a local, public<br />

type declaration, the class itself or public attributes. There c<strong>an</strong> be data declaration<br />

interactions between classes, attributes, types <strong>of</strong> different classes <strong>an</strong>d methods.<br />

They define the following measures; Ratio <strong>of</strong> Cohesive Interactions (RCI), Neutral<br />

Ratio <strong>of</strong> Cohesive Interactions (NRCI), Pessimistic Ratio <strong>of</strong> Cohesive Interactions<br />

(P RCI) <strong>an</strong>d Optimistic Ratio <strong>of</strong> Cohesive Interactions (ORCI).<br />

Hitz <strong>an</strong>d Montazeri base their <strong>cohesion</strong> measurements LCOM3, LCOM4 <strong>an</strong>d<br />

C (Connectivity) on the work <strong>of</strong> Chidamber <strong>an</strong>d Kemerer [51].<br />

The <strong>cohesion</strong> measurements by Biem<strong>an</strong> <strong>an</strong>d K<strong>an</strong>g are also based on the work <strong>of</strong><br />

Chidamber <strong>an</strong>d Kemerer [9]. They define measurements known as Tight Class Cohesion<br />

(T CC) <strong>an</strong>d Loose Class Cohesion (LCC). These <strong>metrics</strong> also consider pairs<br />

<strong>of</strong> methods which use common attributes, however a distinction is made between<br />

methods which access attributes directly or indirectly. They also take inherit<strong>an</strong>ce<br />

into account, making suggestions on how to deal with inherited methods <strong>an</strong>d inherited<br />

attributes.<br />

Lee et al. propose a set <strong>of</strong> <strong>cohesion</strong> measures based on the information flow<br />

through method invocations within a class [62]. For a method m implemented in a<br />

given class c, the <strong>cohesion</strong> <strong>of</strong> m is the number <strong>of</strong> invocations to other methods implemented<br />

in class c, weighted by the number <strong>of</strong> parameters <strong>of</strong> the invoked methods.<br />

The greater the number <strong>of</strong> parameters <strong>an</strong> invoked method has, the more information<br />

is passed, the stronger the link between the invoking <strong>an</strong>d invoked method. The<br />

<strong>cohesion</strong> <strong>of</strong> a class is the sum <strong>of</strong> the <strong>cohesion</strong> <strong>of</strong> its methods. The <strong>cohesion</strong> <strong>of</strong> a set<br />

<strong>of</strong> classes is given by the sum <strong>of</strong> the <strong>cohesion</strong> <strong>of</strong> the classes in the set.<br />

Henderson-Sellers propose a <strong>cohesion</strong> measure (LCOM5) [49]. They state that<br />

a value <strong>of</strong> zero is obtained if each method <strong>of</strong> the class references every attribute


2.4. Frameworks for Static Cohesion Measurement 21<br />

<strong>of</strong> the class <strong>an</strong>d they called this “perfect <strong>cohesion</strong>”. They also state that if each<br />

method <strong>of</strong> the class references only a single attribute, the measure yields one <strong>an</strong>d<br />

that values between zero <strong>an</strong>d one are to be interpreted as percentages <strong>of</strong> the perfect<br />

value. They do not state how to deal with inherited methods <strong>an</strong>d attributes.<br />

2.4 Frameworks for Static Cohesion Measurement<br />

Two frameworks are defined in <strong>an</strong> attempt to outline what constitutes <strong>cohesion</strong>.<br />

Eder et al. define a framework which aims at providing qualitative criteria for<br />

<strong>cohesion</strong> <strong>an</strong>d also assigns relative strengths to the different levels <strong>of</strong> <strong>cohesion</strong> they<br />

identify within this framework.<br />

A comprehensive framework based on a st<strong>an</strong>dard terminology <strong>an</strong>d formalism is<br />

outlined by Bri<strong>an</strong>d et al. which c<strong>an</strong> be used (i) to facilitate comparison <strong>of</strong> existing<br />

<strong>cohesion</strong> measures, (ii) to facilitate the evaluation <strong>an</strong>d <strong>empirical</strong> validation <strong>of</strong> existing<br />

<strong>cohesion</strong> measures, <strong>an</strong>d (iii) to support the definition <strong>of</strong> new <strong>cohesion</strong> measures<br />

<strong>an</strong>d the selection <strong>of</strong> existing ones based on a particular goal <strong>of</strong> measurement.<br />

2.4.1 Eder et al.<br />

Eder et al. propose a framework aimed at providing comprehensive, qualitative<br />

criteria for <strong>cohesion</strong> in object-oriented systems [34]. They modify existing frameworks<br />

for <strong>cohesion</strong> in the procedural <strong>an</strong>d object-based paradigm to the specifics <strong>of</strong><br />

the object-oriented paradigm. They distinguish between three types <strong>of</strong> <strong>cohesion</strong> in<br />

<strong>an</strong> object-oriented system: method, class <strong>an</strong>d inherit<strong>an</strong>ce <strong>cohesion</strong> <strong>an</strong>d state that<br />

various degrees <strong>of</strong> <strong>cohesion</strong> exist for each type.<br />

Myers [83] classical definition <strong>of</strong> <strong>cohesion</strong> was applied to methods for their definition<br />

<strong>of</strong> method <strong>cohesion</strong>. Elements <strong>of</strong> a method are statements, local variables<br />

<strong>an</strong>d attributes <strong>of</strong> the method’s class. They defined seven degrees <strong>of</strong> <strong>cohesion</strong>, based<br />

on the definition by Myers. From weakest to strongest, the degrees <strong>of</strong> method <strong>cohesion</strong><br />

are coincidental, logical, temporal, communicational, sequential, procedural<br />

<strong>an</strong>d functional.<br />

Class <strong>cohesion</strong> addresses the relationships between the elements <strong>of</strong> a class. The


2.4. Frameworks for Static Cohesion Measurement 22<br />

elements <strong>of</strong> a class are its non-inherited methods <strong>an</strong>d non-inherited attributes. Eder<br />

et al. use a categorisation <strong>of</strong> <strong>cohesion</strong> for abstract data types by Embley <strong>an</strong>d Woodfield<br />

[35] <strong>an</strong>d adapt it to object-oriented systems. They define five degrees <strong>of</strong> class <strong>cohesion</strong><br />

which are, from weakest to strongest, separable, multifaceted, non-delegated,<br />

concealed <strong>an</strong>d model.<br />

Inherit<strong>an</strong>ce <strong>cohesion</strong> is similar to class <strong>cohesion</strong> in that it addresses the relationships<br />

between elements <strong>of</strong> a class. However, inherit<strong>an</strong>ce <strong>cohesion</strong> takes all the<br />

methods <strong>an</strong>d attributes <strong>of</strong> a class into account, that is, both the inherited <strong>an</strong>d noninherited.<br />

Inherit<strong>an</strong>ce <strong>cohesion</strong> is strong if inherit<strong>an</strong>ce has been used for the purpose<br />

<strong>of</strong> defining specialized children classes. Inherit<strong>an</strong>ce <strong>cohesion</strong> is weak if it has been<br />

used for the purpose <strong>of</strong> reusing code. The degrees <strong>of</strong> inherit<strong>an</strong>ce <strong>cohesion</strong> are the<br />

same as those for class <strong>cohesion</strong>.<br />

2.4.2 Bri<strong>an</strong>d et al.<br />

Bri<strong>an</strong>d et al. outline a new framework for <strong>cohesion</strong> in object-oriented systems [12]<br />

based on the issues identified by comparing the various approaches to measuring<br />

<strong>cohesion</strong> <strong>an</strong>d the discussion <strong>of</strong> existing measures outlined in Section 2.3. The framework<br />

consists <strong>of</strong> five criteria, each criterion determining one basic aspect <strong>of</strong> the<br />

resulting measure.<br />

The five criteria <strong>of</strong> the framework are:<br />

1. The type <strong>of</strong> connection, that is, what makes a class cohesive. A connection<br />

within a class is a link between elements <strong>of</strong> the class which c<strong>an</strong> be attributes,<br />

methods, or data declarations.<br />

2. Domain <strong>of</strong> the measure, this specifies the objects to be measured which c<strong>an</strong><br />

be methods, classes etc.<br />

3. They ask whether direct or also indirect connections should be counted.<br />

4. How to deal with inherit<strong>an</strong>ce, that is, how to assign attributes <strong>an</strong>d methods<br />

to classes <strong>an</strong>d how to account for polymorphism.<br />

5. How to account for access methods <strong>an</strong>d constructors.


2.5. Run-<strong>time</strong>/Dynamic Coupling Metrics 23<br />

2.5 Run-<strong>time</strong>/Dynamic Coupling Metrics<br />

While there has been considerable work on static <strong>metrics</strong> there has been little research<br />

to date on <strong>run</strong>-<strong>time</strong>/dynamic <strong>coupling</strong> <strong>metrics</strong>. This section presents the two<br />

most relev<strong>an</strong>t works.<br />

2.5.1 Yacoub et al.<br />

Yacoub et al. propose a set <strong>of</strong> dynamic <strong>coupling</strong> <strong>metrics</strong> designed to evaluate the<br />

ch<strong>an</strong>ge-proneness <strong>of</strong> a design [99]. These <strong>metrics</strong> are applied at the early development<br />

phase to determine design quality. The measures are calculated from<br />

executable object-oriented design models, which are used to model the application<br />

to be tested. They are based on execution scenarios, that is “the measurements are<br />

calculated for parts <strong>of</strong> the design model that are activated during the execution <strong>of</strong> a<br />

specific scenario triggered by <strong>an</strong> input stimulus.” A scenario is the context in which<br />

the metric is applicable. The scenarios are then extended to have <strong>an</strong> application<br />

scope.<br />

They define two <strong>metrics</strong> designed to measure the quality <strong>of</strong> designs at <strong>an</strong> early<br />

development phase. Export Object Coupling (EOC x (o i , o j )) for <strong>an</strong> object o i with<br />

respect to <strong>an</strong> object o j , is defined as the percentage <strong>of</strong> the number <strong>of</strong> messages sent<br />

from o i to o j with respect to the total number <strong>of</strong> messages exch<strong>an</strong>ged during the<br />

execution <strong>of</strong> a scenario x. Import Object Coupling (IOC x (o i , o j )) for <strong>an</strong> object o i<br />

with respect to <strong>an</strong> object o j , is the percentage <strong>of</strong> the number <strong>of</strong> messages received by<br />

object o i that were sent by object o j with respect to the total number <strong>of</strong> messages<br />

exch<strong>an</strong>ged during the execution <strong>of</strong> a scenario x.<br />

2.5.2 Arisholm et al.<br />

Arisholm et al. define <strong>an</strong>d validate a number <strong>of</strong> dynamic <strong>coupling</strong> <strong>metrics</strong> that<br />

are listed in Table 2.1 [5]. Each dynamic <strong>coupling</strong> metric name starts with either<br />

I or E to distinguish between import <strong>coupling</strong> <strong>an</strong>d export <strong>coupling</strong>, based on the<br />

direction <strong>of</strong> the method calls. The third letter C or O distinguishes whether entity<br />

<strong>of</strong> measurement is the object or the class. The remaining letter distinguish three


2.5. Run-<strong>time</strong>/Dynamic Coupling Metrics 24<br />

Variable<br />

IC CC<br />

IC CM<br />

IC CD<br />

EC CC<br />

EC CM<br />

EC CD<br />

IC OC<br />

IC OM<br />

IC OD<br />

EC OC<br />

EC OM<br />

EC OD<br />

Description<br />

Import, Class Level, Number <strong>of</strong> Distinct Classes<br />

Import, Class Level, Number <strong>of</strong> Distinct Methods<br />

Import, Class Level, Number <strong>of</strong> Dynamic Messages<br />

Export, Class Level, Number <strong>of</strong> Distinct Classes<br />

Export, Class Level, Number <strong>of</strong> Distinct Methods<br />

Export, Class Level, Number <strong>of</strong> Dynamic Messages<br />

Import, Object Level, Number <strong>of</strong> Distinct Classes<br />

Import, Object Level, Number <strong>of</strong> Distinct Methods<br />

Import, Object Level, Number <strong>of</strong> Dynamic Messages<br />

Export, Object Level, Number <strong>of</strong> Distinct Classes<br />

Export, Object Level, Number <strong>of</strong> Distinct Methods<br />

Export, Object Level, Number <strong>of</strong> Dynamic Messages<br />

Table 2.1: Abbreviations for the dynamic <strong>coupling</strong> <strong>metrics</strong> <strong>of</strong> Arisholm et al.<br />

types <strong>of</strong> <strong>coupling</strong>. The first metric, C, counts the number <strong>of</strong> distinct classes that<br />

a method in a given class/object uses or is used by. The second metric, M, counts<br />

the number <strong>of</strong> distinct methods invoked by each method in each class/object while<br />

the third metric, D, counts the total number <strong>of</strong> dynamic messages sent or received<br />

from one class/object to or from other classes/objects.<br />

Arisholm et al. <strong>study</strong> the relationship <strong>of</strong> these measures with the ch<strong>an</strong>geproneness<br />

<strong>of</strong> classes. They find that the dynamic <strong>coupling</strong> <strong>metrics</strong> capture additional<br />

properties compared to the static <strong>coupling</strong> <strong>metrics</strong> <strong>an</strong>d are good predictors<br />

<strong>of</strong> the ch<strong>an</strong>ge-proneness <strong>of</strong> a class. Their <strong>study</strong> uses a single s<strong>of</strong>tware system called<br />

Velocity executed with its associated test suite, to evaluate the dynamic <strong>coupling</strong><br />

<strong>metrics</strong>. These test cases are found to originally have 70% method coverage, which<br />

is increased to 90% for the methods that “might contribute to <strong>coupling</strong>” through<br />

the removal <strong>of</strong> dead code. However, they did not <strong>study</strong> the impact <strong>of</strong> code coverage<br />

on their results nor were results given for programs other th<strong>an</strong> versions <strong>of</strong> Velocity.


2.6. Run-<strong>time</strong>/Dynamic Cohesion Metrics 25<br />

2.6 Run-<strong>time</strong>/Dynamic Cohesion Metrics<br />

As is the case with the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong>, there has not been much research<br />

into <strong>run</strong>-<strong>time</strong> measures for <strong>cohesion</strong>. This section presents the only available <strong>study</strong><br />

to date.<br />

2.6.1 Gupta <strong>an</strong>d Rao<br />

Gupta <strong>an</strong>d Rao conduct a <strong>study</strong> which measures module <strong>cohesion</strong> in legacy s<strong>of</strong>tware<br />

[46]. Gupta <strong>an</strong>d Rao compare statically calculated <strong>metrics</strong> against a program<br />

execution based approach <strong>of</strong> measuring the levels <strong>of</strong> module <strong>cohesion</strong>. The results<br />

from this <strong>study</strong> show that the static approach signific<strong>an</strong>tly overestimates the levels<br />

<strong>of</strong> <strong>cohesion</strong> present in the s<strong>of</strong>tware tested. However, Gupta <strong>an</strong>d Rao are considering<br />

programs written in C, where m<strong>an</strong>y features <strong>of</strong> object-oriented programs are not<br />

directly applicable.<br />

2.7 Other Studies <strong>of</strong> Dynamic Behaviour<br />

In this section we present a review <strong>of</strong> other work on studies into the dynamic behaviour<br />

<strong>of</strong> Java programs. While such research is not directly related to <strong>coupling</strong><br />

<strong>an</strong>d <strong>cohesion</strong> <strong>metrics</strong>, m<strong>an</strong>y <strong>of</strong> the issues <strong>an</strong>d approaches to measurement are similar.<br />

Indeed, <strong>an</strong>y research that performs both static <strong>an</strong>d dynamic <strong>an</strong>alyses <strong>of</strong> programs<br />

benefits from being viewed in the context <strong>of</strong> some overall perspective <strong>of</strong> the relationship<br />

between the static <strong>an</strong>d dynamic data.<br />

2.7.1 Dynamic Behaviour Studies<br />

A number <strong>of</strong> studies <strong>of</strong> the dynamic behaviour <strong>of</strong> Java programs have been carried<br />

out, mostly for optimisation purposes. Issues such as bytecode usage [45] <strong>an</strong>d memory<br />

utilisation [28] have been studied, along with a comprehensive set <strong>of</strong> dynamic<br />

measures relating to polymorphism, object creation <strong>an</strong>d hot-spots [33]. However,<br />

none <strong>of</strong> this work directly addresses the calculation <strong>of</strong> st<strong>an</strong>dard s<strong>of</strong>tware <strong>metrics</strong> at<br />

<strong>run</strong>-<strong>time</strong>.


2.8. Coverage Metrics <strong>an</strong>d S<strong>of</strong>tware Testing 26<br />

The Sable group [33] seek to qu<strong>an</strong>tify the behaviour <strong>of</strong> programs with a concise<br />

<strong>an</strong>d precisely defined set <strong>of</strong> <strong>metrics</strong>. They define a set <strong>of</strong> unambiguous, dynamic, robust<br />

<strong>an</strong>d architecture-independent measures that c<strong>an</strong> be used to categorise programs<br />

according to their dynamic behaviour in five areas which are size, data structure,<br />

memory use, concurrency, <strong>an</strong>d polymorphism. M<strong>an</strong>y <strong>of</strong> the measurements they<br />

record are <strong>of</strong> interest to the Java perform<strong>an</strong>ce community as underst<strong>an</strong>ding the dynamic<br />

behaviour <strong>of</strong> programs is one import<strong>an</strong>t aspect in developing effective new<br />

strategies for optimising compilers <strong>an</strong>d <strong>run</strong><strong>time</strong> systems. It is import<strong>an</strong>t to note<br />

that these are not typical s<strong>of</strong>tware engineering <strong>metrics</strong>.<br />

2.8 Coverage Metrics <strong>an</strong>d S<strong>of</strong>tware Testing<br />

Dynamic coverage measures are typically used in the field <strong>of</strong> s<strong>of</strong>tware testing as<br />

<strong>an</strong> estimate <strong>of</strong> the effectiveness <strong>of</strong> a test suite [10, 72]. Measurements <strong>of</strong> structural<br />

coverage <strong>of</strong> code is a me<strong>an</strong>s <strong>of</strong> assessing the thoroughness <strong>of</strong> testing. The basis<br />

<strong>of</strong> s<strong>of</strong>tware testing is that s<strong>of</strong>tware functionality is characterised by its execution<br />

behaviour. In general, improved test coverage leads to improved fault coverage<br />

<strong>an</strong>d improved s<strong>of</strong>tware reliability [69]. There are a number <strong>of</strong> <strong>metrics</strong> available for<br />

measuring coverage, with increasing support from s<strong>of</strong>tware tools. Such <strong>metrics</strong> do<br />

not constitute testing techniques, but c<strong>an</strong> be used as a measure <strong>of</strong> the effectiveness<br />

<strong>of</strong> testing techniques. There are m<strong>an</strong>y different strategies for testing s<strong>of</strong>tware, <strong>an</strong>d<br />

there is no consensus among s<strong>of</strong>tware engineers about which approach is preferable<br />

in a given situation. Test strategies fall into two categories [40]:<br />

• Black-box (closed-box) testing: The test cases are derived from the specification<br />

or requirements without reference to the code itself or its structure.<br />

• White-box (open-box) testing: The test cases are selected based on knowledge<br />

<strong>of</strong> the internal program structure.<br />

A number <strong>of</strong> coverage <strong>metrics</strong> are based on the traversal <strong>of</strong> paths through the<br />

control dataflow graph (CDFG) representing the system behaviour. Applying these<br />

<strong>metrics</strong> to the CDFG representing a single process is a well understood task. The


2.8. Coverage Metrics <strong>an</strong>d S<strong>of</strong>tware Testing 27<br />

following coverage <strong>metrics</strong> are examples <strong>of</strong> white-box testing techniques <strong>an</strong>d are<br />

based on the CDFG.<br />

2.8.1 Instruction Coverage<br />

Instruction coverage is the simplest structural coverage metric. It is achieved if<br />

every source l<strong>an</strong>guage statement in the program is executed at least once. With<br />

this technique test cases are selected so that every program statement is executed<br />

at least once. It is also known as statement coverage, segment coverage [84], C1 [7]<br />

<strong>an</strong>d basic block coverage.<br />

The main adv<strong>an</strong>tage <strong>of</strong> this measure is that it c<strong>an</strong> be applied directly to object<br />

code <strong>an</strong>d does not require processing source code. Perform<strong>an</strong>ce pr<strong>of</strong>ilers commonly<br />

implement this measure. The main disadv<strong>an</strong>tage <strong>of</strong> statement coverage is that it is<br />

insensitive to some control structures. In summary, this measure is affected more<br />

by computational statements th<strong>an</strong> by decisions. Due to its ubiquity this was chosen<br />

as the coverage measure that is used in the case studies in this thesis. There are<br />

however, a number <strong>of</strong> other methods for evaluating the coverage <strong>of</strong> a program, for<br />

example br<strong>an</strong>ch coverage, condition coverage, condition/decision coverage, modified<br />

condition/decision coverage <strong>an</strong>d path coverage.<br />

2.8.2 Alex<strong>an</strong>der <strong>an</strong>d Offutt<br />

In their paper [3], Alex<strong>an</strong>der <strong>an</strong>d Offutt describe a <strong>coupling</strong>-based testing approach<br />

for <strong>an</strong>alysing <strong>an</strong>d testing the polymorphic relationships that occur in object-oriented<br />

s<strong>of</strong>tware. The traditional notion <strong>of</strong> s<strong>of</strong>tware <strong>coupling</strong> has been updated to apply<br />

to object-oriented s<strong>of</strong>tware, h<strong>an</strong>dling the relationships <strong>of</strong> aggregation, inherit<strong>an</strong>ce<br />

<strong>an</strong>d polymorphism. This allows the introduction <strong>of</strong> a new integration <strong>an</strong>alysis <strong>an</strong>d<br />

testing technique for data flow interactions within object-oriented s<strong>of</strong>tware. The<br />

foundation <strong>of</strong> this technique is the <strong>coupling</strong> sequence, which is a new abstraction<br />

for representing state space interactions between pairs <strong>of</strong> method invocations. The<br />

<strong>coupling</strong> sequence provides the <strong>an</strong>alytical focal point for methods under test <strong>an</strong>d is<br />

the foundation for identifying <strong>an</strong>d representing polymorphic relationships for both


2.9. Previous Work by the Author 28<br />

static <strong>an</strong>d dynamic <strong>an</strong>alysis. With this abstraction both testers <strong>an</strong>d developers <strong>of</strong><br />

object-oriented programs c<strong>an</strong> <strong>an</strong>alyse <strong>an</strong>d better underst<strong>an</strong>d the interactions within<br />

their s<strong>of</strong>tware. The application <strong>of</strong> these techniques c<strong>an</strong> result in <strong>an</strong> increased ability<br />

to find faults <strong>an</strong>d overall higher quality s<strong>of</strong>tware.<br />

2.9 Previous Work by the Author<br />

A preliminary <strong>study</strong> was previously conducted on the issues involved in performing<br />

a <strong>run</strong>-<strong>time</strong> <strong>an</strong>alysis <strong>of</strong> Java programs [74]. This <strong>study</strong> outlined the general<br />

principles involved in performing such <strong>an</strong> <strong>an</strong>alysis. However, the results did not<br />

<strong>of</strong>fer a justifiable basis for generalisation as the programs <strong>an</strong>alysed were a set <strong>of</strong><br />

Java microbenchmarks from the Java Gr<strong>an</strong>de Forum Benchmark Suite (JGFBS)<br />

<strong>an</strong>d therefore not representative <strong>of</strong> real applications. The <strong>metrics</strong> used were also<br />

<strong>of</strong> a more primitive nature th<strong>an</strong> the ones used in this <strong>study</strong>. Also, there was no<br />

investigation made into the perspective <strong>of</strong> the measures, that is, the influence <strong>of</strong><br />

coverage, or the ability to predict external design quality. It did however provide<br />

<strong>an</strong> indication that the evaluation <strong>of</strong> s<strong>of</strong>tware <strong>metrics</strong> at <strong>run</strong>-<strong>time</strong> c<strong>an</strong> provide <strong>an</strong><br />

interesting qu<strong>an</strong>titative <strong>an</strong>alysis <strong>of</strong> a program <strong>an</strong>d that further research in this area<br />

is needed.<br />

The following papers have also been published:<br />

• In [77, 78] studies on the qu<strong>an</strong>tification <strong>of</strong> a variety <strong>of</strong> <strong>run</strong>-<strong>time</strong> class-level<br />

<strong>coupling</strong> <strong>metrics</strong> for object-oriented programs are described.<br />

• In [77,79] <strong>an</strong> <strong>empirical</strong> investigation into <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> for <strong>cohesion</strong> is presented.<br />

• A <strong>study</strong> into a coverage <strong>an</strong>alysis <strong>of</strong> Java benchmark suites is described in [20].<br />

• An investigation into how object-level <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> c<strong>an</strong> be used to <strong>study</strong><br />

<strong>coupling</strong> between objects is presented in [81].<br />

• A <strong>study</strong> <strong>of</strong> the influence <strong>of</strong> coverage on the relationship between static <strong>an</strong>d<br />

dynamic <strong>coupling</strong> <strong>metrics</strong> is described in [80].


2.10. Definition <strong>of</strong> Run-<strong>time</strong> Metrics 29<br />

2.10 Definition <strong>of</strong> Run-<strong>time</strong> Metrics<br />

This section outlines the <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> used in the remainder <strong>of</strong> this thesis.<br />

Originally, it was decided to develop a number <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> for <strong>coupling</strong><br />

<strong>an</strong>d <strong>cohesion</strong> that parallel the st<strong>an</strong>dard static object-oriented measures defined by<br />

Chidamber <strong>an</strong>d Kemerer [26]. Later Arishlom et al. defined a set <strong>of</strong> dynamic<br />

<strong>coupling</strong> <strong>metrics</strong> in their paper [5] which closely parallel ours, so for the ease <strong>of</strong><br />

comparison it was decided to use their terminology <strong>an</strong>d definitions for the <strong>coupling</strong><br />

measures.<br />

The <strong>cohesion</strong> measures are all novel <strong>an</strong>d are based on our own definitions.<br />

2.10.1 Coupling Metrics<br />

Three decision criteria are used to define <strong>an</strong>d classify the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> measures.<br />

Firstly, a distinction is made as to whether the entity <strong>of</strong> measurement is<br />

the object or the class. Run-<strong>time</strong> object-level <strong>coupling</strong> qu<strong>an</strong>tifies the level <strong>of</strong> dependencies<br />

between objects in a system. Run-<strong>time</strong> class-level <strong>coupling</strong> qu<strong>an</strong>tifies the<br />

level <strong>of</strong> dependencies between the classes that implement the methods or variables <strong>of</strong><br />

the caller object <strong>an</strong>d the receiver object. The class <strong>of</strong> the object sending or receiving<br />

a message may be different from the class implementing the corresponding method<br />

due to the impact <strong>of</strong> inherit<strong>an</strong>ce.<br />

Second, the direction <strong>of</strong> <strong>coupling</strong> for a class or object is taken into account,<br />

as is outlined in previous static <strong>coupling</strong> frameworks [13]. This allows for the fact<br />

that in a <strong>coupling</strong> relationship a class may act as a client or a server, that is, it may<br />

access methods or inst<strong>an</strong>ce variables from <strong>an</strong>other class (import <strong>coupling</strong>) or it may<br />

have its own methods or inst<strong>an</strong>ce variables used (export <strong>coupling</strong>).<br />

Finally the strength <strong>of</strong> the <strong>coupling</strong> relationship is assessed, that is the amount<br />

<strong>of</strong> association between the classes. To do this it is possible to count either:<br />

1. The number <strong>of</strong> distinct classes that a method in a given class uses or is used<br />

by.<br />

2. The number <strong>of</strong> distinct methods invoked by each method in each class.


2.10. Definition <strong>of</strong> Run-<strong>time</strong> Metrics 30<br />

3. The total number <strong>of</strong> dynamic messages sent or received from one class to or<br />

from other classes.<br />

Class-Level Metrics<br />

The following are <strong>metrics</strong> for evaluating class-level <strong>coupling</strong>:<br />

• IC CC: This determines the number <strong>of</strong> distinct classes accessed by a class at<br />

<strong>run</strong>-<strong>time</strong>.<br />

• IC CM: This determines the number <strong>of</strong> distinct methods accessed by a class<br />

at <strong>run</strong>-<strong>time</strong>.<br />

• IC CD: This determines the number <strong>of</strong> dynamic messages accessed by a class<br />

at <strong>run</strong>-<strong>time</strong>.<br />

• EC CC: This determines the number <strong>of</strong> distinct classes that are accessed by<br />

other classes at <strong>run</strong>-<strong>time</strong>.<br />

• EC CM: This determines the number <strong>of</strong> distinct methods that are accessed by<br />

other classes at <strong>run</strong>-<strong>time</strong>.<br />

• EC CD: This determines the number <strong>of</strong> dynamic messages that are accessed<br />

by other classes at <strong>run</strong>-<strong>time</strong>.<br />

Object-Level Metric<br />

To evaluate object-level <strong>coupling</strong> it was deemed necessary to define just one metric.<br />

Since we w<strong>an</strong>t to examine the behaviour <strong>of</strong> objects at <strong>run</strong>-<strong>time</strong> we require a measure<br />

that is based on a class rather th<strong>an</strong> a method-level. Further, it was deemed necessary<br />

to evaluate only <strong>coupling</strong> at the import level, as we are interested in examining how<br />

classes use other classes at the object-level rather th<strong>an</strong> how they are used by other<br />

classes, therefore export <strong>coupling</strong> for this measure was not evaluated.<br />

The following is a measure for evaluating object-level <strong>coupling</strong>:<br />

• IC OC: Import, Object-Level, Number <strong>of</strong> Distinct Classes: This measure will<br />

be some function <strong>of</strong> the static CBO measure, as this measure determines the


2.10. Definition <strong>of</strong> Run-<strong>time</strong> Metrics 31<br />

classes that c<strong>an</strong> be theoretically accessed at <strong>run</strong>-<strong>time</strong>. This is a coarse-grained<br />

measure which will assess class-class <strong>coupling</strong> at the object-level.<br />

2.10.2 Cohesion Metrics<br />

The following <strong>run</strong>-<strong>time</strong> measures are based on the Chidamber <strong>an</strong>d Kemerer static<br />

LCOM measure for <strong>cohesion</strong> as described in Section 2.3.1. However, a problem with<br />

the original definition for LCOM is its lack <strong>of</strong> discriminating power. Much <strong>of</strong> this<br />

arises from the criteria which states if |P | < |Q|, LCOM is automatically set to zero.<br />

The result <strong>of</strong> this is a large number <strong>of</strong> classes with <strong>an</strong> LCOM <strong>of</strong> zero so the metric<br />

has little discriminating power between these classes. In <strong>an</strong> attempt to correct this,<br />

for the purpose <strong>of</strong> this <strong>an</strong>alysis, we modify the original definition to be:<br />

S LCOM = |P |<br />

|P |+|Q|<br />

(2.4)<br />

S LCOM c<strong>an</strong> r<strong>an</strong>ge in value from zero to one. This new definition allows for comparison<br />

across classes therefore we use this new version as a basis for the definition<br />

<strong>of</strong> the <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>. As these are <strong>cohesion</strong> measures they are evaluated at the<br />

class-level only.<br />

Run-<strong>time</strong> Simple LCOM (R LCOM )<br />

R LCOM<br />

is a direct extension <strong>of</strong> the static case, except that now we only count<br />

inst<strong>an</strong>ce variables that are actually accessed at <strong>run</strong>-<strong>time</strong>. Thus, for a set <strong>of</strong> methods<br />

m 1 , . . . , m n , as before, let {Ii R } represent the set <strong>of</strong> inst<strong>an</strong>ce variables referenced by<br />

method m i at <strong>run</strong>-<strong>time</strong>. Two disjoint sets are defined from this:<br />

We c<strong>an</strong> then define R LCOM as:<br />

P R = {(I R i , I R j ) | I R i ∩ I R j = ∅}<br />

Q R = {(I R i , I R j ) | I R i ∩ I R j ≠ ∅}<br />

(2.5)<br />

R LCOM = |P R |<br />

|P R |+|Q R |<br />

(2.6)


2.10. Definition <strong>of</strong> Run-<strong>time</strong> Metrics 32<br />

We note that for <strong>an</strong>y method m i , (I i − Ii R ) ≥ 0, <strong>an</strong>d represents the number<br />

<strong>of</strong> inst<strong>an</strong>ce variables mentioned in a method’s code, but not actually accessed at<br />

<strong>run</strong>-<strong>time</strong>.<br />

Run-<strong>time</strong> Call-Weighted LCOM (RW LCOM )<br />

It is reasonable to suggest that a heavily accessed variable should make a greater contribution<br />

to class <strong>cohesion</strong> th<strong>an</strong> one which is rarely accessed. However, the R LCOM<br />

metric does not distinguish between the degree <strong>of</strong> access to inst<strong>an</strong>ce variables. Thus<br />

a second <strong>run</strong>-<strong>time</strong> measure RW LCOM is defined by weighting each inst<strong>an</strong>ce variable<br />

by the number <strong>of</strong> <strong>time</strong>s it is accessed at <strong>run</strong>-<strong>time</strong>. This metric assesses the strength<br />

<strong>of</strong> <strong>cohesion</strong> by taking the number <strong>of</strong> accesses into account.<br />

As before, consider a class with n methods, m 1 , . . . , m n , <strong>an</strong>d let {I i } be the set<br />

<strong>of</strong> inst<strong>an</strong>ce variables referenced by method m i . Define N i as the number <strong>of</strong> <strong>time</strong>s<br />

method m i dynamically accesses inst<strong>an</strong>ce variables from the set {I i }.<br />

Now define a call-weighted version <strong>of</strong> equation 2.2 by summing over the number<br />

<strong>of</strong> accesses:<br />

P W =<br />

Q W =<br />

∑<br />

1≤i,j≤n<br />

∑<br />

1≤i,j≤n<br />

{(N i + N j ) | I i ∩ I j = ∅}<br />

{(N i + N j ) | I i ∩ I j ≠ ∅}<br />

(2.7)<br />

where<br />

Following equation 2.6 we define:<br />

P W = 0, if {I 1 }, ..., {I n } = ∅<br />

RW LCOM = |P W |<br />

|P W |+|Q W |<br />

(2.8)<br />

RW LCOM c<strong>an</strong> r<strong>an</strong>ge in value from zero to one. There is no direct relationship<br />

with S LCOM or R LCOM , as it is based on the “hotness” <strong>of</strong> a particular program.


2.11. Conclusion 33<br />

2.11 Conclusion<br />

This chapter outlined the most prevalent <strong>metrics</strong> for <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> <strong>an</strong>d discussed<br />

other work on studies into the dynamic behaviour <strong>of</strong> Java programs. Measures<br />

for dynamic coverage that are commonly used in the field <strong>of</strong> s<strong>of</strong>tware testing<br />

were described. Work <strong>an</strong>d publications by the author were outlined. Finally, a<br />

description <strong>of</strong> the <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> used in this thesis were provided.


Chapter 3<br />

Experimental Design<br />

This chapter presents <strong>an</strong> overview <strong>of</strong> the tools <strong>an</strong>d techniques used to carry out the<br />

<strong>run</strong>-<strong>time</strong> <strong>empirical</strong> evaluation <strong>of</strong> a set <strong>of</strong> Java programs together with a detailed<br />

description <strong>of</strong> the set <strong>of</strong> programs <strong>an</strong>alysed. A review <strong>of</strong> the statistical techniques<br />

used to interpret the data is also given.<br />

3.1 Methods for Collecting Run-<strong>time</strong> Information<br />

There are a number <strong>of</strong> alternative techniques available for extracting <strong>run</strong>-<strong>time</strong> information<br />

from Java programs, each with their own adv<strong>an</strong>tages <strong>an</strong>d disadv<strong>an</strong>tages.<br />

3.1.1 Instrumenting a Virtual Machine<br />

There are several open-source implementations <strong>of</strong> the JVM available, for example<br />

Kaffe [58], Jikes [57] or the Sable VM [59]. As their source code is freely available<br />

this me<strong>an</strong>s that all aspects <strong>of</strong> a <strong>run</strong>ning Java program c<strong>an</strong> be observed. However,<br />

due to the logging <strong>of</strong> bytecode instructions, instrumenting a JVM c<strong>an</strong> result in a<br />

huge amount <strong>of</strong> data being generated for the simplest <strong>of</strong> programs. The source code<br />

org<strong>an</strong>isation must be understood <strong>an</strong>d the instrumentation has to be redone for each<br />

new version <strong>of</strong> the VM. There c<strong>an</strong> also be compatibility issues when compared with<br />

the Java class libraries released by Sun. It has also been found that these VMs are<br />

not very robust. This was the method used for a preliminary <strong>study</strong> [74], however it<br />

was later discarded due to its m<strong>an</strong>y disadv<strong>an</strong>tages.<br />

34


3.2. Metrics Data Collection Tools (Design Objectives) 35<br />

3.1.2 Sun’s Java Platform Debug Architecture (JPDA)<br />

Version 1.4 <strong>an</strong>d later <strong>of</strong> the Java SDK supports a debugging architecture, the JPDA<br />

[96], that provides event notification for low level JVM operations. A trace program<br />

that h<strong>an</strong>dles these events c<strong>an</strong> thus record information about the execution <strong>of</strong> a Java<br />

program. This method is faster th<strong>an</strong> instrumenting a VM <strong>an</strong>d is more robust. The<br />

same agent works with all VM’s supporting the JPDA <strong>an</strong>d this is currently supported<br />

by both Sun <strong>an</strong>d IBM (although there are some differences). This technique has<br />

proved useful in class-level <strong>metrics</strong> <strong>an</strong>alysis. However, it is still very <strong>time</strong> consuming<br />

to generate a pr<strong>of</strong>ile for a large application <strong>an</strong>d it is difficult to conduct <strong>an</strong> objectlevel<br />

<strong>an</strong>alysis using this approach.<br />

3.1.3 Bytecode Instrumentation<br />

This involves statically m<strong>an</strong>ipulating the bytecode to insert probes, or other tracking<br />

mech<strong>an</strong>isms, that record information at <strong>run</strong><strong>time</strong>. This provides the simplest<br />

approach to dynamic <strong>an</strong>alysis since it does not require implementation specific<br />

knowledge <strong>of</strong> JVM internals, <strong>an</strong>d imposes little overhead on the <strong>run</strong>ning program.<br />

Bytecode instrumentation c<strong>an</strong> be performed using the publicly available Apache<br />

Bytecode Engineering Library (BCEL) [30]. This technique provides object-level<br />

accuracy <strong>an</strong>d therefore was used in the object-level <strong>metrics</strong> <strong>an</strong>alysis.<br />

3.2 Metrics Data Collection Tools (Design Objectives)<br />

The dynamic <strong>an</strong>alysis <strong>of</strong> <strong>an</strong>y program involves a huge amount <strong>of</strong> data processing.<br />

However, the level <strong>of</strong> perform<strong>an</strong>ce <strong>of</strong> the collection mech<strong>an</strong>ism was not considered<br />

to be a critical issue. It was only desirable that the <strong>an</strong>alysis could be carried out in<br />

reasonable <strong>an</strong>d practical <strong>time</strong>. The flexibility <strong>of</strong> the collection mech<strong>an</strong>ism was a key<br />

issue, as it was necessary to be able to collect a wide variety <strong>of</strong> dynamic information.


3.2. Metrics Data Collection Tools (Design Objectives) 36<br />

3.2.1 Class-Level Metrics Collection Tool (ClMet)<br />

We have developed a tool for the collection <strong>of</strong> class-level <strong>metrics</strong> called ClMet, as<br />

illustrated by Figure 3.1, which utilises the JPDA. This is a multi-tiered debugging<br />

architecture contained within Sun Microsystem’s Java 2 SDK version 1.4. It consists<br />

<strong>of</strong> two interfaces, the Java Virtual Machine Debug Interface (JVMDI), <strong>an</strong>d the Java<br />

Debug Interface (JDI), <strong>an</strong>d a protocol, the Java Debug Wire Protocol (JDWP).<br />

The first layer <strong>of</strong> the JPDA, the JVMDI, is a programming interface implemented<br />

by the virtual machine. It provides a way to both inspect the state <strong>an</strong>d control the<br />

execution <strong>of</strong> applications <strong>run</strong>ning in the JVM. The second layer, the JDWP, defines<br />

the format <strong>of</strong> information <strong>an</strong>d requests tr<strong>an</strong>sferred between the process being<br />

debugged <strong>an</strong>d the debugger front-end which implements the JDI. The JDI, which<br />

comprises the third layer, defines information <strong>an</strong>d requests at the user code level. It<br />

provides introspective access to a <strong>run</strong>ning virtual machine’s state, the class, array,<br />

interface, <strong>an</strong>d primitive types, <strong>an</strong>d inst<strong>an</strong>ces <strong>of</strong> those types. While a tracer implementor<br />

could directly use the Java Debug Wire Protocol (JDWP) or Java Virtual<br />

Machine Debug Interface (JVMDI), this interface greatly facilitates the integration<br />

<strong>of</strong> tracing capabilities into development environments. This method was selected<br />

because <strong>of</strong> the ease with which it is possible to obtain specific information about<br />

the <strong>run</strong>-<strong>time</strong> behaviour <strong>of</strong> a program.<br />

In order to match objects against method calls it is necessary to model the<br />

execution stack <strong>of</strong> the JVM, as this information is not provided directly by the<br />

JPDA. We have implemented <strong>an</strong> EventTrace <strong>an</strong>alyser class in Java, which carries<br />

out a stack based simulation <strong>of</strong> the entire execution in order to obtain information<br />

about the state <strong>of</strong> the execution stack. This class also implements a filter which<br />

allows the user to specify which events <strong>an</strong>d which <strong>of</strong> their corresponding fields are<br />

to be captured for processing. This allows a high degree <strong>of</strong> flexibility in the collection<br />

<strong>of</strong> the dynamic trace data.<br />

The final component <strong>of</strong> our collection system is a Metrics class, which is responsible<br />

for calculating the desired <strong>metrics</strong> on the fly. It is also responsible for<br />

outputting the results in text format. The <strong>metrics</strong> to be calculated c<strong>an</strong> be specified<br />

from the comm<strong>an</strong>d line. The addition <strong>of</strong> the <strong>metrics</strong> class allows new <strong>metrics</strong> to be


3.2. Metrics Data Collection Tools (Design Objectives) 37<br />

Figure 3.1: Components <strong>of</strong> <strong>run</strong>-<strong>time</strong> class-level <strong>metrics</strong> collection tool, ClMet<br />

easily defined as the user need only interact with this class.<br />

3.2.2 Object-Level Metrics Collection Tool (ObMet)<br />

We have developed <strong>an</strong> object-level <strong>metrics</strong> collection tool called ObMet, which uses<br />

the BCEL <strong>an</strong>d is based on the Gretel [53] coverage monitoring tool.<br />

The BCEL is <strong>an</strong> API which c<strong>an</strong> be used to <strong>an</strong>alyse, create, <strong>an</strong>d m<strong>an</strong>ipulate<br />

(binary) Java class files. Classes are represented by BCEL objects which contain all<br />

the symbolic information <strong>of</strong> the given class, such as methods, fields <strong>an</strong>d byte code<br />

instructions. Such objects c<strong>an</strong> be read from <strong>an</strong> existing file, be tr<strong>an</strong>sformed by a<br />

program <strong>an</strong>d dumped to a file again.<br />

Figure 3.2 illustrates the components <strong>of</strong> ObMet. In the first stage the Instru-


3.2. Metrics Data Collection Tools (Design Objectives) 38<br />

A.class B.class C.class<br />

Instrumenter<br />

A.class B.class C.class<br />

JVM<br />

Probe Table<br />

<br />

<br />

<br />

Probe hit<br />

reports<br />

(Binary file)<br />

Calculate<br />

Metrics<br />

Metrics<br />

Results<br />

Figure 3.2: Components <strong>of</strong> <strong>run</strong>-<strong>time</strong> object-level <strong>metrics</strong> collection tool, ObMet<br />

menter program takes a list <strong>of</strong> class files <strong>an</strong>d instruments them. During this phase<br />

the BCEL inserts probes into these files to flag events like method calls or inst<strong>an</strong>ce<br />

variable accesses. During instrumentation, the class files are ch<strong>an</strong>ged in-place, <strong>an</strong>d<br />

a file containing information on method <strong>an</strong>d field accesses is created. Each method<br />

<strong>an</strong>d field are given a unique index in this file. When the application is <strong>run</strong>, each<br />

probe records a “hit” in <strong>an</strong>other file. The Metrics program then calculates the<br />

<strong>run</strong>-<strong>time</strong> measures utilising the information in these files.<br />

3.2.3 Static Data Collection Tool (StatMet)<br />

In order to calculate the static <strong>metrics</strong> it is necessary to convert the binary class<br />

files into a hum<strong>an</strong> readable format. The StatMet tool is based on the Gnoloo<br />

disassembler [38], which converts the class files into <strong>an</strong> Oolong source file. The<br />

Oolong l<strong>an</strong>guage is <strong>an</strong> assembly l<strong>an</strong>guage for the Java Virtual Machine <strong>an</strong>d the<br />

resulting file will be nearly equivalent to the class file format but it will be suitable for<br />

hum<strong>an</strong> interpretation. The StatMet tool extends the disassembler with <strong>an</strong> additional


3.2. Metrics Data Collection Tools (Design Objectives) 39<br />

A.class B.class C.class<br />

[Binary Format]<br />

Gnoloo<br />

[Disassembler]<br />

Oolong Code<br />

[Hum<strong>an</strong> Readable Code]<br />

Metrics<br />

Static Metrics<br />

Results<br />

Figure 3.3: Components <strong>of</strong> static <strong>metrics</strong> collection tool, StatMet<br />

<strong>metrics</strong> component which calculates the static <strong>metrics</strong> from the Oolong code. Figure<br />

3.3 illustrates the components <strong>of</strong> the StatMet tool.<br />

3.2.4 Coverage Data Collection Tool (InCov)<br />

In order to calculate the instruction coverage, it is necessary to record, for each<br />

instruction, whether or not it was executed. In fact, well-known techniques exist for<br />

identifying sequences <strong>of</strong> consecutive instructions, known as basic blocks, that somewhat<br />

reduce the instrumentation overhead. Nonetheless, since static code <strong>an</strong>alysis<br />

is required to determine basic block entry points, it seemed most efficient to also<br />

instrument the bytecode during this <strong>an</strong>alysis.<br />

The instrumentation framework uses the Apache Byte Code Engineering Library<br />

(BCEL) [30] along with the Gretel Residual Test Coverage Tool [53]. The Gretel<br />

tool statically works out the basic blocks in a Java class file <strong>an</strong>d inserts a probe<br />

consisting <strong>of</strong> small sequence <strong>of</strong> bytecode instructions at each basic block. Whenever<br />

the basic block is executed, the probe code records a “hit” as a simple boole<strong>an</strong> value.<br />

The number <strong>of</strong> bytecode instructions in the basic block c<strong>an</strong> then be used to calculate


3.2. Metrics Data Collection Tools (Design Objectives) 40<br />

instruction coverage.<br />

3.2.5 Fault Detection Study<br />

Mutation testing [48, 64] is a fault-based testing technique that measures the effectiveness<br />

<strong>of</strong> test cases. It was first introduced as a way <strong>of</strong> measuring the accuracy<br />

<strong>of</strong> test suites. It is based on the assumption that a program will be well tested if<br />

a majority <strong>of</strong> simple faults are detected <strong>an</strong>d removed. Mutation testing measures<br />

how good a test is by inserting faults into the program under test. Each fault generates<br />

a new program, a mut<strong>an</strong>t, that is slightly different from the original. These<br />

mut<strong>an</strong>t versions <strong>of</strong> the program are created from the original program by applying<br />

mutation operators, which describe syntactic ch<strong>an</strong>ges to the programming l<strong>an</strong>guage.<br />

Test cases are used to execute these mut<strong>an</strong>ts with the goal <strong>of</strong> causing each mut<strong>an</strong>t<br />

to produce incorrect output. The idea is that the tests are adequate if they distinguish<br />

the program from one or more mut<strong>an</strong>ts. The cost <strong>of</strong> mutation testing has<br />

always been a serious issue <strong>an</strong>d m<strong>an</strong>y techniques proposed for implementing it have<br />

proved to be too slow for practical adoption. µJava is a tool created for performing<br />

mutation testing on Java programs.<br />

µJava<br />

µJava [66, 67] is a mutation system for Java programs. It automatically generates<br />

mut<strong>an</strong>ts for both traditional mutation testing <strong>an</strong>d class-level mutation testing. It<br />

c<strong>an</strong> test individual classes <strong>an</strong>d packages <strong>of</strong> multiple classes.<br />

The method-level or traditional mut<strong>an</strong>ts are based on the selective operator set<br />

by Offutt et al. [87]. These (non-OO) mut<strong>an</strong>ts are all behavioural in nature. There<br />

are five traditional mut<strong>an</strong>ts in total. A description <strong>of</strong> these mut<strong>an</strong>ts c<strong>an</strong> be found<br />

in Appendix D.1.<br />

The class-level mutation operators were designed for Java classes by Ma, Kwon<br />

<strong>an</strong>d Offutt [68], <strong>an</strong>d were in turn designed from a categorisation <strong>of</strong> object-oriented<br />

faults by Offutt, Alex<strong>an</strong>der et al. [86]. The object-oriented mut<strong>an</strong>ts are created<br />

according to 23 operators that are specialised to object-oriented faults. Each <strong>of</strong><br />

these c<strong>an</strong> be catergorised based one <strong>of</strong> five l<strong>an</strong>guage feature groups they are related


3.3. Test Case Programs 41<br />

to. The class-level mut<strong>an</strong>ts c<strong>an</strong> also be divided into one <strong>of</strong> two types, behavioural<br />

mut<strong>an</strong>ts are those that ch<strong>an</strong>ge the behavior <strong>of</strong> the program while structural mut<strong>an</strong>ts<br />

are those that ch<strong>an</strong>ge the structure <strong>of</strong> the program. A detailed description <strong>of</strong> these<br />

mut<strong>an</strong>ts c<strong>an</strong> be found in Appendix D.2.<br />

After creating mut<strong>an</strong>ts, µJava allows the tester to enter <strong>an</strong>d <strong>run</strong> tests, <strong>an</strong>d<br />

evaluates the mutation coverage <strong>of</strong> the tests. Test cases are then added in <strong>an</strong> attempt<br />

to “kill” the mut<strong>an</strong>ts by differentiating the output <strong>of</strong> the original program from the<br />

mut<strong>an</strong>t programs. Tests are supplied by the users as sequences <strong>of</strong> method calls to<br />

the classes under test encapsulated in methods in separate classes.<br />

3.3 Test Case Programs<br />

An import<strong>an</strong>t technique used in the evaluation <strong>of</strong> object systems is benchmarking. A<br />

benchmark is a black-box test, even if the source code is available [73]. A benchmark<br />

should consists <strong>of</strong> two elements:<br />

• The structure <strong>of</strong> the persistent data.<br />

• The behaviour <strong>of</strong> <strong>an</strong> application accessing <strong>an</strong>d m<strong>an</strong>ipulating the data.<br />

The process <strong>of</strong> using a benchmark to assess a particular object system involves executing<br />

or simulating the behaviour <strong>of</strong> the application while collecting data reflecting<br />

its perform<strong>an</strong>ce [54]. A number <strong>of</strong> different Java benchmarks are available <strong>an</strong>d those<br />

used in the course <strong>of</strong> this <strong>study</strong> are discussed in the following subsection.<br />

3.3.1 Benchmark Programs<br />

Benchmark suites are commonly used to measure perform<strong>an</strong>ce <strong>an</strong>d fulfill m<strong>an</strong>y <strong>of</strong><br />

the required properties <strong>of</strong> a test suite. The following were used in this <strong>an</strong>alysis.<br />

SPECjvm98 Benchmark Suite<br />

The SPECjvm98 benchmark suite [8] is typically used to <strong>study</strong> the architectural<br />

implications <strong>of</strong> a Java <strong>run</strong><strong>time</strong> environment. The benchmark suite consists <strong>of</strong> eight


3.3. Test Case Programs 42<br />

Application<br />

Description<br />

201 compress A popular modified Lempel–Ziv method (LZW) compression<br />

program.<br />

202 jess JESS is the Java Expert Shell System <strong>an</strong>d is based on<br />

NASAs popular CLIPS rule-based expert shell system.<br />

205 raytrace This is a raytracer that works on a scene depicting a dinosaur.<br />

209 db Data m<strong>an</strong>agement s<strong>of</strong>tware written by IBM.<br />

213 javac This is the Sun Microsystem Java compiler from the JDK<br />

1.0.2.<br />

222 mpegaudio This is <strong>an</strong> application that decompresses audio files that<br />

conform to the ISO MPEG Layer–3 audio specification.<br />

227 mtrt This is a vari<strong>an</strong>t <strong>of</strong> 205 raytrace. This is a dual–threaded<br />

program that ray traces <strong>an</strong> image.<br />

228 jack A Java parser generator from Sun Microsystems that is<br />

based on the Purdue Compiler Construction Tool Set (PC-<br />

CTS). This is <strong>an</strong> early version <strong>of</strong> what is now called<br />

JavaCC.<br />

Table 3.1: Description <strong>of</strong> the SPECjvm98 benchmarks<br />

Java programs which represent different classes <strong>of</strong> Java applications as illustrated<br />

by Table 3.1.<br />

These programs were <strong>run</strong> at the comm<strong>an</strong>d line prompt <strong>an</strong>d do not include graphics,<br />

AWT (graphical interfaces), or networking. The programs were <strong>run</strong> with a 100%<br />

size execution by specifying a problem size s100 at the comm<strong>an</strong>d line.<br />

JOlden Benchmark Suite<br />

The original Olden benchmarks are a suite <strong>of</strong> pointer intensive C programs which<br />

have been tr<strong>an</strong>slated into Java. They are small, synthetic programs but they were<br />

used as part <strong>of</strong> this <strong>study</strong> as each program exhibits a large volume <strong>of</strong> object creation.


3.3. Test Case Programs 43<br />

Application<br />

bh<br />

bisort<br />

em3d<br />

health<br />

mst<br />

perimeter<br />

power<br />

treeadd<br />

tsp<br />

voronoi<br />

Description<br />

Solves the N-body problem using hierarchical methods.<br />

Sorts by creating two disjoint bitonic sequences <strong>an</strong>d then<br />

merging them.<br />

Simulates the propagation electro-magnetic waves in a 3D<br />

object.<br />

Simulates the Columbi<strong>an</strong> health care system.<br />

Computes the minimum sp<strong>an</strong>ning tree <strong>of</strong> a graph.<br />

Computes the perimeter <strong>of</strong> a set <strong>of</strong> quad-tree encoded<br />

raster images.<br />

Solves the Power System Optimization problem.<br />

Adds the values in a tree.<br />

Computes <strong>an</strong> estimate <strong>of</strong> the best hamiltoni<strong>an</strong> circuit for<br />

the travelling salesm<strong>an</strong> problem.<br />

Computes the Voronoi Diagram <strong>of</strong> a set <strong>of</strong> points.<br />

Table 3.2: Description <strong>of</strong> the JOlden benchmarks<br />

Table 3.2 gives a description <strong>of</strong> the programs [23].<br />

There are a number <strong>of</strong> other benchmark suite available that could be used in this<br />

type <strong>of</strong> <strong>study</strong> which were excluded for various reasons. The DaCapo benchmark suite<br />

was excluded as it is still in its beta stage <strong>of</strong> development. The Java Gr<strong>an</strong>de Forum<br />

Benchmark Suite (JGFBS), which was used in a previous <strong>study</strong> [74], was excluded<br />

as the programs did not exhibit very high levels <strong>of</strong> <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> at <strong>run</strong><strong>time</strong>.<br />

Other suite such as CaffineMark were excluded as these are microbenchmark<br />

programs therefore are not typical <strong>of</strong> real Java applications.<br />

3.3.2 Real-World Programs<br />

It was deemed desirable to include a number <strong>of</strong> real-world programs in the <strong>an</strong>alysis<br />

to see if the results are scalable to actual programs. The following were chosen as<br />

they are all publicly available <strong>an</strong>d so is their source code. They all come with a set


3.3. Test Case Programs 44<br />

<strong>of</strong> pre-defined test cases that are also publicly available, thus defining both the static<br />

<strong>an</strong>d dynamic context <strong>of</strong> our work. This contrasts with some other approaches which,<br />

at worst, c<strong>an</strong> use arbitrary s<strong>of</strong>tware packages, <strong>of</strong>ten proprietary, with <strong>an</strong> ad-hoc set<br />

<strong>of</strong> test inputs.<br />

Velocity<br />

Velocity (version 1.4.1) is <strong>an</strong> open-source s<strong>of</strong>tware system that is part <strong>of</strong> the Apache<br />

Jakarta Project [55]. It is a Java-based template engine <strong>an</strong>d it permits <strong>an</strong>yone to<br />

use a simple yet powerful template l<strong>an</strong>guage to reference objects defined in Java<br />

code. It c<strong>an</strong> be used to generate web pages, SQL, PostScript, <strong>an</strong>d other outputs<br />

from template documents. It c<strong>an</strong> be used either as a st<strong>an</strong>dalone utility or as <strong>an</strong><br />

integrated component <strong>of</strong> other systems. The set <strong>of</strong> JUnit test cases supplied with<br />

the program were used to execute the program.<br />

Xal<strong>an</strong>-Java<br />

Xal<strong>an</strong>-Java (version 2.6.0) is <strong>an</strong> open-source s<strong>of</strong>tware system that is part <strong>of</strong> the<br />

Apache XML Project [92]. It is <strong>an</strong> XSLT processor for tr<strong>an</strong>sforming XML documents<br />

into HTML, text, or other XML document types. It implements XSL Tr<strong>an</strong>sformations<br />

(XSLT) Version 1.0 <strong>an</strong>d XML Path L<strong>an</strong>guage (XPath) Version 1.0. It<br />

c<strong>an</strong> be used from the comm<strong>an</strong>d line, in <strong>an</strong> applet or a servlet, or as a module in<br />

other program. A set <strong>of</strong> JUnit test cases supplied for the program were used for its<br />

execution.<br />

Ant<br />

Ant (version 1.6.1) is a Java-based build tool that is part <strong>of</strong> the Apache Ant Project<br />

[4]. It is similar to GNU Make but has the full portability <strong>of</strong> pure Java code. Instead<br />

<strong>of</strong> writing shell comm<strong>an</strong>ds, as with Make, the configuration files are XML-based,<br />

calling out a target tree where various tasks are executed.


3.4. Statistical Techniques 45<br />

SPECjvm98 JOlden Velocity Xal<strong>an</strong> Ant<br />

Case Study 1: X X X X X<br />

Case Study 2: X X X X<br />

Case Study 3: X X X<br />

Table 3.3: Programs used for each case <strong>study</strong><br />

3.3.3 Execution <strong>of</strong> Programs<br />

All the programs except those in the SPEC benchmark suite were compiled using<br />

the javac compiler from Sun’s SDK version 1.5.0 01, <strong>an</strong>d all benchmarks were <strong>run</strong><br />

using the client virtual machine from this SDK. The programs in the SPEC suite are<br />

distributed in class file format, <strong>an</strong>d were not recompiled or otherwise modified. We<br />

note (in accord<strong>an</strong>ce with the license) that the SPEC programs were <strong>run</strong> individually,<br />

<strong>an</strong>d thus none <strong>of</strong> these results are comparable with the st<strong>an</strong>dard SPECjvm98 metric.<br />

All benchmark suites include not just the programs themselves, but a test harness<br />

to ensure that results from different executions are comparable. Table 3.3 outlines<br />

the programs used for each case <strong>study</strong>. Not all programs were suitable for use in<br />

every case <strong>study</strong> <strong>an</strong>d we defer the expl<strong>an</strong>ation <strong>of</strong> this to the relev<strong>an</strong>t chapters.<br />

3.4 Statistical Techniques<br />

The following section presents a detailed review <strong>of</strong> the statistical techniques used in<br />

this <strong>study</strong>.<br />

3.4.1 Descriptive Statistics<br />

Descriptive statistics describe patterns <strong>an</strong>d general trends in a data set. They also<br />

aid in explaining the results <strong>of</strong> more complex statistical techniques. For each case<br />

<strong>study</strong> a number <strong>of</strong> descriptive statistics were evaluated from the following:<br />

The Distribution or Me<strong>an</strong> (X)<br />

X =<br />

∑ X<br />

N<br />

(3.1)


3.4. Statistical Techniques 46<br />

The me<strong>an</strong> is the sum <strong>of</strong> all values (X) divided by the total number <strong>of</strong> values (N).<br />

The St<strong>an</strong>dard Deviation (s)<br />

√ ∑(X<br />

s = √ − X)<br />

2<br />

var =<br />

N − 1<br />

(3.2)<br />

The st<strong>an</strong>dard deviation is a measure <strong>of</strong> the r<strong>an</strong>ge <strong>of</strong> values in a set <strong>of</strong> numbers.<br />

It is used used as a measure <strong>of</strong> the dispersion or variation in a distribution. Simply<br />

put, it tells us how far a typical member <strong>of</strong> a sample or population is from the<br />

me<strong>an</strong> value <strong>of</strong> that sample or population. A large st<strong>an</strong>dard deviation suggests that<br />

a typical member is far away from the me<strong>an</strong>. A small st<strong>an</strong>dard deviation suggests<br />

that members are clustered closely around the me<strong>an</strong>. It is computed as the square<br />

root <strong>of</strong> the vari<strong>an</strong>ce.<br />

M<strong>an</strong>y statistical techniques assume that data is normally distributed. If that<br />

assumption c<strong>an</strong> be justified, then 68% <strong>of</strong> the values are at most one st<strong>an</strong>dard deviation<br />

away from the me<strong>an</strong>, 95% <strong>of</strong> the values are at most two st<strong>an</strong>dard deviations<br />

away from the me<strong>an</strong>, <strong>an</strong>d 99.7% <strong>of</strong> the values lie within three st<strong>an</strong>dard deviations<br />

<strong>of</strong> the me<strong>an</strong>.<br />

The Coefficient <strong>of</strong> Variation (C V )<br />

C V = σ/µ ∗ 100 (3.3)<br />

C V<br />

measures the relative scatter in data with respect to the me<strong>an</strong> <strong>an</strong>d is calculated<br />

by dividing the st<strong>an</strong>dard deviation by the me<strong>an</strong>. It has no units <strong>an</strong>d c<strong>an</strong> be<br />

expressed as a simple decimal value or reported as a percentage value. When the<br />

C V is small the data scatter relative to the me<strong>an</strong> is small. When the C V is large<br />

compared to the me<strong>an</strong> the amount <strong>of</strong> variation is large. Equation 3.3 defines the<br />

coefficient <strong>of</strong> variation as a percentage, where µ is the me<strong>an</strong> <strong>an</strong>d σ is the st<strong>an</strong>dard<br />

deviation.<br />

Skewness<br />

skewness =<br />

∑ n<br />

i=1 (X i − X) 3<br />

(N − 1)s 3 (3.4)


3.4. Statistical Techniques 47<br />

Skewness is the tilt (or lack <strong>of</strong> it) in a distribution. It characterises the degree <strong>of</strong><br />

asymmetry <strong>of</strong> a distribution around its me<strong>an</strong>. A distribution is symmetric if it looks<br />

the same to the left <strong>an</strong>d right <strong>of</strong> the centre point. Equation 3.4 gives the formula<br />

for skewness for X 1 , X 2 , ..., X N , where X is the me<strong>an</strong>, s is the st<strong>an</strong>dard deviation<br />

<strong>an</strong>d N is the number <strong>of</strong> data points<br />

Kurtosis<br />

∑ n<br />

i=1<br />

kurotsis =<br />

(X i − X) 4<br />

(3.5)<br />

(N − 1)s 4<br />

Kurtosis is the peakedness <strong>of</strong> a distribution. Equation 3.5 gives the formula for<br />

kurtosis for X 1 , X 2 , ..., X N ,.<br />

3.4.2 Normality Tests<br />

M<strong>an</strong>y statistical procedures require that the data being <strong>an</strong>alysed follow a normal<br />

data distribution. If this is not the case, then the computed statistics may be<br />

extremely misleading. Normal distributions take the form <strong>of</strong> a symmetric bellshaped<br />

curve. Normality c<strong>an</strong> be visually assessed by looking at a histogram <strong>of</strong><br />

frequencies, or by looking at a normal probability plot.<br />

A common rule-<strong>of</strong>-thumb test for normality is to get skewness <strong>an</strong>d kurtosis, then<br />

divide these by the st<strong>an</strong>dard errors. Skew <strong>an</strong>d kurtosis should be within the +2<br />

to -2 r<strong>an</strong>ge when the data are normally distributed. Negative skew is left-le<strong>an</strong>ing,<br />

positive skew right-le<strong>an</strong>ing. Negative kurtosis indicates too m<strong>an</strong>y cases in the tails<br />

<strong>of</strong> the distribution. Positive kurtosis indicates too few cases in the tails.<br />

Shapiro-Wilk’s W Test<br />

W = (∑ n<br />

i−1 a ix (i) ) 2<br />

∑ n<br />

i−1 (x i − x) 2 (3.6)<br />

Formal tests such as the Shapiro-Wilk’s test may also be applied to assess<br />

whether the data is normally distributed. It calculates a W statistic that tests<br />

whether a r<strong>an</strong>dom sample, x 1 , x 2 , ..., x n comes from a normal distribution. W may<br />

be thought <strong>of</strong> as the correlation between given data <strong>an</strong>d their corresponding normal<br />

scores, with W = 1 when the given data are perfectly normal in distribution.


3.4. Statistical Techniques 48<br />

When W is signific<strong>an</strong>tly smaller th<strong>an</strong> 1, the assumption <strong>of</strong> normality is not met.<br />

Shapiro-Wilks W is recommended for small <strong>an</strong>d medium samples up to n = 2000.<br />

Equation 3.6 calculates the W statistic where x i are the ordered sample values <strong>an</strong>d<br />

a i are the const<strong>an</strong>ts generated from the me<strong>an</strong>s, vari<strong>an</strong>ces <strong>an</strong>d covari<strong>an</strong>ces <strong>of</strong> the<br />

order statistics <strong>of</strong> a sample <strong>of</strong> size n from a normal distribution [90, 93].<br />

Kolmogorov-Smirnov D Test or K-S Lilliefors test<br />

D = max l≤i≤N |F (y i ) − i N | (3.7)<br />

For larger samples, the Kolmogorov-Smirnov test is recommended. For a single<br />

sample <strong>of</strong> data, this test is used to test whether or not the sample <strong>of</strong> data is consistent<br />

with a specified distribution function. When there are two samples <strong>of</strong> data, it is<br />

used to test whether or not these two samples may reasonably be assumed to come<br />

from the same distribution. Equation 3.7 defines the test statistic, where F is the<br />

theoretical cumulative distribution <strong>of</strong> the distribution being tested which must be a<br />

continuous distribution. The hypothesis regarding the distributional form is rejected<br />

if the test statistic, D, is greater th<strong>an</strong> the critical value obtained from a table. There<br />

are several variations <strong>of</strong> these tables in the literature [24].<br />

3.4.3 Normalising Tr<strong>an</strong>sformations<br />

There are a number <strong>of</strong> tr<strong>an</strong>sformations that c<strong>an</strong> be applied to approximate data<br />

to become normally distributed. To normalise right or positive skew, square roots,<br />

logarithmic, <strong>an</strong>d inverse (1/x) tr<strong>an</strong>sforms “pull in” outliers. Inverse tr<strong>an</strong>sforms are<br />

stronger th<strong>an</strong> logarithmic tr<strong>an</strong>sforms which are stronger th<strong>an</strong> roots. To correct left<br />

or negative skew, first subtract all values from the highest value plus 1, then apply<br />

square root, inverse, or logarithmic tr<strong>an</strong>sforms. Power tr<strong>an</strong>sforms c<strong>an</strong> be used to<br />

correct both types <strong>of</strong> skew <strong>an</strong>d finer adjustments c<strong>an</strong> be made by adding a const<strong>an</strong>t,<br />

C, in the tr<strong>an</strong>sform <strong>of</strong> X: (X + C) P . Values <strong>of</strong> P less th<strong>an</strong> one (roots) correct<br />

right skew, which is the common situation (using a power <strong>of</strong> 2/3 is common when<br />

attempting to normalise). Values <strong>of</strong> P greater th<strong>an</strong> 1 (powers) correct left skew.<br />

For right skew, decreasing P decreases right skew. Too great a reduction in P will


3.4. Statistical Techniques 49<br />

overcorrect <strong>an</strong>d cause left skew.<br />

When the best P is found, further refinements<br />

c<strong>an</strong> be made by adjusting C. For right skew, for inst<strong>an</strong>ce, subtracting C will decrease<br />

skew. Logarithmic tr<strong>an</strong>sformations are appropriate to achieve symmetry in<br />

the central distribution when symmetry <strong>of</strong> the tails is not import<strong>an</strong>t. Square root<br />

tr<strong>an</strong>sformations are used when symmetry in the tails is import<strong>an</strong>t. When both are<br />

import<strong>an</strong>t, a fourth root tr<strong>an</strong>sform may work.<br />

3.4.4 Pearson Correlation Test<br />

R =<br />

n ∑ xy − ∑ x ∑ y<br />

√<br />

([n<br />

∑<br />

x2 − ( ∑ x) 2 ][n ∑ y 2 − ( ∑ y) 2 ])<br />

(3.8)<br />

The Pearson or product moment correlation test is used to assess if there is a<br />

relationship between two or more variables, in other words it is a measure <strong>of</strong> the<br />

strength <strong>of</strong> the relationship between the variables. Having n pairs <strong>of</strong> data (x i , y i ),<br />

equation 3.8 computes the correlation coefficient (R). R is a number that summarises<br />

the direction <strong>an</strong>d degree (closeness) <strong>of</strong> linear relations between two variables <strong>an</strong>d is<br />

also known as the Pearson Product-Moment Correlation Coefficient. R c<strong>an</strong> take<br />

values between -1 through 0 to +1. The sign (+ or -) <strong>of</strong> the correlation affects its<br />

interpretation. When the correlation is positive (R > 0), as the value <strong>of</strong> one variable<br />

increases, so does the other. The closer R is to zero the weaker the relationship. If<br />

a correlation is negative, when one variable increases, the other variable decreases.<br />

The following general categories indicate a quick way <strong>of</strong> interpreting a calculated R<br />

value [97]:<br />

• 0.0 to 0.2 Very weak to negligible correlation<br />

• 0.2 to 0.4 Weak, low correlation (not very signific<strong>an</strong>t)<br />

• 0.4 to 0.6 Moderate correlation<br />

• 0.7 to 0.9 Strong, high correlation<br />

• 0.9 to 1.0 Very strong correlation<br />

The results <strong>of</strong> such <strong>an</strong> <strong>an</strong>alysis are displayed in a correlation matrix table.<br />

3.4.5 T-Test<br />

t =<br />

r<br />

√<br />

[(1 − r2 )/(N − 2)]<br />

(3.9)


3.4. Statistical Techniques 50<br />

Any relationship between two variables should be assessed for its signific<strong>an</strong>ce<br />

as well as its strength. A st<strong>an</strong>dard two tailed t-test is used to test for statistical<br />

signific<strong>an</strong>ce as illustrated by equation 3.9. Coefficients are considered signific<strong>an</strong>t if<br />

the t-test p-value is below 0.05. This tells how unlikely a given correlation coefficient,<br />

r, will occur given no relationship in the population. Therefore the smaller the p-<br />

level, the more signific<strong>an</strong>t the relationship taking account <strong>of</strong> type I <strong>an</strong>d type II<br />

errors.<br />

3.4.6 Principal Component Analysis<br />

Principal Component Analysis (PCA) is used to <strong>an</strong>alyse the covariate structure<br />

<strong>of</strong> the <strong>metrics</strong> <strong>an</strong>d to determine the underlying structural dimensions they capture.<br />

In other words PCA c<strong>an</strong> tell if all the <strong>metrics</strong> are likely to be measuring the same<br />

class property. PCA usually generates a large number <strong>of</strong> principal components. The<br />

number will be decided based on the amount <strong>of</strong> vari<strong>an</strong>ce explained by each component.<br />

A typical threshold would be retaining principal components with eigenvalues<br />

(vari<strong>an</strong>ces) larger th<strong>an</strong> 1.0. This is the Kaiser criterion. There are a number <strong>of</strong><br />

stages involved in performing a PCA on a set <strong>of</strong> data:<br />

1. Select a data set, for example one with two dimensions x <strong>an</strong>d y.<br />

2. Subtract the me<strong>an</strong> from each <strong>of</strong> the data dimensions. The me<strong>an</strong> subtracted<br />

is the average across each dimension, so all the x values have the me<strong>an</strong> x<br />

subtracted <strong>an</strong>d all the y values will have y subtracted. This produces a data<br />

set whose me<strong>an</strong> is zero.<br />

3. Calculate the covari<strong>an</strong>ce matrix. Formula 3.10 gives the definition for a covari<strong>an</strong>ce<br />

matrix for a set <strong>of</strong> data with n dimensions, where C n×n is a matrix<br />

with n rows <strong>an</strong>d n columns, <strong>an</strong>d Dim x is the x th dimension.<br />

C n×n = (c i,j, , c i,j = cov(Dim i , Dim j )) (3.10)<br />

An n-dimensional data set will have<br />

n!<br />

(n−2!)∗2<br />

different covari<strong>an</strong>ce values. As<br />

the data we propose to use is two dimensional, the covari<strong>an</strong>ce matrix will be


3.4. Statistical Techniques 51<br />

2 × 2:<br />

⎛<br />

C = ⎝<br />

cov(x, x) cov(x, y)<br />

cov(y, x) cov(y, y)<br />

⎞<br />

⎠<br />

4. Calculate the eigenvectors <strong>an</strong>d eigenvalues <strong>of</strong> the covari<strong>an</strong>ce matrix. They are<br />

both unit vectors, that is their lengths are both 1 <strong>an</strong>d they are both closely<br />

related. These are import<strong>an</strong>t as they provide information about patterns in<br />

the data.<br />

5. Choosing components <strong>an</strong>d forming a feature vector. In general, once eigenvectors<br />

are found from the covari<strong>an</strong>ce matrix, the next step is to order them<br />

by eigenvalue, highest to lowest. This gives you the components in order <strong>of</strong><br />

signific<strong>an</strong>ce. Some <strong>of</strong> the components <strong>of</strong> lesser signific<strong>an</strong>ce c<strong>an</strong> be ignored. If<br />

some components are left out, the final data set will have less dimensions th<strong>an</strong><br />

the original. To be precise, if there are originally n dimensions in the data,<br />

<strong>an</strong>d n eigenvectors <strong>an</strong>d eigenvalues are calculated, <strong>an</strong>d the first p eigenvectors<br />

are chosen, then the final data set has only p dimensions.<br />

Forming a feature vector, which is just <strong>an</strong>other name for a matrix <strong>of</strong> vectors,<br />

is constructed by taking the eigenvectors that you w<strong>an</strong>t to keep from the list<br />

<strong>of</strong> eigenvectors, <strong>an</strong>d forming a matrix with these eigenvectors in the columns.<br />

F eatureV ector = (eig 1 eig 2 eig 3 ...eig n ) (3.11)<br />

6. Derive a new data set, for this we simply take the tr<strong>an</strong>spose <strong>of</strong> the vector <strong>an</strong>d<br />

multiply it on the left <strong>of</strong> the original data set, tr<strong>an</strong>sposed.<br />

F inalData = RowF eatureV ector ∗ RowDataAdjust (3.12)<br />

where RowFeatureVector is the matrix with the eigenvectors in the columns<br />

tr<strong>an</strong>sposed so that the eigenvectors are now in the rows, with the most signific<strong>an</strong>t<br />

eigenvector at the top, <strong>an</strong>d RowDataAdjust is the me<strong>an</strong>-adjusted data


3.4. Statistical Techniques 52<br />

tr<strong>an</strong>sposed, that is, the data items are in each column, with each row holding<br />

a separate dimension.<br />

See [56] for further details on PCA.<br />

3.4.7 Cluster Analysis<br />

Cluster Analysis is a data exploratory statistical procedure that helps reveal associations<br />

<strong>an</strong>d structures <strong>of</strong> data in a domain set [91]. A measure <strong>of</strong> proximity or<br />

similarity/dissimilarity is needed in order to determine groups from a complex data<br />

set. A wide variety <strong>of</strong> such measures exist but no consensus prevails over which is<br />

superior. For this <strong>study</strong>, two widely used dissimilarity measures, Pearson dissimilarity<br />

<strong>an</strong>d Euclide<strong>an</strong> dist<strong>an</strong>ce were chosen. The <strong>an</strong>alysis was repeated using these<br />

two different measures in order to verify the results.<br />

Equation 3.13 defines the Pearson Dissimilarity, where µ x <strong>an</strong>d µ y are the me<strong>an</strong>s<br />

<strong>of</strong> the first <strong>an</strong>d second sets <strong>of</strong> data, <strong>an</strong>d σ x <strong>an</strong>d σ y are the st<strong>an</strong>dard deviations <strong>of</strong><br />

the first <strong>an</strong>d second sets <strong>of</strong> data.<br />

d(x, y) =<br />

∑<br />

1<br />

n i x iy i − µ x µ y<br />

(3.13)<br />

σ x σ y<br />

Equation 3.14 defines the Euclide<strong>an</strong> Dist<strong>an</strong>ce between two sets <strong>of</strong> data.<br />

∑<br />

d(x, y) = √ n (x i − y i ) 2 (3.14)<br />

i<br />

The next step is to select the most suitable type <strong>of</strong> clustering algorithm for the<br />

<strong>an</strong>alysis. The agglomerative hierarchical clustering (AHC) algorithm was chosen as<br />

being the most suitable for the specifications <strong>of</strong> the <strong>an</strong>alysis. Also, it does not require<br />

the number <strong>of</strong> clusters the data should be grouped into be specified in adv<strong>an</strong>ce. AHC<br />

algorithms start with singleton clusters, one for each entity. The most similar pair<br />

<strong>of</strong> clusters are merged, one pair at a <strong>time</strong>, until a single cluster remains.<br />

Throughout the cluster <strong>an</strong>alysis, there is a symmetric matrix <strong>of</strong> dissimilarities<br />

maintained between the clusters. Once two clusters have been merged, it is necessary<br />

to generate the dissimilarity between the new cluster <strong>an</strong>d every other cluster.


3.4. Statistical Techniques 53<br />

The unweighted pair group average linkage algorithm was employed here as it is<br />

theoretically the best method to use. This algorithm clusters objects based on the<br />

average dist<strong>an</strong>ce between all pairs.<br />

Suppose we have three clusters A, B <strong>an</strong>d C, with i being the dist<strong>an</strong>ce between<br />

A <strong>an</strong>d B, <strong>an</strong>d j being the dist<strong>an</strong>ce between B <strong>an</strong>d C. If A <strong>an</strong>d B are the most<br />

similar pair <strong>of</strong> entities <strong>an</strong>d are joined together into a new cluster D, the method <strong>of</strong><br />

calculating the new dist<strong>an</strong>ce k between C <strong>an</strong>d D is given by Equation 3.15.<br />

k = (i ∗ size(A) + j ∗ size(B))/(size(A) + size(B)) (3.15)<br />

The <strong>an</strong>alysis was repeated using Ward’s method to verify the results. With<br />

this method cluster membership is assessed by calculating the total sum <strong>of</strong> squared<br />

deviations from the me<strong>an</strong> <strong>of</strong> a cluster. The criterion for fusion is that it should<br />

produce the smallest possible increase in the error sum <strong>of</strong> squares.<br />

The output <strong>of</strong> AHC is usually represented in a special type <strong>of</strong> tree structure called<br />

a dendrogram, as illustrated by Figure 3.4. Each br<strong>an</strong>ch <strong>of</strong> the tree represents a<br />

cluster <strong>an</strong>d is drawn vertically to height where the cluster merges with neighbouring<br />

clusters. The cutting line is a line drawn horizontally across the dendrogram at a<br />

given dissimilarity level to determine the number <strong>of</strong> clusters. The cutting line is<br />

determined by constructing a histogram <strong>of</strong> node levels to find where the increase<br />

in dissimilarity is strongest, as then we have reached a level where we are grouping<br />

groups that are already homogeneous. The cutting line is selected before this level<br />

is reached.<br />

3.4.8 Regression Analysis<br />

The general computational problem that needs to be solved in linear regression<br />

<strong>an</strong>alysis is to fit a straight line to a set <strong>of</strong> points [43]. When there is more th<strong>an</strong><br />

one independent variable, the regression procedures will estimate a linear equation<br />

<strong>of</strong> the form shown in Equation 3.16, where Y is the dependent variable, X i st<strong>an</strong>ds<br />

for a set <strong>of</strong> independent variables, a is a const<strong>an</strong>t <strong>an</strong>d each b i is the slope <strong>of</strong> the<br />

regression line. The const<strong>an</strong>t a is also known as the intercept, <strong>an</strong>d the slope as the<br />

regression coefficient.


3.4. Statistical Techniques 54<br />

Figure 3.4: Dendrogram: At the cutting line there are two clusters


3.4. Statistical Techniques 55<br />

Y = a + b 1 X 1 + b 2 X 2 + . . . + b p X p (3.16)<br />

The regression line expresses the best prediction <strong>of</strong> the dependent variable Y<br />

given the independent variables X i . However, usually there is subst<strong>an</strong>tial variation<br />

<strong>of</strong> the observed points around the fitted regression line. The deviation <strong>of</strong> a particular<br />

point from the line is known as the residual value. The smaller the variability <strong>of</strong><br />

the residual values around the regression line relative to the overall variability, the<br />

better the prediction. In most cases the ratio will fall somewhere between 0.0 <strong>an</strong>d<br />

1.0. If there is no relationship between the X <strong>an</strong>d Y variables the ratio will be<br />

1.0, while if X <strong>an</strong>d Y are perfectly related the ratio will be 0.0. The least squares<br />

method is employed to perform the regression.<br />

The R 2 or the coefficient <strong>of</strong> determination is 1.0 minus this ratio. The R 2 value<br />

is <strong>an</strong> indicator <strong>of</strong> how well the model fits the data. If we have <strong>an</strong> R 2 close to 1.0 this<br />

indicates that we have accounted for almost all <strong>of</strong> the variability with the variables<br />

specified in the model.<br />

The correlation coefficient R expresses the degree to which two or more independent<br />

variables are related to the dependent variable, <strong>an</strong>d it is the square root <strong>of</strong> R 2 .<br />

R c<strong>an</strong> assume values between -1 <strong>an</strong>d +1. The sign (plus or minus) <strong>of</strong> the correlation<br />

coefficient interprets the direction <strong>of</strong> the relationship between the variables. If<br />

it is positive, then the relationship <strong>of</strong> this variable with the dependent variable is<br />

positive. If it is negative then the relationship is negative. If it is zero then there is<br />

no relationship between the variables.<br />

3.4.9 Analysis <strong>of</strong> Vari<strong>an</strong>ce (ANOVA)<br />

ANOVA is used to test the signific<strong>an</strong>ce <strong>of</strong> the variation in the dependent variable that<br />

c<strong>an</strong> be attributed to the regression <strong>of</strong> one or more independent variables. The results<br />

enable us to determine whether or not the expl<strong>an</strong>atory variables bring signific<strong>an</strong>t<br />

information to the model. ANOVA gives a statistical test <strong>of</strong> the null hypothesis H 0 ,<br />

which is, there is no linear relationship between the variables versus the alternative<br />

hypothesis H 1 , which is, there is a relationship between the variables.


3.5. Conclusion 56<br />

There are four parts to ANOVA results, the sum <strong>of</strong> squares, degrees <strong>of</strong> freedom,<br />

me<strong>an</strong> squares <strong>an</strong>d the F test. Fisher’s F test, as given by Equation 3.17, is used<br />

to test whether the R 2 values are statistically signific<strong>an</strong>t. Values are deemed to be<br />

signific<strong>an</strong>t at p ≤ 0.05.<br />

F = R2 ∗ (N − K − 1)<br />

(1 − R 2 ) ∗ K<br />

(3.17)<br />

Here, K is the number <strong>of</strong> independent variables (two in our case) <strong>an</strong>d N is the<br />

number <strong>of</strong> observed values.<br />

3.5 Conclusion<br />

A detailed account <strong>of</strong> the tools <strong>an</strong>d techniques needed to conduct the test case<br />

studies described in the next sections were outlined in this chapter. The programs<br />

evaluated in this work were discussed <strong>an</strong>d <strong>an</strong> outline <strong>of</strong> the statistical techniques<br />

used to <strong>an</strong>alyse the results were provided.


Chapter 4<br />

Case Study 1: The Influence <strong>of</strong><br />

Instruction Coverage on the<br />

Relationship Between Static <strong>an</strong>d<br />

Run-<strong>time</strong> Coupling Metrics<br />

When comparing static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> measures it is import<strong>an</strong>t to have a thorough<br />

underst<strong>an</strong>ding <strong>of</strong> the degree to which the <strong>an</strong>alysed source code corresponds to the<br />

code that is actually executed. In this chapter this relationship is studied using<br />

instruction coverage measures with regard to the influence <strong>of</strong> coverage on the relationship<br />

between static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>. It is proposed that coverage results<br />

have a signific<strong>an</strong>t influence on the relationship <strong>an</strong>d thus should always be a measured,<br />

recorded factor in <strong>an</strong>y such comparison.<br />

An <strong>empirical</strong> investigation is conducted using a set <strong>of</strong> six <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> on<br />

seventeen Java benchmark <strong>an</strong>d real-world programs. First, the differences in the<br />

underlying dimensions <strong>of</strong> <strong>coupling</strong> captured by the static versus the <strong>run</strong>-<strong>time</strong> <strong>metrics</strong><br />

are assessed using principal component <strong>an</strong>alysis. Subsequently, multiple regression<br />

<strong>an</strong>alysis is used to <strong>study</strong> the predictive ability <strong>of</strong> the static CBO <strong>an</strong>d instruction<br />

coverage data to extrapolate the <strong>run</strong>-<strong>time</strong> measures.<br />

57


4.1. Goals <strong>an</strong>d Hypotheses 58<br />

4.1 Goals <strong>an</strong>d Hypotheses<br />

The Goal Question Metric/MEtric DEfinition Approach (GQM/MEDEA) framework<br />

proposed by Bri<strong>an</strong>d et al. [18] was used to set up the experiments for this<br />

<strong>study</strong>.<br />

Goal: To investigate the relationship between static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong>.<br />

Experiment 1:<br />

Perspective: We would expect some degree <strong>of</strong> correlation between the <strong>run</strong>-<strong>time</strong><br />

measures for <strong>coupling</strong> <strong>an</strong>d the static CBO metric. We use a number <strong>of</strong> statistical<br />

techniques, including principle component <strong>an</strong>alysis to <strong>an</strong>alyse the covariate structure<br />

<strong>of</strong> the <strong>metrics</strong> to determine if they are measuring the same class properties.<br />

Environment: We chose to evaluate a number <strong>of</strong> Java programs from well<br />

defined publicly-available benchmark suites as well as a number <strong>of</strong> open source realworld<br />

programs.<br />

Hypothesis:<br />

H 0 : Run-<strong>time</strong> measures for <strong>coupling</strong> are simply surrogate measures for the static<br />

CBO metric.<br />

H 1 : Run-<strong>time</strong> measures for <strong>coupling</strong> are not simply surrogate measures for the<br />

static CBO metric.<br />

Experiment 2:<br />

Goal: To examine the relationship between static CBO <strong>an</strong>d <strong>run</strong>-<strong>time</strong> coverage<br />

<strong>metrics</strong>, particularly in the context <strong>of</strong> the influence <strong>of</strong> instruction coverage.<br />

Perspective: Intuitively, one would expect the better the coverage <strong>of</strong> the test<br />

cases used the greater the correlation between the static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>. We<br />

use multiple regression <strong>an</strong>alysis to determine if there is a signific<strong>an</strong>t correlation.


4.2. Experimental Design 59<br />

Environment: We chose to evaluate a number <strong>of</strong> Java programs from well<br />

defined publicly-available benchmark suites as well as a number <strong>of</strong> open source realworld<br />

programs.<br />

Hypothesis:<br />

H 0 : The coverage <strong>of</strong> the test cases used to evaluate a program has no influence<br />

on the relationship between static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong>.<br />

H 1 : The coverage <strong>of</strong> the test cases used to evaluate a program has <strong>an</strong> influence<br />

on the relationship between static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong>.<br />

4.2 Experimental Design<br />

In order to conduct the practical experiments underlying this <strong>study</strong>, it was necessary<br />

to select a suite <strong>of</strong> Java programs <strong>an</strong>d measure:<br />

• the static CBO metric<br />

• the instruction coverage percentages: I C<br />

• the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong>: IC CC, EC CC, IC CM, EC EM, IC CD, EC CD<br />

The static <strong>metrics</strong> data collection tool StatMet, described in Section 3.2.3, was<br />

used to calculate CBO, while the InCov tool, outlined in Section 3.2.4, was used to<br />

determine the instruction coverage. The <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> were evaluated using the<br />

ClMet tool, which is described in Chapter 3.2.1.<br />

The set <strong>of</strong> programs used in this <strong>study</strong> consist <strong>of</strong> the benchmark programs JOlden<br />

<strong>an</strong>d SPECjvm98, as well as the real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant. The<br />

SPECjvm98 suite was chosen as it is directly comparable to other studies that use<br />

Java s<strong>of</strong>tware. The program mtrt was excluded from the investigation as it is multithreaded<br />

<strong>an</strong>d therefore is not suitable for this type <strong>of</strong> <strong>an</strong>alysis. The more synthetic<br />

JOlden programs were included to ensure that it considers programs that create<br />

signific<strong>an</strong>tly large populations <strong>of</strong> objects. Three <strong>of</strong> the programs from the JOlden<br />

suite BiSort, TreeAdd <strong>an</strong>d TSP were omitted from the <strong>an</strong>alysis as they contained


4.3. Results 60<br />

only two classes, therefore the results could not be further <strong>an</strong>alysed. A selection <strong>of</strong><br />

real programs were selected to ensure that the results were scalable to all types <strong>of</strong><br />

programs.<br />

4.3 Results<br />

4.3.1 Experiment 1: To investigate the relationship between<br />

static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong><br />

For each program the distribution (me<strong>an</strong>) <strong>an</strong>d vari<strong>an</strong>ce (st<strong>an</strong>dard deviation)<br />

<strong>of</strong> each measure across the class is calculated. These statistics are used to select<br />

<strong>metrics</strong> that exhibit enough vari<strong>an</strong>ce to merit further <strong>an</strong>alysis, as a low vari<strong>an</strong>ce<br />

metric would not differentiate classes very well <strong>an</strong>d therefore would not be a useful<br />

predictor <strong>of</strong> external quality. Descriptive statistics also aid in explaining the results<br />

<strong>of</strong> the subsequent <strong>an</strong>alysis.<br />

The descriptive statistic results for each program are summarised in Table 4.1.<br />

The metric values exhibit large vari<strong>an</strong>ces which makes them suitable c<strong>an</strong>didates for<br />

further <strong>an</strong>alysis.<br />

Principal Component Analysis<br />

Principal Component Analysis (PCA) is used to investigate whether the <strong>run</strong><strong>time</strong><br />

<strong>coupling</strong> <strong>metrics</strong> are not simply surrogate measures for static CBO.<br />

A similar <strong>study</strong> was carried out by Arisholm et al. using only the Velocity<br />

program [5]. The work in this chapter extends their work to include fourteen benchmark<br />

programs as well as three real-world programs in order to demonstrate the<br />

robustness <strong>of</strong> these results over a larger r<strong>an</strong>ge <strong>an</strong>d variety <strong>of</strong> programs.<br />

Appendix A.1 shows the results <strong>of</strong> the principal component <strong>an</strong>alysis used to investigate<br />

the covariate structure <strong>of</strong> the static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>. Using the Kaiser<br />

criterion to select the number <strong>of</strong> factors to retain shows that the <strong>metrics</strong> mostly capture<br />

three orthogonal dimensions in the sample space formed by all measures. In<br />

other words, the <strong>coupling</strong> is divided along three dimensions for each <strong>of</strong> the programs


4.3. Results 61<br />

SPECjvm98 Benchmark Suite<br />

201 compress<br />

202 jess<br />

205 raytrace<br />

209 db<br />

Me<strong>an</strong><br />

SD<br />

Me<strong>an</strong><br />

SD<br />

Me<strong>an</strong><br />

SD<br />

Me<strong>an</strong><br />

SD<br />

CBO 6.24 6.2<br />

CBO 6.99 4.78<br />

CBO 7.25 7.51<br />

CBO 9.12 6.60<br />

IC CC 1.72 2.11<br />

IC CC 2.97 7.21<br />

IC CC 2.14 4.25<br />

IC CC 1.81 1.98<br />

IC CM 4.34 3.54<br />

IC CM 4.34 3.43<br />

IC CM 4.45 3.54<br />

IC CM 6.56 4.46<br />

IC CD 7.56 5.46<br />

IC CD 5.45 4.54<br />

IC CD 7.56 6.56<br />

IC CD 9.67 8.68<br />

EC CC 1.80 1.16<br />

EC CC 2.97 9.01<br />

EC CC 2.06 1.89<br />

EC CC 1.88 1.54<br />

EC CM 4.35 4.76<br />

EC CM 4.34 4.35<br />

EC CM 4.54 4.53<br />

EC CM 6.45 5.67<br />

EC CD 6.56 4.56<br />

EC CD 7.56 6.56<br />

EC CD 6.56 4.56<br />

EC CD 9.57 7.65<br />

213 javac<br />

Me<strong>an</strong> SD<br />

CBO 8.54 7.15<br />

IC CC 3.21 3.01<br />

IC CM 5.45 4.56<br />

IC CD 7.56 7.56<br />

EC CC 3.01 2.87<br />

EC CM 3.45 4.56<br />

EC CD 5.45 5.65<br />

222 mpegaudio<br />

Me<strong>an</strong> SD<br />

CBO 5.75 4.90<br />

IC CC 2.60 2.36<br />

IC CM 4.54 3.56<br />

IC CD 7.56 6.56<br />

EC CC 2.60 2.70<br />

EC CM 5.45 4.56<br />

EC CD 5.87 5.46<br />

228 jack<br />

Me<strong>an</strong> SD<br />

CBO 6.05 7.51<br />

IC CC 2.68 5.37<br />

IC CM 3.45 3.43<br />

IC CD 5.45 4.45<br />

EC CC 2.68 2.39<br />

EC CM 5.45 4.56<br />

EC CD 7.56 6.56<br />

JOlden Benchmark Suite<br />

BH<br />

Em3d<br />

Health<br />

MST<br />

Me<strong>an</strong><br />

SD<br />

Me<strong>an</strong><br />

SD<br />

Me<strong>an</strong><br />

SD<br />

Me<strong>an</strong><br />

SD<br />

CBO 5.22 3.40<br />

CBO 4.20 2.86<br />

CBO 3.43 3.46<br />

CBO 4.34 3.45<br />

IC CC 2.62 2.50<br />

IC CC 3.22 0.71<br />

IC CC 2.43 2.46<br />

IC CC 3.54 2.45<br />

IC CM 7.44 8.86<br />

IC CM 3.87 1.01<br />

IC CM 3.35 4.24<br />

IC CM 4.23 3.45<br />

IC CD 8.67 10.84<br />

IC CD 4.76 3.96<br />

IC CD 4.25 5.46<br />

IC CD 7.54 4.54<br />

EC CC 2.33 1.33<br />

EC CC 3.75 1.33<br />

EC CC 3.35 3.46<br />

EC CC 3.45 3.34<br />

EC CM 5.77 4.44<br />

EC CM 3.35 3.49<br />

EC CM 3.55 2.43<br />

EC CM 3.45 2.45<br />

EC CD 6.25 4.74<br />

EC CD 4.65 3.46<br />

EC CD 4.46 4.43<br />

EC CD 4.56 4.32<br />

Perimeter<br />

Me<strong>an</strong> SD<br />

CBO 5.34 4.34<br />

IC CC 3.34 3.45<br />

IC CM 4.34 2.45<br />

IC CD 8.56 6.45<br />

EC CC 3.54 3.45<br />

EC CM 4.54 3.43<br />

EC CD 6.54 3.54<br />

Power<br />

Me<strong>an</strong> SD<br />

CBO 4.50 2.54<br />

IC CC 1.32 0.45<br />

IC CM 5.23 2.23<br />

IC CD 5.64 2.56<br />

EC CC 1.54 1.45<br />

EC CM 4.12 4.56<br />

EC CD 4.67 5.35<br />

Voronoi<br />

Me<strong>an</strong> SD<br />

CBO 5.43 3.46<br />

IC CC 2.43 1.45<br />

IC CM 4.54 0.45<br />

IC CD 7.45 3.46<br />

EC CC 3.45 3.46<br />

EC CM 4.45 2.45<br />

EC CD 5.36 2.46<br />

Real-World Programs<br />

Velocity<br />

Me<strong>an</strong> SD<br />

CBO 7.59 7.57<br />

IC CC 4.27 7.11<br />

IC CM 8.45 10.87<br />

IC CD 20.45 32.14<br />

EC CC 3.85 4.30<br />

EC CM 7.54 9.45<br />

EC CD 25.45 28.45<br />

Xal<strong>an</strong><br />

Me<strong>an</strong> SD<br />

CBO 8.98 9.92<br />

IC CC 4.03 4.61<br />

IC CM 8.54 8.99<br />

IC CD 35.45 38.14<br />

EC CC 2.85 3.60<br />

EC CM 6.54 7.56<br />

EC CD 42.15 45.12<br />

Ant<br />

Me<strong>an</strong> SD<br />

CBO 8.49 7.74<br />

IC CC 3.92 7.91<br />

IC CM 7.46 8.78<br />

IC CD 16.75 17.25<br />

EC CC 2.43 3.51<br />

EC CM 7.04 7.54<br />

EC CD 21.23 20.56<br />

Table 4.1: Descriptive statistic results for all programs


4.3. Results 62<br />

<strong>an</strong>alysed.<br />

Analysing the definitions <strong>of</strong> the measures that exhibit high loadings in PC1, PC2<br />

<strong>an</strong>d PC3 yields the following interpretation <strong>of</strong> the <strong>coupling</strong> dimensions:<br />

• P C1 = {IC CC, IC CD, IC CM}, the <strong>run</strong>-<strong>time</strong> import <strong>coupling</strong> <strong>metrics</strong> as<br />

illustrated by Figure 4.1(a).<br />

• P C2 = {EC CC, EC CD, EC CM}, the <strong>run</strong>-<strong>time</strong> export <strong>coupling</strong> <strong>metrics</strong><br />

as illustrated by Figure 4.1(b).<br />

• P C3 = {CBO}, the static <strong>coupling</strong> metric as illustrated by Figure 4.1(c).<br />

Figure 4.1 summarises these results graphically. Overall the PCA results demonstrate<br />

that the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> are not redund<strong>an</strong>t with the static CBO<br />

metric <strong>an</strong>d that they capture additional dimensions <strong>of</strong> <strong>coupling</strong>. This leads us to<br />

reject our null hypothesis H 0 , to say that <strong>run</strong>-<strong>time</strong> measures for <strong>coupling</strong> are not<br />

simply surrogate measures for the static CBO metric, suggesting that additional<br />

information over <strong>an</strong>d above the information obtainable from the static CBO <strong>metrics</strong><br />

c<strong>an</strong> be extracted using <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>. This confirms the findings <strong>of</strong> Arisholm et<br />

al. for the single Velocity program are applicable across a variety <strong>of</strong> programs.<br />

The results also indicate that the direction <strong>of</strong> <strong>coupling</strong> is a greater determining<br />

factor th<strong>an</strong> the type <strong>of</strong> <strong>coupling</strong>, with P C1 containing the three import-based<br />

<strong>metrics</strong> <strong>an</strong>d P C2 containing the three export-based <strong>metrics</strong>.<br />

4.3.2 Experiment 2: The influence <strong>of</strong> instruction coverage<br />

Multiple Regression Analysis<br />

Multiple regression <strong>an</strong>alysis is used to test the hypothesis that instruction coverage<br />

<strong>of</strong> test cases used to evaluate a program has no influence on the relationship between<br />

static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>. The two independent variables are thus the static CBO<br />

metric <strong>an</strong>d the instruction coverage measure I c ; each <strong>of</strong> the six <strong>run</strong>-<strong>time</strong> <strong>coupling</strong><br />

<strong>metrics</strong> in turn is then used as the dependent variable. A full list <strong>of</strong> these results<br />

c<strong>an</strong> be found in Appendix A.2.


4.3. Results 63<br />

(a) Results from PCA for IC CC, IC CM <strong>an</strong>d IC CD<br />

(b) Results from PCA for EC CC, EC CM <strong>an</strong>d EC CD<br />

(c) Results from PCA for CBO<br />

Figure 4.1: PCA test results for all programs for <strong>metrics</strong> in PC1, PC2 <strong>an</strong>d PC3. In<br />

all graphs the bars represents the PCA value obtained for the corresponding metric.<br />

PC1 contains import level <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>. PC2 contains the export level <strong>run</strong>-<strong>time</strong><br />

<strong>metrics</strong> <strong>an</strong>d PC3 contain the static CBO metric.


4.3. Results 64<br />

First, all R values turned out to be positive for each <strong>of</strong> the programs used in<br />

this <strong>study</strong>. This me<strong>an</strong>s that there is a positive correlation between the dependent<br />

(<strong>run</strong>-<strong>time</strong> metric) <strong>an</strong>d independent variables CBO <strong>an</strong>d I c . Therefore as the values<br />

for CBO <strong>an</strong>d I c increase or decrease so will the observed value for the <strong>run</strong>-<strong>time</strong><br />

metric under consideration.<br />

Figures 4.2(a) <strong>an</strong>d 4.2(b) give a pictorial view <strong>of</strong> the results from the multiple<br />

regression <strong>an</strong>alysis for all programs for class-level <strong>run</strong>-<strong>time</strong> <strong>coupling</strong>, <strong>an</strong>d Figures<br />

4.3(a) <strong>an</strong>d 4.3(b) for method-level <strong>run</strong>-<strong>time</strong> <strong>coupling</strong>. The lighter bars represent the<br />

influence <strong>of</strong> CBO, while the the darker bars represent the influence <strong>of</strong> both CBO<br />

<strong>an</strong>d I c . Therefore the difference between these gives the additional amount <strong>of</strong> the<br />

variation <strong>of</strong> the <strong>run</strong>-<strong>time</strong> metric that c<strong>an</strong> be allocated to the influence <strong>of</strong> instruction<br />

coverage.<br />

Distinct Classes: IC CC <strong>an</strong>d EC CC<br />

It is immediately apparent from Figures 4.2(a) <strong>an</strong>d 4.2(b) that the instruction coverage<br />

is a signific<strong>an</strong>t influencing factor. For example, from Figure 4.2(a) it c<strong>an</strong> be<br />

seen that in ten <strong>of</strong> the programs, I c accounts for <strong>an</strong> extra additional 20% variation.<br />

Two <strong>of</strong> the programs in Figure 4.2(a), MST <strong>an</strong>d Voroni, that show little increase,<br />

already exhibit a high correlation with CBO alone that would have been difficult<br />

to improve on. While the increase is not uniform throughout the programs in Figure<br />

4.2(a), the overall data demonstrates that instruction coverage is <strong>an</strong> import<strong>an</strong>t<br />

contributory factor.<br />

Figure 4.2(b), representing the contribution <strong>of</strong> CBO <strong>an</strong>d I c to export <strong>coupling</strong><br />

measured at the class level, presents a sharper contrast. Here, the influence <strong>of</strong> I c<br />

is clearly a vital contributing factor, accounting for at least <strong>an</strong> extra 20% <strong>of</strong> the<br />

variation in eleven <strong>of</strong> the seventeen programs. The import<strong>an</strong>t factor here is that<br />

the overall contribution <strong>of</strong> CBO to export <strong>coupling</strong> is much lower th<strong>an</strong> to import<br />

<strong>coupling</strong>, as c<strong>an</strong> be seen from contrasting the lighter-shaded bars in Figure 4.2(a)<br />

with those in Figure 4.2(b). Thus classes with a high level <strong>of</strong> static <strong>coupling</strong> exhibit<br />

a higher level <strong>of</strong> import <strong>coupling</strong> at <strong>run</strong>-<strong>time</strong>. This indicates that the <strong>coupling</strong> being<br />

exercised at <strong>run</strong>-<strong>time</strong> is from classes behaving as clients, making use <strong>of</strong> other class


4.3. Results 65<br />

(a) Results from the multiple linear regression where Y = IC CC.<br />

(b) Results from the multiple linear regression where Y = EC CC.<br />

Figure 4.2: Multiple linear regression results for class-level <strong>metrics</strong> (IC CC <strong>an</strong>d<br />

EC CC). In both graphs the lighter bars represents the R 2 value for CBO, <strong>an</strong>d the<br />

darker bars represents the R 2 value for CBO <strong>an</strong>d I c combined.


4.3. Results 66<br />

(a) Results from the multiple linear regression where Y = IC CM<br />

(b) Results from the multiple linear regression where Y = EC CM<br />

Figure 4.3: Multiple linear regression results for method-level <strong>metrics</strong> (IC CM <strong>an</strong>d<br />

EC CM). In both graphs the lighter bars represents the R 2 value for CBO, <strong>an</strong>d<br />

the darker bars represents the R 2 value for CBO <strong>an</strong>d I c combined.


4.3. Results 67<br />

methods, rather th<strong>an</strong> those behaving as servers, <strong>of</strong>fering their methods for use by<br />

others. The greater influence <strong>of</strong> I c in export <strong>coupling</strong> results from there being less<br />

<strong>of</strong> a drop in its influence between IC CC <strong>an</strong>d EC CC, suggesting that instruction<br />

coverage, as a predictor <strong>of</strong> <strong>coupling</strong>, is not as sensitive to the direction <strong>of</strong> that<br />

<strong>coupling</strong>.<br />

Distinct Methods: IC CM <strong>an</strong>d EC CM<br />

The results for the IC CM <strong>an</strong>d EC CM, illustrated by Figures 4.3(a) <strong>an</strong>d 4.3(b),<br />

present a similar picture. Both <strong>of</strong> these <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> are scaled by the number<br />

<strong>of</strong> methods involved in the <strong>coupling</strong> relationship. Given that CBO is defined on a<br />

class level, it does surprisingly well in influencing the IC CM metric. Instruction<br />

coverage is also defined at a class level, but nonetheless accounts for roughly <strong>an</strong><br />

extra 20% <strong>of</strong> the vari<strong>an</strong>ce for five programs, <strong>an</strong>d roughly <strong>an</strong> extra 10% for five other<br />

programs. The drop between import <strong>an</strong>d export <strong>coupling</strong> is accentuated here, but<br />

while Figure 4.3(b) shows CBO proving a bad predictor for EC CM, instruction<br />

coverage dramatically improves this for over half the programs studied.<br />

Overall, these results show that coverage has a signific<strong>an</strong>t impact on the correlation<br />

between static CBO <strong>an</strong>d the four <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> defined for distinct<br />

classes <strong>an</strong>d distinct methods.<br />

Run-<strong>time</strong> Messages: IC CD <strong>an</strong>d EC CD<br />

The <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> IC CD <strong>an</strong>d EC CD did not exhibit a signific<strong>an</strong>t relationship<br />

for <strong>an</strong>y <strong>of</strong> the programs under consideration <strong>an</strong>d thus are not depicted graphically<br />

here. As these <strong>metrics</strong> are defined in terms <strong>of</strong> a count <strong>of</strong> the number <strong>of</strong> distinct <strong>time</strong>s<br />

a method was executed, this result was not surprising. It is reasonable to postulate<br />

that such <strong>metrics</strong> might be more influenced by the “hotness” <strong>of</strong> a particular method,<br />

<strong>an</strong>d the distribution <strong>of</strong> execution focus through the program, rather th<strong>an</strong> instruction<br />

coverage data. This was the result we expected for the measures based on the number<br />

<strong>of</strong> dynamic method calls.


4.4. Conclusion 68<br />

4.4 Conclusion<br />

From our experimental data, using principal component <strong>an</strong>alysis, we showed that<br />

<strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> captured different properties th<strong>an</strong> static CBO <strong>an</strong>d therefore<br />

are not simply surrogate measures for CBO. This indicated that useful information<br />

beyond that which is provided by CBO may be obtained through the use <strong>of</strong><br />

these <strong>run</strong>-<strong>time</strong> measures.<br />

Second, we found that the coverage <strong>of</strong> test cases used to evaluate a program had<br />

a signific<strong>an</strong>t impact on the correlation between CBO <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong><br />

<strong>an</strong>d thus should be a measured, recorded factor in <strong>an</strong>y comparison made. We found<br />

that instruction coverage <strong>an</strong>d CBO were a better predictor <strong>of</strong> the <strong>run</strong>-<strong>time</strong> metics<br />

based on distinct class (IC CC, EC CC) <strong>an</strong>d distinct method counts (IC CM,<br />

EC CM) th<strong>an</strong> CBO alone. The results in Appendix A.2 show the results from the<br />

Fishers F test which illustrate that all results were statistically signific<strong>an</strong>t at the 5%<br />

level <strong>of</strong> signific<strong>an</strong>ce..


Chapter 5<br />

Case Study 2: The Impact <strong>of</strong><br />

Run-<strong>time</strong> Cohesion on Object<br />

Behaviour<br />

In this <strong>study</strong> we present <strong>an</strong> investigation into the <strong>run</strong>-<strong>time</strong> behaviour <strong>of</strong> objects<br />

in Java programs <strong>an</strong>d whether <strong>cohesion</strong> <strong>metrics</strong> are a good predictor <strong>of</strong> object behaviour.<br />

Based on the definition <strong>of</strong> static CBO it would be expected that objects<br />

derived from the same class would exhibit similar <strong>coupling</strong> behaviour, that is, that<br />

they would be coupled to the same classes <strong>an</strong>d make the same accesses. It is unknown<br />

whether static CBO provides a true measure <strong>of</strong> <strong>coupling</strong> between objects, or<br />

whether it is restricted to being a measure <strong>of</strong> the level <strong>of</strong> <strong>coupling</strong> between classes.<br />

To this end, a measure, the Number <strong>of</strong> Object-Class Clusters (N OC ), is proposed<br />

in <strong>an</strong> attempt to <strong>an</strong>alyse <strong>run</strong>-<strong>time</strong> object behaviour. This measure is derived from<br />

a statistical <strong>an</strong>alysis <strong>of</strong> <strong>run</strong>-<strong>time</strong> object-level <strong>coupling</strong> <strong>metrics</strong>. Cluster <strong>an</strong>alysis is<br />

used to group objects together based on the similarity <strong>of</strong> the accesses they make to<br />

other classes. Therefore one would expect objects from the same class to occupy the<br />

same cluster. If more th<strong>an</strong> one cluster is found for a class then it is reasonable to<br />

postulate that the class has objects that are behaving differently at <strong>run</strong>-<strong>time</strong> from<br />

the point <strong>of</strong> view <strong>of</strong> <strong>coupling</strong>. A selection <strong>of</strong> programs are <strong>an</strong>aylsed to determine if<br />

this is the case.<br />

The second part <strong>of</strong> this <strong>study</strong> involves determining the predictive ability <strong>of</strong> cohe-<br />

69


5.1. Goals <strong>an</strong>d Hypotheses 70<br />

sion <strong>metrics</strong> (both static <strong>an</strong>d <strong>run</strong>-<strong>time</strong>) to forecast object behaviour, in other words,<br />

how well they indicate the N OC for a class. First, the differences in the underlying<br />

dimensions <strong>of</strong> <strong>cohesion</strong> captured by the static versus the <strong>run</strong>-<strong>time</strong> measures<br />

are assessed using principal component <strong>an</strong>alysis. Subsequently, multiple regression<br />

<strong>an</strong>alysis is used to <strong>study</strong> the predictive ability <strong>of</strong> <strong>cohesion</strong> <strong>metrics</strong> to extrapolate<br />

N OC for a class. We also wish to determine if a <strong>run</strong>-<strong>time</strong> definition <strong>of</strong> <strong>cohesion</strong> is a<br />

better predictor <strong>of</strong> N OC th<strong>an</strong> the static S LCOM version alone.<br />

5.1 Goals <strong>an</strong>d Hypotheses<br />

The GQM/MEDEA framework was used to set up the experiments for this <strong>study</strong>.<br />

Experiment 1:<br />

Goal: To determine if objects from the same class behave differently at <strong>run</strong><strong>time</strong><br />

from the point <strong>of</strong> view <strong>of</strong> <strong>coupling</strong>.<br />

Perspective: We investigate the behaviour <strong>of</strong> objects at <strong>run</strong>-<strong>time</strong> with respect<br />

to <strong>coupling</strong> using a number <strong>of</strong> <strong>metrics</strong> which measure the level <strong>of</strong> <strong>coupling</strong> at different<br />

layers <strong>of</strong> gr<strong>an</strong>ularity. We use a number <strong>of</strong> statistical techniques capable <strong>of</strong><br />

separating objects from a class into groups based on their similarity.<br />

Environment: Since we are <strong>study</strong>ing object behaviour, a set <strong>of</strong> Java programs<br />

which create a large number <strong>of</strong> objects at <strong>run</strong>-<strong>time</strong> are used. These are supplemented<br />

with a number <strong>of</strong> real-world programs to ensure the results are scalable to<br />

genuine programs.<br />

Hypothesis:<br />

H 0 : Objects from a class behave similarly at <strong>run</strong>-<strong>time</strong> from the point <strong>of</strong> view <strong>of</strong><br />

<strong>coupling</strong><br />

H 1 : Objects from a class behave differently at <strong>run</strong>-<strong>time</strong> from the point <strong>of</strong> view<br />

<strong>of</strong> <strong>coupling</strong>


5.2. Experimental Design 71<br />

Experiment 2:<br />

Goal: To determine if a <strong>run</strong>-<strong>time</strong> definition for <strong>cohesion</strong> gives <strong>an</strong>y additional<br />

information about class behaviour over <strong>an</strong>d above the st<strong>an</strong>dard static definition.<br />

Perspective: Within a highly cohesive class the components <strong>of</strong> the class are<br />

functionally related, with a class that exhibits low <strong>cohesion</strong>, they are not. Intuitively,<br />

one would expect the more cohesive the class the lower the N OC for a class.<br />

We use a number <strong>of</strong> statistical techniques, including PCA <strong>an</strong>d regression <strong>an</strong>alysis to<br />

determine if there is a signific<strong>an</strong>t correlation between static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong><br />

<strong>an</strong>d N OC . We also wish to determine if <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> is a better predictor <strong>of</strong><br />

N OC th<strong>an</strong> the static version alone.<br />

Environment: Since we are <strong>study</strong>ing object behaviour a set <strong>of</strong> Java programs<br />

which create a large number <strong>of</strong> objects at <strong>run</strong>-<strong>time</strong> are used. These are supplemented<br />

with a number <strong>of</strong> real-world programs to ensure the results are scalable to<br />

genuine programs.<br />

Hypothesis:<br />

H 0<br />

: Run-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> do not provide additional information about<br />

class behaviour over <strong>an</strong>d above that provided by static S LCOM<br />

H 1 : Run-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> provide additional information about class behaviour<br />

over <strong>an</strong>d above that provided by static S LCOM<br />

5.2 Experimental Design<br />

For this <strong>study</strong> it was necessary to calculate:<br />

• the <strong>run</strong>-<strong>time</strong> object-level <strong>coupling</strong> metric: IC OC<br />

• the Number <strong>of</strong> Object-Class Clusters: N OC<br />

• the static S LCOM


5.2. Experimental Design 72<br />

GreyNode QuadT reeNode W hiteNode<br />

BlackNode 1 0 2 0<br />

BlackNode 2 0 2 0<br />

BlackNode 3 0 2 0<br />

BlackNode 4 0 2 0<br />

Table 5.1: Matrix <strong>of</strong> unique accesses per object, for objects<br />

BlackNode 1 , . . . , BlackNode 4 to classes GreyNode, QuadTreeNode <strong>an</strong>d WhiteNode<br />

• the <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong>: R LCOM , RW LCOM<br />

IC OC was calculated using the object-level <strong>run</strong>-<strong>time</strong> metric <strong>an</strong>alysis tool Ob-<br />

Met which is described in Chapter 3.2.2. In order to test the first hypothesis the<br />

coefficient <strong>of</strong> variation, C V , was calculated for the IC OC results to determine how<br />

the IC OC values varied across the objects <strong>of</strong> a class. If the C V for all classes under<br />

consideration is zero then this would lead us to accept the null hypothesis, H 0 , as all<br />

objects <strong>of</strong> this classes would be accessing the same variables. However, if there was<br />

variation in the IC OC values, C V > 0, this would lead us to reject H 0 <strong>an</strong>d accept<br />

H 1 , as the objects would be behaving differently at <strong>run</strong>-<strong>time</strong> from the point <strong>of</strong> view<br />

<strong>of</strong> <strong>coupling</strong>.<br />

To determine the N OC for a class, one class is fixed <strong>an</strong>d the distribution <strong>of</strong><br />

unique accesses per object is determined. A matrix <strong>of</strong> such values for each class<br />

in the program under consideration is constructed. Table 5.1 gives <strong>an</strong> example <strong>of</strong><br />

such a matrix, where we record the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> values for individual objects<br />

<strong>of</strong> class BlackNode, BlackNode 1 , . . . , BlackNode 4 , against the classes GreyNode,<br />

QuadTreeNode <strong>an</strong>d WhiteNode. This data is statistically <strong>an</strong>alysed using cluster<br />

<strong>an</strong>alysis to evaluate the behaviour <strong>of</strong> the objects. This technique groups objects<br />

together based on their similarity. The number <strong>of</strong> clusters are determined <strong>an</strong>d this<br />

becomes the N OC for that class. In order to accept H 0 we would expect objects from<br />

the same class to group together <strong>an</strong>d occupy the same cluster, therefore expecting<br />

values <strong>of</strong> N OC to be 1. The formation <strong>of</strong> a number <strong>of</strong> different clusters, where N OC<br />

> 1, would lead us to reject H 0 <strong>an</strong>d accept H 1 .


5.3. Results 73<br />

JOlden Benchmark Suite<br />

BH<br />

Em3d<br />

Health<br />

MST<br />

Me<strong>an</strong> SD<br />

Me<strong>an</strong> SD<br />

Me<strong>an</strong> SD<br />

Me<strong>an</strong> SD<br />

IC OC 1.83 2.74 IC OC 1 0.5 IC OC 2.5 1.84 IC OC 2 1.54<br />

N OC 2 0.52 N OC 6 −<br />

N OC 2.5 1.29 N OC 2.5 2.12<br />

S LCOM 0.317 0.30 S LCOM 0.317 0.223 S LCOM 0.318 0.223 S LCOM 0.163 0.283<br />

R LCOM 0.144 0.287 R LCOM 0.190 0.381 R LCOM 0.171 0.189 R LCOM 0.111 0.172<br />

RW LCOM 0.248 0.226 RW LCOM 0.472 0.572 RW LCOM 0.335 0.356 RW LCOM 0.252 0.154<br />

Perimeter<br />

Power<br />

Voronoi<br />

Me<strong>an</strong> SD<br />

IC OC 2.25 2.6<br />

N OC 2.5 1.73<br />

S LCOM 0.136 0.275<br />

R LCOM 0.104 0.285<br />

RW LCOM 0.132 0.254<br />

Me<strong>an</strong> SD<br />

IC OC 1.66 1.88<br />

N OC 2 1.73<br />

S LCOM 0.151 0.199<br />

R LCOM 0.083 0.204<br />

RW LCOM 0.155 0.134<br />

Me<strong>an</strong> SD<br />

IC OC 2 2.12<br />

N OC 4.5 0.71<br />

S LCOM 0.373 0.238<br />

R LCOM 0.265 0.363<br />

RW LCOM 0.448 0.438<br />

Real-World Programs<br />

Velocity<br />

Me<strong>an</strong> SD<br />

IC OC 6.14 7.21<br />

N OC 5.1 2.45<br />

S LCOM 0.314 0.385<br />

R LCOM 0.154 0.254<br />

RW LCOM 0.398 0.454<br />

Xal<strong>an</strong><br />

Me<strong>an</strong> SD<br />

IC OC 7.45 8.21<br />

N OC 6.7 3.45<br />

S LCOM 0.251 0.305<br />

R LCOM 0.198 0.241<br />

RW LCOM 0.354 0.484<br />

Ant<br />

Me<strong>an</strong> SD<br />

IC OC 8.11 8.65<br />

N OC 7.2 2.56<br />

S LCOM 0.333 0.31<br />

R LCOM 0.247 0.208<br />

RW LCOM 0.387 0.355<br />

Table 5.2: Descriptive statistic results for all programs<br />

The static <strong>metrics</strong> data collection tool StatMet, described in Section 3.2.3, was<br />

used to calculate S LCOM . The ClMet tool, described in Chapter 3.2.1, was used to<br />

calculate the <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong>.<br />

The <strong>an</strong>alysis was conducted on the programs from the JOlden benchmark suite<br />

as well as the real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant. Three <strong>of</strong> the classes<br />

BiSort, TSP <strong>an</strong>d TreeAdd contain too few class to perform PCA <strong>an</strong>d regression<br />

<strong>an</strong>alysis, therefore are excluded from further <strong>an</strong>alysis. The SPECjvm98 benchmark<br />

programs that were used in the previous <strong>study</strong> were excluded from this <strong>an</strong>alysis as<br />

they did not exhibit signific<strong>an</strong>t volumes <strong>of</strong> object creation.<br />

5.3 Results<br />

Table 5.2 summarises the descriptive statistic results for each program. The measures<br />

all exhibited large vari<strong>an</strong>ces which makes them suitable c<strong>an</strong>didates for further<br />

<strong>an</strong>alysis.


5.3. Results 74<br />

5.3.1 Experiment 1: To determine if objects from the same<br />

class behave differently at <strong>run</strong>-<strong>time</strong> from the point <strong>of</strong><br />

view <strong>of</strong> <strong>coupling</strong><br />

IC OC Results<br />

The IC OC metric is used to investigate whether objects <strong>of</strong> the same class type are<br />

coupled to the same classes at <strong>run</strong>-<strong>time</strong>. The first thing to look at is the C V results<br />

for the IC OC metric, as depicted by Figure 5.1. If all objects from the same class<br />

are behaving in a similar fashion we would expect them to make accesses to the<br />

same classes at <strong>run</strong>-<strong>time</strong>. Consequently, there should be little or no variability in<br />

the IC OC values for objects from the same class, for example, two classes from<br />

BH had C V <strong>of</strong> 0. However, for the classes from the set <strong>of</strong> programs studied, the<br />

C V varied from 0% to 54.2%. In the cases where the C V > 0, we have classes with<br />

objects that are coupled to different classes at <strong>run</strong>-<strong>time</strong>. A class might create one<br />

group <strong>of</strong> objects that access one set <strong>of</strong> classes <strong>an</strong>d <strong>an</strong>other that access a different set.<br />

So we have a number <strong>of</strong> objects from the same class that are behaving differently at<br />

<strong>run</strong>-<strong>time</strong> at the class-class level. Due to these results, at the class-class level, we c<strong>an</strong><br />

reject H 0 <strong>an</strong>d accept H 1 . One c<strong>an</strong>not observe such behaviour simply by calculating<br />

the static CBO value for that class.<br />

N OC Results<br />

Figure 5.2 illustrates the N OC values for the programs under consideration. The N OC<br />

values r<strong>an</strong>ge from one to seven <strong>an</strong>d the bars represent the number <strong>of</strong> classes from<br />

each program that exhibit that value. Since cluster <strong>an</strong>alysis groups objects together<br />

based on the similarity <strong>of</strong> the accesses they make to other classes one would expect<br />

objects from the same class to occupy the same cluster (N OC = 1). This was the<br />

case for a large proportion <strong>of</strong> the classes under consideration, for example 50% <strong>of</strong> the<br />

classes from the program BH from the JOlden suite exhibited <strong>an</strong> N OC <strong>of</strong> 1. Similar<br />

results were obtained with the real-world programs with N OC = 1 for 51% <strong>of</strong> classes<br />

from Velocity, 49% from Xal<strong>an</strong> <strong>an</strong>d 48% from Ant. However, there were inst<strong>an</strong>ces<br />

where more th<strong>an</strong> one cluster was found for a class, for example 50% <strong>of</strong> the classes


5.3. Results 75<br />

IC_OC C V Results<br />

50% - 60%<br />

40% - 50%<br />

Cv<br />

30% - 40%<br />

20% - 30%<br />

10% - 20%<br />

Ant<br />

Xal<strong>an</strong><br />

Velocity<br />

Voronoi<br />

Power<br />

Perimeter<br />

MST<br />

Health<br />

Em3d<br />

BH<br />

0% -10%<br />

0%<br />

0 10 20 30 40 50 60<br />

No. Classes<br />

Figure 5.1: C V <strong>of</strong> IC OC for classes from the programs studied. The bars represent<br />

the number <strong>of</strong> classes in each program that have C V in the corresponding r<strong>an</strong>ge.


5.3. Results 76<br />

Figure 5.2: N OC results <strong>of</strong> cluster <strong>an</strong>alysis. The bars represent the number <strong>of</strong> classes<br />

in each program that have the corresponding N OC value.<br />

from Perimeter from the JOlden suite had N OC = 4. When more th<strong>an</strong> one cluster<br />

is found we have the situtation where a single class is creating groups <strong>of</strong> objects<br />

that are exhibiting different behaviours at <strong>run</strong>-<strong>time</strong>. This leads us to reject H 0 <strong>an</strong>d<br />

accept H 1 to state that objects from a class c<strong>an</strong> behave differently at <strong>run</strong>-<strong>time</strong> from<br />

the point <strong>of</strong> view <strong>of</strong> <strong>coupling</strong>.<br />

Looking at Figures 5.1 <strong>an</strong>d 5.2 there seems to be a relationship between the C V<br />

<strong>an</strong>d the number <strong>of</strong> clusters with both graphs being markedly similar. In m<strong>an</strong>y cases<br />

a high C V leads to >1 clusters. Intuitively this would make sense as it is easy to<br />

see how variation in the number <strong>of</strong> classes used by <strong>an</strong> object would lead to variation<br />

in the variables they use <strong>an</strong>d consequently leading to a number <strong>of</strong> groups <strong>of</strong> objects<br />

behaving differently.<br />

From these findings, it is suggested that the static CBO metric would be better<br />

defined as <strong>coupling</strong> between classes as it does not necessarily give a true measure <strong>of</strong><br />

<strong>run</strong>-<strong>time</strong> <strong>coupling</strong> between objects.


5.3. Results 77<br />

5.3.2 Experiment 2: The influence <strong>of</strong> <strong>cohesion</strong> on the N OC<br />

The following statistical <strong>an</strong>alysis is applied to determine first, if <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong><br />

<strong>metrics</strong> are redund<strong>an</strong>t with respect to S LCOM <strong>an</strong>d second, if <strong>cohesion</strong> <strong>metrics</strong> are<br />

good predictors <strong>of</strong> N OC .<br />

Principal Component Analysis<br />

Initially, we investigate the relationship between static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong>.<br />

We use PCA to determine if the static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> are<br />

likely to be measuring the same class property, in other words it is used to examine<br />

whether the <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> are not simply surrogate measures for static<br />

S LCOM .<br />

Appendix B.1 shows the results <strong>of</strong> the principal component <strong>an</strong>alysis when all<br />

<strong>of</strong> the <strong>cohesion</strong> <strong>metrics</strong> are taken into consideration. Using the Kaiser criterion to<br />

select the number <strong>of</strong> factors to retain it is found that the <strong>metrics</strong> mostly capture<br />

two orthogonal dimensions in the sample space formed by all measures. In other<br />

words, <strong>cohesion</strong> is divided along two dimensions for each <strong>of</strong> the programs <strong>an</strong>alysed.<br />

Analysing the definitions <strong>of</strong> the measures that exhibit high loadings in PC1 <strong>an</strong>d<br />

PC2 yields the following interpretation <strong>of</strong> the <strong>cohesion</strong> dimensions:<br />

• P C1 = {S LCOM }, the static <strong>cohesion</strong> metric.<br />

• P C2 = {R LCOM , RW LCOM }, the <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong>.<br />

Figure 5.3 summarises these results graphically. The PCA findings from this<br />

<strong>study</strong> indicate that no signific<strong>an</strong>t information about the cohesiveness <strong>of</strong> a class<br />

c<strong>an</strong> be gained by evaluating the RW LCOM instead <strong>of</strong> the simpler R LCOM , as both<br />

<strong>metrics</strong> belonged to the same principal component. This me<strong>an</strong>s not enough vari<strong>an</strong>ce<br />

is captured by the RW LCOM that is not accounted for by R LCOM .<br />

However, the PCA results indicate that R LCOM is not redund<strong>an</strong>t with respect to<br />

S LCOM <strong>an</strong>d that it captures additional information about <strong>cohesion</strong>. The values show<br />

that R LCOM is not simply <strong>an</strong> alternative static measure. Clearly, the simple static<br />

calculation <strong>of</strong> S LCOM masks a considerable amount <strong>of</strong> detail available at <strong>run</strong>-<strong>time</strong>.


5.3. Results 78<br />

PCA Test Results for all Programs for Metrics in PC1<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

0.6<br />

PCA Value<br />

0.5<br />

0.4<br />

RLCOM<br />

RWLCOM<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

BH Em3d Health MST Perimeter Power Voronoi Velocity Xal<strong>an</strong> Ant<br />

Program<br />

PCA Test Results for all Programs for Metrics in PC2<br />

1<br />

0.9<br />

0.8<br />

0.7<br />

PCA Value<br />

0.6<br />

0.5<br />

0.4<br />

SLCOM<br />

0.3<br />

0.2<br />

0.1<br />

0<br />

BH Em3d Health MST Perimeter Power Voronoi Velocity Xal<strong>an</strong> Ant<br />

Program<br />

Figure 5.3: PCA Test Results for all programs for <strong>metrics</strong> in PC1 <strong>an</strong>d PC2. In both<br />

graphs the bars represents the PCA value obtained for the corresponding metric.<br />

PC1 contains R LCOM <strong>an</strong>d RW LCOM . PC2 contains S LCOM .


5.3. Results 79<br />

Multiple Regression Analysis<br />

Next we wish to discover if <strong>cohesion</strong> <strong>metrics</strong> are good predictors <strong>of</strong> object behaviour,<br />

that is, c<strong>an</strong> they be used to deduct the N OC for a class. Multiple regression <strong>an</strong>alysis<br />

is used for this purpose. In this case the dependent variable is the N OC , while the<br />

independent variables are the static S LCOM <strong>an</strong>d the <strong>run</strong>-<strong>time</strong> R LCOM <strong>an</strong>d RW LCOM<br />

<strong>cohesion</strong> <strong>metrics</strong>. Appendix B.2 gives the results from this <strong>an</strong>alysis.<br />

First, the results show that there is a positive correlation between the N OC<br />

(dependent variable) <strong>an</strong>d the static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> measures (independent<br />

variables), as all R values were positive. This me<strong>an</strong>s that as the value for S LCOM ,<br />

R LCOM <strong>an</strong>d RW LCOM increases/decreases so will the observed value for N OC . Intuitively<br />

this would make sense, as one would expect the more cohesive the class, that<br />

is the lower the LCOM value is, the more the class it is geared toward performing a<br />

single function. Therefore one would expect the number <strong>of</strong> clusters to be low also.<br />

Figure 5.4 summarises the results <strong>of</strong> the regression <strong>an</strong>alysis for each <strong>of</strong> the programs<br />

<strong>an</strong>alysed. The lighter bars represent the influence <strong>of</strong> S LCOM , while the darker<br />

bars depict the influence <strong>of</strong> both S LCOM <strong>an</strong>d R LCOM . The difference between the two<br />

indicates the additional amount <strong>of</strong> variation that c<strong>an</strong> be allocated to the <strong>run</strong>-<strong>time</strong><br />

<strong>cohesion</strong> metric.<br />

It is apparent from this graph that the R LCOM is a signific<strong>an</strong>t factor influencing<br />

N OC , for example for the three real-world programs R LCOM accounts for approximately<br />

<strong>an</strong> additional 30% variation, while five <strong>of</strong> the benchmarks exhibit a similar<br />

result. For eight out <strong>of</strong> the ten programs studied R LCOM was a better predictor <strong>of</strong><br />

N OC th<strong>an</strong> S LCOM .<br />

Overall, these results show that <strong>cohesion</strong> <strong>metrics</strong> are a good predictor <strong>of</strong> N OC ,<br />

with <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> being the superior metric. This leads us to reject our null<br />

hypothesis <strong>an</strong>d state that <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> provide additional information<br />

about class behaviour over <strong>an</strong>d above that provided by static S LCOM .<br />

Only one program exhibited a signific<strong>an</strong>t result when using the RW LCOM measure,<br />

therefore the results have not been summarised graphically. This could be due<br />

to the fact that the metric is defined on a call-weighted basis, which may skew the<br />

results.


5.3. Results 80<br />

Figure 5.4: Results from multiple linear regression where Y=N OC . The lighter bars<br />

represent the R 2 for S LCOM , <strong>an</strong>d the darker bars represent the R 2 value for S LCOM<br />

<strong>an</strong>d R LCOM combined.


5.4. Conclusion 81<br />

5.4 Conclusion<br />

From this case <strong>study</strong>, we found that <strong>run</strong>-<strong>time</strong> object-level <strong>coupling</strong> <strong>metrics</strong> could be<br />

used to investigate object behaviour. Using the IC OC <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> measure<br />

we discovered that objects from the same class exhibited different behaviours at<br />

<strong>run</strong>-<strong>time</strong> from the point <strong>of</strong> view <strong>of</strong> <strong>coupling</strong>. Object behaviour was identified by<br />

defining a new metric N OC which groups objects together based on their <strong>run</strong>-<strong>time</strong><br />

<strong>coupling</strong> properties.<br />

We defined a number <strong>of</strong> <strong>metrics</strong> for evaluating <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong>. First, we<br />

proved that these measures were not redund<strong>an</strong>t with respect to the static LCOM<br />

measure <strong>an</strong>d that they captured additional dimensions <strong>of</strong> <strong>cohesion</strong>. Next, we investigated<br />

the impact <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> on object behaviour using regression<br />

<strong>an</strong>alysis <strong>an</strong>d proved that these <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> were good predictors <strong>of</strong><br />

object behaviour, as identified by the N OC measure. Appendix B.2 gives the results<br />

from this <strong>an</strong>alysis <strong>an</strong>d shows the Fishers F test results which state that all results<br />

were statistically signific<strong>an</strong>t the 5% level <strong>of</strong> signific<strong>an</strong>ce..


Chapter 6<br />

Case Study 3: A Study <strong>of</strong><br />

Run-<strong>time</strong> Coupling Metrics <strong>an</strong>d<br />

Fault Detection<br />

Fault-proneness detection is <strong>an</strong> interesting concept in m<strong>an</strong>y areas <strong>of</strong> s<strong>of</strong>tware engineering<br />

research. Quality <strong>an</strong>d mainten<strong>an</strong>ce effort control depend on the underst<strong>an</strong>ding<br />

<strong>of</strong> this concept. In previous years, a large volume <strong>of</strong> work has been performed<br />

in order to define suitable <strong>metrics</strong> <strong>an</strong>d models for fault detection [6, 13, 19, 41].<br />

Code coverage has been proposed to be <strong>an</strong> estimator for fault-proneness, but it<br />

remains a controversial topic which lacks support from <strong>empirical</strong> data [22]. In this<br />

case <strong>study</strong> we investigate whether instruction coverage is a signific<strong>an</strong>t predictor <strong>of</strong><br />

fault-proneness, <strong>an</strong> import<strong>an</strong>t s<strong>of</strong>tware quality indicator. This is done by taking<br />

a set <strong>of</strong> real-world programs, namely Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant, <strong>an</strong>d introducing<br />

faults into them using the mutation system µJava. Two kinds <strong>of</strong> mutations are<br />

introduced separately into the programs, traditional <strong>an</strong>d class-type mutations. We<br />

then determine the percentage mut<strong>an</strong>ts killed (M K ) by the set <strong>of</strong> test cases provided<br />

with the programs. Equation 6.1 gives the formula for M K . Regression <strong>an</strong>alysis is<br />

applied to determine if instruction coverage is a good predictor <strong>of</strong> fault-proneness,<br />

which is defined as the M K for the class for each type <strong>of</strong> mutation. From previous<br />

work we expect instruction coverage to be a good predictor <strong>of</strong> non object-oriented<br />

or traditional-type mut<strong>an</strong>ts [69].<br />

82


6.1. Goals <strong>an</strong>d Hypotheses 83<br />

M K =<br />

Number <strong>of</strong> mut<strong>an</strong>ts killed<br />

Total number <strong>of</strong> mut<strong>an</strong>ts created ∗ 100<br />

1<br />

(6.1)<br />

Next, we <strong>empirical</strong>ly validate a set <strong>of</strong> six <strong>run</strong>-<strong>time</strong> object-oriented <strong>metrics</strong> in<br />

terms <strong>of</strong> their usefulness in predicting fault-proneness. We use regression <strong>an</strong>alysis<br />

again to investigate the ability <strong>of</strong> these <strong>run</strong>-<strong>time</strong> measures in predicting M K for both<br />

types <strong>of</strong> mutations. From these two experiments we wish to discover if the <strong>run</strong>-<strong>time</strong><br />

measures for <strong>coupling</strong> are better predictors <strong>of</strong> fault-proneness th<strong>an</strong> the traditional<br />

coverage measure.<br />

6.1 Goals <strong>an</strong>d Hypotheses<br />

The GQM/MEDEA framework was used to set up the experiments for this <strong>study</strong>.<br />

Experiment 1:<br />

Goal: To examine the relationship between coverage <strong>an</strong>d fault detection, in the<br />

context <strong>of</strong> instruction coverage.<br />

Perspective: Code coverage has been proposed to be <strong>an</strong> estimator for testing<br />

effectiveness. Regression <strong>an</strong>alysis is used to assess if coverage is a better indicator<br />

<strong>of</strong> fault-proneness in comparison to the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong>. In particular we<br />

investigate whether it is a better detector <strong>of</strong> traditional or class-type mutations in<br />

programs.<br />

Environment: We chose to evaluate a selection <strong>of</strong> open source real-world programs.<br />

Each program comes with its own set <strong>of</strong> JUnit test cases, thus defining both<br />

the static <strong>an</strong>d dynamic context <strong>of</strong> our work.<br />

Hypothesis:<br />

H 0 : Coverage measures are poor detectors <strong>of</strong> faults in a program.<br />

H 1 : Coverage measures are good detectors <strong>of</strong> faults in a program.


6.2. Experimental Design 84<br />

Experiment 2:<br />

Goal: To examine the relationship between <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> <strong>an</strong>d fault<br />

detection.<br />

Perspective: Previous work has shown the static <strong>coupling</strong> measure CBO is a<br />

good detector <strong>of</strong> faults in programs [13]. Intuitively, one would expect <strong>run</strong>-<strong>time</strong> <strong>coupling</strong><br />

measures to give a better indication as they are based on <strong>an</strong> actual execution<br />

<strong>of</strong> the program. Regression <strong>an</strong>alysis is used to determine if there is a signific<strong>an</strong>t<br />

correlation.<br />

Environment: We chose to evaluate a selection <strong>of</strong> open source real-world programs.<br />

Each program comes with its own set <strong>of</strong> JUnit test cases, thus defining both<br />

the static <strong>an</strong>d dynamic context <strong>of</strong> our work.<br />

Hypothesis:<br />

H 0 : Run-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> are poor detectors <strong>of</strong> faults in a program.<br />

H 1 : Run-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> are good detectors <strong>of</strong> faults in a program.<br />

6.2 Experimental Design<br />

In order to conduct the practical experiments underlying this <strong>study</strong>, it was necessary<br />

to select a suite <strong>of</strong> Java programs <strong>an</strong>d measure:<br />

• the instruction coverage percentages: I C<br />

• the mutation coverage <strong>of</strong> test cases (mut<strong>an</strong>ts killed (M K ))<br />

• the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong>: IC CC, EC CC, IC CM, EC EM, IC CD, EC CD<br />

The InCov tool, described in Chapter 3.2.4, was used to determine I C . The<br />

<strong>run</strong>-<strong>time</strong> measures were evaluated using the ClMet tool. The mutation system<br />

µJava, described in Chapter 3.2.5, was used to insert both traditional <strong>an</strong>d class-


6.3. Results 85<br />

level mut<strong>an</strong>ts into the test case programs <strong>an</strong>d to determine the M K rates <strong>of</strong> the test<br />

cases supplied with the programs.<br />

Three open source real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant were evaluated in<br />

this <strong>study</strong>. The SPECjvm98 <strong>an</strong>d JOlden benchmark programs used in the previous<br />

studies exhibited very poor mut<strong>an</strong>t kill percentages when <strong>an</strong>alysed (most classes<br />

exhibited 0% mut<strong>an</strong>t kill rate) <strong>an</strong>d therefore were excluded from futher <strong>an</strong>alysis.<br />

6.3 Results<br />

Percentage Mut<strong>an</strong>t Kill Rate Results<br />

Figure 6.1 gives the percentages <strong>of</strong> mut<strong>an</strong>ts killed upon the execution <strong>of</strong> the JUnit<br />

test cases supplied with the programs <strong>an</strong>alysed. Looking at Figure 6.1(a) for the<br />

Velocity program, twenty-three classes exhibit a percentage kill rate <strong>of</strong> zero for the<br />

class-level mut<strong>an</strong>ts, while thirteen classes exhibit the same rate for the traditional<br />

mut<strong>an</strong>ts. At the other end <strong>of</strong> the spectrum for the class-level mut<strong>an</strong>ts, six classes<br />

exhibited a percentage kill rate <strong>of</strong> between 90% <strong>an</strong>d 100%, while seven classes exhibited<br />

the same kill rate for the traditional mut<strong>an</strong>ts.<br />

In their paper [66] Offutt et al. created test cases for the set <strong>of</strong> programs they<br />

studied by h<strong>an</strong>d so that 100% M K was achieved. To date no one has endeavoured<br />

to apply this mutation system to a set <strong>of</strong> real programs so there is no consensus on<br />

what a desirable M K rate would be.<br />

6.3.1 Experiment 1: To examine the relationship between<br />

instruction coverage <strong>an</strong>d fault detection.<br />

Regression Analysis<br />

We investigate the statistical relationship between instruction code coverage <strong>an</strong>d<br />

fault-proneness using regression <strong>an</strong>alysis. The dependent variable is the percentage<br />

mut<strong>an</strong>t kill rate M K , while the independent variable is the instruction coverage measure<br />

I c for each class. Both class <strong>an</strong>d traditional mut<strong>an</strong>ts are evaluated separately.<br />

Appendix C.2 gives the results from this <strong>an</strong>alysis.


6.3. Results 86<br />

(a) Results from Mutation Testing for Velocity.<br />

(b) Results from Mutation Testing for Xal<strong>an</strong>.<br />

(c) Results from Mutation Testing for Ant.<br />

Figure 6.1: Mutation test results for real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant.<br />

In all graphs the bars represents the number <strong>of</strong> classes that exhibit a percentage<br />

mut<strong>an</strong>t kill rate in the corresponding r<strong>an</strong>ge.


6.3. Results 87<br />

Figure 6.2: Regression <strong>an</strong>alysis results for the effectiveness <strong>of</strong> I c in predicting class<br />

<strong>an</strong>d traditional-level mutations in real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant. The<br />

bars represents the R 2 value for the <strong>run</strong>-<strong>time</strong> metric under consideration.<br />

Figure 6.2 depicts the results on the effects <strong>of</strong> instruction coverage on faultproneness<br />

for both types <strong>of</strong> mutations tested. For all <strong>of</strong> the programs tested I c<br />

proved to be a poor predictor <strong>of</strong> class-type mut<strong>an</strong>ts with the highest value being<br />

16.7% for Xal<strong>an</strong>. In contrast I c showed to be <strong>an</strong> effective indicator <strong>of</strong> traditional<br />

mutations with the values r<strong>an</strong>ging from 64.5% to 78.9%. This was as we expected as<br />

coverage is not really effective in evaluating object-oriented type programs therefore<br />

we would not expect it to be good predictors <strong>of</strong> object-oriented type faults.<br />

6.3.2 Experiment 2: To examine the relationship between<br />

<strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> <strong>an</strong>d fault detection.<br />

Regression Analysis<br />

Regression <strong>an</strong>alysis is used to determine the effectiveness <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong><br />

in detecting faults in programs. The dependent variable is the percentage mu-


6.3. Results 88<br />

t<strong>an</strong>t kill rate <strong>of</strong> the test cases used to execute the programs, while the independent<br />

variables are the six <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong>. Both class <strong>an</strong>d traditional mut<strong>an</strong>ts<br />

are evaluated separately. Appendix C.1 gives the results from this <strong>an</strong>alysis.<br />

The traditional mut<strong>an</strong>ts did not show <strong>an</strong>y relation with the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong><br />

measures, with only the IC CC metric for the Velocity program exhibiting a signific<strong>an</strong>t<br />

correlation. This is in contrast to the results from the previous experiment<br />

where I c proved to be a poor predictor <strong>of</strong> class-type mut<strong>an</strong>ts but a good predictor<br />

<strong>of</strong> traditional-type mut<strong>an</strong>ts.<br />

Figure 6.3 illustrates the results for the effectiveness <strong>of</strong> the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong><br />

<strong>metrics</strong> IC CC, IC CM, EC CC <strong>an</strong>d EC CM in predicting the M K for class-level<br />

mutations for each <strong>of</strong> the programs <strong>an</strong>alysed. For two <strong>of</strong> the programs Velocity <strong>an</strong>d<br />

Xal<strong>an</strong> the IC CC measure provided the greatest predictor <strong>of</strong> M K at 69% <strong>an</strong>d 59%<br />

respectively. For the Ant program, the EC CC metric had the highest value at 69%,<br />

however the IC CC value for this was also high at 60%. For all <strong>of</strong> the programs the<br />

EC CM measure was the poorest predictor. There were five categories <strong>of</strong> mutations<br />

introduced into the programs by µJava, as illustrated by Table D.2. We would<br />

expect the <strong>coupling</strong> measures to be a good predictor <strong>of</strong> those mutations based on<br />

inherit<strong>an</strong>ce, polymorphism <strong>an</strong>d overloading. However, we would not expect such<br />

a relationship for those based on Java-specific features <strong>an</strong>d common programming<br />

mistakes. The inclusion <strong>of</strong> these types <strong>of</strong> mutations may have negitavely skewed the<br />

results.<br />

None <strong>of</strong> the <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> based on distinct method counts IC CD <strong>an</strong>d EC CD<br />

exhibited a signific<strong>an</strong>t result <strong>an</strong>d therefore have not been summarised graphically.<br />

As with the case in Section 4.3, this was expected <strong>an</strong>d emphasises the signific<strong>an</strong>ce<br />

<strong>of</strong> the predictive capabilities <strong>of</strong> the other <strong>metrics</strong>.<br />

Overall, one would expect this kind <strong>of</strong> result as the class-type mut<strong>an</strong>ts are objectoriented,<br />

while the traditional mutations are based on factors like operator replacement<br />

<strong>an</strong>d therefore would not be expected to correlate strongly with <strong>coupling</strong>. This<br />

leads us to reject our null hypothesis for both experiments <strong>an</strong>d state that <strong>run</strong>-<strong>time</strong><br />

<strong>coupling</strong> <strong>metrics</strong> are good detectors <strong>of</strong> class-level faults, while coverage measures<br />

are good detectors <strong>of</strong> traditional-type faults in a program. We therefore postulate


6.4. Conclusion 89<br />

Figure 6.3: Regression <strong>an</strong>alysis results for the effectiveness <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong><br />

in predicting class-level mutations in real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d<br />

Ant. The bars represents the R 2 value for the <strong>run</strong>-<strong>time</strong> metric under consideration.<br />

a possible utility for <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> for use in fault-proneness detection<br />

with regard to identifying faults in object-oriented programs.<br />

6.4 Conclusion<br />

The results from this case <strong>study</strong> used regression <strong>an</strong>alysis to show that <strong>run</strong>-<strong>time</strong> <strong>coupling</strong><br />

<strong>metrics</strong> were good detectors <strong>of</strong> class-type faults in programs, while instruction<br />

coverage was a good detector <strong>of</strong> traditional-type mut<strong>an</strong>ts. Appendix C.1 illustrates<br />

these results <strong>an</strong>d shows that all results were deemed to be statistically signific<strong>an</strong>t at<br />

the 5% level <strong>of</strong> signific<strong>an</strong>ce. We therefore proposed the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> as<br />

alternative measures for fault-detection useful for identifying object-oriented type<br />

faults in programs.


Chapter 7<br />

Conclusions<br />

In this thesis we presented <strong>an</strong> <strong>empirical</strong> investigation into <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>an</strong>d<br />

<strong>cohesion</strong> <strong>metrics</strong>.<br />

The first case <strong>study</strong> investigated the influence <strong>of</strong> instruction coverage on the relationship<br />

between static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong>. An <strong>empirical</strong> investigation<br />

was conducted using the set <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> proposed by Arisholm et al. on<br />

a large set <strong>of</strong> Java programs. This set contained programs from the SPECjvm98<br />

<strong>an</strong>d JOlden benchmark suites <strong>an</strong>d also included three real-world programs Velocity,<br />

Xal<strong>an</strong> <strong>an</strong>d Ant.<br />

The differences in the underlying dimensions <strong>of</strong> <strong>coupling</strong> captured by the static<br />

versus the <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> were assessed using principal component <strong>an</strong>alysis. Three<br />

components were identified which contained static CBO, the import-based <strong>run</strong>-<strong>time</strong><br />

<strong>metrics</strong>, <strong>an</strong>d the export-based <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>. This established that the <strong>run</strong><strong>time</strong><br />

<strong>metrics</strong> were not simply surrogate static measures, which made them suitable<br />

c<strong>an</strong>didates for further <strong>an</strong>alysis.<br />

A <strong>study</strong> into the predictive ability <strong>of</strong> the static CBO <strong>an</strong>d instruction coverage<br />

data was then conducted using multiple regression <strong>an</strong>alysis. The purpose <strong>of</strong> this was<br />

to show how well the static CBO metric <strong>an</strong>d instruction coverage measure I c could<br />

predict the six <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> under consideration. The PCA <strong>an</strong>alysis placed<br />

import <strong>an</strong>d export based <strong>coupling</strong> in different components, <strong>an</strong>d this difference was<br />

also seen in the regression <strong>an</strong>alysis. Both CBO <strong>an</strong>d instruction coverage had less<br />

influence overall on the export-based <strong>metrics</strong>, EC CC <strong>an</strong>d EC CM th<strong>an</strong> on the<br />

90


Chapter 7. Conclusions 91<br />

Static<br />

Coupling<br />

Not surrogate<br />

Run-<strong>time</strong><br />

Coupling<br />

Static<br />

Coupling<br />

+<br />

Instruction<br />

Coverage<br />

Good predictor<br />

Run-<strong>time</strong><br />

Coupling<br />

Figure 7.1: Findings from case <strong>study</strong> one that show our <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong><br />

are not simply surrogate measures for static CBO <strong>an</strong>d coverage plus static <strong>metrics</strong><br />

are better predictors <strong>of</strong> <strong>run</strong>-<strong>time</strong> measures th<strong>an</strong> static measure alone.<br />

import-based <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>, IC CC <strong>an</strong>d IC CM.<br />

It was shown from the regression <strong>an</strong>alysis that the combination <strong>of</strong> the static<br />

measure with instruction coverage gave a signific<strong>an</strong>tly better prediction <strong>of</strong> the <strong>run</strong><strong>time</strong><br />

behavior <strong>of</strong> programs th<strong>an</strong> the use <strong>of</strong> static <strong>metrics</strong> alone, for the class-based<br />

<strong>an</strong>d method-based <strong>metrics</strong>. This suggested that the correlation between static <strong>an</strong>d<br />

<strong>run</strong>-<strong>time</strong> was as much a factor <strong>of</strong> coverage as <strong>an</strong> intrinsic property <strong>of</strong> the <strong>metrics</strong><br />

themselves.<br />

The results for the two <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> based on distinct message counts,<br />

EC CD <strong>an</strong>d EC CD were not within the chosen signific<strong>an</strong>ce level, <strong>an</strong>d thus no determination<br />

was made on the relationship for these <strong>metrics</strong>. Figure 7.1 summarises<br />

the finding from this <strong>study</strong>.


Chapter 7. Conclusions 92<br />

The second case <strong>study</strong> looked at <strong>run</strong>-<strong>time</strong> object behaviour <strong>an</strong>d whether <strong>run</strong>-<strong>time</strong><br />

<strong>cohesion</strong> <strong>metrics</strong> could be used to identify such behaviour.<br />

First, we looked at object behaviour in the context <strong>of</strong> <strong>coupling</strong>. We used the<br />

IC OC object-level metric, as defined by Arisholm et al. <strong>an</strong>d defined a new measure<br />

N OC in <strong>an</strong> attempt to identify objects that were behaving differently at <strong>run</strong>-<strong>time</strong><br />

from the point <strong>of</strong> view <strong>of</strong> <strong>coupling</strong>. We concluded that objects from the same class<br />

could behave differently at <strong>run</strong>-<strong>time</strong> from the point <strong>of</strong> view <strong>of</strong> <strong>coupling</strong> due to the<br />

fact that there were classes that exhibited variable C V values for IC OC <strong>an</strong>d N OC<br />

values greater th<strong>an</strong> one<br />

Subsequently, we looked at whether <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> could be used to<br />

predict object behaviour, as defined by the N OC measure. First, we had to prove<br />

that the <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> were not redund<strong>an</strong>t with respect to static S LCOM .<br />

The relationship between static <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> was investigated using<br />

PCA. Two components were identified containing the static S LCOM <strong>an</strong>d the <strong>run</strong>-<strong>time</strong><br />

<strong>cohesion</strong> measures R LCOM , RW LCOM . This established that the <strong>run</strong>-<strong>time</strong> <strong>metrics</strong><br />

were not simply surrogate static measures, making them suitable c<strong>an</strong>didates for<br />

further <strong>an</strong>alysis.<br />

Multiple regression <strong>an</strong>alysis was used to discover if the <strong>cohesion</strong> <strong>metrics</strong> were<br />

good predictors <strong>of</strong> object behaviour. The purpose <strong>of</strong> this was to show how well the<br />

S LCOM metric <strong>an</strong>d the <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> measures R LCOM , RW LCOM could predict<br />

N OC . Overall, the results showed that <strong>cohesion</strong> <strong>metrics</strong> were a good predictor <strong>of</strong><br />

N OC , with <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> being the superior metric. This led us to conclude that<br />

<strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> provided additional information about class behaviour<br />

over <strong>an</strong>d above that provided by S LCOM . Figure 7.2 depicts the results <strong>of</strong> this<br />

<strong>study</strong>.<br />

The third case <strong>study</strong> investigated whether instruction coverage was a good predictor<br />

<strong>of</strong> faults in a program. We used regression <strong>an</strong>alysis to determine if this<br />

measure was related to M K , the mutation kill rate <strong>of</strong> the test cases used. It was<br />

found that I C was a good predictor <strong>of</strong> traditional-type faults but a poor predictor <strong>of</strong><br />

class-type faults which verifies results from previous studies on coverage measures.<br />

Next, we <strong>an</strong>alysed the extent to which the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> were good


Chapter 7. Conclusions 93<br />

Class A<br />

IC_OC used to determine N OC<br />

N OC =3<br />

a 9<br />

Run-<strong>time</strong> Cohesion<br />

good predictor <strong>of</strong> N OC<br />

a 10<br />

a 11<br />

a 12<br />

a 5<br />

Task 3<br />

a 6<br />

a 1 a 7<br />

a 2<br />

a 3<br />

a 8<br />

Task 2<br />

Objects<br />

a 4<br />

Task 1<br />

Figure 7.2: Findings from case <strong>study</strong> two that show <strong>run</strong>-<strong>time</strong> object-level <strong>coupling</strong><br />

measures c<strong>an</strong> be used to identify objects that are exhibiting different behaviours<br />

at <strong>run</strong>-<strong>time</strong> <strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> measures are good predictors <strong>of</strong> this type <strong>of</strong><br />

behaviour.


7.1. Contributions 94<br />

Fault Proneness<br />

Run-<strong>time</strong><br />

Coupling<br />

Good predictor<br />

Instruction<br />

Coverage<br />

Class-type Mutations<br />

Traditional Mutations<br />

Figure 7.3: Findings from case <strong>study</strong> three that show <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> are<br />

good predictors <strong>of</strong> class-type faults <strong>an</strong>d instruction coverage is a good predictor <strong>of</strong><br />

traditional faults in programs.<br />

detectors <strong>of</strong> traditional <strong>an</strong>d class-type faults in a program. Our results showed that<br />

the measures IC CC, IC CM, EC CC <strong>an</strong>d EC CM were signific<strong>an</strong>tly related to M K<br />

when considering class-type mutations. The results for EC CD <strong>an</strong>d EC CD, the<br />

two <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> based on distinct message counts, were not within the chosen<br />

signific<strong>an</strong>ce level, <strong>an</strong>d thus no determination was made on the relationship for these<br />

<strong>metrics</strong>.<br />

The purpose <strong>of</strong> this <strong>study</strong> was to determine whether instruction coverage is a better<br />

predictor <strong>of</strong> fault-proneness th<strong>an</strong> the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> measures. As we found<br />

these measures were superior in detecting object-oriented type faults in programs<br />

th<strong>an</strong> simple measures <strong>of</strong> coverage, we proposed the <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> as <strong>an</strong><br />

alternative measure for fault-proneness, useful for detecting faults in object-oriented<br />

s<strong>of</strong>tware. Figure 7.3 illustrates the finding from this <strong>study</strong>.<br />

7.1 Contributions<br />

We have implemented the tools ClMet <strong>an</strong>d ObMet that c<strong>an</strong> be used to perform a<br />

class <strong>an</strong>d object-level <strong>an</strong>alysis <strong>of</strong> Java programs.


7.1. Contributions 95<br />

We use the definitions <strong>of</strong> Arishlom et al. for the set <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong><br />

in this <strong>an</strong>alysis. We had however defined our own set <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong><br />

previous to the publication <strong>of</strong> this paper [75, 76]. However, due to the similarity in<br />

nature <strong>of</strong> the <strong>metrics</strong> we then switched to using their definitions for the sake <strong>of</strong> ease<br />

<strong>of</strong> comparison.<br />

We define a number <strong>of</strong> object-oriented <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> for <strong>cohesion</strong> <strong>an</strong>d we<br />

investigate their possible utility. To date no one has attempted to do this.<br />

We define a new measure N OC that c<strong>an</strong> be used to <strong>study</strong> <strong>run</strong>-<strong>time</strong> objectbehaviour.<br />

To the best <strong>of</strong> our knowledge this is the largest <strong>empirical</strong> <strong>study</strong> that has been<br />

performed to date on the <strong>run</strong>-<strong>time</strong> <strong>an</strong>alysis <strong>of</strong> Java programs. Previously, a <strong>study</strong><br />

was carried out by Arisholm et al. on “Dynamic Coupling Measurement for Object-<br />

Oriented S<strong>of</strong>tware”, however this included a single program, Velocity, in the <strong>an</strong>alysis.<br />

Our <strong>study</strong> looks at not only Velocity but also the real-world programs Xal<strong>an</strong> <strong>an</strong>d<br />

Ant as well as seven benchmark programs from the SPECjvm98 suite <strong>an</strong>d seven<br />

programs from the JOlden suite thus making it much wider in scope.<br />

The main findings from our <strong>study</strong> are as follows:<br />

• We showed <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> capture additional dimensions <strong>of</strong> <strong>coupling</strong><br />

<strong>an</strong>d are not simply surrogate measures for static CBO. Therefore, useful<br />

information above that provided by a simple static <strong>an</strong>alysis may be acquired<br />

through the use <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>metrics</strong>.<br />

• Coverage has a signific<strong>an</strong>t impact on the correlation between static CBO <strong>an</strong>d<br />

<strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> <strong>an</strong>d should be a measured, recorded factor in <strong>an</strong>y<br />

comparison.<br />

• Run-<strong>time</strong> object-level <strong>coupling</strong> <strong>metrics</strong> c<strong>an</strong> be used to investigate object behaviour.<br />

Using such a measure we discovered that objects from the same class<br />

c<strong>an</strong> behave differently at <strong>run</strong>-<strong>time</strong> from the point <strong>of</strong> view <strong>of</strong> <strong>coupling</strong>.<br />

• Run-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> are not redund<strong>an</strong>t with respect to the static S LCOM<br />

measure <strong>an</strong>d capture additional dimensions <strong>of</strong> <strong>cohesion</strong>.


7.2. Applications <strong>of</strong> this Work 96<br />

• Run-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> are good predictors <strong>of</strong> <strong>run</strong>-<strong>time</strong> object behaviour.<br />

• Run-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> based on distinct class <strong>an</strong>d distinct method counts<br />

are good predictors <strong>of</strong> class-type or object-oriented faults in programs but poor<br />

predictors <strong>of</strong> traditional-type mutations.<br />

• Coverage is a good predictor <strong>of</strong> traditional-type faults but a poor predictor <strong>of</strong><br />

class-type faults in programs.<br />

7.2 Applications <strong>of</strong> this Work<br />

Much <strong>of</strong> the work on the dynamic <strong>an</strong>alysis <strong>of</strong> Java programs has come from the<br />

l<strong>an</strong>guage design <strong>an</strong>d compiler community. The work in this thesis forms part <strong>of</strong> <strong>an</strong><br />

increasing link between this community <strong>an</strong>d the s<strong>of</strong>tware engineering community,<br />

with <strong>an</strong> emphasis on collecting, <strong>an</strong>alysing <strong>an</strong>d comparing qu<strong>an</strong>titative static <strong>an</strong>d<br />

dynamic data. Other possible examples <strong>of</strong> this synthesis include relating studies<br />

<strong>of</strong> polymorphicity with testing inherit<strong>an</strong>ce relationships, or relating measures <strong>of</strong><br />

program “hot-spots” with <strong>metrics</strong> based on distinct messages such as IC CD <strong>an</strong>d<br />

EC CD. Run-<strong>time</strong> <strong>metrics</strong> may also have a role to play in areas <strong>of</strong> research such<br />

as reverse engineering <strong>an</strong>d program comprehension, as they contribute to a better<br />

underst<strong>an</strong>ding <strong>of</strong> the behavior <strong>of</strong> code in its operational environment.<br />

7.3 Threats to Validity<br />

7.3.1 Internal Threats<br />

There are a number <strong>of</strong> factors which may potentially affect the validity <strong>of</strong> these <strong>run</strong><strong>time</strong><br />

<strong>metrics</strong>. In this thesis we have chosen only to look at <strong>run</strong>-<strong>time</strong> definitions for<br />

<strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> which are based on the st<strong>an</strong>dard static definitions proposed by<br />

Chidamber <strong>an</strong>d Kemerer. Their metric suite for <strong>an</strong>alysing object-oriented s<strong>of</strong>tware<br />

consists <strong>of</strong> three additional measurements for evaluating the depth <strong>of</strong> inherit<strong>an</strong>ce tree<br />

(DIT), the number <strong>of</strong> children (NOC) <strong>an</strong>d the weighted methods per class (WMC).


7.3. Threats to Validity 97<br />

Our set <strong>of</strong> <strong>run</strong>-<strong>time</strong> measures should be exp<strong>an</strong>ded to include <strong>run</strong>-<strong>time</strong> definitions for<br />

these also to ensure that the set is fully comprehensive.<br />

The <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> used in this <strong>study</strong> are rated based on how they perform<br />

in relation to static measurements for <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong>. However, no <strong>study</strong><br />

has definitively shown that <strong>an</strong>y measurement for <strong>coupling</strong> or <strong>cohesion</strong> provides <strong>an</strong>y<br />

extra information on design quality over <strong>an</strong>d above that which c<strong>an</strong> be gained simply<br />

by evaluating the much simpler lines <strong>of</strong> code measure.<br />

7.3.2 External Threats<br />

A general problem with <strong>an</strong>y type <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>an</strong>alysis is that the results are based on<br />

dynamic measurement <strong>an</strong>d are thus tied to the inputs or test cases used. Therefore<br />

the use <strong>of</strong> different test cases may produce different results. Static measurements<br />

however will remain the same regardless <strong>of</strong> the set <strong>of</strong> test cases used to execute the<br />

program.<br />

The set <strong>of</strong> programs used in this <strong>study</strong> may not be representative <strong>of</strong> all classes<br />

<strong>of</strong> Java programs, as for example, no GUI based programs were included in this<br />

<strong>an</strong>alysis.<br />

While the <strong>run</strong>-<strong>time</strong> <strong>an</strong>alysis tools ClMet <strong>an</strong>d ObMet made it easy to collect a<br />

wide variety <strong>of</strong> <strong>run</strong>-<strong>time</strong> information from a program <strong>an</strong>d were easy to use, it was<br />

still quite <strong>time</strong> comsuming to perform a full <strong>an</strong>alysis <strong>of</strong> a program. Allthough it was<br />

stated that perform<strong>an</strong>ce was not really <strong>an</strong> issue in the design <strong>of</strong> these tools. if such<br />

a method <strong>of</strong> evaluating a program was to be marketed to industry the perform<strong>an</strong>ce<br />

<strong>of</strong> the tools would have to be given more serious consdieration.<br />

Only one external quality attribute, fault detection, was investigated in this<br />

thesis. Further research need to be conducted to see how measures for <strong>coupling</strong> <strong>an</strong>d<br />

<strong>cohesion</strong> do in predicting other import<strong>an</strong>t external quality attributes <strong>of</strong> a design<br />

such as maintainability, reusability or comprehensibility.<br />

The relationship between internal <strong>an</strong>d external quality attribute is quite intituitive,<br />

for example, more complex code will require greater effort to maintain. However<br />

the precise functional form <strong>of</strong> this relationship is less clear <strong>an</strong>d is the subject<br />

<strong>of</strong> intense <strong>an</strong>d practical research concern. Using theories <strong>of</strong> cognition <strong>an</strong>d problem-


7.4. Future Work 98<br />

solving to help us underst<strong>an</strong>d the effects <strong>of</strong> complexity on s<strong>of</strong>tware is the subject <strong>of</strong><br />

much current research [31].<br />

7.4 Future Work<br />

Future work may involve extending the existing set <strong>of</strong> <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> <strong>metrics</strong><br />

to develop a comprehensive set <strong>of</strong> <strong>run</strong>-<strong>time</strong> object-oriented <strong>metrics</strong> that c<strong>an</strong><br />

intuitively qu<strong>an</strong>tify such aspects <strong>of</strong> object-oriented applications such as inherit<strong>an</strong>ce,<br />

dynamic binding <strong>an</strong>d polymorphism.<br />

Currently there exists no set <strong>of</strong> benchmarks specifically designed for evaluating<br />

properties <strong>of</strong> object-oriented programming such as <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong>, it would<br />

be useful to design such a set <strong>of</strong> benchmarks for use in similar <strong>empirical</strong> studies.<br />

Futher research could involve designing a <strong>run</strong>-<strong>time</strong> pr<strong>of</strong>iling tool written in C++<br />

rather th<strong>an</strong> Java. Such a tool could utilise the JVMDI component <strong>of</strong> the JPDA<br />

directly <strong>an</strong>d therefore would be dynamically linked with thw JVM at <strong>run</strong>-<strong>time</strong>.<br />

This would probably result in less perform<strong>an</strong>ce overhead which would result in a<br />

reduction in the <strong>time</strong> taken to preform such <strong>an</strong> <strong>an</strong>alysis.<br />

Another import<strong>an</strong>t aspect would be to further investigate the correlation between<br />

<strong>run</strong>-<strong>time</strong> <strong>metrics</strong> <strong>an</strong>d external quality aspects <strong>of</strong> a design, including investigating<br />

the possibility <strong>of</strong> using hybrid models that use a combination <strong>of</strong> static <strong>an</strong>d <strong>run</strong><strong>time</strong><br />

<strong>metrics</strong> to evaluate a design.<br />

It would be interesting to conduct <strong>an</strong> industrial case <strong>study</strong> using real commercial<br />

s<strong>of</strong>tware <strong>an</strong>d data to further verify the results in this thesis.<br />

Other applications <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> should be investigated, for example they<br />

could be useful in determining where refactoring have been or could be applied or<br />

they could be used to aid in program comprehension.<br />

This <strong>study</strong> has focused solely on the evaluation <strong>of</strong> Java s<strong>of</strong>tware, it would be<br />

import<strong>an</strong>t to investigate if the <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> gave similar results if they were<br />

used to evaluate other types <strong>of</strong> object-oriented s<strong>of</strong>tware, for example C sharp.<br />

Though the approach <strong>an</strong>d results are <strong>of</strong> signific<strong>an</strong>ce to the field, they c<strong>an</strong> also be<br />

used as stepping stones to open up new ways to consider a wider set <strong>of</strong> internal qual-


7.4. Future Work 99<br />

ity attributes <strong>an</strong>d their interrelationships, <strong>an</strong>d their independent <strong>an</strong>d interdependent<br />

effect relationship on external quality aspect <strong>of</strong> a design.


Appendix A<br />

Case Study 1: To Investigate the<br />

Influence <strong>of</strong> Instruction Coverage<br />

on the Relationship Between<br />

Static <strong>an</strong>d Run-<strong>time</strong> Coupling<br />

Metric<br />

Appendix A.1 contains the PCA test results for the SPECjvm98 <strong>an</strong>d JOlden suites<br />

<strong>an</strong>d for the real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant. Values deemed to be<br />

signific<strong>an</strong>t at the level p ≤ 0.05 are highlighted.<br />

Appendix A.2 contains the results from the multiple linear regression used to test<br />

the hypothesis H 0 , that coverage has no effect on the relationship between static<br />

<strong>an</strong>d <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> for the programs from the SPECjvm98 <strong>an</strong>d JOlden suites <strong>an</strong>d<br />

for the real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant. All signific<strong>an</strong>t results are<br />

highlighted.<br />

100


A.1. PCA Test Results for all programs. 101<br />

A.1 PCA Test Results for all programs.<br />

A.1.1<br />

SPECjvm98 Benchmark Suite<br />

201 compress<br />

P C1 P C2 P C3<br />

CBO 0.113 0.014 0.712<br />

IC CC 0.865 0.065 0.186<br />

IC CM 0.766 0.154 0.097<br />

IC CD 0.866 0.073 0.100<br />

EC CC 0.023 0.873 0.176<br />

EC CM 0.143 0.799 0.035<br />

EC CD 0.098 0.834 0.096<br />

209 db<br />

P C1 P C2 P C3<br />

CBO 0.012 0.163 0.843<br />

IC CC 0.893 0.088 0.002<br />

IC CM 0.923 0.004 0.000<br />

IC CD 0.976 0.003 0.013<br />

EC CC 0.178 0.763 0.002<br />

EC CM 0.110 0.793 0.027<br />

EC CD 0.087 0.823 0.017<br />

202 jess<br />

P C1 P C2 P C3<br />

CBO 0.198 0.187 0.672<br />

IC CC 0.963 0.007 0.005<br />

IC CM 0.912 0.003 0.016<br />

IC CD 0.874 0.032 0.004<br />

EC CC 0.154 0.812 0.002<br />

EC CM 0.298 0.734 0.054<br />

EC CD 0.098 0.923 0.002<br />

213 javac<br />

P C1 P C2 P C3<br />

CBO 0.187 0.000 0.973<br />

IC CC 0.633 0.083 0.184<br />

IC CM 0.834 0.033 0.023<br />

IC CD 0.723 0.143 0.002<br />

EC CC 0.138 0.834 0.004<br />

EC CM 0.078 0.734 0.012<br />

EC CD 0.067 0.759 0.034<br />

228 jack<br />

P C1 P C2 P C3<br />

CBO 0.004 0.243 0.634<br />

IC CC 0.605 0.234 0.154<br />

IC CM 0.723 0.194 0.076<br />

IC CD 0.604 0.195 0.098<br />

EC CC 0.194 0.749 0.098<br />

EC CM 0.103 0.694 0.049<br />

EC CD 0.094 0.749 0.104<br />

205 raytrace<br />

P C1 P C2 P C3<br />

CBO 0.123 0.087 0.723<br />

IC CC 0.834 0.021 0.019<br />

IC CM 0.912 0.017 0.008<br />

IC CD 0.896 0.103 0.001<br />

EC CC 0.198 0.763 0.003<br />

EC CM 0.125 0.709 0.017<br />

EC CD 0.097 0.821 0.002<br />

222 mpegaudio<br />

P C1 P C2 P C3<br />

CBO 0.244 0.137 0.583<br />

IC CC 0.943 0.004 0.087<br />

IC CM 0.898 0.034 0.041<br />

IC CD 0.943 0.023 0.001<br />

EC CC 0.034 0.943 0.043<br />

EC CM 0.134 0.754 0.085<br />

EC CD 0.098 0.845 0.005<br />

A.1.2<br />

JOlden Benchmark Suite<br />

BH<br />

P C1 P C2 P C3<br />

CBO 0.403 0.002 0.520<br />

IC CC 0.728 0.224 0.012<br />

IC CM 0.536 0.391 0.001<br />

IC CD 0.555 0.376 0.000<br />

EC CC 0.358 0.522 0.109<br />

EC CM 0.203 0.763 0.025<br />

EC CD 0.203 0.763 0.025<br />

MST<br />

P C1 P C2 P C3<br />

CBO 0.000 0.013 0.972<br />

IC CC 0.900 0.063 0.032<br />

IC CM 0.956 0.010 0.026<br />

IC CD 0.941 0.012 0.027<br />

EC CC 0.356 0.609 0.033<br />

EC CM 0.121 0.877 0.001<br />

EC CD 0.118 0.881 0.000<br />

Em3d<br />

P C1 P C2 P C3<br />

CBO 0.134 0.034 0.712<br />

IC CC 0.933 0.013 0.016<br />

IC CM 0.772 0.168 0.039<br />

IC CD 0.772 0.168 0.039<br />

EC CC 0.139 0.702 0.082<br />

EC CM 0.223 0.716 0.039<br />

EC CD 0.223 0.716 0.039<br />

Perimeter<br />

P C1 P C2 P C3<br />

CBO 0.231 0.123 0.612<br />

IC CC 0.541 0.169 0.281<br />

IC CM 0.876 0.080 0.002<br />

IC CD 0.905 0.056 0.038<br />

EC CC 0.236 0.752 0.000<br />

EC CM 0.147 0.830 0.023<br />

EC CD 0.142 0.828 0.026<br />

Health<br />

P C1 P C2 P C3<br />

CBO 0.238 0.187 0.521<br />

IC CC 0.956 0.005 0.017<br />

IC CM 0.936 0.024 0.010<br />

IC CD 0.940 0.028 0.009<br />

EC CC 0.076 0.831 0.086<br />

EC CM 0.070 0.919 0.002<br />

EC CD 0.065 0.794 0.003<br />

Power<br />

P C1 P C2 P C3<br />

CBO 0.329 0.014 0.626<br />

IC CC 0.617 0.073 0.161<br />

IC CM 0.624 0.338 0.036<br />

IC CD 0.712 0.228 0.041<br />

EC CC 0.022 0.915 0.015<br />

EC CM 0.007 0.880 0.112<br />

EC CD 0.008 0.824 0.164


A.1. PCA Test Results for all programs. 102<br />

Voronoi<br />

P C1 P C2 P C3<br />

CBO 0.198 0.213 0.526<br />

IC CC 0.718 0.123 0.069<br />

IC CM 0.812 0.088 0.134<br />

IC CD 0.773 0.176 0.141<br />

EC CC 0.043 0.911 0.005<br />

EC CM 0.067 0.934 0.004<br />

EC CD 0.148 0.834 0.054<br />

A.1.3<br />

Real-World Programs, Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant<br />

Velocity<br />

P C1 P C2 P C3<br />

CBO 0.384 0.184 0.734<br />

IC CC 0.623 0.034 0.174<br />

IC CM 0.725 0.087 0.231<br />

IC CD 0.684 0.196 0.192<br />

EC CC 0.284 0.684 0.097<br />

EC CM 0.023 0.793 0.005<br />

EC CD 0.174 0.590 0.015<br />

Xal<strong>an</strong><br />

P C1 P C2 P C3<br />

CBO 0.316 0.174 0.586<br />

IC CC 0.824 0.184 0.183<br />

IC CM 0.890 0.284 0.284<br />

IC CD 0.795 0.003 0.194<br />

EC CC 0.013 0.834 0.164<br />

EC CM 0.284 0.793 0.023<br />

EC CD 0.384 0.823 0.154<br />

Ant<br />

P C1 P C2 P C3<br />

CBO 0.125 0.254 0.687<br />

IC CC 0.874 0.125 0.125<br />

IC CM 0.789 0.231 0.012<br />

IC CD 0.801 0.324 0.214<br />

EC CC 0.214 0.789 0.124<br />

EC CM 0.141 0.785 0.054<br />

EC CD 0.123 0.754 0.014


A.2. Multiple linear regression results for all programs 103<br />

A.2 Multiple linear regression results for all programs<br />

A.2.1<br />

SPECjvm98 Benchmark Suite<br />

201 compress<br />

209 db<br />

Hypothesis Y R R 2 P > F Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.775 0.593 0.003 H CBO IC CC 0.419 0.178 0.0001<br />

H CBO,Ic IC CC 0.798 0.602 0.0001 H CBO,Ic IC CC 0.868 0.754 0.001<br />

H CBO EC CC 0.634 0.402 0.01 H CBO EC CC 0.567 0.322 0.002<br />

H CBO,Ic EC CC 0.870 0.759 0.007 H CBO,Ic EC CC 0.881 0.777 0.001<br />

H CBO IC CD 0.512 0.262 0.421 H CBO IC CD 0.691 0.478 0.522<br />

H CBO,I c IC CD 0.599 0.359 0.201 H CBO,I c IC CD 0.768 0.589 0.263<br />

H CBO EC CD 0.239 0.057 0.054 H CBO EC CD 0.312 0.097 0.609<br />

H CBO,I c EC CD 0.422 0.178 0.134 H CBO,I c EC CD 0.429 0.184 0.816<br />

H CBO IC CM 0.762 0.58 0.003 H CBO IC CM 0.582 0.338 0.003<br />

H CBO,Ic IC CM 0.885 0.784 0.006 H CBO,Ic IC CM 0.703 0.494 0.006<br />

H CBO EC CM 0.235 0.056 0.04 H CBO EC CM 0.313 0.098 0.019<br />

H CBO,Ic EC CM 0.58 0.336 0.035 H CBO,Ic EC CM 0.428 0.184 0.016<br />

202 jess<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.553 0.306 0.002<br />

H CBO,Ic IC CC 0.703 0.494 0.001<br />

H CBO EC CC 0.428 0.184 0.031<br />

H CBO,I c EC CC 0.567 0.322 0.023<br />

H CBO IC CD 0.765 0.586 0.145<br />

H CBO,I c IC CD 0.868 0.754 0.321<br />

H CBO EC CD 0.691 0.748 0.246<br />

H CBO,Ic EC CD 0.723 0.523 0.135<br />

H CBO IC CM 0.762 0.581 0.023<br />

H CBO,Ic IC CM 0.922 0.852 0.012<br />

H CBO EC CM 0.618 0.382 0.001<br />

H CBO,I c EC CM 0.645 0.416 0.002<br />

213 javac<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.535 0.286 0.005<br />

H CBO,Ic IC CC 0.748 0.559 0.002<br />

H CBO EC CC 0.443 0.196 0.004<br />

H CBO,I c EC CC 0.531 0.282 0.007<br />

H CBO IC CD 0.512 0.262 0.234<br />

H CBO,I c IC CD 0.606 0.367 0.176<br />

H CBO EC CD 0.872 0.76 0.765<br />

H CBO,Ic EC CD 0.922 0.85 0.567<br />

H CBO IC CM 0.553 0.306 0.034<br />

H CBO,Ic IC CM 0.76 0.577 0.024<br />

H CBO EC CM 0.321 0.107 0.042<br />

H CBO,I c EC CM 0.567 0.322 0.034<br />

205 raytrace<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.444 0.197 0.021<br />

H CBO,I c IC CC 0.659 0.434 0.002<br />

H CBO EC CC 0.59 0.349 0.043<br />

H CBO,I c EC CC 0.669 0.447 0.032<br />

H CBO IC CD 0.256 0.065 0.342<br />

H CBO,I c IC CD 0.36 0.13 0.365<br />

H CBO EC CD 0.239 0.057 0.123<br />

H CBO,Ic EC CD 0.363 0.132 0.432<br />

H CBO IC CM 0.443 0.196 0.034<br />

H CBO,I c IC CM 0.599 0.359 0.032<br />

H CBO EC CM 0.422 0.178 0.012<br />

H CBO,I c EC CM 0.632 0.399 0.032<br />

222 mpegaudio<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.174 0.032 0.003<br />

H CBO,I c IC CC 0.452 0.204 0.001<br />

H CBO EC CC 0.296 0.088 0.013<br />

H CBO,I c EC CC 0.635 0.403 0.006<br />

H CBO IC CD 0.734 0.538 0.165<br />

H CBO,I c IC CD 0.885 0.784 0.214<br />

H CBO EC CD 0.948 0.899 0.234<br />

H CBO,Ic EC CD 0.978 0.956 0.654<br />

H CBO IC CM 0.753 0.567 0.001<br />

H CBO,I c IC CM 0.769 0.592 0.002<br />

H CBO EC CM 0.533 0.284 0.021<br />

H CBO,I c EC CM 0.635 0.403 0.03<br />

228 jack<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.606 0.367 0.003<br />

H CBO,I c IC CC 0.966 0.933 0.012<br />

H CBO EC CC 0.512 0.262 0.002<br />

H CBO,I c EC CC 0.872 0.76 0.003<br />

H CBO IC CD 0.239 0.057 0.465<br />

H CBO,Ic IC CD 0.618 0.382 0.450<br />

H CBO EC CD 0.363 0.132 0.123<br />

H CBO,I c EC CD 0.419 0.178 0.576<br />

H CBO IC CM 0.585 0.343 0.013<br />

H CBO,I c IC CM 0.599 0.359 0.002<br />

H CBO EC CM 0.363 0.132 0.045<br />

H CBO,I c EC CM 0.417 0.174 0.032


A.2. Multiple linear regression results for all programs 104<br />

A.2.2<br />

JOlden Benchmark Suite<br />

BH<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.531 0.282 0.038<br />

H CBO,I c IC CC 0.767 0.588 0.044<br />

H CBO EC CC 0.092 0.008 0.0001<br />

H CBO,I c EC CC 0.533 0.284 0.0001<br />

H CBO IC CD 0.431 0.185 0.247<br />

H CBO,I c IC CD 0.617 0.381 0.237<br />

H CBO EC CD 0.443 0.196 0.232<br />

H CBO,Ic EC CD 0.514 0.264 0.398<br />

H CBO IC CM 0.45 0.203 0.024<br />

H CBO,I c IC CM 0.635 0.403 0.013<br />

H CBO EC CM 0.443 0.196 0.032<br />

H CBO,I c EC CM 0.514 0.264 0.024<br />

MST<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.97 0.941 0.001<br />

H CBO,I c IC CC 0.972 0.945 0.0001<br />

H CBO EC CC 0.606 0.367 0.002<br />

H CBO,I c EC CC 0.76 0.577 0.001<br />

H CBO IC CD 0.966 0.933 0.200<br />

H CBO,I c IC CD 0.987 0.974 0.401<br />

H CBO EC CD 0.239 0.057 0.649<br />

H CBO,Ic EC CD 0.618 0.382 0.486<br />

H CBO IC CM 0.966 0.933 0.002<br />

H CBO,I c IC CM 0.987 0.974 0.004<br />

H CBO EC CM 0.239 0.057 0.049<br />

H CBO,I c EC CM 0.618 0.382 0.086<br />

Em3d<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.617 0.381 0.046<br />

H CBO,I c IC CC 0.748 0.659 0.001<br />

H CBO EC CC 0.262 0.069 0.03<br />

H CBO,I c EC CC 0.937 0.878 0.024<br />

H CBO IC CD 0.59 0.349 0.294<br />

H CBO,Ic IC CD 0.591 0.349 0.651<br />

H CBO EC CD 0.02 0.00 0.975<br />

H CBO,I c EC CD 0.626 0.392 0.608<br />

H CBO IC CM 0.59 0.349 0.194<br />

H CBO,I c IC CM 0.591 0.349 0.151<br />

H CBO EC CM 0.02 0.000 0.075<br />

H CBO,I c EC CM 0.626 0.392 0.008<br />

Perimeter<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.36 0.13 0.306<br />

H CBO,I c IC CC 0.422 0.178 0.503<br />

H CBO EC CC 0.095 0.009 0.194<br />

H CBO,I c EC CC 0.599 0.359 0.211<br />

H CBO IC CD 0.512 0.262 0.131<br />

H CBO,Ic IC CD 0.585 0.343 0.230<br />

H CBO EC CD 0.256 0.065 0.476<br />

H CBO,I c EC CD 0.58 0.336 0.238<br />

H CBO IC CM 0.645 0.416 0.044<br />

H CBO,I c IC CM 0.66 0.435 0.135<br />

H CBO EC CM 0.256 0.065 0.076<br />

H CBO,I c EC CM 0.58 0.336 0.038<br />

Health<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.601 0.372 0.04<br />

H CBO,I c IC CC 0.643 0.414 0.003<br />

H CBO EC CC 0.22 0.048 0.06<br />

H CBO,Ic EC CC 0.254 0.064 0.13<br />

H CBO IC CD 0.659 0.434 0.075<br />

H CBO,Ic IC CD 0.753 0.566 0.124<br />

H CBO EC CD 0.444 0.197 0.27<br />

H CBO,I c EC CD 0.535 0.286 0.431<br />

H CBO IC CM 0.669 0.447 0.07<br />

H CBO,I c IC CM 0.76 0.578 0.116<br />

H CBO EC CM 0.444 0.197 0.207<br />

H CBO,Ic EC CM 0.535 0.286 0.431<br />

Power<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.709 0.502 0.042<br />

H CBO,I c IC CC 0.713 0.508 0.001<br />

H CBO EC CC 0.635 0.404 0.011<br />

H CBO,Ic EC CC 0.872 0.76 0.001<br />

H CBO IC CD 0.104 0.011 0.844<br />

H CBO,Ic IC CD 0.723 0.523 0.329<br />

H CBO EC CD 0.363 0.132 0.479<br />

H CBO,I c EC CD 0.632 0.399 0.465<br />

H CBO IC CM 0.067 0.004 0.9<br />

H CBO,I c IC CM 0.638 0.407 0.456<br />

H CBO EC CM 0.417 0.174 0.010<br />

H CBO,Ic EC CM 0.673 0.453 0.005<br />

Voronoi<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.922 0.85 0.009<br />

H CBO,Ic IC CC 0.941 0.885 0.0001<br />

H CBO EC CC 0.553 0.306 0.255<br />

H CBO,Ic EC CC 0.561 0.314 0.568<br />

H CBO IC CD 0.762 0.58 0.078<br />

H CBO,I c IC CD 0.768 0.589 0.263<br />

H CBO EC CD 0.627 0.393 0.183<br />

H CBO,I c EC CD 0.636 0.405 0.459<br />

H CBO IC CM 0.765 0.586 0.076<br />

H CBO,Ic IC CM 0.77 0.594 0.059<br />

H CBO EC CM 0.627 0.393 0.083<br />

H CBO,Ic EC CM 0.636 0.405 0.029


A.2. Multiple linear regression results for all programs 105<br />

A.2.3<br />

Real-World Programs, Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant<br />

Velocity<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.515 0.265 0.0001<br />

H CBO,Ic IC CC 0.722 0.521 0.001<br />

H CBO EC CC 0.381 0.145 0.014<br />

H CBO,I c EC CC 0.617 0.381 0.025<br />

H CBO IC CD 0.595 0.354 0.254<br />

H CBO,I c IC CD 0.741 0.547 0.354<br />

H CBO EC CD 0.677 0.458 0.144<br />

H CBO,Ic EC CD 0.861 0.741 0.214<br />

H CBO IC CM 0.675 0.455 0.005<br />

H CBO,Ic IC CM 0.752 0.565 0.004<br />

H CBO EC CM 0.409 0.167 0.007<br />

H CBO,I c EC CM 0.506 0.256 0.01<br />

Xal<strong>an</strong><br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.453 0.205 0.002<br />

H CBO,I c IC CC 0.637 0.406 0.001<br />

H CBO EC CC 0.430 0.185 0.002<br />

H CBO,I c EC CC 0.570 0.325 0.004<br />

H CBO IC CD 0.709 0.502 0.547<br />

H CBO,Ic IC CD 0.892 0.796 0.214<br />

H CBO EC CD 0.830 0.689 0.114<br />

H CBO,Ic EC CD 0.857 0.735 0.147<br />

H CBO IC CM 0.652 0.425 0.006<br />

H CBO,I c IC CM 0.762 0.581 0.005<br />

H CBO EC CM 0.504 0.254 0.011<br />

H CBO,I c EC CM 0.624 0.389 0.007<br />

Ant<br />

Hypothesis Y R R 2 P > F<br />

H CBO IC CC 0.604 0.365 0.005<br />

H CBO,I c IC CC 0.765 0.585 0.006<br />

H CBO EC CC 0.453 0.205 0.014<br />

H CBO,Ic EC CC 0.636 0.405 0.018<br />

H CBO IC CD 0.597 0.356 0.154<br />

H CBO,Ic IC CD 0.698 0.487 0.198<br />

H CBO EC CD 0.518 0.268 0.287<br />

H CBO,I c EC CD 0.667 0.445 0.098<br />

H CBO IC CM 0.725 0.525 0.017<br />

H CBO,I c IC CM 0.784 0.615 0.025<br />

H CBO EC CM 0.451 0.204 0.042<br />

H CBO,I c EC CM 0.560 0.314 0.034


Appendix B<br />

Case Study 2: The Impact <strong>of</strong><br />

Run-<strong>time</strong> Cohesion on Object<br />

Behaviour<br />

Appendix B.1 contains the PCA test results for the JOlden benchmark suite <strong>an</strong>d for<br />

the real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant. Values deemed to be signific<strong>an</strong>t<br />

at the level p ≤ 0.05 are highlighted.<br />

Appendix B.2 contains the results from the multiple linear regression used to test<br />

the hypothesis H 0 , that measures <strong>of</strong> <strong>run</strong>-<strong>time</strong> <strong>cohesion</strong> provide a better indication<br />

<strong>of</strong> N OC th<strong>an</strong> a static measure alone for the JOlden benchmark programs <strong>an</strong>d the<br />

real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant. All signific<strong>an</strong>t results are highlighted.<br />

B.1 PCA Test Results for all programs.<br />

B.1.1<br />

JOlden Benchmark Suite<br />

BH<br />

P C1 P C2<br />

S LCOM 0.214 0.754<br />

R LCOM 0.714 0.214<br />

RW LCOM 0.721 0.101<br />

MST<br />

P C1 P C2<br />

S LCOM 0.251 0.712<br />

R LCOM 0.714 0.211<br />

RW LCOM 0.751 0.165<br />

Em3d<br />

P C1 P C2<br />

S LCOM 0.135 0.812<br />

R LCOM 0.841 0.014<br />

RW LCOM 0.814 0.014<br />

Perimeter<br />

P C1 P C2<br />

S LCOM 0.025 0.912<br />

R LCOM 0.874 0.145<br />

RW LCOM 0.768 0.121<br />

Health<br />

P C1 P C2<br />

S LCOM 0.122 0.789<br />

R LCOM 0.674 0.145<br />

RW LCOM 0.714 0.212<br />

Power<br />

P C1 P C2<br />

S LCOM 0.142 0.775<br />

R LCOM 0.654 0.154<br />

RW LCOM 0.698 0.177<br />

106


B.2. Multiple linear regression results for all programs. 107<br />

Voronoi<br />

P C1 P C2<br />

S LCOM 0.045 0.901<br />

R LCOM 0.854 0.104<br />

RW LCOM 0.868 0.021<br />

B.1.2<br />

Real-World Programs, Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant<br />

Velocity<br />

P C1 P C2<br />

S LCOM 0.215 0.614<br />

R LCOM 0.814 0.124<br />

RW LCOM 0.751 0.165<br />

Xal<strong>an</strong><br />

P C1 P C2<br />

S LCOM 0.315 0.554<br />

R LCOM 0.714 0.116<br />

RW LCOM 0.641 0.225<br />

Ant<br />

P C1 P C2<br />

S LCOM 0.114 0.712<br />

R LCOM 0.814 0.124<br />

RW LCOM 0.801 0.101<br />

B.2 Multiple linear regression results for all programs.<br />

B.2.1<br />

JOlden Benchmark Suite<br />

BH<br />

Hypothesis Y R R 2 P > F<br />

H SLCOM N OC 0.444 0.197 0.016<br />

H SLCOM ,R LCOM N OC 0.711 0.507 0.01<br />

H SLCOM N OC 0.105 0.012 0.452<br />

H SLCOM ,RW LCOM N OC 0.631 0.398 0.487<br />

Em3d<br />

Hypothesis Y R R 2 P > F<br />

H SLCOM N OC 0.365 0.134 0.006<br />

H SLCOM ,R LCOM N OC 0.744 0.655 0.005<br />

H SLCOM N OC 0.415 0.173 0.254<br />

H SLCOM ,RW LCOM N OC 0.67 0.451 0.354<br />

Health<br />

Hypothesis Y R R 2 P > F<br />

H SLCOM N OC 0.518 0.268 0.012<br />

H SLCOM ,R LCOM N OC 0.754 0.568 0.009<br />

H SLCOM N OC 0.445 0.198 0.124<br />

H SLCOM ,RW LCOM N OC 0.534 0.285 0.211<br />

MST<br />

Hypothesis Y R R 2 P > F<br />

H SLCOM N OC 0.235 0.056 0.025<br />

H SLCOM ,R LCOM N OC 0.704 0.495 0.012<br />

H SLCOM N OC 0.555 0.308 0.121<br />

H SLCOM ,RW LCOM N OC 0.594 0.355 0.241<br />

Perimeter<br />

Hypothesis Y R R 2 P > F<br />

H SLCOM N OC 0.514 0.263 0.002<br />

H SLCOM ,R LCOM N OC 0.631 0.398 0.001<br />

H SLCOM N OC 0.366 0.135 0.048<br />

H SLCOM ,RW LCOM N OC 0.451 0.203 0.037<br />

Power<br />

Hypothesis Y R R 2 P > F<br />

H SLCOM N OC 0.177 0.035 0.028<br />

H SLCOM ,R LCOM N OC 0.598 0.358 0.035<br />

H SLCOM N OC 0.445 0.198 0.214<br />

H SLCOM ,RW LCOM N OC 0.514 0.264 0.277<br />

Voronoi<br />

Hypothesis Y R R 2 P > F<br />

H SLCOM N OC 0.523 0.273 0.004<br />

H SLCOM ,R LCOM N OC 0.767 0.589 0.002<br />

H SLCOM N OC 0.255 0.064 0.381<br />

H SLCOM ,RW LCOM N OC 0.333 0.129 0.358


B.2. Multiple linear regression results for all programs. 108<br />

B.2.2<br />

Real-World Programs, Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant<br />

Velocity<br />

Hypothesis Y R R 2 P > F<br />

H SLCOM N OC 0.445 0.198 0.002<br />

H SLCOM ,R LCOM N OC 0.756 0.572 0.001<br />

H SLCOM N OC 0.363 0.132 0.456<br />

H SLCOM ,RW LCOM N OC 0.598 0.358 0.345<br />

Xal<strong>an</strong><br />

Hypothesis Y R R 2 P > F<br />

H SLCOM N OC 0.242 0.06 0.044<br />

H SLCOM ,R LCOM N OC 0.621 0.385 0.098<br />

H SLCOM N OC 0.722 0.523 0.287<br />

H SLCOM ,RW LCOM N OC 0.869 0.758 0.205<br />

Ant<br />

Hypothesis Y R R 2 P > F<br />

H SLCOM N OC 0.455 0.207 0.0001<br />

H SLCOM ,R LCOM N OC 0.747 0.558 0.001<br />

H SLCOM N OC 0.633 0.401 0.214<br />

H SLCOM ,RW LCOM N OC 0.69 0.747 0.564


Appendix C<br />

Case Study 3: A Study <strong>of</strong><br />

Run-<strong>time</strong> Coupling Metrics <strong>an</strong>d<br />

Fault Detection<br />

Appendix C.1 contains the results from the regression <strong>an</strong>alysis used to test the<br />

hypothesis H 0 , that <strong>run</strong>-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> are poor detectors <strong>of</strong> faults in a<br />

program for the set <strong>of</strong> real-world programs Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant.<br />

Appendix C.2 presents the results to test the hypothesis H 0 , that coverage measures<br />

are poor detectors <strong>of</strong> faults in a program for the real-world programs Velocity,<br />

Xal<strong>an</strong> <strong>an</strong>d Ant. All signific<strong>an</strong>t results are highlighted.<br />

109


C.1. Regression <strong>an</strong>alysis results for real-world programs, Velocity, Xal<strong>an</strong><br />

<strong>an</strong>d Ant. 110<br />

C.1 Regression <strong>an</strong>alysis results for real-world programs,<br />

Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant.<br />

C.1.1<br />

For Class Mut<strong>an</strong>ts<br />

Velocity<br />

Hypothesis Y R R 2 P > F<br />

H IC CC M K 0.830 0.689 0.002<br />

H IC CM M K 0.766 0.587 0.001<br />

H IC CD M K 0.684 0.468 0.006<br />

H EC CC M K 0.790 0.621 0.007<br />

H EC CM M K 0.754 0.569 0.411<br />

H EC CD M K 0.491 0.241 0.456<br />

Xal<strong>an</strong><br />

Hypothesis Y R R 2 P > F<br />

H IC CC M K 0.767 0.589 0.003<br />

H IC CM M K 0.705 0.498 0.002<br />

H IC CC M K 0.710 0.504 0.001<br />

H EC CD M K 0.706 0.499 0.046<br />

H EC CM M K 0.706 0.499 0.254<br />

H EC CD M K 0.649 0.421 0.680<br />

Ant<br />

Hypothesis Y R R 2 P > F<br />

H IC CC M K 0.773 0.598 0.003<br />

H IC CM M K 0.708 0.501 0.005<br />

H IC CC M K 0.829 0.687 0.001<br />

H EC CD M K 0.749 0.561 0.075<br />

H EC CM M K 0.749 0.561 0.342<br />

H EC CD M K 0.463 0.214 0.127


C.2. Regression <strong>an</strong>alysis results for real-world programs, Velocity, Xal<strong>an</strong><br />

<strong>an</strong>d Ant. 111<br />

C.1.2<br />

For Traditional Mut<strong>an</strong>ts<br />

Velocity<br />

Hypothesis Y R R 2 P > F<br />

H IC CC M K 0.570 0.325 0.048<br />

H IC CM M K 0.644 0.415 0.054<br />

H IC CD M K 0.375 0.141 0.065<br />

H EC CC M K 0.642 0.412 0.115<br />

H EC CM M K 0.463 0.214 0.256<br />

H EC CD M K 0.392 0.154 0.658<br />

Xal<strong>an</strong><br />

Hypothesis Y R R 2 P > F<br />

H IC CC M K 0.598 0.358 0.091<br />

H IC CM M K 0.567 0.321 0.078<br />

H IC CC M K 0.463 0.214 0.154<br />

H EC CD M K 0.676 0.457 0.254<br />

H EC CM M K 0.459 0.211 0.254<br />

H EC CD M K 0.381 0.145 0.351<br />

Ant<br />

Hypothesis Y R R 2 P > F<br />

H IC CC M K 0.740 0.547 0.065<br />

H IC CM M K 0.649 0.421 0.085<br />

H IC CC M K 0.463 0.214 0.159<br />

H EC CD M K 0.577 0.333 0.241<br />

H EC CD M K 0.606 0.367 0.154<br />

H EC CM M K 0.536 0.287 0.054<br />

H EC CD M K 0.459 0.211 0.216<br />

C.2 Regression <strong>an</strong>alysis results for real-world programs,<br />

Velocity, Xal<strong>an</strong> <strong>an</strong>d Ant.<br />

C.2.1<br />

For Class Mut<strong>an</strong>ts<br />

Velocity<br />

Hypothesis Y R R 2 P > F<br />

H Ic M K 0.326 0.106 0.032<br />

Xal<strong>an</strong><br />

Hypothesis Y R R 2 P > F<br />

H I c M K 0.409 0.167 0.004<br />

Ant<br />

Hypothesis Y R R 2 P > F<br />

H I c M K 0.344 0.118 0.005


C.2. Regression <strong>an</strong>alysis results for real-world programs, Velocity, Xal<strong>an</strong><br />

<strong>an</strong>d Ant. 112<br />

C.2.2<br />

For Traditional Mut<strong>an</strong>ts<br />

Velocity<br />

Hypothesis Y R R 2 P > F<br />

H Ic M K 0.888 0.789 0.003<br />

Xal<strong>an</strong><br />

Hypothesis Y R R 2 P > F<br />

H I c M K 0.803 0.645 0.024<br />

Ant<br />

Hypothesis Y R R 2 P > F<br />

H I c M K 0.836 0.699 0.019


Appendix D<br />

Mutation operators in µJava<br />

Table D.1 presents a description <strong>of</strong> the traditional-level mutation operators in µJava.<br />

Table D.2 presents a description <strong>of</strong> the class-level mutation operators in µJava.<br />

Operator<br />

ABS<br />

AOR<br />

LCR<br />

ROR<br />

UOI<br />

Description<br />

Absolute value insertion<br />

Arithmetic operator replacement<br />

Logical connector replacement<br />

Relational operator replacement<br />

Unary operator insertion<br />

Table D.1: Traditional-level mutation operators in µJava<br />

113


Appendix D. Mutation operators in µJava 114<br />

L<strong>an</strong>guage Feature Operator Description<br />

Inherit<strong>an</strong>ce: IHD Hiding variable deletion<br />

IHI Hiding variable delection<br />

IOD Overriding method deletion<br />

IOP Overriding method calling position ch<strong>an</strong>ge<br />

IOR Overriding method rename<br />

ISK super key word deletion<br />

IPC Explicit call <strong>of</strong> a parents constructor deletion<br />

Polymorphism: PNC new method call with child class type<br />

PMD Inst<strong>an</strong>ce variable declaration with parent class type<br />

PPD Parameter variable declaration with child class type<br />

PRV Reference assignment with other comparable type<br />

Overloading: OMR Overloading method contents ch<strong>an</strong>ge<br />

OMD Overloading method deletion<br />

OAO Argument order ch<strong>an</strong>ge<br />

OAN Argument number ch<strong>an</strong>ge<br />

Java-specific features: JTD this keyword deletion<br />

JSC static modifier ch<strong>an</strong>ge<br />

JID Member variable initialization deletion<br />

JDC Java-supported default constructor creation<br />

Common programming EOA Reference assignment <strong>an</strong>d content<br />

mistakes:<br />

assignment replacement<br />

EOC Reference comparison <strong>an</strong>d content<br />

comparison replacement<br />

EAM Accessor method ch<strong>an</strong>ge<br />

EMM Modifier method ch<strong>an</strong>ge<br />

Table D.2: Class-level mutation operators in µJava


Bibliography<br />

[1] F. Abreu, M. Goulo, <strong>an</strong>d R. Esteves. Toward the design quality evaluation <strong>of</strong><br />

object-oriented s<strong>of</strong>tware systems. In Fifth International Conference on S<strong>of</strong>tware<br />

Quality, pages 44–57, Austin, Texas, USA, Oct 1995.<br />

[2] A.J. Albrecht. Measuring application development. In IBM Applications Development<br />

joint SHARE/GUIDE symposium, pages 83–92, Monterey California,<br />

USA, 1979.<br />

[3] R.T. Alex<strong>an</strong>der <strong>an</strong>d J. Offutt. Coupling-based testing <strong>of</strong> O-O programs. The<br />

Journal <strong>of</strong> Universal Computer Science, 10(4):391–427, 2004.<br />

[4] The Apache Ant Project. Ant. http://<strong>an</strong>t.apache.org/.<br />

[5] E. Arisholm, L.C. Bri<strong>an</strong>d, <strong>an</strong>d A. Foyen. Dynamic <strong>coupling</strong> measures for objectoriented<br />

s<strong>of</strong>tware. IEEE Tr<strong>an</strong>sactions on S<strong>of</strong>tware Engineering, 30(8):491–506,<br />

2004.<br />

[6] V.R. Basili, L.C. Bri<strong>an</strong>d, <strong>an</strong>d W.L. Melo. A validation <strong>of</strong> object-oriented design<br />

<strong>metrics</strong> as quality indicators. IEEE Tr<strong>an</strong>sactions on S<strong>of</strong>tware Engineering,<br />

22(10):751–761, October 1996.<br />

[7] B. Beizer. S<strong>of</strong>tware Testing Techniques. 2nd edition, V<strong>an</strong> Nostr<strong>an</strong>d Reinhold,<br />

New York, USA, 1990.<br />

[8] St<strong>an</strong>dard Perform<strong>an</strong>ce Evaluation Corporation SPECjvm98 Benchmarks.<br />

http://www.spec.org/jvm98.<br />

115


Bibliography 116<br />

[9] J.M. Biem<strong>an</strong> <strong>an</strong>d B.K. K<strong>an</strong>g. Cohesion <strong>an</strong>d reuse in <strong>an</strong> object-oriented system.<br />

In ACM Symposium on S<strong>of</strong>tware Reusability, pages 295–262, Seattle, Washington,<br />

USA, 1995.<br />

[10] R. Binder. Testing Object Oriented Systems: Models, Patterns <strong>an</strong>d Tools. Addison<br />

Wesley, Boston, Massachusetts, USA, October 1999.<br />

[11] L.C. Bri<strong>an</strong>d, J. Daly, V. Porter, <strong>an</strong>d J. Wüst. A comprehensive <strong>empirical</strong> validation<br />

<strong>of</strong> product measures in object-oriented systems. Technical Report ISERN-<br />

98-07, Fraunh<strong>of</strong>er Institute for Experimental S<strong>of</strong>tware Engineering, Germ<strong>an</strong>y,<br />

1998.<br />

[12] L.C. Bri<strong>an</strong>d, J.W. Daly, <strong>an</strong>d J.K. Wüst. A unified framework for <strong>cohesion</strong><br />

measurement in object-oriented systems. Empirical S<strong>of</strong>tware Engineering: An<br />

International Journal, 3(1):65–117, 1998.<br />

[13] L.C. Bri<strong>an</strong>d, J.W. Daly, <strong>an</strong>d J.K. Wüst. A unified framework for <strong>coupling</strong><br />

measurement in object-oriented systems. IEEE Tr<strong>an</strong>sactions on S<strong>of</strong>tware Engineering,<br />

25(1):91–121, J<strong>an</strong>/Feb 1999.<br />

[14] L.C. Bri<strong>an</strong>d, P. Dev<strong>an</strong>bu, <strong>an</strong>d W. Melo. An investigation into <strong>coupling</strong> measures<br />

for C++. In 19th International Conference on S<strong>of</strong>tware Engineering, pages<br />

412–421, Boston, USA, May 1997.<br />

[15] L.C. Bri<strong>an</strong>d, W.L. Melo, <strong>an</strong>d J. Wüst. Assessing the applicability <strong>of</strong> faultproneness<br />

models across object-oriented s<strong>of</strong>tware projects. IEEE Tr<strong>an</strong>sactions<br />

on S<strong>of</strong>tware Engineering, 28(7):706–720, 2002.<br />

[16] L.C. Bri<strong>an</strong>d, S. Morasca, <strong>an</strong>d V. Basili. Measuring <strong>an</strong>d assessing maintainability<br />

at the end <strong>of</strong> high-level design. In International Conference on S<strong>of</strong>tware<br />

Mainten<strong>an</strong>ce, pages 88–97, Montreal, C<strong>an</strong>ada, 1993.<br />

[17] L.C. Bri<strong>an</strong>d, S. Morasca, <strong>an</strong>d V. Basili. Defining <strong>an</strong>d validating high-level design<br />

<strong>metrics</strong>. Technical Report CS-TR 3301, Department <strong>of</strong> Computer Science,<br />

University <strong>of</strong> Maryl<strong>an</strong>d, College Park, MD 20742, USA, 1994.


Bibliography 117<br />

[18] L.C. Bri<strong>an</strong>d, S. Morasca, <strong>an</strong>d V.R. Basili. An operational process for goaldriven<br />

definition <strong>of</strong> measures. IEEE Tr<strong>an</strong>sactions on S<strong>of</strong>tware Engineering,<br />

28(12):1106–1125, December 2002.<br />

[19] L.C. Bri<strong>an</strong>d, J.K. Wüst, J.W. Daly, <strong>an</strong>d V. Porter. Exploring the relationship<br />

between design measures <strong>an</strong>d s<strong>of</strong>tware quality in object-oriented systems. The<br />

Journal <strong>of</strong> Systems <strong>an</strong>d S<strong>of</strong>tware, 51:245–273, 2000.<br />

[20] S. Brown, Á. Mitchell, <strong>an</strong>d J.F. Power. A coverage <strong>an</strong>alysis <strong>of</strong> Java benchmark<br />

suites. In IASTED International Conference on S<strong>of</strong>tware Engineering, pages<br />

144–150, Innsbruck, Austria, Feburary 15-17 2005.<br />

[21] M. Bunge. Treatise on Basic Philosophy: Ontology II: The World <strong>of</strong> Systems.<br />

Riedel, Boston, USA, 1972.<br />

[22] X. Cai <strong>an</strong>d M.R. Lyu. The effect <strong>of</strong> code coverage on fault detection under<br />

different testing pr<strong>of</strong>iles. In First International Workshop on Adv<strong>an</strong>ces in<br />

Model-based Testing, pages 1–7, St. Louis, Missouri, USA, 2005.<br />

[23] M.C. Carlisle <strong>an</strong>d A. Rogers. S<strong>of</strong>tware caching <strong>an</strong>d computation migration in<br />

olden. In ACM Symposium on Principles <strong>an</strong>d Practice <strong>of</strong> Parallel Programming,<br />

pages 29–38, S<strong>an</strong>ta Barbara, California, USA, July 1995.<br />

[24] I.M. Chakravarti, R.G. Laha, <strong>an</strong>d J. Roy. H<strong>an</strong>dbook <strong>of</strong> Methods <strong>of</strong> Applied<br />

Statistics, volume 1. John Wiley <strong>an</strong>d Sons, New York, USA, 1967.<br />

[25] S.R. Chidamber <strong>an</strong>d C.F. Kemerer. Towards a <strong>metrics</strong> suite for object-oriented<br />

design. In Object Oriented Programming Systems L<strong>an</strong>guages <strong>an</strong>d Applications,<br />

pages 197–211, Phoenix, Arizona, USA, November 1991.<br />

[26] S.R. Chidamber <strong>an</strong>d C.F. Kemerer. A <strong>metrics</strong> suite for object-oriented design.<br />

IEEE Tr<strong>an</strong>sactions on S<strong>of</strong>tware Engineering, 20(6):467–493, June 1994.<br />

[27] E.J. Chik<strong>of</strong>sky <strong>an</strong>d J.H. Cross II. Reverse engineering <strong>an</strong>d design recovery: A<br />

taxonomy. IEEE S<strong>of</strong>tware, 7(1):13–17, 1990.


Bibliography 118<br />

[28] J. Choi, M. Gupta, M.J. Serr<strong>an</strong>o, V.C. Sreedhar, <strong>an</strong>d S.P. Midkiff. Stack<br />

allocation <strong>an</strong>d synchronization optimizations for Java using escape <strong>an</strong>alysis.<br />

ACM Tr<strong>an</strong>sactions on Programming L<strong>an</strong>guages <strong>an</strong>d Systems, 25(6):876 – 910,<br />

November 2003.<br />

[29] L.L Const<strong>an</strong>tine <strong>an</strong>d E. Yourdon. Structured Design. Prentice-Hall, Englewood<br />

Cliffs, New Jersey, USA, 1979.<br />

[30] M. Dahm. Byte Code Engineering Library (BCEL), version 5.1, April 25 2004.<br />

http://jakarta.apache.org/bcel/.<br />

[31] D.P. Darcy, C.F. Kemerer, S.A. Slaughter, <strong>an</strong>d T.A. Tomayko. The structural<br />

complexity <strong>of</strong> s<strong>of</strong>tware: An experimental test. TOSE, 31(11):982–995, 2005.<br />

[32] S. Demeyer, S. Ducasse, <strong>an</strong>d O. Nierstrasz. Finding refactorings via ch<strong>an</strong>ge<br />

<strong>metrics</strong>. In 15th ACM SIGPLAN conference on Object-oriented programming,<br />

systems, l<strong>an</strong>guages, <strong>an</strong>d applications, pages 166–178, Minneapolis, Minnesota,<br />

USA, 2000.<br />

[33] B. Dufour, K. Driesen, L. J. Hendren, <strong>an</strong>d C. Verbrugge. Dynamic <strong>metrics</strong> for<br />

Java. In Conference on Object-Oriented Programming Systems, L<strong>an</strong>guages <strong>an</strong>d<br />

Applications, pages 149–168, Anaheim, California, USA, October 26-30 2003.<br />

[34] J. Eder, G. Kappel, <strong>an</strong>d M. Schrefl. Coupling <strong>an</strong>d <strong>cohesion</strong> in object–oriented<br />

systems. Technical Report 2/93, Department <strong>of</strong> Information Systems, University<br />

<strong>of</strong> Linz, Linz, Austria, 1993.<br />

[35] D.W. Embley <strong>an</strong>d S.N. Woodfield. Cohesion <strong>an</strong>d <strong>coupling</strong> for abstract data<br />

types. In 6th International Phoenix Conference on Computers <strong>an</strong>d Communications,<br />

pages 144–153, Phoenix, Arizona, USA, 1987.<br />

[36] T. J. Emerson. A discrimin<strong>an</strong>t metric for module <strong>cohesion</strong>. In 7th 1nternational<br />

Conference on S<strong>of</strong>tware Engineering, pages 294–303, Orl<strong>an</strong>do, Florida, USA,<br />

1984.


Bibliography 119<br />

[37] T. J. Emerson. Program testing, path coverage, <strong>an</strong>d the <strong>cohesion</strong> metric. In<br />

Computer S<strong>of</strong>tware Application Conference, pages 421–431, Chicago, Illinois,<br />

USA, 1984.<br />

[38] J. Engel. Programming for the Java Virtual Machine. Addison-Wesley, California,<br />

USA, 1999.<br />

[39] N.E. Fenton <strong>an</strong>d M. Neil. S<strong>of</strong>tware <strong>metrics</strong>: Successes, failures <strong>an</strong>d new directions.<br />

The Journal <strong>of</strong> Systems <strong>an</strong>d S<strong>of</strong>tware, 47:149–157, 1999.<br />

[40] N.E. Fenton <strong>an</strong>d S.L. Pfleeger. S<strong>of</strong>tware Metrics: A Rigorous <strong>an</strong>d Practical<br />

Approach. PWS Publishing Comp<strong>an</strong>y, Boston, Massachusetts, USA, 1997.<br />

[41] F. Fiorav<strong>an</strong>ti <strong>an</strong>d P. Nesi. A <strong>study</strong> on fault-proneness detection <strong>of</strong> objectoriented<br />

systems. In Fifth Europe<strong>an</strong> Conference on S<strong>of</strong>tware Mainten<strong>an</strong>ce <strong>an</strong>d<br />

Reengineering, pages 121–130, Lisbon, Portugal, 14-16 March 2001.<br />

[42] P.G. Fr<strong>an</strong>kl <strong>an</strong>d E.J. Weyuker. An applicable family <strong>of</strong> data flow testing criteria.<br />

IEEE Tr<strong>an</strong>sactions on S<strong>of</strong>tware Engineering, 14(10):1483–1498, 1988.<br />

[43] R.J. Freund <strong>an</strong>d W.J. Wilson. Regression Analysis: Statistical Modeling <strong>of</strong> a<br />

Response Variable. Academic Press, 1998.<br />

[44] R.R. Gonzalez. A unified metric <strong>of</strong> s<strong>of</strong>tware complexity: Measuring productivity,<br />

quality <strong>an</strong>d value. The Journal <strong>of</strong> Systems <strong>an</strong>d S<strong>of</strong>tware, 29(1):17–37,<br />

1995.<br />

[45] D. Gregg, J. Power, <strong>an</strong>d J. Waldron. Platform independent dynamic Java virtual<br />

machine <strong>an</strong>alysis: the Java Gr<strong>an</strong>de Forum benchmark suite. Concurrency<br />

<strong>an</strong>d Computation: Practice <strong>an</strong>d Experience, 15(3-5):459–484, March 2003.<br />

[46] N. Gupta <strong>an</strong>d P. Rao. Program execution based module <strong>cohesion</strong> measurement.<br />

In 16th International Conference on Automated on S<strong>of</strong>tware Engineering, pages<br />

144–153, S<strong>an</strong> Diego, USA, Nov 2001.<br />

[47] M. Halstead. Elements <strong>of</strong> S<strong>of</strong>tware Science. North-Holl<strong>an</strong>d, Amsterdam, 1977.


Bibliography 120<br />

[48] R. G. Hamlet. Testing programs with the aid <strong>of</strong> a compiler. IEEE Tr<strong>an</strong>sactions<br />

on S<strong>of</strong>tware Engineering, 3(4):279–290, 1977.<br />

[49] B. Henderson-Sellers. S<strong>of</strong>tware Metrics. Prentice Hall, Hemel Hempstaed, U.K.,<br />

1996.<br />

[50] B. Henderson-Sellers <strong>an</strong>d J. Edwards. Object-Oriented Knowledge: The Working<br />

Object (Book Two). Prentice Hall, Sydney, Australia, 1994.<br />

[51] M. Hitz <strong>an</strong>d B. Montazeri. Measuring <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong> in object-oriented<br />

systems. In International Symposium on Applied Corporate Computing, pages<br />

25–27, Monterrey, Mexico, October 1995.<br />

[52] M. Hitz <strong>an</strong>d B. Montazeri. Measuring product attributes <strong>of</strong> object-oriented<br />

systems. In Fifth Europe<strong>an</strong> S<strong>of</strong>tware Engineering Conference, pages 124 – 136,<br />

Barcelona, Spain, September 1995.<br />

[53] C. Howells. Gretel: An open-source residual test coverage tool, June 2002.<br />

http://www.cs.uoregon.edu/research/perpetual/S<strong>of</strong>tware/Gretel/.<br />

[54] T.O. Humphries, A. Klauser, A.L. Wolf, <strong>an</strong>d B.G. Zorn. An infrastructure for<br />

generating <strong>an</strong>d sharing experimental workloads for persistent object systems.<br />

S<strong>of</strong>tware–Practice <strong>an</strong>d Experience, 30:387–417, 2000.<br />

[55] Jakarta. The Apache Jakarta Project. http://jakarta.apache.org/.<br />

[56] I.T. Jolliffe. Principal Component Analysis. Springer Verlag, 2nd edition, 2002.<br />

[57] Jikes JVM. http://www-124.ibm.com/developerworks/oss/jikes/.<br />

[58] Kaffe JVM. http://www.kaffe.org/.<br />

[59] Sable JVM. http://sablevm.org/.<br />

[60] B. Kitchemham <strong>an</strong>d S.L. Pfleeger. S<strong>of</strong>tware quality: The elusive target. IEEE<br />

S<strong>of</strong>tware, pages 12–21, 1996.


Bibliography 121<br />

[61] A. Lakhotia. Rule-based approach to computing module <strong>cohesion</strong>. In 15th International<br />

Conference on S<strong>of</strong>tware Engineering, pages 35–44, Baltimore, Maryl<strong>an</strong>d,<br />

USA, 1993.<br />

[62] Y.S. Lee, B.S. Li<strong>an</strong>g, S.F. Wu, <strong>an</strong>d F.J. W<strong>an</strong>g. Measuring the <strong>coupling</strong> <strong>an</strong>d<br />

<strong>cohesion</strong> <strong>of</strong> <strong>an</strong> object-oriented program based on information flow. In International<br />

Conference on S<strong>of</strong>tware Quality, pages 81–90, Maribor, Slovenia, 1995.<br />

[63] W. Li <strong>an</strong>d S. Henry. Object-oriented <strong>metrics</strong> that predict maintainability. The<br />

Journal <strong>of</strong> Systems <strong>an</strong>d S<strong>of</strong>tware, 23(2):111–122, 1993.<br />

[64] R.J. Lipton, R.A. DeMillo, <strong>an</strong>d F.G. Sayward. Hints on test data selection:<br />

Help for the practicing programmer. IEEE Computer, 11(4):34–41, 1978.<br />

[65] M. Lorenz <strong>an</strong>d J. Kidds. Object-Oriented S<strong>of</strong>tware Metrics. Prentice Hall<br />

Object-Oriented Series, Englewood Cliffs, USA, 1994.<br />

[66] Y. Ma, J. Offutt, <strong>an</strong>d Y. Kwon. Mujava: An automated class mutation system.<br />

The Journal <strong>of</strong> S<strong>of</strong>tware Testing, Verification <strong>an</strong>d Reliability, 15(2):97–<br />

133, June 2005.<br />

[67] Y.S. Ma, Y.R. Kwon, <strong>an</strong>d J. Offutt. mujava. http://www.isse.gmu.edu/-<br />

faculty/<strong>of</strong>ut/mujava/.<br />

[68] Y.S. Ma, Y.R. Kwon, <strong>an</strong>d J. Offutt. Inter-class mutation operators for java.<br />

In 13th International Symposium on S<strong>of</strong>tware Reliability Engineering, pages<br />

352–363, Annapolis, Maryl<strong>an</strong>d, USA, November 2002.<br />

[69] Y.K. Malaiya, M.N. Li, J.M. Biem<strong>an</strong>, <strong>an</strong>d R. Karcich. S<strong>of</strong>tware reliability<br />

growth with test coverage. IEEE Tr<strong>an</strong>sactions on Reliability, 51(4):420426,<br />

December 2002.<br />

[70] R. Martin. OO design quality <strong>metrics</strong>: An <strong>an</strong>alysis <strong>of</strong> dependencies. In Proceedings<br />

Workshop on Pragmatic <strong>an</strong>d Theoretical Directions in Object-Oriented<br />

S<strong>of</strong>tware Metrics, 1994.


Bibliography 122<br />

[71] T. McCabe. A s<strong>of</strong>tware complexity measure. IEEE Tr<strong>an</strong>sactions on S<strong>of</strong>tware<br />

Engineering, 2(4):308–320, 1976.<br />

[72] J.D. McGregor <strong>an</strong>d D.A. Sykes. A Practical Guide to Testing Object-oriented<br />

S<strong>of</strong>tware. Addison Wesley, March 2001.<br />

[73] P.C. Mehlitz. Perform<strong>an</strong>ce <strong>an</strong>alysis <strong>of</strong> Java implementations. In<br />

http://www.tr<strong>an</strong>svirtual.com/presentations/speed/index.html.<br />

[74] Á. Mitchell <strong>an</strong>d J.F. Power. Masters thesis: Dynamic <strong>coupling</strong> <strong>an</strong>d <strong>cohesion</strong><br />

<strong>metrics</strong> for Java programs. Department <strong>of</strong> Computer Science, N.U.I. Maynooth,<br />

Co. Kildare, Irel<strong>an</strong>d, Aug 2002.<br />

[75] Á. Mitchell <strong>an</strong>d J.F. Power. Run-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong> for the <strong>an</strong>alysis <strong>of</strong><br />

Java programs - preliminary results from the SPEC <strong>an</strong>d Gr<strong>an</strong>de suites. Technical<br />

Report NUIM-CS-TR2003-08, Department <strong>of</strong> Computer Science, N.U.I.<br />

Maynooth, Co. Kildare, Irel<strong>an</strong>d, April 2003.<br />

[76] Á. Mitchell <strong>an</strong>d J.F. Power. Run-<strong>time</strong> <strong>coupling</strong> <strong>metrics</strong> for the <strong>an</strong>alysis <strong>of</strong><br />

Java programs - preliminary results from the SPEC <strong>an</strong>d Gr<strong>an</strong>de suites. Technical<br />

Report NUIM-CS-TR2003-07, Department <strong>of</strong> Computer Science, N.U.I.<br />

Maynooth, Co. Kildare, Irel<strong>an</strong>d, April 2003.<br />

[77] Á. Mitchell <strong>an</strong>d J.F. Power. Toward a definition <strong>of</strong> <strong>run</strong>-<strong>time</strong> object-oriented<br />

<strong>metrics</strong>. In 7th ECOOP Workshop on Qu<strong>an</strong>titative Approaches in Object-<br />

Oriented S<strong>of</strong>tware Engineering, Darmstadt, Germ<strong>an</strong>y, July 2003.<br />

[78] Á. Mitchell <strong>an</strong>d J.F. Power. An <strong>empirical</strong> investigation into the dimensions <strong>of</strong><br />

<strong>run</strong>-<strong>time</strong> <strong>coupling</strong> in java programs. In 3rd Conference on the Principles <strong>an</strong>d<br />

Practice <strong>of</strong> Programming in Java, pages 9–14, Las Vegas, Nevada, USA, June<br />

16-18 2004.<br />

[79] Á. Mitchell <strong>an</strong>d J.F. Power. Run-<strong>time</strong> <strong>cohesion</strong> <strong>metrics</strong>: An <strong>empirical</strong> investigation.<br />

In International Conference on S<strong>of</strong>tware Engineering Research <strong>an</strong>d<br />

Practice, pages 532–537, Las Vegas, Nevada, USA, June 21-24 2004.


Bibliography 123<br />

[80] Á. Mitchell <strong>an</strong>d J.F. Power. A <strong>study</strong> <strong>of</strong> the influence <strong>of</strong> coverage on the relationship<br />

between static <strong>an</strong>d dynamic <strong>coupling</strong> <strong>metrics</strong>. Science <strong>of</strong> Computer<br />

Programming Elsevier, Accepted for Publication, March 2005.<br />

[81] Á. Mitchell <strong>an</strong>d J.F. Power. Using object-level <strong>run</strong>-<strong>time</strong> <strong>metrics</strong> to <strong>study</strong> <strong>coupling</strong><br />

between objects. In ACM Symposium on Applied Computing, pages 1456–<br />

1463, S<strong>an</strong>ta Fe, New Mexico, USA, March 13-17 2005.<br />

[82] G. Myers. Reliable S<strong>of</strong>tware Through Composite Design. Mason <strong>an</strong>d Lipscomb<br />

Publishers, New York, USA, 1974.<br />

[83] G. Myers. Composite Structured Design. V<strong>an</strong> Nostr<strong>an</strong>d Reinhold, New York,<br />

USA, 1978.<br />

[84] S. Ntafos. A comparison <strong>of</strong> some structural testing strategies. IEEE Tr<strong>an</strong>sactions<br />

on S<strong>of</strong>tware Engineering, 14(6):868–874, June 1988.<br />

[85] A.J. Offutt, M.J. Harrold, <strong>an</strong>d P. Kolte. A s<strong>of</strong>tware <strong>metrics</strong> system for module<br />

<strong>coupling</strong>. The Journal <strong>of</strong> Systems <strong>an</strong>d S<strong>of</strong>tware, 20(3):295–308, 1993.<br />

[86] J. Offutt, R. Alex<strong>an</strong>der, Y. Wu, Q. Xiao, <strong>an</strong>d C. Hutchinson. A fault model for<br />

subtype inherit<strong>an</strong>ce <strong>an</strong>d polymorphism. In 12th International Symposium on<br />

S<strong>of</strong>tware Reliability Engineering, pages 84–93, Hong Kong, China, November<br />

2001.<br />

[87] J. Offutt, A. Lee, G. Rothermel, R. Untch, <strong>an</strong>d C. Zapf. An experimental<br />

determination <strong>of</strong> sufficient mutation operators. ACM Tr<strong>an</strong>sactions on S<strong>of</strong>tware<br />

Engineering Methodology, 5(2):99–118, April 1996.<br />

[88] L. M. Ott <strong>an</strong>d J. J. Thuss. The relationship between slices <strong>an</strong>d module <strong>cohesion</strong>.<br />

In 11th International Conference on S<strong>of</strong>tware Engineering, pages 198 – 204,<br />

Pittsburgh, Pennsylv<strong>an</strong>ia, USA, 1989.<br />

[89] M. Page-Jones. The Practical Guide to Structured Systems Design. Yourdon<br />

Press, New York, NY, 1980.


Bibliography 124<br />

[90] A.V. Pearson <strong>an</strong>d H.O. Hartley. Biometrica Tables for Statistici<strong>an</strong>s, volume 2.<br />

Cambridge University Press, Cambridge, Engl<strong>an</strong>d, 1972.<br />

[91] S. Phattarsukol <strong>an</strong>d P. Muenchaisri. Identifying c<strong>an</strong>didate objects using hierarchical<br />

clustering <strong>an</strong>alysis. In 8th Asia-Pacific S<strong>of</strong>tware Engineering Conference,<br />

pages 381–389, Macao, Jap<strong>an</strong>, December 4-7 2001.<br />

[92] The Apache XML Project. Xal<strong>an</strong>. http://xml.apache.org/xal<strong>an</strong>-j//.<br />

[93] S.S. Shapiro <strong>an</strong>d M.B. Wilk. An <strong>an</strong>alysis <strong>of</strong> vari<strong>an</strong>ce test for normality (complete<br />

samples). Biometrika, 52(3/4):591–611, 1965.<br />

[94] M.L Shoom<strong>an</strong>. S<strong>of</strong>tware Engineering: Design, Reliability <strong>an</strong>d M<strong>an</strong>agement.<br />

McGraw Hill, New York, USA, 1983.<br />

[95] W.P. Stevens, G.J. Myers, <strong>an</strong>d L. L. Const<strong>an</strong>tine. Structured design. IBM<br />

Systems Journal, 13(2):115–139, 1974.<br />

[96] Sun Microsystems, Inc. Java Platform Debug Architecture (JPDA). http://-<br />

java.sun.com/products/jpda.<br />

[97] TimeWeb. Correlation explained. http://www.bized.ac.uk/<strong>time</strong>web/c<strong>run</strong>ching/-<br />

c<strong>run</strong>ch relate expl.htm, 2002.<br />

[98] D.A. Troy <strong>an</strong>d S.H. Zweben. Measuring the quality <strong>of</strong> structured designs. The<br />

Journal <strong>of</strong> Systems <strong>an</strong>d S<strong>of</strong>tware, 2:112–120, 1981.<br />

[99] S.M. Yacoub, H.H. Ammar, <strong>an</strong>d T. Robinson. Dynamic <strong>metrics</strong> for objectoriented<br />

designs. In S<strong>of</strong>tware Metrics Symposium, pages 50–61, Boca Raton,<br />

Florida, USA, Nov 4-6 1999.<br />

Note: All URL’s correct as <strong>of</strong> 30 th September 2005

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!