27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

anches is more applicable than the approaches with either<br />

statements or methods.<br />

4. Related Work<br />

As the testing and analysis activities on released software<br />

systems are different from those in-house, researchers have<br />

proposed some techniques to facilitate testing and analysis<br />

of released software systems. Pavlopoulou and Young [11]<br />

proposed a residual test coverage monitoring tool to selectively<br />

instrument a Java program under test so that the performance<br />

overhead is acceptable. As a software system has<br />

distributed in sites of many remote users, a monitoring task<br />

can be divided into several subtasks, and each subtask is<br />

assigned to different instances of the software system and<br />

can be implemented by minimal instrumentation. This approach,<br />

called the Gamma system, isimplemented by Orso<br />

and colleagues [1, 10]. The preceding research aims to<br />

propose different instrumentation strategies in monitoring<br />

a released software system. However, our work uses execution<br />

data, which is collected by these instrumentation techniques,<br />

to distinguish failing executions from passing ones.<br />

The execution data collected from the field can be analyzed<br />

and used for the purpose of software maintenance and<br />

evolution. Liblit and colleagues [6, 7] proposed to gather<br />

executions by using Bernoulli process and then isolate bugs<br />

based on the sampling executions. Orsa and colleagues [9]<br />

investigated the use of execution-data from the field to support<br />

and improve impact analysis and regression testing inhouse.<br />

Although both of these research and ours use the<br />

execution data, they have different aims. These work aims<br />

at fault localization, impact analysis or testing, whereas our<br />

work aims at classifying executions.<br />

5. Conclusion<br />

In this paper, we performed an empirical study on<br />

execution-data classification by varying the three factors:<br />

machine learning algorithms, number of training instances,<br />

and type of execution data. Among the three machinelearning<br />

algorithms, the Random Forest algorithm makes<br />

the classification approach produce significantly better results<br />

than the Naive Bayes algorithm. Either of these algorithms<br />

makes the classification approach produce significantly<br />

better results than the Sequential Minimal Optimization<br />

algorithm. As we increased the number of training<br />

instances, the classification model usually produces better<br />

classification results. However, execution-data classification<br />

approach still correctly classifies most execution data<br />

when fed with a small number of training instances (i.e.,<br />

1/5 of the number of executable statements contained by a<br />

program). Moreover, when the type of execution data is<br />

branch coverage, the corresponding classification approach<br />

produces better results than the approach with the other<br />

types of execution data.<br />

Acknowledgements<br />

This work is supported by the National Basic Research<br />

Program of China under Grant No. 2009CB320703, Science<br />

Fund for Creative Research Groups of China under<br />

Grant No. 60821003, the National Natural Science Foundation<br />

of China under Grant No. 91118004, and the National<br />

Natural Science Foundation of China under Grant No.<br />

60803012.<br />

References<br />

[1] J. Bowring, A. Orso, and M. J. Harrold. Monitoring deployed<br />

software using software tomograph. In Proc. ACM<br />

SIGPLAN-SIGSOFT Workshop on Program Analysis for<br />

Software Tools and Engineering, pages 2–8, 2002.<br />

[2] L. Breiman. Random forests. Machine Learning, 45(1):5–<br />

32, Oct. 2001.<br />

[3] H. Do, S. Elbaum, and G. Rothermel. Supporting controlled<br />

experimentation with testing techniques: An infrastructure<br />

and its potential impact. Empirical Software Engineering,<br />

10(4):405–435, 2005.<br />

[4] T. Fawcett. An introduction to roc analysis. Pattern Recognition<br />

Letters, 27:861–874, 2006.<br />

[5] M. Haran, A. Karr, A. Orso, A. Porter, and A. Sanil. Applying<br />

classification techniques to remotely-collected program<br />

execution data. In Proc. 13th ACM SIGSOFT International<br />

Symposium on Foundations of Software Engineering, pages<br />

146–155, 2005.<br />

[6] B. Liblit, A. Aiken, A. X. Zheng, and M. I. Jordan. Bug<br />

isolation via remote program sampling. In Proc. ACM SIG-<br />

PLAN 2003 Conference on Programming Languages Design<br />

and Implementation, pages 141–154, 2003.<br />

[7] B. Liblit, M. Naik, A. X. Zheng, A. Aiken, and M. I. Jordan.<br />

Scalable statistical bug isolation. In Proc. 2005 ACM SIG-<br />

PLAN Conference on Programming Language Design and<br />

Implementation, pages 15–26, June 2005.<br />

[8] A. McCallum and K. Nigam. Acomparison of event models<br />

for naive bayes text classification. In Proc. AAAI-98<br />

Workshop on Learning for Text Categorization, pages 41–<br />

48, 1998.<br />

[9] A. Orso, T. Apiwattanaapong, and M. J. Harrold. Leveraging<br />

field data for impact analysis and regression testing. In<br />

Proc. 10th ACM SIGSOFT Symposium on the Foundations<br />

of Software Engineering, pages 128–137, 2003.<br />

[10] A. Orso, D. Liang, M. J. Harrold, and R. Lipton. Gamma<br />

system: Continuous evolution of software after deployment.<br />

In Proc. the International Symposium on Software Testing<br />

and Analysis, pages 65–69, July 2002.<br />

[11] C. Pavlopoulou and M. Young. Residual test coverage monitoring.<br />

In Proc. 21st International Conference on Software<br />

Engineering, pages 277–284, May 1999.<br />

[12] J. C. Platt. Fast training of support vector machines using sequential<br />

minimal optimization. Advances in kernel methods:<br />

support vector learning, pages 185 – 208, 1999.<br />

288

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!