27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

of execution-data classification. Third, their study investigated<br />

execution data only in the form of counts for methods,<br />

statements and so on, but did not investigate execution data<br />

in the form of coverage, which is widely used in practice.<br />

In this paper, we report an empirical study that evaluates<br />

the impact of these three factors on the effectiveness<br />

of execution-data classification. The aim of our empirical<br />

study is to provide evidence on the determination of<br />

the three important factors in practice. For the first factor,<br />

we investigate three popular machine-learning algorithms:<br />

the Random Forest algorithm (RF), the Sequential Minimal<br />

Optimization algorithm(SMO), and the Naive Bayes algorithm(NB).<br />

For the second factor, we investigate training<br />

sets of different sizes. For the third factor, we investigate<br />

five types of execution data: the statement count, the statement<br />

coverage, the method count, the method coverage, and<br />

the branch coverage. The main contributions of this paper<br />

are as follows. First, to our knowledge, our study is<br />

the first empirical study on the impact of the three factors<br />

in execution-data classification based on machine learning.<br />

Second, our study provides guidelines for the application<br />

of execution-data classification based on machine learning<br />

regarding the choice of machine-learning algorithms, the<br />

number of training instances, and the type of execution data.<br />

2. Execution-Data Classification Based on Machine<br />

Learning<br />

Haran and colleagues [5] proposed to collect execution<br />

data by instrumenting instances of a released software system<br />

and then classify passing executions and failing executions<br />

based on these collected execution data using the Random<br />

Forest algorithm, which is an implementation of treebased<br />

classification. This technique can be viewed as an<br />

application of a classification algorithm in machine learning<br />

for execution-data classification. In the following, we generalize<br />

Haran and colleagues’ technique to execution-data<br />

classification based on machine learning.<br />

For the ease of presentation, let us consider the execution<br />

data based on the statement count. For a program consisting<br />

of n statements (denoted as ST={st 1 ,st 2 , ..., st n }), the<br />

execution data of a test case (denoted as t) is represented as<br />

a series of numbers, each of which is the number of times a<br />

statement is executed when using t to test the program. All<br />

the test cases (whose execution data are already obtained)<br />

are divided into two sets: the training set and the testing<br />

set 1 . Each test case in the training set is also associated<br />

with a label that indicates whether the test case is a passing<br />

one or a failing one. For test cases in the testing set, it is<br />

unknown whether each such test case is a passing one or a<br />

failing one. The problem of execution-data classification is<br />

1 The training set and the testing set are two terminologies in machine<br />

learning. Note that the testing set is not related to software testing.<br />

Table 1. Subjects<br />

Program Ver. Test TrainingSet LOC(Exe) Met. Bra.<br />

print tokens 7 4,072 41∼203 565(203) 18 35<br />

print tokens2 10 4,057 41∼203 510(203) 18 70<br />

replace 32 5,542 58∼289 563(289) 21 61<br />

schedule 9 2,627 33∼162 412(162) 18 25<br />

schedule2 10 2,683 29∼144 307(144) 16 31<br />

tcas 41 1,592 14∼67 173(67) 9 16<br />

tot info 23 1,026 14∼136 406(136) 7 36<br />

Space 38 13,585 1,244∼6,218 9,564(6218) 136 530<br />

to use the training set to predict whether each test case in<br />

the testing set is a passing one or a failing one.<br />

3. Experimental Study<br />

In this experiment, we plan to evaluate the impact of<br />

three factors in execution-data classification so as to answer<br />

the following three research questions. RQ1— which<br />

machine-learning algorithm produces the best results in<br />

execution-data classification? RQ2— what is the minimal<br />

number of training instances that are needed to produce<br />

good enough results in execution-data classification?<br />

RQ3— which type of execution-data produces the best results<br />

in execution-data classification?<br />

Table 1 presents the details of the eight C subjects used in<br />

this experiment, whose source code and test collections are<br />

available from Software-artifact Infrastructure Repository<br />

(http://sir.unl.edu/portal/index.php) [3]. Specifically, the table<br />

gives the name of each subject, number of faulty versions,<br />

number of test cases in the test collections, the range<br />

of the numbers of training instances used to construct a classification<br />

model in our empirical study, the number of lines<br />

of code, and the number of branch statements. Moreover,<br />

the “Exe” (abbreviation of executable statements) within<br />

parentheses represents the number of executable statements,<br />

which presents the real size of a subject. As the faulty programs<br />

in Table 1 are all single-fault programs, we totally<br />

constructed 80 multiple-fault programs for the eight subjects<br />

by extracting the faults from the single-fault programs<br />

of each subject and then seeding more than one faults into<br />

each subject.<br />

3.1. Independent and Dependent Variables<br />

The independent variables come from the three research<br />

questions and are defined as follows.<br />

The first independent variable Algorithm refers to the<br />

machine-learning algorithm that is used to build a classification<br />

model. Many algorithms have been proposed in<br />

the literature of machine learning. Here we investigate the<br />

following three representative algorithms: Random Forest<br />

(RF) [2], Naive Bayes (NB) [8], and Sequential Minimal<br />

Optimization (SMO) [12], because these algorithms have<br />

been found to be effective in the literature of machine learning<br />

and have been widely used. Moreover, these algorithms<br />

284

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!