27.03.2014 Views

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SEKE 2012 Proceedings - Knowledge Systems Institute

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Siemens set contains seven C programs, and all of them can be<br />

downloaded from the SIR repository [17]. Each of the<br />

programs has a correct version, a number of faulty versions<br />

seeded with a single fault, and a corresponding test suite. We<br />

compare the output results of the correct versions with that of<br />

the corresponding seeded versions to determine the failures.<br />

Failures determined in this manner are called output-based<br />

failures. According to the execution profile, if a faulty element<br />

is executed during a test case, but no output-based failure is<br />

detected, we categorize the test case as coincidentally correct.<br />

Our study only takes into account 115 seeded versions, and<br />

excludes the other versions because they contain code-missing<br />

errors or the faulty statements are not executable. Figure 1<br />

summarizes the result. It illustrates the exhibited level of<br />

coincidental correctness is significant. The horizontal axis<br />

represents the percentages of coincidentally correct tests (each<br />

bar corresponds to a range of size 10%). The vertical axis<br />

represents the percentage of seeded versions that exhibited a<br />

given range. As can be seen, coincidental correctness is<br />

common in software testing.<br />

2.2 Safety Reducing Effect<br />

Denmat et al. [15] pointed out the limitation of CBFL and<br />

argued that the effectiveness of this technique largely depended<br />

on the hypothesis that executing the faulty statements leads<br />

most of the time to a failure.<br />

In the following, we use Ochiai as an example to show that<br />

coincidental correctness is a potential safety-reducing factor.<br />

As shown in L. Naish et al., [16], the suspiciousness metric of<br />

Ochiai is defined as:<br />

M(e) =<br />

( a<br />

ef<br />

aef<br />

anf<br />

)( a<br />

ef<br />

a<br />

e = faulty program element<br />

a ef = number of failed runs that execute e<br />

a nf = number of failed runs that do not execute e<br />

a ep = number of passed runs that execute e<br />

Assume that there are k tests which execute e but do not<br />

raise a failure. Two strategies can be applied on these tests to<br />

improve the accuracy of the CBFL technique. The first strategy<br />

% Faulty Versions<br />

30<br />

25<br />

20<br />

15<br />

10<br />

5<br />

0<br />

10 20 30 40 50 60 70 80 90 100<br />

% Coincidental Correctnes(|Tcc|/|T|)<br />

Figure 1. Frequency of Coincidental Correctness<br />

ep<br />

)<br />

is to remove these tests from the test suite, that is, to subtract k<br />

from a ep . Consequently, the suspiciousness metric will be:<br />

M'(e) =<br />

( a<br />

ef<br />

aef<br />

anf<br />

)( a<br />

ef<br />

a<br />

ep<br />

k)<br />

It is easy to know that M(e) ≤ M’(e). To verify:<br />

M’(e) ≥ 0, M(e) ≥ 0 and M’(e)/M(e) ≥ 1 => M(e) ≤<br />

M’(e).<br />

The second strategy is to relabel those tests from “passed”<br />

to “failed”, i.e., to subtract k from a ep and add it to a ef , the<br />

suspiciousness metric will be:<br />

M'' (e) =<br />

( a<br />

ef<br />

aef<br />

k<br />

anf<br />

k)(<br />

a<br />

ef<br />

a<br />

It is easy to know that M(e) ≤ M’’(e). To verify:<br />

M’’ 2 (e) -M 2 (e) ≥ 0 => M(e) ≤ M’’(e). It can be seen<br />

that ignoring coincidentally correct test cases will leads to an<br />

underestimating of the suspiciousness of the faulty element.<br />

3 METHODOLOGY<br />

3.1 General Process<br />

Some symbols we use throughout the rest of the paper are<br />

explained as follows:<br />

T: the test suite used for a given program.<br />

Tp: the set of passed test cases.<br />

Tf:the set of failed test cases.<br />

Tcc: the set of coincidentally correct test cases.<br />

Ticc: the set of identified coincidentally correct tests.<br />

Given a test suite T, which is comprised of Tp and Tf, the<br />

goal is to identify Tcc from Tp. The result is Ticc, and each<br />

element of Ticc is a potential candidate of the members of Tcc.<br />

In this paper, we propose a clustering-based strategy to<br />

obtain Ticc. The goal of cluster analysis is to partition objects<br />

into clusters such that objects with similar attributes are placed<br />

in the same cluster, while objects with dissimilar attributes are<br />

placed in different clusters [7]. So, execution profiles are used<br />

as the features fed to a clustering algorithm. Specifically, test<br />

cases which execute the faulty elements and have similar<br />

execution profiles with the failed test cases are likely to be<br />

clustered together. Therefore, if a cluster consists of both failed<br />

test cases and passed test cases, the passed test cases within this<br />

cluster are very likely to be coincidentally correct. Note that<br />

our approach is based on the single-fault assumption. Multifault<br />

programs are not within the discussion of this paper, but<br />

will be explored in the near future.<br />

For a developer to find the fault with the help of automatic<br />

fault-localization techniques, he/she can use the following<br />

procedure to take advantage of our strategy to improve the<br />

effectiveness of the diagnosis: First, a set of test cases is<br />

executed on the given program. As a result, each test case is<br />

labeled "passed" or "failed" according to its output result.<br />

Execution profiles which reveal the coverage information are<br />

ep<br />

)<br />

268

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!