07.01.2013 Views

Lecture Notes in Computer Science 3472

Lecture Notes in Computer Science 3472

Lecture Notes in Computer Science 3472

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

14 Tools for Test Case Generation 435<br />

These theoretical results co<strong>in</strong>cide with the results obta<strong>in</strong>ed with the benchmark<strong>in</strong>g<br />

experiment discussed below.<br />

Regard<strong>in</strong>g the other theoretical aspects we have tried to give as much <strong>in</strong>formation<br />

as possible <strong>in</strong> the tool descriptions. Not all facts (especially complexity<br />

issues) are known for every tool and some aspects are still actual research topics.<br />

Examples of the latter are compositionality, complex data and real time issues.<br />

14.3.2 Benchmark<strong>in</strong>g<br />

The benchmark<strong>in</strong>g approach takes the view that, as the proof of the pudd<strong>in</strong>g is<br />

<strong>in</strong> the eat<strong>in</strong>g, the comparison (test<strong>in</strong>g) of the test tool is <strong>in</strong> see<strong>in</strong>g how successful<br />

they are at f<strong>in</strong>d<strong>in</strong>g errors. To make comparison easier, a controlled experiment<br />

can be set up. In such an experiment, a specification (formal or <strong>in</strong>formal) is<br />

provided, together with a number of implementations. Some of the implementations<br />

are correct, others conta<strong>in</strong> errors. Each of the tools is then used to try to<br />

identify the erroneous implementations. Ideally, the persons do<strong>in</strong>g the test<strong>in</strong>g do<br />

not know which implementations are erroneous, nor do they know details about<br />

the errors themselves. Also, the experience that they have with the tools should<br />

be comparable (ideally, they should all be expert users, to give each tool the<br />

best chance <strong>in</strong> succeed<strong>in</strong>g).<br />

In the literature we have found a few references to benchmark<strong>in</strong>g or similar<br />

experiments.<br />

Other discipl<strong>in</strong>es, for example model check<strong>in</strong>g, have collected over time a<br />

common body of cases or examples, out of which most tool authors pick their<br />

examples when they publish results of their new or updated tools, such that<br />

their results can be compared to those of others.<br />

In (model-based) test<strong>in</strong>g this is much less the case, <strong>in</strong> our experience. Often<br />

papers about model-based test<strong>in</strong>g tools do refer to case studies done with the<br />

tools, but usually the case studies are one-time specific ones. Moreover, many<br />

of the experiments done for those cases cannot be considered controlled <strong>in</strong> the<br />

sense that one knows <strong>in</strong> advance which SUTs are erroneous. This does make those<br />

experiments more realistic – which is no co<strong>in</strong>cidence s<strong>in</strong>ce often the experiments<br />

are done <strong>in</strong> collaboration with <strong>in</strong>dustry – but at the same time it makes it hard<br />

to compare the results, at least with respect to error-detect<strong>in</strong>g power of the tools.<br />

Of course, there are exceptions, where controlled model-based test<strong>in</strong>g experiments<br />

are conducted and the results are published. In some cases those experiments<br />

are l<strong>in</strong>ked with a particular application doma<strong>in</strong>. For example, Lutess has<br />

participated <strong>in</strong> a Feature Interaction contest [dBZ99].<br />

Also <strong>in</strong>dependent benchmark<strong>in</strong>g experiments have been set up, like the “Conference<br />

Protocol Benchmark<strong>in</strong>g Experiment” [BFdV + 99, HFT00, dBRS + 00] that<br />

we will discuss <strong>in</strong> more detail below. The implementations that are tested <strong>in</strong> such<br />

an experiment are usually much simpler than those that one has to deal with <strong>in</strong><br />

day-to-day real-life test<strong>in</strong>g – if only to limit the resources (e.g. time) needed to<br />

conduct or participate <strong>in</strong> the experiment. There is not much one can do about<br />

that.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!