12.07.2015 Views

Statistical Testing Using Automated Search - Crest

Statistical Testing Using Automated Search - Crest

Statistical Testing Using Automated Search - Crest

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

POULDING AND CLARK: EFFICIENT SOFTWARE VERIFICATION: STATISTICAL TESTING USING AUTOMATED SEARCH 775brought about by the diversity constraint is greatest atsmaller test sizes, but the effect size is generally small.The additional dotted line in Fig. 6c plots the mutationscores for uniform random testing taken from Experiment B.It can be seen that the improved efficiency enabled by thediversity constraint is insufficient to close the gap withuniform random testing at the largest test sizes.These results provide partial evidence for Hypothesis 5that using a diversity constraint during the search improvesthe efficiency of statistical testing. However, the nature of theimprovement depends on both the SUT and the test size.8 CONCLUSIONS AND FUTURE WORKThe experimental results support a number of the hypothesesproposed in Section 4. We have shown that usingautomated search to derive near-optimal probability distributionsfor statistical testing is not only viable, butpractical for different types of SUT (Hypothesis 1). We havedemonstrated that statistical testing using automated searchcontinues to show superior fault-detecting ability comparedto uniform random and deterministic structural testing(Hypotheses 2 and 3). There was also some evidence thatsearching for distributions with the highest probability lowerbounds resulted in the most efficient test sets (Hypothesis 4).However, uniform random testing was more efficientthan statistical testing at large test sizes for one of the SUTs,and distributions with near-optimal lower bounds showeddiminished fault-detecting ability. We hypothesize thatboth effects result from a lack of diversity in the generatedtest data. Although the addition of a diversity constraint asa search objective did improve test set efficiency (Hypothesis5), the results indicate that the metric used is probablytoo crude to retain all of the important forms of diversity inthe generated data.Further work is therefore indicated on different diversitymetrics, with the goal of identifying a measure thatcorrelates well with the ability of the test set to detectfaults. In addition, we plan to expand the notion of diversityto nondiscrete input domains.One of the challenges of researching software engineeringtechniques is the vast range of software to which atechnique may be applied. We have experimented on threeprograms that we feel have contrasting characteristics.However, in the absence of a detailed understanding of thesoftware characteristics that affect our proposed technique,it is inappropriate to extrapolate more widely. In particular,the cardinality of the input domain for the SUTs wasrelatively small as a result of using integer data types. It willbe important to demonstrate that the technique remainseffective and practical for larger input domains.This suggests further work to investigate the softwarecharacteristics that affect the efficacy of our technique. Asfor other SBSE applications, a barrier to take up by softwareengineers is the lack of guidance on the types of problemsfor which use of search is effective, and how to configurethe search algorithm based on the characteristics of theproblem. For the experimental work in this paper, we tookthe conservative approach of using a simple searchalgorithm and used similar algorithm parameters acrossall three SUTs. However, greater scalability would bepossible if more effective and faster algorithms areidentified—particularly those that make full use of availablecomputing resources through parallel processing—and ifthe optimal algorithm parameters could be set a prioribased on relevant characteristics of the SUT.An alternative to tuning the algorithm in advance based onthe characteristics of the SUT would be to adapt parametervalues during the algorithm run itself. Eiben et al. refer to thisapproach as parameter control in their recent survey of bothparameter tuning and parameter control [42]. Given the vastrange of software that may be tested (noted above) and thedifficulty in identifying SUT characteristics that affectalgorithm performance, parameter control may be a viabletechnique if it avoids the need to identify such characteristics.Currently, the fitness metric for the coverage constraintuses a count of the elements exercised by a sample of inputvectors. However, if an element is exercised by no inputvectors in the sample, little guidance is provided to thesearch. In this case, the incorporation of metrics used by othersearch-based test data generation techniques—such as theapproach level and branch distance discussed in Section 3—into the fitness function might also enable the technique toscale to larger and more complicated SUTs.Another path for improvement in algorithm performanceis the use of optimization methods that efficientlyaccommodate—or even make use of—the noise in thefitness function. (We speculate that some noise in the fitnessmay actually be beneficial to the hill climbing algorithmused for this paper by occasionally permitting moves to lessfit distributions in order to escape nonglobal local optima.)A number of methods are suggested by existing work onnoisy fitness functions for simulated annealing and evolutionaryalgorithms [43], [44], [45], [46].Experiment D used two competing objectives—thecoverage and diversity constraints—to find the mostefficient probability distribution. It might therefore beconstructive to apply optimization techniques that areexplicitly multiobjective, rather than using a single fitnessfunction that combines the constraints, as we did in thispaper. This use of efficient multi-objective optimizationalgorithms is an approach taken by many recent SBSEapplications [7], [10], [13], [47].The wider applicability of the search technique proposedin this paper requires an extension of the representation toother input data types. The current use of real numbers in theinternal representation of probability distributions, and ofbinning to control the size of the representation, promises arelatively straightforward extension to floating point arguments.However, the incorporation of nonscalar data types,such as strings, objects, and pointers, will be a significantchallenge, and is a current topic of research for other searchbasedtest data generation techniques (e.g., [48]).Finally, there is scope to search for distributionssatisfying broader adequacy criteria. The coverage elementscould be specified by other testing objectives, suchas coverage of the software’s functionality. The criterion onthe coverage probability distribution could be expressed interms of properties other than the lower bound, such as anincreased probability of exercising parts of the softwarethat have previously shown a propensity for faults, or thatare particularly critical to the correct or safe operation ofthe program.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!