Full Theoretical Runtime Analysis of Alternating ... - IEEE Xplore

More documents

Recommendations

Info

necessarily true for high values of the size that have not beenempirically tested. Other branches might be more difficult.This is one reason why theoretical analyses are necessary.In this paper, we prove that the expected runtime of AVMon all the branches of TC is O((log n) 2 ). That is necessaryand sufficient to prove that AVM has a better runtime onTC compared to the other search algorithms we previouslyanalysed. We also carried out an empirical study to integratethe theoretical analysis.The theorems in this paper are also useful for future analyses.In fact, to state that a search algorithm has worse runtimecompared to AVM, it will be just sufficient to provethat its lower bound is higher than Ω((log n) 2 ) on the coverageof at least one branch of TC.The paper is organised as follows. Section 2 gives backgroundinformation about runtime analysis. Section 3 describesin detail the TC problem, whereas Section 4 describesthe AVM search algorithm. Theoretical analyses arepresented in Section 5, whereas the empirical study is discussedin Section 6. Finally, Section 7 concludes the paper.2 Runtime AnalysisTo make the notion of runtime precise, it is necessary todefine time and size. We defer the discussion on how todefine problem instance size for software testing to the nextsection, and define time first.Time can be measured as the number of basic operationsin the search heuristic. Usually, the most time-consumingoperation in an iteration of a search algorithm is the evaluationof the cost function. We therefore adopt the black-boxscenario [4], in which time is measured as the number oftimes the algorithm evaluates the cost function.Definition 1 (Runtime [3, 8]). Given a class F of cost functionsf i : S i → R, theruntime T A,F (n) of a search algorithmA is defined asT A,F (n) := max {T A,f | f ∈Fwith l(f) =n} ,where l(f) is the problem instance size, and T A,f is thenumber of times algorithm A evaluates the cost function funtil the optimal value of f is evaluated for the first time.A typical search algorithm A is randomised. Hence, thecorresponding runtime T A,F (n) will be a random variable.The runtime analysis will therefore seek to estimate propertiesof the distribution of random variable T A,F (n), in particularthe expected runtime E [T A,F (n)] and the successprobability Pr [T A,F (n) ≤ t(n)] for a given time boundt(n). More details can be found in [1].The last decades of research in the area show that it isimportant to apply appropriate mathematical techniques toget good results [22]. Initial studies of exact Markov chainmodels of search heuristics were not fruitful, except for thethe simplest cases.A more successful and particularly versatile techniquehas been so-called drift analysis [8, 19], where one introducesa potential function which measures the distance fromany search point to the global optimum. By estimating theexpected one-step drift towards the optimum with respectto the potential function, one can deduce expected runtimeand success probability.In addition to drift analysis, the wide range of techniquesused in the study of randomised algorithms [17], in particularChernoff bounds, have proved useful also for evolutionaryalgorithms.3 Triangle Classification ProblemTC is the most famous problem in software testing. Itopens the classic 1979 book of Myers [18], and has beenused and studied since early 70s. Nowadays, TC is stillwidely used in many publications (e.g., [14, 23, 15, 11, 16,24]).We use the implementation for the TC problem that waspublished in the survey by McMinn [15] (see Figure 1).Some slight modifications to the program have been introducedfor clarity.A solution to the testing problem is represented as a vectorI =(x, y, z) of three integer variables. We call (a, b, c)the permutation in ascending order of I. For example, ifI =(3, 5, 1), then (a, b, c) =(1, 3, 5).There is the problem to define what is the size of an instancefor TC. In fact, the goal of runtime analysis is notabout calculating the exact number of steps required forfinding a solution. On the other hand, the runtime complexityof an algorithm gives us insight of scalability of thesearch algorithm. The problem is that TC takes as input afixed number of variables, and the structure of its sourcecode does not change. Hence, what is the size in TC? Wechose to consider the range for the input variables for thesize of TC. In fact, it is a common practise in software testingto put constraints on the values of the input variables toreduce the search effort. For example, if a function takesas input 32 bit integers, instead of doing a search throughover four billion values, a range like {0,...,1000} mightbe considered for speeding up the search.Limits on the input variables are always present in theform of bit representation size. For example, the same pieceof code might be either run on machine that has 8 bit integersor on another that uses 32 bits. What will happen ifwe want to do a search for test data on the same code thatruns on a 64 bit machine? Therefore, using the range of theinput variables as the size of the problem seems an appropriatechoice.114
In our analyses, the size n of the problem defines therange R = {−n/2 +1,...,n/2} in which the variablesin I can be chosen (i.e., x, y, z ∈ R). Hence, the searchspace S is defined as S = {(x, y, z)|x, y, z ∈ R}, and it iscomposed of n 3 elements. Without loss of generality n iseven. To obtain full coverage, it is necessary that n ≥ 8,otherwise the branch regarding the classification as scalenewill never be covered. Note that one can consider differenttypes of R (e.g., R ′ = {0,...,n}), and each type may leadto different behaviours of the search algorithms. We basedour choice on what is commonly used in literature. For simplicityand without loss of generality, search algorithms areallowed to generate solutions outside S. In fact, R is mainlyused when random solutions need to be initialised.The employed fitness function f is the commonly usedapproximation level A plus the branch distance δ [15]. Fora target branch z, we have that the fitness function f z is:f z (I) =A z (I)+ω(δ w (I)) .Note that the branch distance δ is calculated on the nodeof diversion, i.e. the last node in which a critical decision(not taking the branch w) is made that makes the executionof z not possible. For example, branch z could be nestedto a node N (in the control flow graph) in which branch wrepresents the then branch. If the execution flow reachesN but then the else branch is taken, then N is the node ofdiversion for z. The search hence gets guided by δ w to enterin the nested branches.Let be {N 0 ,...,N k } the sequence of diversion nodesfor the target z, with N i nested to all N j>i . Let be D i theset of inputs for which the computation diverges at node N iand none of the nested nodes N j0, the fitness functions for the 12branches (i.e., f i is the fitness function for branch ID i )areshown in Figure 2. Note that the branch distance dependson the status of the computation (e.g., the values of the localvariables) when the predicates are evaluated. For simplicity,in an equivalent way we show the fitness functions basedonly on the inputs I.1: int tri_type(int x, int y, int z) {2: int type;3: int a=x, b=y, c=z;4: if (x > y) { /* ID_0 */5: int t = a; a = b; b = t;6: } else { /* ID_1 */}7: if (a > z) { /* ID_2 */8: int t = a; a = c; c = t;9: } else { /* ID_3 */}10: if (b > c) { /* ID_4 */11: int t = b; b = c; c = t;12: } else { /* ID_5 */}13: if (a + b
Page 1: 2009 1st International Symposium on
Page 5 and 6: 4 Alternating Variable MethodAVM is
Page 7 and 8: point the input variable representi
Page 9: anch ID 8 seems the most difficult

Full Theoretical Runtime Analysis of Alternating ... - IEEE Xplore

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?