PROGRAM STRUCTURE TREES - Software Systems Lab

More documents

Recommendations

Info

3.1 Regions in structured programs Obtaining an region tree of a structured annotated program is trivial. Every loop and every condition is a region. The nesting of the region tree is equivalent to the nesting of the loops and conditions in the original program. Therefore this approach is taken in several algorithms, without explicitally specifying it as an region detection algorithm. However as soon as more complicated constructs like loops with multiple exits (breaks), exceptions or even gotos appear this approach does not work any more. Unfortunately many programming languages allow at least some of these constructs so a more general approach is necessary. 3.2 The Program Structure Tree The in [4] published algorithm nowadays can be seen as the classical approach to calculated region trees, or as they are called in this paper, program structure trees (PST) for a general, possibly irreducible, CFG. One reason for this is the fast and streigtforward algorithm. Based on a simple data structure, called bracket list, the CFG is analysed without any previous information required. The runtime is in O(V + E). The algorithm can detect simple regions, however no refined regions. To detect refined regions, a possible approach is to insert merge basic blocks beforehand. However is has two drawbacks. First of all, deciding where to insert merge blocks is not trivial, but probably requires some analysis. Furthermore modifiying the program often invalidates existing analysis like dominance information, and is nothing that someone wants in a production compiler. 3.3 The Refined Program Structure Tree In [5] the PST approach was extended and a such called Refined Program Structure Tree (RPST) was introduced. This RPST was used to model workflows in buisness processes, however it could also be applied to control flow graphs. One of the main advantages of the RPST is the refined definition of a region, that allows not only to present simple regions with just one entry and exit edge, but also regions that still have several entry and exit edges, which could be joined to a single entry or exit edge. This refinement permits the detection of regions, that cannot be handled in a plain PST. To calculate the RPST a preliminary analysis is required to build the triconnected components of the CFG as described in [3] and corrected/improved in [2]. If this analysis is not yet available the afford required to implement the RPST construction algorithm seemd to be high, especially as the triconnected components algorithm is not trivial. Another drawback of this algorithm and the refined region definition is, that a region cannot be described in constant memory, but has to be defined by all incoming and outgoing edges. To know if a basic block is part of a region a auxilary data structure has to store a mapping in between basic blocks and regions, otherwise a graph walk is required. 6
Analysis PST RPST DRPST Applicable All CFGs All CFGs All CGSs Precision Basic regions Enhanced regions Enhanced regions Runtime O(V + S) O(V + S) O((V + S) 2 ), probably better Prerequisites None Triconnected components Dominance and Postdominance trees Representation 2 edges all edges in region 2 basic blocks BB in Region extra mapping required based on dominance info Table 1: Comparison of different region detection algorithms 3.4 Dominance Tree based RPST Calculation In winter 2009 another approach was developed as a region detection algorithm for the LLVM compiler toolkit. The objective was to achieve the same precision as in the previous described algorithm, but to take advantage of already existing analysis. One of the most common analysis in restructering compilers is the (post)dominance information. Therefore a relatively simple algorithm was developed, that calculates a region tree based on the (post)dominance information already available. The algorithm is able to detect all refined regions on any (even unstructured) CFG. A first analysis of the runtime complexity has shown an upper bound of O((V + S) 2 ), however it seems possible to proof even better performance in the order of magnitude of O((V + S) + log(V + S)). Another advantage of a dominance tree based approach are the constant time operations to check if a basic block is part of a region. These operations are possible, as the algorithm can take advantage of the existing (post) dominance information. This also leads to the advantage of being able to store the description of a region in a constant amount of memory, two references to a basic block. 4 comparison In “Table 1” the attributes of all algorithms, that can handle general CFGs, are summed up. Because of the better precision the RPST and DRPST algorithms seem to be the most powerful analysis. In theory the RPST algorithm is already perfect in terms of runtim complexity, coverage and precision, however in practise it requires a lot of implementation afford. This problem seems to be solved by the DRPST algorithm, that can be implemented easily, if dominance information is already available. The only drawback is the not yet proven optimal runtime complexity. However in first tests a limitation because of this, was not found. 5 region detection in static program analysis In static program analysis, especially software model checking, often extremly expensive analysis are performed. To get resonable runtime it is therefore necessary to reduce the affords required as much as possible. Program region trees offer various possiblities to reduce complexity. 7
Page 1 and 2: P R O G R A M S T R U C T U R E T R
Page 3 and 4: int i, a, b i = 0 if (i != 100) T F
Page 5: a g h b l j i f k c d e Figure 4: P
Page 9 and 10: a b a b c_1 i = 100 i=100 if (b ==

PROGRAM STRUCTURE TREES - Software Systems Lab

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?