25.01.2015 Views

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

Cost-Based Optimization of Integration Flows - Datenbanken ...

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6 On-Demand Re-<strong>Optimization</strong><br />

stratum 1<br />

stratum 2<br />

stratum 3<br />

RNode<br />

o 1 o 2 o 3 o 7<br />

ONode<br />

nid=1 nid=2 nid=3 nid=7<br />

stat 1 stat 3 stat 1 stat 3<br />

stat 1 stat 1 stat 2 stat 5<br />

SNode<br />

monitored<br />

statistics<br />

stratum 4<br />

cstat<br />

cstat<br />

cstat<br />

CSNode<br />

cstat<br />

global memo<br />

structure<br />

stratum 5<br />

ocond<br />

ocond<br />

OCNode<br />

ocond<br />

Figure 6.5: General Structure <strong>of</strong> a PlanOptTree<br />

1. RNode: The single root node refers to m ′ with 1 ≤ m ′ ≤ m operator nodes (ONode).<br />

2. ONode: An operator node is identified by a node identifier nid and refers to s ′ with<br />

1 ≤ s ′ ≤ s statistic nodes (SNode), where s denotes the maximum number <strong>of</strong> atomic<br />

statistic types.<br />

3. SNode: A statistic node exhibits one <strong>of</strong> the s atomic statistic types, where a single<br />

type must not occur multiple times for one operator o i . Further, each SNode contains<br />

a list <strong>of</strong> statistic tuples monitored for o i , a single aggregate, as well as a reference to<br />

a list <strong>of</strong> CSNodes and a list <strong>of</strong> OCNodes.<br />

4. CSNode: A complex statistic node is a mathematical expression using all referenced<br />

parent SNodes or CSNodes as operands, where a CSNode can refer to SNodes <strong>of</strong><br />

different operators. Further, it refers to a list <strong>of</strong> complex statistic nodes (CSNode)<br />

and a list <strong>of</strong> optimality condition nodes (OCNode). Hence, arbitrary hierarchies<br />

<strong>of</strong> complex statistics are possible. In addition, CSNodes can be used to represent<br />

constant values or externally loaded values.<br />

5. OCNode: An optimality condition node is defined as a boolean expression op 1 θ op 2 ,<br />

where θ denotes an arbitrary binary comparison operator and the operands op 1 and<br />

op 2 refer to any CSNode or SNode, respectively. The optimality condition is defined<br />

as violated if the expression evaluates to false.<br />

The nodes <strong>of</strong> strata 1 and 2 are reachable over unidirectional references, while nodes <strong>of</strong><br />

strata 3-5 are defined as bidirectional references (children and parents).<br />

Although the PlanOptTree is a graph, we call it a tree, because from the viewpoint <strong>of</strong><br />

statistic maintenance, only the tree from strata 1 to 3 is relevant, while from the viewpoint<br />

<strong>of</strong> directed optimization, each optimality condition is the root <strong>of</strong> a tree from strata 5 to<br />

3. All references to children and parents are maintained as sorted lists ordered by their<br />

identifier. Conceptually, known index structures can be used instead <strong>of</strong> lists. Furthermore,<br />

each node <strong>of</strong> stratum 4 and stratum 5 is reachable over multiple paths. For this reason,<br />

a PlanOptTree includes a MEMO structure in order to mark subgraphs that have already<br />

been evaluated. Finally, we are able to exploit the following four fundamental properties:<br />

• Minimal Monitoring: The PlanOptTree includes only operators and statistics that<br />

are included in any optimality condition. Thus, we can easily determine the relevant<br />

statistics for minimal statistics monitoring (given by stratum 2 and stratum 3).<br />

172

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!