Roxana - Gabriela HORINCAR Refresh Strategies and Online ... - LIP6
Roxana - Gabriela HORINCAR Refresh Strategies and Online ... - LIP6
Roxana - Gabriela HORINCAR Refresh Strategies and Online ... - LIP6
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
2.1 Divergence<br />
We focus on the aggregator node n, that applies an aggregation query q on the source s,<br />
where s ∈ S(q). As introduced before, query q generates a new query feed f = q(S(q)),<br />
where the aggregation query q represents a union of filters over the source feeds in S(q).<br />
When discussing divergence, we examine how the changes that are of interest for the<br />
aggregation query q <strong>and</strong> that occur on the source feed s are perceived <strong>and</strong> captured by<br />
the aggregator n.<br />
We begin by formally defining the notion of divergence, as well as the measure proposed<br />
to evaluate it.<br />
Definition 2.1.1. Divergence<br />
Let Tr represent the last time moment before time t when source s has been refreshed by<br />
the aggregator. We define the divergence function Div(s, t, Tr) measured at time t > Tr<br />
as the total number of new items published by the source s in the time period (Tr, t] that<br />
were not yet fetched by the aggregator.<br />
Definition 2.1.2. Divergence relevant to query q<br />
We similarly define the divergence function Div(s, q, t, Tr) measured at time t as the total<br />
number of new items relevant to query q published by the source s in the time period<br />
(Tr, t].<br />
We suppose that the relevant items to query q are published uniformly distributed in time,<br />
such that the selectivity sel(q) is considered constant. If the selectivity sel(q) of the query<br />
q <strong>and</strong> the divergence Div(s, t, Tr) of the source s are known, then we can estimate the<br />
divergence relevant to query q as:<br />
Div(s, q, t, Tr) = sel(q) · Div(s, t, Tr)<br />
Observe that the values of the divergences relevant to two different queries q <strong>and</strong> q ′ defined<br />
on the same source s <strong>and</strong> measured at the same time instant t can be different, especially if<br />
the selectivities of the two queries are different (if sel(q) �= sel(q ′ )), then Div(s, q, t, Tr) �=<br />
Div(s, q ′ , t, Tr).<br />
Definition 2.1.3. Divergence relevant to a set of queries Q<br />
Let Q be the set of disjoint queries defined on source s by an aggregator node: ∩qi = ∅,<br />
where qi ∈ Q. We define the divergence relevant to the set of queries Q measured at time<br />
t as the sum of the divergence functions relevant to all queries qi ∈ Q.<br />
Div(s, Q, t, Tr) = �<br />
Div(s, qi, t, Tr)<br />
qi∈Q<br />
To better underst<strong>and</strong> how the divergence function behaves in time, we introduce an example<br />
in Figure 2.1. Let Tr−1, Tr <strong>and</strong> Tr+1 represent the time moments at which source s is<br />
22