18.12.2012 Views

Roxana - Gabriela HORINCAR Refresh Strategies and Online ... - LIP6

Roxana - Gabriela HORINCAR Refresh Strategies and Online ... - LIP6

Roxana - Gabriela HORINCAR Refresh Strategies and Online ... - LIP6

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

2.1 Divergence<br />

We focus on the aggregator node n, that applies an aggregation query q on the source s,<br />

where s ∈ S(q). As introduced before, query q generates a new query feed f = q(S(q)),<br />

where the aggregation query q represents a union of filters over the source feeds in S(q).<br />

When discussing divergence, we examine how the changes that are of interest for the<br />

aggregation query q <strong>and</strong> that occur on the source feed s are perceived <strong>and</strong> captured by<br />

the aggregator n.<br />

We begin by formally defining the notion of divergence, as well as the measure proposed<br />

to evaluate it.<br />

Definition 2.1.1. Divergence<br />

Let Tr represent the last time moment before time t when source s has been refreshed by<br />

the aggregator. We define the divergence function Div(s, t, Tr) measured at time t > Tr<br />

as the total number of new items published by the source s in the time period (Tr, t] that<br />

were not yet fetched by the aggregator.<br />

Definition 2.1.2. Divergence relevant to query q<br />

We similarly define the divergence function Div(s, q, t, Tr) measured at time t as the total<br />

number of new items relevant to query q published by the source s in the time period<br />

(Tr, t].<br />

We suppose that the relevant items to query q are published uniformly distributed in time,<br />

such that the selectivity sel(q) is considered constant. If the selectivity sel(q) of the query<br />

q <strong>and</strong> the divergence Div(s, t, Tr) of the source s are known, then we can estimate the<br />

divergence relevant to query q as:<br />

Div(s, q, t, Tr) = sel(q) · Div(s, t, Tr)<br />

Observe that the values of the divergences relevant to two different queries q <strong>and</strong> q ′ defined<br />

on the same source s <strong>and</strong> measured at the same time instant t can be different, especially if<br />

the selectivities of the two queries are different (if sel(q) �= sel(q ′ )), then Div(s, q, t, Tr) �=<br />

Div(s, q ′ , t, Tr).<br />

Definition 2.1.3. Divergence relevant to a set of queries Q<br />

Let Q be the set of disjoint queries defined on source s by an aggregator node: ∩qi = ∅,<br />

where qi ∈ Q. We define the divergence relevant to the set of queries Q measured at time<br />

t as the sum of the divergence functions relevant to all queries qi ∈ Q.<br />

Div(s, Q, t, Tr) = �<br />

Div(s, qi, t, Tr)<br />

qi∈Q<br />

To better underst<strong>and</strong> how the divergence function behaves in time, we introduce an example<br />

in Figure 2.1. Let Tr−1, Tr <strong>and</strong> Tr+1 represent the time moments at which source s is<br />

22

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!