18.12.2012 Views

Roxana - Gabriela HORINCAR Refresh Strategies and Online ... - LIP6

Roxana - Gabriela HORINCAR Refresh Strategies and Online ... - LIP6

Roxana - Gabriela HORINCAR Refresh Strategies and Online ... - LIP6

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1.4.2 Source <strong>and</strong> Query Feeds<br />

In this section, we analyze <strong>and</strong> discuss feeds from their provenance point of view, as well<br />

as the way they are created <strong>and</strong> used. We distinguish between source feeds <strong>and</strong> query<br />

feeds. In the first case, we use them as input feeds consumed by the aggregator <strong>and</strong> ignore<br />

the way they are created; we call them source feeds. In the other case, we talk about query<br />

feeds when they represent the output feeds generated by an aggregator as the result of<br />

aggregation queries.<br />

Definition 1.4.4. Source feed<br />

A source feed s represents the feed generated by a source, by constantly publishing new<br />

items.<br />

Definition 1.4.5. Query feed<br />

Let q be the aggregation query applied by an aggregator on a set of source feeds S =<br />

{s1, ...sk}. The result of q is called a query feed <strong>and</strong> is denoted by f = q(S).<br />

As already mentioned in Section 1.3, in this dissertation we focus on the frequent case<br />

of aggregation queries computed as a union of filters over source feeds sets <strong>and</strong> has the<br />

general form: q(S) = �<br />

j σj(Sj), where S = ∪jSj.<br />

A filter applied on a set of source feeds σj(Sj) selects the items published by those source<br />

feeds that satisfy a given item predicate. Item predicates are boolean expressions (using<br />

conjunctions, disjunctions, negation) of atomic item predicates that express a condition<br />

on an item attribute. Depending on the item attribute type, atomic predicates may be<br />

applied on the publication date of an item, on the content of an item or on the set of<br />

categories that describes an item. A complex model of aggregation queries specific to the<br />

RoSeS system is presented in [TATV11].<br />

We resume the example presented in Section 1.3 that shows how the query feed f =<br />

IceVolEruFeed is created as a union of two filters applied on two different feed sources,<br />

f = q(s1, s2) = σ1(s1) ∪ σ2(s2). The two input feed sources come from two different<br />

websites, s1 =”The Guardian” <strong>and</strong> s2 =”The Big Picture”. The filter applied on the<br />

first source σ1(s1) selects those items that contain the keywords ”icel<strong>and</strong>”, ”volcano” or<br />

”eruption” in their content. The second filter applied on the second source σ2(s2) selects<br />

only the items that contain the same three keywords within the set of keywords associated<br />

to each item.<br />

The items contained by the query feed f = q(S) published by the aggregator come from<br />

the input source feeds in S <strong>and</strong> have passed the conditions imposed by the filters σj<br />

(they are said to be relevant to query q). We consider that the items are sorted on their<br />

publication date before being inserted in the query feed f. Other sorting criteria may also<br />

be considered, such as the source feed sj from which the item comes from or some sort of<br />

item importance score.<br />

Any query q applied on a set of source feeds introduces a selectivity factor with values<br />

14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!