Reasoning over real time data streams - ENVISION ...

envision.project.eu

Reasoning over real time data streams - ENVISION ...

Deliverable D4.8:Reasoning over real time data streamsDate: Tuesday 25. September 2012Author(s):Dissemination level:Norbert Lanzanasto (UIBK), Srdjan Komazec (UIBK), Ioan Toma(UIBK)PUWP: 4Version: 1.0Keywords:Description:Stream reasoning, prototype, Streaming IRIS, SparkwaveThis deliverable reports about the methodology used to applyStream Reasoning techniques to real time data. Theimplementation of this methodology is realized via a prototypecalled Streaming IRIS and Sparkwave.ICT for Environmental Services andClimate Change AdaptionSmall or Medium-scale Focused Research ProjectENVISION (Environmental Services Infrastructure with Ontologies)Project No: 249120Project Runtime: 01/2010 – 12/2012


Document metadataQuality assurors and contributorsQuality assuror(s)Contributor(s)Vid Podpečan (JSI), Henry Michels (UoM)Norbert Lanzanasto (UIBK), Srdjan Komazec (UIBK), Ioan Toma (UIBK)Version historyVersion Date Description0.1 12.09.2012 TOC ready for peer review0.2 14.09.2012 TOC reviewed0.3 25.09.20120.4 05.10.2012Added chapters Introduction,Related Work, Streaming IRIS,WSML2Reasoner, SparkwaveAdded chapters Future Work,Integration in ENVISION0.5 08.10.2012 Added chapter Conclusion0.6 10.10.2012 Draft ready for peer review0.7 17.10.2012 Internal quality review performed0.8 24.10.2012Draft ready for technical qualitycheck0.9 30.10.2012 Final quality check performed1.0 05.11.2012 Final approval


Public Project Deliverable3.5. Document Restrictions metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264. WSML2Reasoner Quality assurors and contributors274.1. Architecture and Implementation . . . . . . . . . . . . . . . . . . . . . . . 274.2.QualityUsageassuror(s). . . . .George. . .Athansopoulos. . . . . . .(NKUA), . . .Joël. .Langlois. . . .(BRGM). . . . . . . . . . . . . 27Contributor(s)All partners5. Sparkwave 305.1. Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.2. Version Sparkwave history Pattern Language . . . . . . . . . . . . . . . . . . . . . . . . . 306. Integration Version in Date ENVISION Description336.1. 0.1 WSML2Reasoner 20 th of May 2010 . . . . Outline . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2. 0.2 Sparkwave 1 st . of . June . . 2010 . . . . Draft . . of . Section . . . . 2 . . . . . . . . . . . . . . . . . . . .34346.3. Approach Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . 340.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review7. Conclusion 361.0 30 th of June 2010 Updated based on internal reviews from George and Joël8. Future Work 37A. Datalog Program for the Evaluation 38Bibliography 39Copyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 4 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)Executive SummaryContributor(s)All partnersWith the improvements in the area of integrated systems and the prevalence of these systemsVersion historyin many devices, more and more streams of live data become available. This data can belocation information of mobile devices, movement information about cars, or other sensorVersion Date Descriptiondata like temperature, precipitation, water level, and more. This leads to the research area of0.1 20 th of May 2010 OutlineStream Reasoning, which aims at providing reasoning capabilities over streams. We present0.2 1 st of June 2010 Draft of Section 2the prototype Streaming IRIS that uses a datalog reasoner for this task. Further, we describe0.3 10 th of June 2010 Draft of Sections 3 and 4the extensions implemented in the WSML2Reasoner framework to support Streaming IRIS,0.4 14 th of June 2010 Draft version ready for internal reviewand we explain how this framework is used in the ENVISION project to detect and create1.0 30 th of June 2010 Updated based on internal reviews from George and Joëlevents.Copyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 5 / 42


Public Project DeliverableAspect Document metadata Disadvantages AdvantagesQuality assurors and contributorshigh storage needsHigh completeness of datagood quality of resultslow performanceQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)High Contributor(s) expressive language All partners low performance widen area of applicationHigh performanceVersion historyless completeness of dataless expressive languagehigher actuality of resultsVersion Date Description0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 21.2. MotivationTable 1.1: This table summarizes the aspects of stream reasoning.0.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlData streams are becoming more and more common on the Web. Many streams regarding,e.g. stock exchange movements, weather information, sensor readings, or social networkingactivity notifications, are already present, and platforms to collect and share these streams,such as Cosm 1 , have appeared. Techniques to process data streams while responding in atimely fashion are also following the trend. The most prominent areas in this respect areComplex Event Processing (CEP) [26] and Data Stream Management Systems (e.g., Aurora[1] and STREAM [6]).The combination of data stream processing techniques with data streams distributed acrossthe Web comes as a natural fit; data stream processing on the Web needs however to cope withthe typical openness and heterogeneity of the Web environment. Semantic Web technologiesare meant to facilitate data integration in open environments thus can help to overcome theseproblems by using machine processable descriptions to resolve heterogeneities across multiplestreams. For example, Semantic Sensor Web [31] represents an attempt to collect andprocess avalanches of sensor data using semantic Web technologies.The application of semantic Web technologies to data stream processing opens also anopportunity to perform reasoning tasks over continuously and rapidly changing information,which was a trigger for the emergence of the stream reasoning research area [36]. Streamreasoning systems aim at preserving the core value of data stream processing, i.e. processingstreaming data in a timely fashion, while providing a number of features: support for expressivequeries/patterns and complex schemas, integration of static background knowledge withstreaming data, support for logical, data and temporal operators, time and tuple windows, etc.In addition, stream reasoning systems need to take into account entailed knowledge, whichresults in higher complexity and performance penalties. The most prominent solutions fallinginto this category are C-SPARQL [13], ETALIS [3], CQELS [25] and Sparkwave [22].Copyright ENVISION Consortium 2009-20121.3. GoalsSeveral use cases exist in ENVISION where stream reasoning is helpful and needed. Thisscenarios aim at predicting floods and landslides as well as predicting the impact of an oilspill. The following requirements are linked to this scenarios.1 https://cosm.com/, 2012-10-19Copyright c○ ENVISION Consortium 2009-2012 Page 7 / 42


Public Project DeliverableOurDocument focus is on providing metadata a system, capable of advanced reasoning. Therefore, an expressiverule-based language should be supported, where it is possible to define complex queries.This focus Quality onassurors reasoningand capabilities contributors leads to a limited performance. The main impact of thisis that the manageable amount of data coming from the stream is limited. This is reasonable,Quality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)because the input streams, containing e.g. data about weather or water levels, are sendingonly aContributor(s)relatively small amount All partners of data (water levels and precipitation are measured probablyonce an hour). ENVISION requires this reasoning capabilities to combine several observedevents (water levels of certain rivers, precipitation in adjacent areas, temperature at snowcoveredVersion mountains), history whose sum can lead to a flood. For this reason the support of complexqueries is mandatory.Version Date DescriptionAnother requirement is the support for background knowledge, like in ETALIS (Section2.2) and Streaming Knowledge Bases (Section 2.3). Background knowledge is static0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 2knowledge that is stored permanently during execution, whereas data coming from the streams0.3 10 th of June 2010 Draft of Sections 3 and 4is stored temporarily. In the flooding and landslide scenario of ENVISION, this could be the0.4 14 th of June 2010 Draft version ready for internal reviewlandscape of the area affected by a flood or landslide.1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlThese objectives can be summarized to the following question:How can an expressive language that provides advanced reasoning capabilitiesbe used in a streaming environment together with static background knowledge?1.4. StructureIn the next chapter (Chapter 2), different already existing approaches on this topic are discussed.Chapter 3 describes Streaming IRIS, a stream reasoning engine with advanced reasoningcapabilities. The integration of Streaming IRIS in WSML2Reasoner is explainedin Chapter 4. Chapter 5 introduces Sparkwave, a light-weight reasoning engine. The integrationof both of these systems in ENVISION is illustrated in Chapter 6, followed by theconclusion (Chapter 7) and possible future work (Chapter 8).Copyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 8 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)2. Related WorkContributor(s)All partnersSeveral papers concentrate on the topic of Stream Reasoning [35, 10]. Barbieri et al. [14]Version historysuggest the usage of logic programming to maintain materializations incrementally. A way toexpress Version continuousDate queries with sound Description semantics and the possibility for query optimizationis presented 0.1 in [23]. 20 th of May 2010 OutlineThe0.2 following section 1 st of June (Section 2010 2.1) Draft introduces of Section 2 SPARQL [29] and its extensions that areproposed0.3as a step towards10 th of JuneStream2010Reasoning.Draft of SectionsSection3 and 42.2 describes ETALIS, an implementationfor specifying and detecting complex events on streams in near-real-time. Section 2.30.4 14 th of June 2010 Draft version ready for internal reviewintroduces Streaming Knowledge Bases, an approach that combines a database with a stream1.0 30 th of June 2010 Updated based on internal reviews from George and Joëlprocessing engine, and in Section 2.4 CQELS is presented. Section 2.5 summarizes the keyfeatures of each approach.2.1. SPARQL ExtensionsSPARQL is a query language for RDF. Several extensions to SPARQL exist, providing additionalfunctionality to define streams and query over them. Detailed information aboutSPARQL can be found on the W3C website 1 .2.1.1. Streaming SPARQLBolles et al. [16] present Streaming SPARQL. This extension to SPARQL allows the specificationof continuous queries over RDF data streams.The stream is split into windows, which then form the knowledge base used to answer thequeries. Windows can be time based or element based. A time based window is defined by atimespan, for example the last 30 minutes, and, optionally, a delta (δ), which schedules whenthe window is moved. δ is given as a timespan, e.g. 10 min. This would mean the windowis moved every 10 minutes. If it is 10PM and we have defined a time window of 30 minutes,all data from the stream that arrived between 9:30PM and 10:00PM is in the window. At10:10PM the window is moved and all data that arrived between 9:40PM and 10:10PM is inthe window. If no δ is given, 1 of the unit defined in the timespan (in our case 1 minute) istaken as δ. In contrast, element based windows are defined by the number of elements.Therefore a Streaming SPARQL query differs from a SPARQL query in defining one ormore Copyright RDF data ENVISION streams Consortium which are 2009-2012 used as input. Furthermore a window can be defined.1 PREFIX w t u r : < h t t p : / / i e c . org / 61400 − 25/ r o o t / l n / c l a s s e s / WTUR#>2 SELECT ? x ? y ? z3 FROM STREAM < h t t p : / / i e c . org / 6 1400 − 25/ r o o t / t d . r d f >4 WINDOW RANGE 30 MINUTE SLIDE5 WHERE { ? x w t u r : S t r C n t ? y .6 OPTIONAL { ? x w t u r : S t o p C n t ? z .1 http://www.w3.org/TR/rdf-sparql-query/, 2012-10-22Copyright c○ ENVISION Consortium 2009-2012 Page 9 / 42


Public Project DeliverableDocument metadata7 WINDOW ELEMS 1500 } }Listing 2.1: Example of a Streaming SPARQL query ([16]).Quality assurors and contributorsListing 2.1 illustrates an example Streaming SPARQL query. In line 3 and 4 an inputQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)stream is declared. A time based window is used for this stream. The timespan for thiswindow Contributor(s) is 30 minutes andAll the partners window moves every minute (because no δ is given after thekeyword SLIDE). Additional another window is used in the OPTIONAL part of the query.This window is element based and contains the last 1500 elements of the stream. SummarizedVersion the purpose history of the query is to return the number of starts of wind turbines and, ifavailable, the number of stops of the wind turbines.Version Date DescriptionThe authors implemented Streaming SPARQL in the Odysseus system [5].0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 22.1.2. EP-SPARQL0.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal reviewThe goal of this extension to SPARQL is to combine Event Processing (EP) with stream reasoning.This approach is realized in EP-SPARQL [2]. The language allows specifying com-1.0 30 th of June 2010 Updated based on internal reviews from George and Joëlplex events and combining it with background knowledge. EP-SPARQL adds several binaryoperators to the SPARQL syntax (SEQ, EQUALS, OPTIONALSEQ, EQUALSOPTIONAL),together with some functions (getDURATION(), getSTARTTIME(), getENDTIME()). Thedetailed semantic of these operators and functions can be found in [2].1 SELECT ?company WHERE2 { ?company h a s S t o c k P r i c e ? p r i c e 1 }3 SEQ { ?company h a s S t o c k P r i c e ? p r i c e 2 }4 SEQ { ?company h a s S t o c k P r i c e ? p r i c e 3 }5 FILTER ( ? p r i c e 2 < ? p r i c e 1 ∗ 0 . 7 && ? p r i c e 3 > ? p r i c e 1 ∗ 1 . 0 56 && getDURATION ( ) < ”P30D” ˆ ˆ x s d : d u r a t i o n )Listing 2.2: Example of an EP-SPARQL query ([2]).The query presented in Listing 2.2 illustrates some aspects of EP-SPARQL. This queryreturns all companies whose stock price dropped by 30% and subsequently rose by 5%. Thistemporal dependency is described by the keyword SEQ. Additional these changes have tohappen within a time frame of 30 days (defined in the FILTER part).EP-SPARQL is used in ETALIS, which is described in detail in Section 2.2.2.1.3. C-SPARQLBarbieri et al. [11] introduce another extension to SPARQL named C-SPARQL. C-SPARQLsupports the registration and continuous execution of queries over RDF streams. This isachieved by an extension to SPARQL, which is done in a way that every SPARQL query isalso a valid C-SPARQL query.Copyright ENVISION Consortium 2009-2012Similar to Streaming SPARQL different types of windows are used. Windows which aredefined by the number of triples are called physical, and those that are defined by a timeinterval are called logical.Queries can be registered as query, which means that the results are renewed each timethe query is executed, or as stream, where the output of the query is again a stream (queriescan only be registered as stream if it is a construct or describe query).One important feature of C-SPARQL is the support for aggregations. In C-SPARQL it ispossible to use several aggregations in one query.Copyright c○ ENVISION Consortium 2009-2012 Page 10 / 42


Public Project DeliverableDocument metadata1 REGISTER QUERY C a r s E n t e r i n g C i t y C e n t e r P e r T o l l g a t e AS2 PREFIX t : < h t t p : / / l i n k e d u r b a n d a t a . org / t r a f f i c #>3 SELECT Quality DISTINCT assurors ? t oand l l gcontributorsa t e ? p a s s a g e s4 FROM STREAM [RANGE 10 MIN STEP 1 MIN]5 WHERE Quality { ? assuror(s) t o l l g a t e t : rGeorge e g i s tAthansopoulos e r s ? car .(NKUA), } Joël Langlois (BRGM)6 AGGREGATE { ( ? passages , COUNT, {? t o l l g a t e } ) }Contributor(s)All partnersListing 2.3: Example of a C-SPARQL query ([11]).An example query is presented in Listing 2.3. In line 1 the query is registered as QUERY.The input Version stream historydefined in line 4 together with a logical window over this stream. Thewindow size is 10 minutes, sliding every 1 minute. An aggregation is created in the last lineVersion Date Descriptionof the query (line 6). It starts with the keyword AGGREGATE followed by a triple. The triple0.1 20 th of May 2010 Outlinecontains the following elements:0.2 1 st of June 2010 Draft of Section 20.3 10 th of June 2010 Draft of Sections 3 and 4• a new variable (?passages)0.4 14 th of June 2010 Draft version ready for internal review• an aggregation function (COUNT)1.0 30 th of June 2010 Updated based on internal reviews from George and Joël• a variable occurring in the WHERE clause and defining the grouping criteria (?tollgate)The query returns the number of passing cars for each tollgate in the last 10 minutes.An execution environment for C-SPARQL is described in [12].2.2. ETALISAnicic et al. [4] introduce ETALIS, a prototype implementation, which allows the specificationof events and their detection at occurrence time. For the specification of the events,ETALIS supports two languages: EP-SPARQL (see Section 2.1.2) and ETALIS language forevents. Both languages allow the definition of complex events. ETALIS follows a deductiverule-based paradigm. One interesting aspect of the stream reasoning engine ETALIS is thesupport for background knowledge.2.3. Streaming Knowledge BasesWalavalkar et al. [37] describe a system that allows stream reasoning with backgroundknowledge. They use a PostgreSQL database to store the static data which is given to thesystem as RDFS or OWL ontologies. The authors suggest to inference over this ontologiesbefore the execution starts. TelegraphCQ is used as the stream processing engine. For moreinformation about TelegraphCQ see [17]. This system allows registering queries which arethen continuously executed over the stream of incoming tuples and the background knowledge.CopyrightThe reasoning ENVISIONcapabilitiesConsortium 2009-2012are limited to several RDFS and OWL constructs, but theauthors announce that it will be extended over time.2.4. CQELSCQELS [25] represents a recent solution for RDF stream processing built on top of the notionof Linked Stream Data [30]. The solution offers a native way to interpret and implementcommon stream processing features (time window operator, relational database-like join andCopyright c○ ENVISION Consortium 2009-2012 Page 11 / 42


Public Project DeliverableunionDocument operators, andmetadatastream generation operator) in a RDF data stream processing environment.In addition, CQELS is equipped with a flexible query execution framework capableof dynamically Quality assurors adapting and to contributors changes in input data (e.g., operator reordering to improve queryexecution). Except of time window support CQELS lacks any support for temporal operators.Quality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)Contributor(s)2.5. SummaryAll partnersVersion historyTable 2.1 summarizes the key features of the three approaches presented above.ETALISVersionand StreamingDateKnowledgeDescriptionBases focus on reasoning. Streaming KnowledgeBases0.1 supports background 20 th of May knowledge 2010 Outline and allows the continuous execution of queries overthis background 0.2 knowledge 1 st of June 2010 and part Draft of of the Section stream 2 (defined by a time window). ETALIS,in contrast, 0.3 does not 10 th specify of June 2010 queries Draft but of Sections complex 3 and events. 4 Background knowledge is alsosupported. 0.4 CQELS14 supports th of June time 2010 windows, Draft version but ready nofor other internal temporal review operators.What 1.0 is missing 30 is th aof stream June 2010 reasoning Updated engine based on with internal the reviews reasoning from George capabilities and Joël of a stateof-the-artreasoner. The main focus of existing system is on performance, which makesthem most suitable for streams with a high amount of data. Our work focuses on advancedreasoning over streams with less amount of data.ApproachETALISStreaming Knowledge BasesCQELSCharacteristics- definition of complex events- detection of events at occurrence time- background knowledge- supports RDFS and limited OWL reasoning- registration of continuous queries- background knowledge- supports time windows- flexible query execution frameworkTable 2.1: This table gives an overview of the characteristics of the different projects thatsupport stream reasoning.Copyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 12 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)3. Streaming IRISContributor(s)All partnersWe developed a system called Streaming IRIS that allows advanced reasoning in a streamingVersion historyenvironment. In the following sections we describe its architecture (Section 3.1), its implementationVersion(SectionDate3.2), its evaluationDescription(Section 3.3), and its usage (Section 3.4). Section 3.5addresses the restrictions of Streaming IRIS.0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 23.1. Architecture0.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlAs seen in Chapter 1, several issues have to be considered in order to implement a streamreasoning engine. The following sections describe these issues and justify the decisions wemade. It further describes IRIS, the system which Streaming IRIS is based on.We start with an overview of the system (Section 3.1.1). In Section 3.1.2 we describe thelanguage that is used. Section 3.1.3 introduces IRIS and explains it in detail. The followingsections describe the data management (Section 3.1.4), the querying (Section 3.1.5), and thepublication mechanism (Section 3.1.6).3.1.1. OverviewStreaming IRIS aims at performing reasoning tasks in a continuous way. To illustrate thefunctionality of Streaming IRIS we use a flood scenario as an example. The data that isavailable is the landscape of the specific area where floods should be predicted. Furthermore,several measurements such as precipitation, temperature, and water levels are provided. Anotherprerequisite are rules which define under which circumstances a flood can happen (e.g.if the water level of a river is higher than a threshold and the precipitation in the drainagearea is high, a flood is very likely in a specific area).The landscape information is static knowledge, which does not change (or at least veryslowly). Also the rules are static knowledge, because the circumstances that lead to a flooddo not change. This static knowledge is given to the system at initialization time. Whilethe system is running, it listens on a socket for incoming data. This incoming data can bethe measurements we mentioned before. In contrast to the static knowledge measurementsof water level or the weather do change regularly. The user can then ask questions (registerqueries in the system via the Streaming IRIS Interface). The answers to the queries are givenin a continuous way. Every time new data is available to the system, answers are producedCopyright ENVISION Consortium 2009-2012based on the static knowledge and the currently available dynamic knowledge.Figure 3.1 gives an overview of the Streaming IRIS system. In the following paragraphthe architecture of Streaming IRIS is described following the data through the system. Afterinitializing Streaming IRIS with the static knowledge and the rules, data is consumed comingfrom the data streams. This data gets processed by the input thread and included into the currentknowledge base. As soon as the knowledge base changes, the garbage collector cleansthe knowledge base (delete outdated data) and the queries that are registered are executed.The results are then sent to all interested listeners.Copyright c○ ENVISION Consortium 2009-2012 Page 13 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s)Contributor(s)George Athansopoulos (NKUA), Joël Langlois (BRGM)All partnersVersion historyVersion Date Description0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 20.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlFigure 3.1: Architecture of Streaming IRIS.3.1.2. LanguageThe supported language is an important aspect for a stream reasoning engine. Since our focusis on supporting advanced reasoning capabilities, we have to use an expressive language.We decided to use Datalog (see [34]), because of the following advantages. Datalog isbased on first order logic and is therefore sound and complete. In contrast to Prolog, inDatalog (which is a subset of Prolog) the order of statements does not matter. This is animportant aspect when consuming input streams, where the ordering in which the statementsarrive at the system is not always deterministic. Furthermore, executions on finite sets willalways terminate. Given this advantages, IRIS [15] is used as the basis for this project.3.1.3. Integrated Rule Inference System (IRIS)IRIS [15] is an open-source Datalog engine implemented in Java. The used Logic Programmingbased formalism, Datalog, is extended by XML schema data types, built-in predicates,function symbols, and well-founded default negation. Different evaluation strategies can bechosen. Per default a bottom-up evaluation strategy is used. This has the disadvantage thatthe performance is not as good as with a top-down evaluation strategies. This disadvantageis mitigated by the use of Magic Sets [9]. IRIS offers two different storage mechanism.The user can choose between storing the knowledge base in-memory or using a relationaldatabase. Copyright ENVISION Consortium 2009-2012To define all necessary properties described above, IRIS uses an configuration object thathas to be provided when starting the reasoner.The system has a strong focus on modularity. It is possible to add additional componentslater such as more evaluation strategies, storage systems, or rule evaluators.To enable the usage of this Datalog reasoner for a wider area of application, it is used intwo other projects. First, IRIS can be used as the reasoning engine in WSML2Reasoner [24].As the name suggests, this is a framework for rule-based reasoning with WSML ontologies.Copyright c○ ENVISION Consortium 2009-2012 Page 14 / 42


Public Project DeliverableFurther Document IRIS is the underlying metadata reasoning engine for RDFS Reasoner 1 .Quality assurors and contributors3.1.4. Data ManagementQuality assuror(s)• Version static data historyVersion Date Description0.1 20 th of May 2010 OutlineGeorge Athansopoulos (NKUA), Joël Langlois (BRGM)An important Contributor(s) issue is, how All partners to manage the amount of data in the system. Therefore, twodifferent types of data are distinguished:• dynamic data0.2 1 st of June 2010 Draft of Section 2Static data is data that is permanently stored in the system. It serves as background knowledgeused by the reasoner. This data is never outdated and will be used whenever a query is0.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal reviewexecuted. Reasoning over the static data can be done in advance (see [4, 37]). Dynamic data1.0 30 th of June 2010 Updated based on internal reviews from George and Joëlis continuously added to the system. In the flood scenario, the static data would be a modelof the area and the dynamic data the data about precipitation and temperature in the nearbyregions.If all dynamic data is stored in the system, the amount of data would constantly grow.Therefore, such a system would not be scalable concerning storage and execution time. Twosolutions to limit the amount of data are suggested by C-SPARQL (see Section 2.1.3):• physical windows• logical windowsWe decided to use physical windows. Although the advantage of logical windows is thatthe size of the knowledge base is fixed, the usage of time-based windows (physical windows)seems to be the more applicable approach. In the case of a flood prediction system, theamount of data about precipitation coming in can vary, but the important aspect is the dataabout the precipitation in a certain period.Our system uses physical windows and supports background knowledge. In ENVISION,background knowledge is required e.g. for the landscape in the flood and landslide scenario.Physical windows are used, because they are much more useful in a flood scenario thenlogical windows. The number of data items is not important, but all data for a certain periodof time is interesting when trying to predict a flood. For detailed information about theimplementation see Section 3.2.3.1.5. QueryingCopyright ENVISION Consortium 2009-2012Similar to the concept of continuous queries in Stream Processing, queries can be registeredat Streaming IRIS. In a first step the rules are evaluated and in the second step all queriesare executed. The evaluation of the rules decreases the time needed for the execution of thequeries (therefore the evaluation of the rules is a part, where performance optimization canbe performed, see Section 7).1 http://www.iris-reasoner.org/rdfsreasoner, 2012-09-19Copyright c○ ENVISION Consortium 2009-2012 Page 15 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s)Contributor(s)George Athansopoulos (NKUA), Joël Langlois (BRGM)All partnersVersion historyVersion Date Description0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 20.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review1.0 30 th of June 2010 Updated based on internal reviews from George and Joël3.1.6. Publication MechanismFigure 3.2: Class diagram of Streaming IRIS.The publication system in Streaming IRIS is related to the Observer Pattern 2 . StreamingIRIS manages a list of observers for each query that is registered. For each query the results,as soon as they are available, are published to the observers. It is also possible to register thesame observer for different queries.3.2. ImplementationThe implementation of Streaming IRIS follows the decisions we described in Section 3.1.Summarized this includes the following points:• We use time-based windows to store and work with incoming data.• We generate streams as output containing the results of the queries.• We support background knowledge which is stored permanently during runtime.• We support Datalog.Section 3.2.1 gives an overview of the implementation of Streaming IRIS followed bya detailed explanations of the main interface (Section 3.2.2), the data management (Section3.2.3), the garbage collector (Section 3.2.4), the input thread (Section 3.2.5), and thepublication mechanism (Section 3.2.6).Copyright ENVISION Consortium 2009-20123.2.1. OverviewStreaming IRIS aims at providing advanced reasoning over streams combined with staticbackground knowledge. This is achieved by using timestamps for the streamed data to limitthe amount of data to reason on at every time. Continuous queries are used to generate resultsas a stream.2 http://en.wikipedia.org/w/index.php?title=Observer_pattern&oldid=512430504, 2012-09-21Copyright c○ ENVISION Consortium 2009-2012 Page 16 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s)Contributor(s)George Athansopoulos (NKUA), Joël Langlois (BRGM)All partnersVersion historyVersion Date Description0.1 20 th of May 2010 OutlineFigure 3.3: Interface class IKnowledgeBase.0.2 1 st of June 2010 Draft of Section 20.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal reviewFigure 3.2 gives a conceptual overview of Streaming IRIS. The central component is theclass KnowledgeBase,1.0 30 th ofwhichJune 2010implementsUpdated basedthe interfaceon internalIKnowledgeBasereviews from George and(describedJoëlin Section3.2.2). The knowledge base includes an object called facts. This object hosts all datathat forms the current knowledge base. At the time when Streaming IRIS is started, the userhas to provide a configuration object (which can be seen on the right side in Figure 3.2). Theconfiguration is further described in Section 3.4.1.On the left side of Figure 3.2 the two classes are shown, which are responsible for fetchingnew data from the input streams and add it to the knowledge base. One thread is startedtogether with Streaming IRIS, which starts a Java socket and listens for incoming data (theport is defined in the configuration object). When a new connection to this socket is established,another thread is created that is responsible for parsing the new data and add it to theknowledge base. Further information about this part can be found in Section 3.2.5.The last component is the lower part in Figure 3.2, which shows the classes needed forpublishing the results of the queries. For every query that is registered at Streaming IRIS, anew instance of IrisOutputStreamer is created, responsible for publishing the results of thequery to all sockets, which are registered for this query. This component is further discussedin Section 3.2.6.3.2.2. Streaming IRIS InterfaceThe access point to use Streaming IRIS is the interface class IKnowledgeBase (see Figure3.3). In the following all methods provided by this interface are described.void execute()Executes all registered queries against the knowledge base and publishes the results to allCopyright ENVISION Consortium 2009-2012registered listeners.IRelation execute(IQuery query) throws ProgramNotStratifiedException,RuleUnsafeException, EvaluationExceptionExecutes the given query over the actual knowledge base. The results are returned, but notpublished. This execution is independent of the periodical execution of the queries. The usedquery will not be registered at the knowledge base.Copyright c○ ENVISION Consortium 2009-2012 Page 17 / 42


Public Project DeliverableDocument metadataIRelation execute(IQuery query, List variableBindings) throwsProgramNotStratifiedException, RuleUnsafeException, EvaluationExceptionQuality assurors and contributorsExecutes the given query and returns the variable bindings in the object passed with themethod Quality call. assuror(s) The results are George returned, Athansopoulos but not(NKUA), published. Joël Langlois This execution (BRGM) is independent of theperiodical execution of the queries. The used query will not be registered at the knowledgeContributor(s)All partnersbase.void registerQueryListener(IQuery Version historyquery, String host, int port) throwsProgramNotStratifiedException, RuleUnsafeException, EvaluationExceptionVersion Date DescriptionRegisters 0.1 a query at20this of May knowledge 2010 Outline base. This query will then be periodically executed andthe results 0.2 get published 1 st of June to 2010 the socket Draft defined of Section by 2 the host and port. If the query is alreadyregistered, 0.3 the observer 10 th of isJune just 2010 added Draft to of the Sections listener 3 and list 4 for this query.0.4 14 th of June 2010 Draft version ready for internal reviewvoid deregisterQueryListener(IQuery query, String host, int port)1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlDeregisters the listener for this query. If this was the only listener for this query, the query isderegistered from the knowledge base.List getRules()Returns the rules used by the knowledge base to perform the reasoning tasks. These ruleswere added to the knowledge base at initialization time.void addFacts(Map newFacts) throws EvaluationExceptionThis method adds facts to the knowledge base while Streaming IRIS is running. A timestampis added to the facts, which represents the time when they get obsolete (this timestamp iscalculated by the current time plus the time window specified in the configuration). Thismethod is called by the thread that is listening on the input stream for incoming facts.void shutdown()This method shuts down the reasoner and terminates all corresponding threads (all input andoutput threads that are running at that time).void cleanKnowledgeBase() throws EvaluationExceptionDeletes obsolete facts from the knowledge base. When this method is called, the timestampof each fact is compared to the current time. If the timestamp indicates a point in time in thepast, theCopyrightfactgetsENVISIONdeleted.ConsortiumThis2009-2012method is called by the garbage collector.3.2.3. Data ManagementThe most important aspect in the data management is to control the amount of data at anytime stored in the system. In contrast to the unlimited amount of data in a stream, the storagecapabilities are limited.As explained before, Streaming IRIS uses two sources of input data:Copyright c○ ENVISION Consortium 2009-2012 Page 18 / 42


Public Project Deliverable• Document static background metadata knowledge• Quality input streams assurors and contributorsIn theQualityfirstassuror(s)case no specialGeorgetreatmentAthansopoulosis necessary,(NKUA),becauseJoël Langloisthe(BRGM)static background knowledgeis already a finite set of facts and can be stored in the knowledge base. In the case of theContributor(s)All partnersinput streams we use physical windows (see Section 3.1.4). In the configuration object ofStreaming IRIS a time window is defined, which indicates how long facts are stored (seeSection 3.4.1). Every fact is stored together with a timestamp as soon as it is retrieved fromVersion historyan input stream. The timestamp is calculated by adding the specified time window to theexact Version point in time Date when the fact was Description retrieved. With this timestamps it is possible to deleteall obsolete 0.1 facts before 20 th of May executing 2010 the Outline queries. This is done by the garbage collector that isdescribed0.2in the next1section.st of June 2010 Draft of Section 20.3 10 th of June 2010 Draft of Sections 3 and 43.2.4. Garbage Collector0.4 14 th of June 2010 Draft version ready for internal review1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlThe garbage collector is called every time new facts are added to the knowledge base. Thereforeit calls the method IKnowledgeBase.cleanKnowledgeBase(). Every fact has a timestampattached, which indicates the time when the fact gets obsolete. The garbage collector comparesthis timestamp of the fact with the current time. If the timestamp indicates a point intime in the past, the fact gets deleted from the knowledge base, otherwise the fact stays inthe knowledge base. Afterwards the new facts are added to the knowledge base.3.2.5. Input ThreadThis thread listens to the input socket for new data, which is then parsed to datalog facts andadded to the knowledge base.New data is received by the class KnowledgeBaseServer. When data is caught, a newthread is created, namely an instance of KnowledgeBaseServerThread, to fetch the data,parse it, and add it to the knowledge base (IKnowledgeBase.addFacts(MapnewFacts)). As soon as new facts are added, a timestamp is attached to every fact. Thetimestamp is the current time plus the time window specified in the configuration of StreamingIRIS, which indicates how long facts should be valid. When new facts are added to theknowledge base, the garbage collector deletes obsolete facts, all rules are evaluated againstthe new knowledge base and the queries get executed. The results of the query get published,which is described in the next section. If this process of evaluating the rules and queries isstill ongoing when new facts are parsed by the input thread, the new data is buffered (togetherwith the timestamp when it arrived) and passed to the knowledge base as soon as theevaluation is finished.3.2.6. Publication MechanismCopyright ENVISION Consortium 2009-2012Streaming IRIS enables the user to register several listeners to the platform. Each listeneris defined by the query it is listening to and the host and port of the concrete socket thatis listening to this query. One socket can listen to multiple queries (every query has to beregistered with the same socket separately) and also multiple sockets can listen to one query(the query has to be registered with every socket separately). After new facts are added to theknowledge base, every query is executed once. The results of the query are then distributedto all sockets that are listening to this query. This is done by an output streamer. An outputCopyright c○ ENVISION Consortium 2009-2012 Page 19 / 42


Public Project Deliverablestreamer Document is defined by metadata the interface IOutputStreamer, which allows to add different concreteimplementations over time. They only have to implement this interface, which includes themethods Quality to stream assurors dataand andcontributorsto shutdown itself.The method IKnowledgeBase.registerQueryListener(IQuery query, String host, int port) isQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)used to register a listener to Streaming IRIS. IKnowledgeBase.deregisterQueryListener(IQueryquery, Contributor(s) String host, int port) All is partners used to deregister a listener. If this method is called, the socketis removed from the list of listeners for the specified query. If no more listeners are registeredto this query, the query is deregistered and will not be executed anymore.Version historyVersion Date Description3.3. Evaluation0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 2Evaluating 0.3 Streaming 10 th IRIS of June proves 2010 difficult Draft of Sections due to3 several and 4 issues. For Datalog no benchmarktests exist. 0.4 It would14 not th of be June possible 2010 Draft to use version Benchmark ready for internal tests as review they are, but an adaption to thestreaming 1.0 environment 30 th of would June 2010 be feasible. Updated based Due to on the internal lackreviews of standardized from George and Datalog Joël programswe have to create test data. For the tests we use different Datalog programs, from simple tocomputational expensive. These Datalog programs are listed in Appendix A.The remainder of this chapter is structured in the following way. The subsequent sectionsdescribe the environment used to evaluate Streaming IRIS (Section 3.3.1) followed by thedifferent experiments run (Section 3.3.2). Section 3.3.3 gives an interpretation of the results.3.3.1. EnvironmentFor evaluating Streaming IRIS, we used a notebook running Windows 7 Enterprise ServicePack 1 (64bit). The experiments run on an Intel(R) Core(TM) i7-2620M 2.70GHz dual-coreprocessor with 8 GB of memory and Java 6 Update 30. The maximum heap space of theJVM was set to 455MB.3.3.2. Experiments3.3.2.1. Experiment 1In the first experiment we use a very simple Datalog program which was also used for evaluatingIRIS (the program can be found in Listing A.2). The stream consists of facts in theform of p(x) where x is an arbitrary string.The configuration is as follows:• Time window: 10sCopyright ENVISION Consortium 2009-2012• Streaming delay: 1sThe times shown in Figure 3.4 are an average over 10 runs. After 10 streamed elementsthe query answer time stabilizes. This is the desired behavior, because the defined timewindow exactly consists of 10 streamed facts, after which the garbage collector starts todelete obsolete facts.Copyright c○ ENVISION Consortium 2009-2012 Page 20 / 42


Public Project DeliverableDocument metadataQuality8assurors and contributorsTime [s]Quality assuror(s)6Contributor(s)Version 4 historyGeorge Athansopoulos (NKUA), Joël Langlois (BRGM)All partnersVersion Date Description20.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 200.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review0 5 10 15 20 25 30 35 40 45 50 55Streamed element1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlFigure 3.4: Query answer times.3.3.2.2. Experiment 2In this experiment we use a Datalog program that computes the Cartesian product of thestored facts (the rule and the query are shown in Listing A.1). The input stream consistsof facts in the form of p(x) where x is an increasing number. The time window is set to 5minutes and new facts are streamed every 100ms. The test runs 30 minutes with a total of18.000 streamed facts. Given this configuration, a maximum of 3.000 streamed facts can bevalid, which leads to a total amount of facts in the system of 9.000.000. The presented timesare the average times over 5 executions with the same configuration.The results of this experiment are shown in Figure 3.5. The time that the system needsto produce an answer increases rapidly in the beginning until a point is reached where theanswer times become stable. This point marks the time when 3.000 facts where streamedto Streaming IRIS. Starting at this time old facts became obsolete, wherefore the knowledgebase stayed at a constant size. In this example new facts arrived at the system much fasterthen it took for finishing the previous evaluation. Therefore new facts were buffered andpassed to the system as soon as the previous execution was finished. For this reason theamount of received results is less than the amount of streamed facts.3.3.2.3. Experiment 3Copyright ENVISION Consortium 2009-2012This experiment tests the scalability of Streaming IRIS. We use a simple Datalog programpresented in Listing A.4. We run the experiment with 8 different configurations (see Table3.1). These configurations ensure that always the amount of 10 facts are valid and storedin the knowledge base. The input stream produces facts for the given time and we measurethe amount of query answers we get.In the first 4 configurations shown in Figure 3.6 we receive the maximum number ofresults, which means that for every fact streamed to the system an own result is returned.Starting at the 5th configuration the amount of results stays constant (at about 400). DueCopyright c○ ENVISION Consortium 2009-2012 Page 21 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s)400Contributor(s)George Athansopoulos (NKUA), Joël Langlois (BRGM)All partners300Version historyTime [s]200Version Date Description0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 21000.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review01.0 30 th of June 2010 Updated based on internal reviews from George and Joël0 10 20 30 40 50 60 70 80Received result [*50]Figure 3.5: Query answer times for the Cartesian product example.2,5002,000ResultsFacts2,400Query answers1,5001,000800600500387 398 390 402240 24060 60 80 80 120 1200Copyright ENVISION Consortium 2009-20121,2000 1 2 3 4 5 6 7 8 9ConfigurationFigure 3.6: Number of query answers received for different configurations (compared to theamount of streamed facts).Copyright c○ ENVISION Consortium 2009-2012 Page 22 / 42


Public Project DeliverableDocument metadataConfiguration Time window Streaming delay Streaming timeQuality assurors and contributors1 100s 10s 10minQuality assuror(s) 2 George 75s Athansopoulos (NKUA), 7.5s Joël Langlois (BRGM) 10min3 50s 5s 10minContributor(s)All partners4 25s 2.5s 10min5 10s 1s 10minVersion history 6 7.5s 750ms 10min7 5s 500ms 10minVersion 8 Date 2.5s Description 250ms 10min0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 2Table 3.1: This table shows the configurations for Experiment 3.0.3 10 th of June 2010 Draft of Sections 3 and 4100.4 14 th of June 2010 Draft version ready for internal review1.0 30 th of June 2010 Updated based on internal reviews from George and Joël8Time [s]64200 5 10 15 20 25 30 35 40 45 50Received resultFigure 3.7: Query answer times for different number of queries.1 Query2 Queries3 Queries4 Queries5 Queriesto the fact that there are always 10 valid facts in the knowledge base, this shows that theevaluation time for these facts is about 1.5s and therefore not more than 400 evaluationsare possible in 10 minutes. This does not effect the correctness of the results, because thestreamed facts are buffered and as soon as Streaming IRIS starts a new evaluation it is donewith the most recent facts.Copyright ENVISION Consortium 2009-20123.3.2.4. Experiment 4In order to test the functionality to register multiple continuous queries at Streaming IRISwe run this fourth experiment. This experiment is executed with 1, 2, 3, 4 and 5 registeredqueries, 10 times for each of them. The average query answer times are presented in Figure3.7. The Datalog program used for this test is shown in Listing A.3. A time window of10s and a streaming delay of 1s are used.It can be seen that the amount of queries does not significantly effect the query answerCopyright c○ ENVISION Consortium 2009-2012 Page 23 / 42


Public Project DeliverableDocument 12 metadataQuality assurors and contributors10Quality assuror(s)8Contributor(s)George Athansopoulos (NKUA), Joël Langlois (BRGM)All partnersTime [s]6Version history4Version Date Description0.1 20 th of May 2010 Outline20.2 1 st of June 2010 Draft of Section 20.3 10 th of June 2010 Draft of Sections 3 and 400.4 14 th of June 2010 Draft version ready for internal review1 Listener2 Listeners3 Listeners4 Listeners5 Listeners1.0 30 th of June 2010 Updated based on internal reviews from George and Joël0 5 10 15 20 25 30 35 40 45 50 55 60Received resultFigure 3.8: Query answer times for different number of listeners.times. This can be explained by the fact that the evaluation of the rules, which is only performedonce the knowledge base changed independent of the number of queries, consumesmost of the time, whereas the evaluation of the queries is fast.3.3.2.5. Experiment 5The fifth and last experiment investigates the behaviour of Streaming IRIS if multiple listenersare registered for one query. The used time window is again 10s with a streaming delayof 1s. Like in experiment 1 and 3 we initialize Streaming IRIS with the Datalog programpresented in Listing A.2.The query answer times for 1 to 5 registered listeners are presented in Figure 3.8 (againthe times are the average value of 10 runs for each configuration). The timespan is measuredbetween the point in time when the fact arrives at the input socket at Streaming IRIS and thetime when the results are streamed to each listener. The graphic shows that the number ofregistered listeners does not significantly influence the query answer times.3.3.3. InterpretationCopyright ENVISION Consortium 2009-2012The evaluation points out the best suitability of Streaming IRIS for “slow” streams, wherecomplex reasoning has to be performed. Nevertheless it approves the scalability in the presenceof streams with a higher amount of data, because of the buffering of incoming data.In this case it is necessary to configure Streaming IRIS carefully by defining an appropriatetime window. “Fast” streams have to be handled with caution. In case the evaluation takeslonger than the time window, some facts will never appear in the results. The long run testsreveal the stability of Streaming IRIS.Copyright c○ ENVISION Consortium 2009-2012 Page 24 / 42


3.4. User GuideDocument metadataPublic Project DeliverableThis section Quality explains assurors how and tocontributorsuse Streaming IRIS. Streaming IRIS is an open source software.The source code can be downloaded 3 or the Maven 4 artifacts can be used, which are deployedQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)at http://maven.sti2.at/archiva/repository/external.Contributor(s)3.4.1. ConfigurationAll partnersStreaming Version IRIS history uses a configuration object that defines all the necessary parameters. Theconfiguration of Streaming IRIS is the same as for IRIS (see IRIS - user guide 5 ), but needsVersion Date Descriptiontwo additional parameters:0.1 20 th of May 2010 Outline• 0.2 inputPort: This 1 st of parameter June 2010 defines Draft of Section the port 2 where Streaming IRIS opens a socket and0.3 listens for facts. 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review• timeWindowMilliseconds: This parameter defines the time window, which specifies1.0 30 th of June 2010 Updated based on internal reviews from George and Joëlhow long the facts that come from the input stream are valid. The value is given inmilliseconds and the default value is 30.000 (30 sec).3.4.2. InitializationThe initialization of Streaming IRIS is similar to the initialization of IRIS. A new IKnowledgeBaseobject is created which requires three parameters:• inputFacts: Those facts represent the background knowledge. The validity of thesefacts is unlimited.• rules: The rules are registered in the knowledge base. The rules can only be definedwhile initialization. Later no rules can be added or deleted.• configuration: The configuration object contains all parameters needed by StreamingIRIS.During the initialization of Streaming IRIS a thread for consuming the input is created (formore details about the input thread see Section 3.2.5).3.4.3. RuntimeWhile Streaming IRIS is running the user can perform several requests:• Register a query: By calling IKnowledgeBase.registerQueryListener(IQuery query,String host, int port) the query is registered at the knowledge base. In the next executionthis query will be included. The host and port define the socket where the resultsof this query will be sent.Copyright ENVISION Consortium 2009-2012• Deregister a query: A query can also be deregistered by calling the method IKnowledgeBase.deregisterQueryListener(IQueryquery, String host, int port). In the nextexecution the results of this query will not be sent to the socket (if no more sockets arelistening on this query, the query will not be executed anymore).3 https://iris-reasoner.svn.sourceforge.net/svnroot/iris-reasoner/streaming-iris4 http://maven.apache.org/5 http://www.iris-reasoner.org/documentation, 2012-08-12Copyright c○ ENVISION Consortium 2009-2012 Page 25 / 42


Public Project DeliverableStreaming Document IRIS ismetadatalistening at the input socket for new facts. As soon as new facts areavailable, these facts are added to the knowledge base. While adding them, a timestamp iscreated Quality by taking assurors the current and contributors time and adding the time window (defined in the configuration).Therefore, the timestamp added to the facts defines the point in time when the facts becomeQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)obsolete. Afterwards the rules are evaluated against the new knowledge base. Given thatat least Contributor(s) one query is registered All partners and one listener is added, the query/queries are executedperiodically and the results of each query are sent to all corresponding listeners.Version history3.5. RestrictionsVersion Date DescriptionThe current 0.1 implementation 20 th of May 2010 of Streaming Outline IRIS has the following restrictions:0.2 1 st of June 2010 Draft of Section 2• 0.3 Evaluation Strategy: 10 th of June 2010 Whereas Draft IRIS of Sections supports 3 and multiple 4 evaluation strategies, StreamingIRIS only14supports th of June 2010 stratified Draft bottom-up version ready evaluation for internal review [33] with a naive or 0.4 semi-naiveapproach.1.0 30 th of June 2010 Updated based on internal reviews from George and Joël• Storage: Streaming IRIS only supports in-memory storage, whereas IRIS allows touse a relational database. Storing the data at the hard disk has the advantage that thereasoner is not limited by the size of the main memory. We decided to support inmemorystorage because of performance issues. This leads to faster execution timeswhen evaluating rules and queries.The current restrictions are considered as possible improvements for upcoming versions(see Section 8).Copyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 26 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)4. WSML2ReasonerContributor(s)All partnersKrummenacher et al. [24] describe the WSML2Reasoner framework. This framework aimsVersion historyat providing a highly modular system, which allows the usage of different reasoners by transformingVersion the input, which Date is in WSML Description [18] format, to the format used by the reasoner. Severalreasoning 0.1 engines are 20 th supported of May 2010 including Outline IRIS and ELLY 1 .We0.2 extended the 1WSML2Reasoner st of June 2010 Draft framework of Section 2 and added the support for Streaming IRIS.In this0.3 section we only10 th ofdescribeJune 2010theDraft newlyof Sectionsadded3 functionality,and 4whereas general informationon WSML2Reasoner can be found in the user guide section of the official website 2 .0.4 14 th of June 2010 Draft version ready for internal reviewSection 4.1 shows the integration of Streaming IRIS in WSML2Reasoner followed by1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlSection 4.2 describing how to use Streaming IRIS in the WSML2Reasoner framework.4.1. Architecture and ImplementationA conceptional overview of the integration of Streaming IRIS in WSML2Reasoner is givenin Figure 4.1. WSML2Reasoner offers an interface to work with Streaming IRIS as theunderlying reasoning engine, namely the StreamingLPReasoner interface, which is shownin Figure 4.2. The main difference to the interfaces to the other reasoning engines are themethods to register and deregister query listeners. In the following, we describe the flow ofdata through the system.Similar to the concept in Streaming IRIS, as soon as WSML2Reasoner is initialized usingStreaming IRIS as the reasoning engine, a socket is opened, which waits for input data.Every time data is sent to this socket, a new thread is created that transforms the input datathat is in WSML to datalog and sends it to the internal socket of Streaming IRIS (described inSection 3.2.5). For each query registered at WSML2Reasoner (via the previously mentionedinterface) one socket is created listening for the results of this query. An output streamer(also similar to the mechanism in Streaming IRIS, described in Section 3.2.6) translates theresults from datalog to WSML and distributes them to each listener registered for this query.In the next section an example is shown how WSML2Reasoner has to be configured inorder to use the new functionality.4.2. UsageCopyright ENVISION Consortium 2009-2012The usage of WSML2Reasoner is shown in Listing 4.1. The most important part is theinitialization of the reasoner (line 4 - 9), where the type of the desired reasoner is defined(STREAMING IRIS STRATIFIED). If a concrete port is desired where the WSML2Reasonerframework listens for inputs, this has to be defined in the configuration object that is passedto the framework together with the ontology when starting the reasoner (line 11 -16). The1 http://elly.sourceforge.net/, 2012-09-172 http://tools.sti-innsbruck.at/wsml2reasoner/user_guide, 2012-09-15Copyright c○ ENVISION Consortium 2009-2012 Page 27 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s)Contributor(s)George Athansopoulos (NKUA), Joël Langlois (BRGM)All partnersVersion historyVersion Date Description0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 20.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlFigure 4.1: Usage of Streaming IRIS in wsml2reasoner.Figure 4.2: StreamingLPReasoner interface.rest of the code snippet shows how a query listener is registered at the system, which thenretrieves the results of the specified query (line 18 - 28).Copyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 28 / 42


Public Project DeliverableDocument metadata1 / / load t h e o n t o l o g y2 Ontology exampleOntology = l o a d O n t o l o g y ( ” exampleOntology . wsml” ) ;3 Quality assurors and contributors4 / / i n i t i a l i z e a r e a s o n e r5 Map< SQuality t r i n g assuror(s) , Object > params George Athansopoulos = new HashMap< (NKUA), Joël S t r iLanglois n g , Object (BRGM) >() ;6 params . p u t ( WSMLReasonerFactory . PARAM BUILT IN REASONER ,7 WSMLReasonerFactory Contributor(s)All partners . B u i l t I n R e a s o n e r . STREAMING IRIS STRATIFIED ) ;8 StreamingLPReasoner r e a s o n e r = DefaultWSMLReasonerFactory . g e t F a c t o r y ( )9 . c r e a t e S t r e a m i n g F l i g h t R e a s o n e r ( params ) ;1011 / /Version historyc r e a t e t h e c o n f i g u r a t i o n12 Map< S t r i n g , Object > c o n f i g u r a t i o n = new HashMap< S t r i n g , Object >() ;Version Date Description13 c o n f i g u r a t i o n . p u t ( ” i n p u t P o r t ” , 45821) ;0.1 20 th of May 2010 Outline1415 / / s t0.2 a r t t h e r e a1 s st oof nJune e r 2010 Draft of Section 216 r e a s o0.3 n e r . s t a r t R e10 a s th oof nJune e r ( 2010 exampleOntology Draft of Sections , 3 and c o n4f i g u r a t i o n ) ;1718 / /0.4c r e a t e a14 th of June 2010queryDraft version ready for internal review19 S t r i n1.0 g q u e r y S t r i30 n g th of = June ”? x2010 memberOf Updated based ? y” ; on internal reviews from George and Joël20 L o g i c a l E x p r e s s i o n query = new W s m l L o g i c a l E x p r e s s i o n P a r s e r (21 exampleOntology ) . p a r s e ( q u e r y S t r i n g ) ;2223 / / r e g i s t e r query l i s t e n e r24 S e r v e r S o c k e t s e r v e r = new S e r v e r S o c k e t ( 0 ) ;25 W s m l 2 R e a s o n e r L i s t e n e r w s m l 2 R e a s o n e r L i s t e n e r =26 new W s m l 2 R e a s o n e r L i s t e n e r ( s e r v e r ) ;27 w s m l 2 R e a s o n e r L i s t e n e r . s t a r t ( ) ;28 r e a s o n e r . r e g i s t e r Q u e r y L i s t e n e r ( query , ” l o c a l h o s t ” , s e r v e r . g e t L o c a l P o r t ( ) );Listing 4.1: Example how to use WSML2Reasoner with Streaming IRIS.Copyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 29 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)5. SparkwaveContributor(s)All partners5.1. ArchitectureVersion historySparkwave Version is a solution Date to perform continuous Description schema-enhanced pattern matching over RDF0.1 20 th of May 2010 Outlinedata streams. More precisely, the goal of Sparkwave is to provide efficient pattern matchingfunctionalities 0.2 on RDF 1 st of June streams 2010 in Draft a truly of Section continuous 2 way, enabling the expression of temporal0.3 constraints in10 the of form June 2010 of time Draft windows of Sections and 3 and taking 4 into account RDF schema entailments.0.4 Unlike other14existing of June 2010 solutions, Draft Sparkwave version ready for meets internal this review goal by taking as a foundationan efficient 1.0 pattern 30 matching th of June 2010 algorithm: Updated Rete based [20]. on internal The Rete reviews algorithm, from George originally and Joël designedas a solution for production rule systems, represents a general approach to deal with manypattern/many object situations. The algorithm emphasises on trading memory for performanceby building comprehensive memory structures, called α- and β-networks, designatedto check, respectively, intra- and inter-pattern conditions over a set of objects.The intrinsic dataflow nature of Rete goes in favour of using it also over data streams.However, in order to meet the goal, the basic Rete algorithm needs to be extended to properlyaddress RDF schema entailments and the temporal requirements of data stream processing.Sparkwave resolves the first issue by extending Rete with an additional network called ɛ-network. The ɛ-network is positioned in front of the normal Rete network and is responsiblefor generating triples following schema entailments. The ɛ-network nodes are connected tothe appropriate α-network nodes in a dataflow style. Regarding the second issue, Sparkwaveprovides support for time windows. This is realised through an extension of the functionalitiesof β-network nodes, which are responsible for checking if partial or complete patternmatches fall into the scope of the designated time window. A high-level overview of Sparkwavearchitecture is presented in Figure 5.1. More information on ɛ-network functionalityand how the network is built, as well as description on time window support in β-networknodes is given in [22].5.2. Sparkwave Pattern LanguageSparkwave patterns are specified in a language which closely follows the SPARQL languagesyntax. As presented in Listing 5.1 the only significant deviation is in the terms of additionof handlers section in the prologue of a query. Handlers are defining a way to respond toa pattern match, i.e., a handler is triggered when Sparkwave identifies a match. Writing aCopyright ENVISION Consortium 2009-2012handler is as easy as implementing a Java interface and so far following handlers have beenimplemented:• Console handler prints out matches to a console• File handler logs matches to a file• REST service handler writes matches to a URL by following REST communicationstyleCopyright c○ ENVISION Consortium 2009-2012 Page 30 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s)Contributor(s)George Athansopoulos (NKUA), Joël Langlois (BRGM)All partnersVersion historyVersion Date Description0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 20.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal reviewthat lacks1.0definitions30 th ofoffollowingJune 2010terms:UpdatedPNAMEbased on internalNS, IRIreviewsREF,fromstring,GeorgeINTEGER,and Joëlvar, andCopyright ENVISION Consortium 2009-2012Figure 5.1: Sparkwave architecture.The Sparkwave pattern specification presented in Listing 5.1 is not complete in the senserdfLiteral, which definitions can be found at (7).An example of a Sparkwave pattern is given in Listing 5.2. The pattern constructs a reportcomposed out of 6 triples based on the pattern match which searches for two sensor triggersin the same area.1 query = prologue , ( s e l e c t Q u e r y | c o n s t r u c t Q u e r y ) ;2 p r o l o g u e = p r e f i x e s , h a n d l e r s C l a u s e ;3 p r e f i x e s = { P R E F I X , PNAME NS , IRI REF } ;4 h a n d l e r s C l a u s e = HANDLERS , { , { h a n d l e r C l a u s e } , } ;5 h a n d l e r C l a u s e = HANDLER handlerGroup ;6 handlerGroup = { , { k e y V a l u e P a i r } , } ;7 k e y V a l u e P a i r = s t r i n g , = , s t r i n g ;8 c o n s t r u c t Q u e r y = CONSTRUCT , c o n s t r u c t T e m p l a t e , whereClause;9 c o n s t r u c t T e m p l a t e = { , { c o n s t r u c t T r i p l e s } , } ;10 c o n s t r u c t T r i p l e s = t r i p l e , { . , t r i p l e } , [ . ] ;11 whereClause = WHERE , groupGraphPattern ;12 groupGraphPattern = { , groupGraphPatternSub , } ;13 groupGraphPatternSub = t r i p l e s B l o c k , { g r o u p G r a p h P a t t e r n S u b D e t a i l }14 | { g r o u p G r a p h P a t t e r n S u b D e t a i l } ;15 g r o u p G r a p h P a t t e r n S u b D e t a i l = g r a p h P a t t e r n N o t T r i p l e s , [ . ] , { t r i p l e s B l o c k} ;16 g r a p h P a t t e r n N o t T r i p l e s = timewindow | f i l t e r ;17 timewindow = TIMEWINDOW , ( , INTEGER , ) ;18 f i l t e r = F I L T E R , c o n s t r a i n t ;19 t r i p l e s B l o c k = t r i p l e , { . , t r i p l e } , [ . ] ;20 t r i p l e = varOrTerm , varOrTerm , varOrTerm ;21 varOrTerm = var | graphTerm ;22 graphTerm = IRI REF | r d f L i t e r a l ;Listing 5.1: Specification of the Sparkwave query language.Copyright c○ ENVISION Consortium 2009-2012 Page 31 / 42


Public Project DeliverableDocument metadata1 PREFIX r d f : < h t t p : / / www. w3 . org / 1 9 9 9 / 0 2 / 2 2 − r d f −s y n t a x −ns #>2 PREFIX s u p p o r t : < h t t p : / / www. f o i . se / s u p p o r t / wp4demo#>3 Quality assurors and contributors4 HANDLERS {5 HANDLER Quality { assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)6 ” c l a s s ” = ” a t . s t i 2 . s p a r k . h a n d l e r . ConsoleHandler ” ” v e r b o s e ” = ” t r u e ”7 } Contributor(s)All partners8 }910 CONSTRUCT Version { history11 s u p p o r t : R e p o r t X r d f : t y p e s u p p o r t : R e p o r t .12 s u p p o r t : R e p o r t X s u p p o r t : r e p o r t b a s e d o n s e r v i c e ? s e n s o r 1 .Version Date Description13 s u p p o r t : R e p o r t X s u p p o r t : r e p o r t b a s e d o n s e r v i c e ? s e n s o r 2 .0.1 20 th of May 2010 Outline14 s u p p o r t : R e p o r t X s u p p o r t : r e p o r t s a b o u t e v e n t ? d e t e c t i o n 1 .15 s u p p0.2 o r t : R e p o r t X1 st sof uJune p p o2010 r t : r e pDraft o r t sof aSection b o u t 2 e v e n t ? d e t e c t i o n 2 .16 s u p p0.3 o r t : R e p o r t X10 th s of u pJune p o r2010 t : r e pDraft o r t of g Sections e n e r a t3 e and d b4y s e r v i c es u p p o r t : S t r e a m P r o c e s s o r0.4 14 th of June 2010 Draft version ready for internal review17 } WHERE {18 ? d e t1.0 e c t i o n 1 s u p30p th oof r t June : h a2010 s s t a Updated t u s ” tbased r u e ” on ˆ ˆ < internal h t t p reviews : / / www. from w3 George . / and 2 0 0Joël1 / XMLSchema# boolean> .19 ? d e t e c t i o n 1 s u p p o r t : h a s s e n s o r ? s e n s o r 1 .20 ? s e n s o r 1 r d f : t y p e s u p p o r t : P E T S e n s o r .21 ? s e n s o r 1 s u p p o r t : s e n s o r h a s l o c a t i o n ? l o c a t i o n 1 .22 ? l o c a t i o n 1 s u p p o r t : l o c a t i o n i s p a r t o f l o c a t i o n support:DockX .23 ? d e t e c t i o n 2 s u p p o r t : h a s s t a t u s ” t r u e ” ˆ ˆ < h t t p : / / www. w3 . org / 2 0 0 1 / XMLSchema# boolean> .24 ? d e t e c t i o n 2 s u p p o r t : h a s s e n s o r ? s e n s o r 2 .25 ? s e n s o r 2 r d f : t y p e s u p p o r t : P A T S e n s o r .26 ? s e n s o r 2 s u p p o r t : s e n s o r h a s l o c a t i o n ? l o c a t i o n 2 .27 ? l o c a t i o n 2 s u p p o r t : l o c a t i o n i s p a r t o f l o c a t i o n support:DockX .28 TIMEWINDOW ( 1 0 0 )29 }Listing 5.2: An example of a Sparkwave pattern.Copyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 32 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)6.Contributor(s)IntegrationAll partnersin ENVISIONVersion historyThe ENVISION [28] project targets non ICT-skilled users and aims at providing them an infrastructureVersion for semantic Date discovery Description and adaptive chaining and composition of environmentalservices. 0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 20.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlFigure 6.1: Overall architecture of the stream processing part of ENVISION project.Figure 6.1 shows the overall architecture of the stream processing subpart of ENVISIONplatform. Heterogeneous sensor data is processed by the Event Processing Service (EPS),which looks for event patterns in the incoming stream. The derived events are then furtherprocessed by the Stream-based Reasoning Engine, trying to find more complex events basedon theCopyright data from ENVISION the EPS. Consortium Event 2009-2012 notifications are sent to the users if specific events areproduced. Additionally the event patterns, which the EPS and the Stream-based ReasoningEngine are looking for, are updated and extended by analysing historical data.The next section (Section 6.1) describes how Streaming IRIS and the WSML2Reasonerframework are integrated in ENVISION followed by Section 6.2 sketching the possible integrationof Sparkwave in the overall architecture. The last section (Section 6.3) comparesthe different usage of Streaming IRIS and WSML2Reasoner in conjunction with EPS andEnStreaM.Copyright c○ ENVISION Consortium 2009-2012 Page 33 / 42


6.1. Document WSML2ReasonermetadataPublic Project DeliverableWSML2Reasoner Quality assurors is used and in contributors ENVISION to generate more complex events, based on theevent instances produced by the EPS. WSML2Reasoner is used in the component SemanticQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)Reasoner in Figure 6.1. The sensor data is retrieved by the EPS, which generates “simple”events. Contributor(s) These simple All events partners are processed by the Notification Broker and forwarded(if needed) to the Stream Reasoner. Internally the Stream Reasoner runs WSML2Reasonerwith Streaming IRIS as the reasoning engine. Based on the rules stored in WSML2Reasoner,“complex” Version events history are generated and again processed by the Notification Broker, whichpushes the events to the listeners.Version Date DescriptionThe system is highly modular, so the reasoner is completely separated from the NotificationBroker. This allows the Notification Broker to decide if the reasoner is needed or0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 2not.0.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review6.2. Sparkwave1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlSparkwave can be located right after the sensors. The purpose is the filtering of the sensordata and converting this heterogeneous data into a uniform syntax.To create a uniform syntax an overlay ontology would have to be created that can handle alltypes of sensor data, mapping the specific sensor readings to this uniform syntax. Sparkwaveretrieves all sensor data from the different sources and, in a first step, filters only the importantdata. Therefore the stream coming to the EPS is more homogeneous and easier to handle.This is done by patterns, describing desired data, for which Sparkwave is searching in thestream. In a second step, this patterns are translated, using the overlay ontology, into anuniform syntax. This gives the advantage that the EPS only has to be able to understandthis uniform syntax and thus does not have to be adapted if new streams (with differentvocabularies) are added to the system. In this case only a new pattern has to be added toSparkwave.6.3. Approach DifferentiationAs described above, the overall stream processing functionality in ENVISION is deliveredby three core components namely:1. EnStreaM [27] , a system that enables semantic data streams and stream ontologiesmanagement;2. Event Processing Service (EPS) [7], a system that allows basic event processing ofobservation data;Copyright ENVISION Consortium 2009-20123. Stream-based reasoner for advance reasoning with environmental data streams.The Stream-based reasoner component, described in this deliverable, relies on StreamingIRIS and an extension of WSML2Reasoner that accommodates Streaming IRIS. Additionally,for fast processing of data streams, Sparkwave can be used. In the rest of thissection we describe the different requirements and settings for using Streaming IRIS andWSML2Reasoner in conjunction with EPS and then with EnStreaM.Copyright c○ ENVISION Consortium 2009-2012 Page 34 / 42


Public Project DeliverableTheDocument integration ofmetadataStreaming IRIS/WSML2Reasoner with EPS serves for generation ofmore complex events, based on the event instances produced by the EPS. In this case StreamingIRIS/WSML2Reasoner Quality assurors and contributors is used as a stream based reasoner that generates on the fly newknowledge (i.e. “complex” environmental events) by evaluating the rules registered withinQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)given the static domain knowledge and the dynamic environment stream data.In the Contributor(s) case of integration All between partners EPS and EnStreaM, Streaming IRIS/WSML2Reasoneris used to enrich the historical data that is produced and managed by EnStreaM. StreamingIRIS/WSML2Reasoner is provided as a service. This service is accessed and invoked byEnStreaM Version remotely. history Several transformations are needed between the internal knowledgeand rule format required by Streaming IRIS/WSML2Reasoner, namely Datalog, WSML,Version Date Descriptionand the format produced by EnStreaM, namely Rule-ML and RDF. Being based on the IRIS0.1 20 th of May 2010 Outlinereasoner, Streaming IRIS/WSML2Reasoner can be used not only to process data streams but0.2 1 st of June 2010 Draft of Section 2also to perform “standard” rule based reasoning which is used in this case by EnStreaM to0.3 10 th of June 2010 Draft of Sections 3 and 4enrich historical data.0.4 14 th of June 2010 Draft version ready for internal review1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlCopyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 35 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)7. ConclusionContributor(s)All partnersIn this deliverable we have described the work done in Envision to support reasoning overVersion historystreams of data coming from various environmental sources (e.g. sensors measuring waterlevels, weather, etc.). The requirements, design, and implementation of the proposedVersion Date Descriptionapproach were derived and driven by the ENVISION use cases which are focusing on landslidehazard assessment, environmental pollution (oil spills), and floods monitoring. In these0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 2scenarios the amount of data is not extremely huge and the update rates are not very high.0.3 10 th of June 2010 Draft of Sections 3 and 4Furthermore, streaming data has to be processed according to a set of of not trivial rules0.4 14 th of June 2010 Draft version ready for internal reviewthat are domain specific. An advanced rule based reasoning capability over streams of data1.0 30 th of June 2010 Updated based on internal reviews from George and Joëlis therefore required. Our approach, called Streaming IRIS, supports expressive rule-basedreasoning. Streaming IRIS performs reasoning tasks in a continuous way of the streamsof environmental data. In our approach the background knowledge i.e. knowledge of thedomain, as well as the rules used for reasoning are not changing (or at least changing veryslow) and is given at the initialization time. The data that passes through Streaming IRIS iscontinuously changing (i.e. new data comes from the sensors) and the queries which are registeredwith the reasoner are answered in a continuous way. This means, every time new datais available to the system, answers are produced based on the static knowledge and the currentlyavailable dynamic knowledge. This deliverable described the architectural decisions,implementation, usage as well as the restrictions of Streaming IRIS reasoner. The system isscalable in terms of number queries to be answered and increasing amount of data passingthrough. Streaming IRIS uses natively Datalog as a language for representing data and rules.However, by using the WSML2Reasoner implementation described as well in these deliverableStreaming IRIS can be used as a WSML reasoner, accepting data and rules in WSML.Streaming IRIS is integrated with other stream processing components developed in WP4,namely EnStream and Event Processing System, being used by these two components for theenrichment of historical data, respectively creation of more complex environmental events.Streaming IRIS supports advanced, rule-based reasoning over streams of environmental data.Sparkwave, another component developed by us and described in this deliverable, in contrast,is addressing the need for a system capable of pattern matching and light-weight reasoning(i.e. RDFS and a few OWL constructs) over streams of data. The goal of Sparkwave is toprovide efficient pattern matching functionalities on RDF streams in a truly continuous way,enabling the expression of temporal constraints in the form of time windows and taking intoaccount RDF schema entailments. From an integration perspective, Sparkwave can be usedas a filtering system before Streaming IRIS for high throughput streaming data.Copyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 36 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)8. Future WorkContributor(s)All partnersIn Streaming IRIS major improvements can be done by working on the evaluation of theVersion historyrules. It should be possible to improve the performance of the evaluation by applying “intelligent”evaluation strategies. All rules can still be evaluated as soon as Streaming IRIS isVersion Date Descriptionstarted, but the re-evaluation when new facts are added can be changed to a partial evaluation,depending on which rules are affected by the new facts. Together with that, the query0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 2evaluation can also be improved by finding common parts in the different queries.0.3 10 th of June 2010 Draft of Sections 3 and 4As stated in Section 3.5 several restrictions exist in the current implementation. StreamingIRIS can be improved by implementing the support for further evaluation strategies.0.4 14 th of June 2010 Draft version ready for internal review1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlCurrently a bottom-up strategy is used with naive or semi-naive evaluation. Similar to IRIStop-down evaluation strategies (SLDNF, OLDT) would be desired. This would widen thearea of applications for Streaming IRIS.To be able to handle huge background knowledge (too big for the main memory) thesupport for relational databases as storage medium can be implemented. Therefore, theMaven component responsible for the RDB support in IRIS can be adapted, which wouldimprove the scalability of Streaming IRIS.Besides the improvements in Streaming IRIS, the integration of Sparkwave is a futureaction. This can be done in a way described in Section 6.2 and would lead to a more modularsystem, simplifying the integration of different scenarios (streams with different format) intoENVISION.Copyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 37 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)A. Datalog Program for the EvaluationContributor(s)All partnersq (? x , Version ? y ) : −historyp (? x ) , p (? y ) .?− q (? x , ? y ) .Listing A.1: Cartesian product as a Datalog program.Version Date Description0.1 20 th of May 2010 Outline1234ra (?A 0.2 , ? B , ? C, ?D, ? 1E st ) of June : − p 2010 (?A ) , p Draft (?B) of Section , p (?C) 2 , p (?D) , p (?E) .rb (?A 0.3 , ? B , ? C, ?D, ? 10 E th ) of : June − p 2010 (?A ) , p Draft (?B) of Sections , p (?C) 3 , p and (?D) 4 , p (?E) .0.4 14 th of June 2010 Draft version ready for internal reviewr (?A , ? B , ? C, ?D, ? E ) : − ra (?A , ? B , ? C, ?D, ? E ) , rb (?A , ? B , ? C, ?D, ? E ) .5 1.0 30 th of June 2010 Updated based on internal reviews from George and Joël6 q (?A) : − r (?A , ? B , ? C, ?D, ? E ) .7 q (?B) : − r (?A , ? B , ? C, ?D, ? E ) .8 q (?C) : − r (?A , ? B , ? C, ?D, ? E ) .9 q (?D) : − r (?A , ? B , ? C, ?D, ? E ) .10 q (?E) : − r (?A , ? B , ? C, ?D, ? E ) .1112 ?− q (?X) .Listing A.2: Datalog program used for the evaluation.1 ra (?A , ? B , ? C, ?D, ? E ) : − p (?A ) , p (?B) , p (?C) , p (?D) , p (?E) .2 rb (?A , ? B , ? C, ?D, ? E ) : − p (?A ) , p (?B) , p (?C) , p (?D) , p (?E) .34 r (?A , ? B , ? C, ?D, ? E ) : − ra (?A , ? B , ? C, ?D, ? E ) , rb (?A , ? B , ? C, ?D, ? E ) .56 a (?A) : − r (?A , ? B , ? C, ?D, ? E ) .7 b (?B) : − r (?A , ? B , ? C, ?D, ? E ) .8 c (?C) : − r (?A , ? B , ? C, ?D, ? E ) .9 d (?D) : − r (?A , ? B , ? C, ?D, ? E ) .10 e (?E) : − r (?A , ? B , ? C, ?D, ? E ) .1112 ?− a (?X) .13 ?− b (?X) .14 ?− c (?X) .15 ?− d (?X) .16 ?− e (?X) .Listing A.3: Datalog program with several queries.1 r (?A , ? B , ? C, ?D, ? E ) : − p (?A) , p (?B) , p (?C) , p (?D) , p (?E) .Copyright ENVISION Consortium 2009-20122 q (?A) : − r (?A , ? B , ? C, ?D, ? E ) .34 ?− q (?X) .Listing A.4: Simple Datalog program.Copyright c○ ENVISION Consortium 2009-2012 Page 38 / 42


Public Project DeliverableDocument metadataQuality assurors and contributorsQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)BibliographyContributor(s)All partners[1] Daniel J. Abadi, Don Carney, Ugur Çetintemel, Mitch Cherniack, Christian Convey,Version historySangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. Aurora: a newmodel Version and architecture Date for data Description stream management. The VLDB Journal, 12(2):120–139,0.1August 2003.20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 2[2] Darko Anicic, Paul Fodor, Sebastian Rudolph, and Nenad Stojanovic. Ep-sparql: A0.3 10 th of June 2010 Draft of Sections 3 and 4unified language for event processing and stream reasoning. In WWW 2011: Proceedingsof the Twentieth International World Wide Web Conference,0.4 14 th of June 2010 Draft version ready for internal review2011.1.0 30 th of June 2010 Updated based on internal reviews from George and Joël[3] Darko Anicic, Paul Fodor, Sebastian Rudolph, Roland Stühmer, Nenad Stojanovic,and Rudi Studer. Etalis: Rule-based reasoning in event processing. In Sven Helmer,Alexandra Poulovassilis, and Fatos Xhafa, editors, Reasoning in Event-Based DistributedSystems, volume 347 of Studies in Computational Intelligence, pages 99–124.Springer Berlin / Heidelberg, 2011. 10.1007/978-3-642-19724-6 5.[4] Darko Anicic, Sebastian Rudolph, Paul Fodor, and Nenad Stojanovic. Stream reasoningand complex event processing in etalis. Semantic Web - Interoperability, Usability,Applicability, 2011.[5] Hans-Jrgen Appelrath, Dennis Geesen, Marco Grawunder, Timo Michelsen, andDaniela Nicklas. Odysseus - a highly customizable framework for creating efficientevent stream management systems. In Proceedings of the 6th ACM International ConferenceonDistributed Event-Based Systems, DEBS ’12. ACM, 2012.[6] A. Arasu, B. Babcock, S. Babu, J. Cieslewicz, M. Datar, K. Ito, R Motwani, U. Srivastava,and J. Widom. Stream: The stanford data stream management system. TechnicalReport 2004-20, Stanford InfoLab, 2004.[7] Alejandro Llaves Arellano. D4.7 - event processing of observation data, 2012.[8] Brian Babcock, Shivnath Babu, Mayur Datar, Rajeev Motwani, and Jennifer Widom.Models and issues in data stream systems. Technical Report 2002-19, Stanford InfoLab,2002.Copyright ENVISION Consortium 2009-2012[9] Francois Bancilhon, David Maier, Yehoshua Sagiv, and Jeffrey D Ullman. Magic setsand other strange ways to implement logic programs (extended abstract). In Proceedingsof the fifth ACM SIGACT-SIGMOD symposium on Principles of database systems,PODS ’86, pages 1–15, New York, NY, USA, 1986. ACM.[10] Davide Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, and MichaelGrossniklaus. Stream Reasoning: Where We Got So Far. In Proceedings of the Ne-FoRS2010 Workshop, co-located with ESWC2010, 2010.Copyright c○ ENVISION Consortium 2009-2012 Page 39 / 42


Public Project Deliverable[11] Davide Document Francesco metadata Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, andMichael Grossniklaus. C-sparql: Sparql for continuous querying. In Proceedings ofthe Quality 18thassurors international and contributors conference on World wide web, WWW ’09, pages 1061–1062,New York, NY, USA, 2009. ACM.Quality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)[12] Davide Contributor(s) Francesco Barbieri, All partners Daniele Braga, Stefano Ceri, and Michael Grossniklaus.An execution environment for c-sparql queries. In Proceedings of the 13th InternationalConference on Extending Database Technology, EDBT ’10, pages 441–452, New York,NY, Version USA, history 2010. ACM.[13] Davide Version Francesco Date Barbieri, Daniele Description Braga, Stefano Ceri, Emanuele Della Valle, and0.1 20 th of May 2010 OutlineMichael Grossniklaus. C-sparql: a continuous query language for rdf data streams. Int.J. 0.2 Semantic Computing, 1 st of June 2010 4(1):3–25, Draft of 2010. Section 20.3 10 th of June 2010 Draft of Sections 3 and 4[14] Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, and0.4 14 th of June 2010 Draft version ready for internal reviewMichael Grossniklaus. Incremental reasoning on streams and rich background knowledge.In Lora Aroyo, Grigoris Antoniou, Eero Hyvönen, Annette ten Teije, Heiner1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlStuckenschmidt, Liliana Cabral, and Tania Tudorache, editors, ESWC (1), volume 6088of Lecture Notes in Computer Science, pages 1–15. Springer, 2010.[15] Barry Bishop and Florian Fischer. Iris - integrated rule inference system. In Proceedingsof the 1st Workshop on Advancing Reasoning on the Web: Scalability andCommonsense (ARea2008), 2008.[16] Andre Bolles, Marco Grawunder, and Jonas Jacobi. Streaming sparql extending sparqlto process data streams. In Proceedings of the 5th European semantic web conferenceon The semantic web: research and applications, ESWC’08, pages 448–462, Berlin,Heidelberg, 2008. Springer-Verlag.[17] Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin,Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel Madden, VijayshankarRaman, Frederick Reiss, and Mehul A. Shah. Telegraphcq: Continuousdataflow processing for an uncertain world. In CIDR, 2003.[18] Jos De, Bruijn Holger, Lausen Axel Polleres, Dieter Fensel, Jos De, Bruijn Holger,Lausen Axel, and Polleres Dieter Fensel. The web service modeling language wsml:An overview, 2005.[19] Encyclopædia Britannica. streaming — encyclopædia britannica online. http://www.britannica.com/EBchecked/topic/1386523/streaming, 2012. [Online;accessed 11-September-2012].[20] Charles Copyright Forgy. ENVISION Rete: Consortium A fast 2009-2012 algorithm for the many pattern/many object pattern matchproblem. Artificial Intelligences, 19(1):17–37, 1982.[21] Ramanathan V. Guha and Dan Brickley. RDF vocabulary description language1.0: RDF schema. W3C recommendation, W3C, February 2004.http://www.w3.org/TR/2004/REC-rdf-schema-20040210/.[22] Srdjan Komazec, Davide Cerri, and Dieter Fensel. Sparkwave: Continuous Schema-Enahanced Pattern Matching over RDF Data Streams. In Proceedings of the 6th ACMCopyright c○ ENVISION Consortium 2009-2012 Page 40 / 42


Public Project DeliverableInternational Document Conference metadata on Distributed Event-Based Systems - ACM DEBS 2012,pages 273–283. ACM, 7 2012.Quality assurors and contributors[23] Jürgen Krämer and Bernhard Seeger. Semantics and implementation of continuousQuality assuror(s) George Athansopoulos (NKUA), Joël Langlois (BRGM)sliding window queries over data streams. ACM Trans. Database Syst., 34(1):4:1–4:49,April 2009.Contributor(s)All partners[24] Reto Krummenacher, Daniel Winkler, and Adrian Marte. Wsml2reasoner - a comprehensiveVersion history reasoning framework for the semantic web. In Axel Polleres and HuajunChen, editors, ISWC Posters&Demos, volume 658 of CEUR Workshop Proceedings.Version Date DescriptionCEUR-WS.org, 2010.0.1 20 th of May 2010 Outline[25] Danh 0.2 Le-Phuoc, 1 st of Minh June 2010 Dao-Tran, Draft of Josiane Section Xavier 2 Parreira, and Manfred Hauswirth. Anative 0.3 and adaptive 10 th of June approach 2010 Draft for unified of Sections processing 3 and 4 of linked streams and linked data.In 0.4 Proceedings 14of th of the June 10th 2010 international Draft version conference ready for internal onreviewThe semantic web - Volume PartI, 1.0 ISWC’11, pages 30 th of 370–388, June 2010 Berlin, Updated Heidelberg, based on internal 2011. reviews Springer-Verlag.from George and Joël[26] David Luckham. The Power of Events: An Introduction to Complex Event Processingin Distributed Enterprise Systems. Addison-Wesley, Reading, MA, USA, 2002.[27] Klemen Kenda Maja Skrjanc, Alexandra Morar. D4.5 - semantic data streams andstream ontologies software, 2012.[28] Patrick Mau and Dumitru Roman. The envision environmental portal and services infrastructure.In Jir Hrebcek, Gerald Schimak, and Ralf Denzer, editors, ISESS, volume359 of IFIP Advances in Information and Communication Technology, pages 280–294.Springer, 2011.[29] Eric Prud’hommeaux and Andy Seaborne. SPARQL query language for rdf. W3CRecommendation, 4:1–106, 2008.[30] J.F. Sequeda, O. Corcho, and A. Gomez-Perez. Linked stream data: a short paper. InCEUR Workshop Proceedings. CEUR, 2009.[31] Amit Sheth, Cory Henson, and Satya S. Sahoo. Semantic sensor web. IEEE InternetComputing, 12(4):78–83, July 2008.[32] H.A. Simon. Models of man: social and rational; mathematical essays on rationalhuman behavior in society setting. Wiley, 1957.[33] J.D. Ullman. Principles of database systems. Computer software engineering series.Computer Science Press, 1983.Copyright ENVISION Consortium 2009-2012[34] Jeffrey D. Ullman. Principles of database and knowledge-base systems, Vol. I. ComputerScience Press, Inc., New York, NY, USA, 1988.[35] Emanuele Della Valle, Stefano Ceri, Davide Francesco Barbieri, Daniele Braga, andAlessandro Campi. A first step towards stream reasoning. In John Domingue, DieterFensel, and Paolo Traverso, editors, FIS, volume 5468 of Lecture Notes in ComputerScience, pages 72–81. Springer, 2008.Copyright c○ ENVISION Consortium 2009-2012 Page 41 / 42


Public Project Deliverable[36] Emanuele Document Della metadata Valle, Stefano Ceri, Frank van Harmelen, and Dieter Fensel. It’s astreaming world! reasoning upon rapidly changing information. IEEE Intelligent Systems,Quality 24(6):83–89, assurors and 2009. contributors[37] Onkar Quality assuror(s) Walavalkar, Anupam George Athansopoulos Joshi, Tim (NKUA), Finin, and Joël Langlois Yelena(BRGM)Yesha. Streaming KnowledgeBases.Contributor(s)In ProceedingsAll partnersof the Fourth International Workshop on Scalable Semantic Webknowledge Base Systems, October 2008.Version historyVersion Date Description0.1 20 th of May 2010 Outline0.2 1 st of June 2010 Draft of Section 20.3 10 th of June 2010 Draft of Sections 3 and 40.4 14 th of June 2010 Draft version ready for internal review1.0 30 th of June 2010 Updated based on internal reviews from George and JoëlCopyright ENVISION Consortium 2009-2012Copyright c○ ENVISION Consortium 2009-2012 Page 42 / 42

More magazines by this user
Similar magazines