NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

More documents

Recommendations

Info

Curated Entities for Enterprise Umair ul Hassan, Edward Curry, Seán O’Riain Digital Enterprise Research Institute National University of Ireland, Galway umair.ul.hassan@deri.org, ed.curry@deri.org, sean.oriain@deri.org Abstract We propose an entity consolidation system with user collaboration. First source data is converted into entity-attribute-value data model. Then system finds equivalence relationships at entity, attribute, and value levels. The confidence of relationships and data conflicts are kept as intermediate results. Users provide feedback on result iteratively. Finally corrected data with lineage and provenance information is updated in curated database of entities. 1. Introduction The amount of data generated and stored in organizations is increasing with more automation of business processes. Often this data relates to specific entities of business interest like people, products, and customers. Users collect, integrate and standardize this data for analysis. Teams of analysts and skilled IT staff spend significant amount of time and effort to bring this all at one place. 2. Problem Statement Integration of data from disparate sources generates uncertain results [1]. For example if an analyst integrates data about iPod from two sources, following types of uncertainty can occur for price of iPod Absence: no price Conflicts: price is 150€ and also 160€ Vagueness: price is given as High Non-specificity: price is between 150€ and 160€ 3. Proposal We propose to develop an entity consolidation system, which supports iterative cleaning of uncertain data with user feedback [2]. Figure 1 illustrates major process flow steps of our prototype. Figure 1: Process flow of entity consolidation with management of uncertain data using iterative user feedback 115 3.1. Entity Consolidation The process starts by converting source data in common entity-attribute-value format [3]. Followed by three associated tasks; mapping of schema attributes between sources, comparing individual entities for equivalence, and merging values of attributes for same entities. 3.2. Uncertain Data Automated entity consolidation generates results with confidence scores for equivalence between entities. Conflicts of data values also exist between matched entities. All this information is stored in temporary database for resolution. 3.3. User Feedback User provides feedback on uncertain data in two forms, either by validating possible choices or providing generic rules for repairs. Having people with domain expertise collaborate to improve quality of integration result adds value to overall process. This is similar to curation process of reference works and dictionaries by domain experts [4]. 3.4. Provenance Provenance information about data source, entities, and user feedback is stored for tracking lineage of data. This information serves as indicator of trust for entity database consumers, which can be further utilized to support data cleaning tasks automatically. 4. References [1] M. Magnani and D. Montesi, “A Survey on Uncertainty Management in Data Integration,” Journal of Data and Information Quality, vol. 2, Jul. 2010, p. 33. [2] M.J. Franklin, “Dataspaces: Progress and Prospects,” Dataspace: The Final Frontier, A.P. Sexton, ed., Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, pp. 1-3. [3] P.M. Nadkarni, L. Marenco, R. Chen, E. Skoufos, G. Shepherd, and P. Miller, “Organization of Heterogeneous Scientific Data Using the EAV / CR Representation,” Journal of the American Medical Informatics Association, vol. 6, 1999, pp. 478-493. [4] P. Buneman, J. Cheney, W.-C. Tan, and S. Vansummeren, “Curated databases,” Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems - PODS ’08, New York, New York, USA: ACM Press, 2008.
Continuous Query Optimization and Evaluation over Unified Linked Stream Data and Linked Open Data Danh Le Phuoc, Josiane Xavier Parreira, Manfred Hauswirth Digital Enterprise Research Institute, NUIG {firstname.lastname}@deri.org Abstract In this poster, we present our Continuous Query Evaluation over Linked Streams (CQELS) approach which provides a scalable query processing model for unified Linked Stream Data and Linked Open Data. Scalability in CQELS is achieved by applying state-ofthe-art techniques for efficient data storage and query pre-processing, combined with a new adaptive costbased query optimization algorithm for dynamic data sources, such as sensor streams. In traditional Database Management Systems (DBMS), query optimizers use pre-computed selectivity values for the data to decide on the best execution plan, whereas with continuous query over stream data the data – and consequently its selectivity values – varies over time. This means that the optimal execution plan itself can vary throughout the execution of the query. To overcome this problem, the CQELS query optimizer retains a subset of the possible execution plans and, at query time, updates their respective costs and chooses the least expensive one for executing the query at this given point in time. We have implemented CQELS and our experimental results show that CQELS can greatly reduce query response times while scaling to a realistically high number of parallel queries. In the past years, sensors have become ubiquitous, for instance in mobile phones (accelerometer, compass, GPS, camera, etc.), in weather observation stations (temperature, humidity, etc.), in the health care domain (heart rate, blood pressure monitors, etc.), in devices for tracking people’s and object’s locations (GPS, RFID, etc.), in buildings (energy measurement, environmental conditions, etc.), cars (engine monitoring, driver monitoring, etc.), and in the Web at large, with online communities such as Twitter and Facebook delivering (typically unstructured) real-time data on various topics (RSS or Atom feeds, etc.). The raw nature of the data produced by sensors – that is, the basic readings, without any metadata attached to it – limited the use of sensor networks to specific applications domains. Typically applications are still custom- built for specific cases and are to be classified as “information stovepipes”, i.e., integration of sensor data with other data sources is still a difficult and labour-intensive task, which currently requires a lot of “hand- crafting”. Only recently, there have been a lot of efforts to lift sensor data to a semantic level, for example, by the W3C Semantic Sensor Network Incubator Group, Semantic Streams, Semantic System S, and Semantic sensor web. These projects make sensor data available 116 following the Linked Data principles, a concept known as Linked Stream Data. Linked Stream Data aims at the seamless integration of sensor data with other data sources, such as found in the Linked Open Data (LOD) cloud and enabling a range of new, “real-time” applications, while also making this data accessible to the wide range of existing software. Exposing stream data according to Linked Data standards seems to be a promising way to “understand” this information through semantic enrichment. It can be easily integrated with existing data sets using established standards specifically designed for query-processing, and generally makes information accessible in a way which furthers its re-use and integration with data sets currently unthought of. Current attempts in combining sensor and Linked Open Data suffer from scalability problems (for example, the Live Social Semantic experiment involved only 139 conference attendees), and therefore such solutions are not suitable for large-scale applications. Also, the real-time nature of data is often not covered and the generated data sets frequently are not available online but only offline through normal download specifically due to the lack of efficient query processing. In this research, we address the problem of scalable query processing over Linked Stream Data integrated with Linked Open Data. We present our Continuous Query Evaluation over Linked Streams (CQELS) approach[1], an unifying processing model for scalable continuous query evaluation over combined Linked Stream Data and Linked Open Data. Scalability in CQELS is achieved by applying state-of-the-art techniques for efficient data storage and query preprocessing, as well as by deriving an adaptive costbased query optimization algorithm for dynamic data sources, such as streams. Contrary to traditional query optimizers, where pre-computed selectivity values for the data are used to decide on the best execution plan, our CQELS query optimizer keeps a subset of the possible execution plans and, at query time, updates their respective costs and chooses the least expensive one for executing the query. We have implemented CQELS and show through experimental evaluation that our model achieves great performance in terms of query response time while scaling to a realistically high numbers of parallel queries. References [1] D. Le-Phuoc, et al. Continuous query optimization and evaluation over unified linked stream data and linked open data. Technical Report DERI-TR-2010-09-27, DERI.
Page 1 and 2:
NUI Galway - UL Alliance First Annu
Page 4 and 5:
FULL TABLE OF CONTENTS 1 GAMES, VIS
Page 6 and 7:
4 MECHANICAL AND BIOMEDICAL ENGINEE
Page 8 and 9:
5.21 Detecting Topics and Events in
Page 10 and 11:
8.7 Modelling Extreme Flood Events
Page 12 and 13:
GAMES, VISUALISATION & EDUCATION 1.
Page 14 and 15:
Generation and Analysis of Graph St
Page 16 and 17:
Evolution and Analysis of Strategie
Page 18 and 19:
Abstract The delivery of multimedia
Page 20 and 21:
Applications of Reinforcement Learn
Page 22 and 23:
Assessing the effects of interactiv
Page 24 and 25:
Real-time depth map generation usin
Page 26 and 27:
An analysis of the capability of pr
Page 28 and 29:
Building Information Modelling duri
Page 30 and 31:
Dwelling Energy Measurement Procedu
Page 32 and 33:
Numerical Modelling of Tidal Turbin
Page 34 and 35:
Energy Storage using Microencapsula
Page 36 and 37:
Data Centre Energy Efficiency Mark
Page 38 and 39:
An embodied energy and carbon asses
Page 40 and 41:
SmartOp - Smart Buildings Operation
Page 42 and 43:
Ocean Wave Energy Exploitation in D
Page 44 and 45:
Future Smart Grid Synchronization C
Page 46 and 47:
Web-Based Building Energy Usage Vis
Page 48 and 49:
Image Recognition and Classificatio
Page 50 and 51:
Android Based Multi-Feature Elderly
Page 52 and 53:
Determining Subjects’ Activities
Page 54 and 55:
New Analysis Techniques for ICU Dat
Page 56 and 57:
National E-Prescribing Systems in I
Page 58 and 59:
Using Mashups to Satisfy Personalis
Page 60 and 61:
3D Computational Modeling of Blood
Page 62 and 63:
Experimental and Computational Inve
Page 64 and 65:
Experimental Analysis of the Therma
Page 66 and 67:
Simulating Actin Cytoskeleton Remod
Page 68 and 69:
Computational Analysis of Transcath
Page 70 and 71:
An In vitro Shear Stress System for
Page 72 and 73:
Development of a Micropipette Aspir
Page 74 and 75:
A Computational Test-Bed to Examine
Page 76 and 77: Computational Modeling of Ceramic-b
Page 78 and 79: Multi-Scale Computational Modelling
Page 80 and 81: Development of a mixed-mode cohesiv
Page 82 and 83: Active Computational Modelling of C
Page 84 and 85: Modelling the Management of Medical
Page 86 and 87: SOCIAL MEDIA, SEARCH & RECOMMENDATI
Page 88 and 89: Improving Twitter Search by Removin
Page 90 and 91: Abstract The goal of this research
Page 92 and 93: Generalized Blockmodeling Samantha
Page 94 and 95: Life-Cycles and Mutual Effects of S
Page 96 and 97: dcat: Searching Public Sector Infor
Page 98 and 99: The Effect of User Features on Chur
Page 100 and 101: User Similarity and Interaction in
Page 102 and 103: Improving Categorisation in Social
Page 104 and 105: Natural Language Queries on Enterpr
Page 106 and 107: Studying Forum Dynamics from a User
Page 108 and 109: Provenance in the Web of Data: a bu
Page 110 and 111: Towards Social Descriptions of Serv
Page 112 and 113: ENVIRONMENTAL ENGINEERING 6.1 Asses
Page 114 and 115: Novel Agri-engineering solutions fo
Page 116 and 117: Evaluation of amendments to control
Page 118 and 119: Determination of optimal applicatio
Page 120 and 121: Treatment of Piggery Wastewaters us
Page 122 and 123: NEXT GENERATION INTERNET 7.1 Extens
Page 124 and 125: Enabling Federation of Government M
Page 128 and 129: Mobile Web + Social Web + Semantic
Page 130 and 131: Engaging Citizens in the Policy-Mak
Page 132 and 133: Preference-based Discovery of Dynam
Page 134 and 135: RDF On the Go: An RDF Storage and Q
Page 136 and 137: Policy Modeling meets Linked Open D
Page 138 and 139: A Contextualized Perspective for Li
Page 140 and 141: Improving discovery in Life Science
Page 142 and 143: The Semantic Public Service Portal
Page 144 and 145: Personalized Content Delivery on Mo
Page 146 and 147: A Framework to Describe Localisatio
Page 148 and 149: The influence of secondary settleme
Page 150 and 151: Analysis of Shear Transfer in Void-
Page 152 and 153: Cost-Effective Sustainable Construc
Page 154 and 155: Modelling Extreme Flood Events due
Page 156 and 157: Axial Load Capacity of a Driven Cas
Page 158 and 159: Chemical amendment of dairy cattle
Page 160 and 161: Seismic Design of Concentrically Br
Page 162 and 163: MODELLING, ALGORITHMS & CONTROL 9.1
Page 164 and 165: Eigen-based Approach for Leverage P
Page 166 and 167: Evolutionary Modelling of Industria
Page 168 and 169: Abstract: Graphical Semantic Wiki f
Page 170 and 171: Low Coverage Genome Assembly Using
Page 172 and 173: Evolving a Robust Open-Ended Langua
Page 174 and 175: Context Stamp - A Topic-based Conte
Page 176 and 177:
DSP-Based Control of Multi-Rail DC-
Page 178 and 179:
Topographical Cues - Controlling Ce
Page 180 and 181:
Creep Relaxation and Crack Growth P
Page 182 and 183:
Finite Element Modelling of Failure
Page 184 and 185:
Influence of Fluorine and Nitrogen
Page 186 and 187:
Phase Decompositions of Bioceramic
Page 188 and 189:
High Resolution Microscopical Analy
Page 190 and 191:
An Experimental and Numerical Analy
Page 192 and 193:
Thermomechanical characterisation o
Page 194 and 195:
A multiaxial damage mechanics metho
Page 196:
The effect of citrate ester plastic
show all

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?