10 S. Mazumdar, D. Petrelli and F.Ciravegna / A <strong>Visual</strong> <strong>Dashboard</strong> <strong>for</strong> <strong>Linked</strong> <strong>Data</strong>: An Exploration of User and System Requirementseasy to understand and use” indicate that the dashboardapproach holds much potential. A rather interstingcomment from an expert “Clear colour presentation,gives pretty pictures very fast!” indicates theproximity of our approach to the ideal goal of an interfacedeveloper - efficiently provide aesthetically pleasingvisualizations to communicate essential in<strong>for</strong>mationto the user. Often, users are interested in gatheringa high level understanding of large datasets, ratherthan looking at individual data instances. A few commentssuch as “It was on occasion rather “hard touse” (though this may have just been inexperience)”show that there is probably room <strong>for</strong> improvement inthe intuitiveness of some parts, like the filter selection.However it should be noted that some of the difficultiescould come from the data itself, <strong>for</strong> example thelong list of filters (1,090!) the user has to go throughto select the interesting item, is an essential part of thedata structure. This evaluation clearly showed how thiskind of details have to be considered be<strong>for</strong>ehand if ageneric tool to visualize linked data has to be provided.7. System Requirements: Validation via publiclinked dataAs linked data become available from differentsources, a visualization tool should be able to take differentsets and visualize them without too much tuning,ideally without any tuning. Porting a system ontoanother domain might not be straight<strong>for</strong>ward and thetechnical implications of using a different infrastructurehave to be understood in full as certain aspectsof the interaction, e.g. system reaction time, must bekept between accepted thresholds to assure users acceptability.For example, a system is perceived as interactiveif the response time is under 2 seconds, but<strong>for</strong> direct manipulation of data the reaction time mustbe under 2 milliseconds. The portability of .views. wastested on different domains and data sources: SQLdatabases, RDF triple stores and SPARQL endpoints.To better understand the limitations developers wouldface while building user-interfaces <strong>for</strong> open linkeddata,we used a realistic and large dataset: DBpediacontains almost three and a half million resources,stored in over a billion RDF triples (version 3.5.1, releasedApr 28, 2010). This provided an excellent usecase<strong>for</strong> our research. Be<strong>for</strong>e we started implementingour system on SPARQL endpoints, we per<strong>for</strong>med severaltests on large SQL databases. These tests had indicatedthat in order to provide a fluent interaction withthe data via a user-interface, there are certain compromisesthat are required. For example, dragging a sliderto continuously query the database would cause thesystem to slow down, as it has to continuously sendqueries and parse the results. Instead, sending queriesonly when the user finishes dragging the slider (indicatedby a release of the slider handle) would makea significant improvement on the system. The systemevaluation was composed of two parts both logged andtime-stamped:1. Querying the endpoints and retrieving resultsfrom the filtering interface;2. <strong>Visual</strong>izing the result sets into 5 widgets textualresults, geographical map, pie chart, bar chart,tag cloud.The setup consisted of four cases based on the numberof results returned 100, 600, 1100 and 2200. Samplequeries like Select all the public Universities inthe United Kingdom, or Select all the places in UnitedKingdom were passed from the interface to the backend.For each case, four individual tasks were measured:time to transfer queries to the backend, timeto execute query, time to parse results and convert toJSON, and time to transfer JSON objects to the frontend.The frontend was evaluated by timing the per<strong>for</strong>manceof each visualization widget. Figure 9 shows therelative response times <strong>for</strong> the DBpedia endpoint (increasein the result size maps: 100=1, 600=2, 1100=3,2200=4). The time taken <strong>for</strong> the backend to process thequery and send the results to the frontend varied from123.78ms to 6.9s, with the query execution time varyingfrom 98ms to 6.85s. For most of the cases (60%),the time taken <strong>for</strong> executing the query took more than70% of the backend processing time. In order to see ifthere were computational bottlenecks or patterns thatcould be optimized, data was normalized to highlightthe proportion among the different phases. Figure 9plots the distribution of the 4 tasks and show severalinteresting points:– The overall time taken by the backend is highlydependent on either the query execution time orthe time taken to transfer the results to the frontendas they take the maximum time to complete.– The overall time taken by the backend is highlyvariant.– The time taken <strong>for</strong> transferring the query to thebackend and the time <strong>for</strong> converting the results toJSON objects is negligible compared to the othertwo.
S. Mazumdar, D. Petrelli and F.Ciravegna / A <strong>Visual</strong> <strong>Dashboard</strong> <strong>for</strong> <strong>Linked</strong> <strong>Data</strong>: An Exploration of User and System Requirements 11Fig. 9. Plots showing the extremely high variation of query execution phase (98ms to 6.85s) in the backend processing. The size of the results (1- 2200 results returned, 2 - 1100 results returned, 3 - 600 results returned and 4 - 100 results returned) are shown as individual bars on the y-axis,and the differently shaded x axis bars show the time taken to per<strong>for</strong>m individual functions to retrieve the respective results. The plots show therelative times in different parts of the systemA major concern is the query execution time, the variationof which is alarming and cannot be controlled.While it this could be attributed to high server loador the way queries are distributed, this is an importantaspect that user interface developers need to take intoconsideration. The system tests show that though thequery execution phase often takes a lot of time to complete,there are other phases in the backend processingthat can be significantly improved. More investigationis needed to understand the causes of the delays intransferring the result objects to the frontend and furtheroptimize this step.This high variability in the query processing stage isin contrast to the per<strong>for</strong>mance achieved by traditionaldatabases. In a similar experiment with a MYSQLdatabase, we tested how the backend per<strong>for</strong>ms withsimilar query-result sets. The speed, in general variedbetween 0.00026ms and 6.48ms. However, the relativetime taken by the query processing stage has mostlybeen consistent. It is however important to note thatthe tests conducted with a MYSQL database are not intendedto compare the absolute times taken by a linkeddata endpoint and a traditional database, but is to highlightthat in comparison with a traditional database, interfacedevelopers may face challenges with high variabilityin the response times. Figure 10 shows the contrastingper<strong>for</strong>mance of a traditional database.