13.07.2015 Aufrufe

Installations- und Betriebsanweisung 348-900 - Unical Deutschland

Installations- und Betriebsanweisung 348-900 - Unical Deutschland

Installations- und Betriebsanweisung 348-900 - Unical Deutschland

MEHR ANZEIGEN
WENIGER ANZEIGEN

Sie wollen auch ein ePaper? Erhöhen Sie die Reichweite Ihrer Titel.

YUMPU macht aus Druck-PDFs automatisch weboptimierte ePaper, die Google liebt.

- 2 - 2 nd November 2009other devices aro<strong>und</strong> it, in an uncertain environment and without human input.Major theoretical breakthroughs will be needed to give us the confidence todepend on these devices, and, with Veriware, Kwiatkowska hopes to make them.Meanwhile Diadem – Domain-centric Intelligent Automated Data ExtractionMethodology – sets out to solve the problem of extracting complex, structuredinformation from large numbers of websites. ‘If we succeed, Diadem will be thenext major step forward in web search technology,’ says Gottlob. ‘It will boostindividual and corporate web users’ ability to get the information they need fromthe internet.’Traditional web search engines rely on looking for keywords on web pages. Theywork well when looking for some kinds of information, but struggle with morecomplex queries – typing ‘restaurants near me serving pasta al pesto as today’sspecial’ isn’t likely to produce useful results.With Diadem, Gottlob aims to create software that can trawl through everywebsite in a particular field – the property market, for example, or restaurants, orair travel – and pull out the information they contain in structured form. Equippedwith a basic knowledge of the general principles its domain works on, it will beable to analyse each web page’s low-level structure as it goes in order to extractthe information it contains.Humans find it easy to visit a new website and immediately grasp its structureand what the different elements on each page mean – which of the numbersvisible is an item’s price, for example, or how to interpret a timetable. Butcomputers struggle with this kind of semi-structured content – they don’t<strong>und</strong>erstand how websites are structured.It’s possible to teach computers to pull information out of particular websites, butthis involves a human spending time showing it which parts of a page do what.This can work on a small scale but takes too much human time and effort whendealing with large numbers of websites; it is also confo<strong>und</strong>ed if the website beingstudied changes even slightly.By the end of the Diadem project, Gottlob hopes to have built a system that candeal with a specified country’s property market, analysing tens of thousands ofestate agents’ websites and presenting the properties discovered to users. Theresult won’t simply be a web page with links to other pages that may containrelevant information, as with traditional search engines; it will be a structureddataset drawn from the data objects fo<strong>und</strong> on sites within the domain, which caneasily be searched or further processed by other software applications.Companies like Google, Microsoft and Yahoo! have already expressed interest inDiadem’s results, which could lead to the next generation of search engines,going beyond the limitations of keyword searching.

Hurra! Ihre Datei wurde hochgeladen und ist bereit für die Veröffentlichung.

Erfolgreich gespeichert!

Leider ist etwas schief gelaufen!