03.04.2018 Views

SciDex Whitepaper

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

The ScieEngine supports most of the data protocols and structures such as the majority of<br />

Harvesting protocols (such as METS and OAI-OMH) and the ProvONE Data Structure<br />

Specification.<br />

This tool allows intensive usage of AI training in order to optimize the search as well as the<br />

building of new smart (compliant) meta-datasets. Initial features that can be used by classifiers<br />

are identified, including content-based and behavioral features which are identified using a<br />

clustering algorithm (OPTICS). A classifier (SVM) is created for data classification and<br />

regression analysis. In order to better search and discover, a registry for storing arbitrary xml<br />

metadata scheme is used. CRUD services are automatically generated for the registered<br />

schemes.<br />

User input<br />

User<br />

Metadatas<br />

Identifying<br />

features<br />

Dataset<br />

Integrator<br />

The <strong>SciDex</strong> framework provides users with sets of smart manual & automated indexing tools<br />

which help data providers index and maintain their metadata. The indexing tools support the<br />

majority of databases including those supporting big data and file formats such as Proprietary<br />

(Oracle, IBM, Terradata, etc.), Open-Source (MongoDB, Hadoop, etc) and Cloud Services (Google<br />

App Engine, Amazon EMR, etc).<br />

4.2 Data Contribution<br />

Data Providers Registry<br />

The <strong>SciDex</strong> Marketspace maintains a registry of data providers. The data providers are ranked<br />

based on the quality of the data they provide. The rank of each provider is visible to the <strong>SciDex</strong><br />

community.<br />

17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!