12.07.2015 Views

The Computational Materials Repository

The Computational Materials Repository

The Computational Materials Repository

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

66 <strong>Computational</strong> <strong>Materials</strong> <strong>Repository</strong>schemas, but we concentrate on the default schema that is used by the PHPUI.<strong>The</strong> first challenge is to find a database schema that allows storing ofheterogeneous data in a relational database without knowing exactly what kindof analysis should be performed. (If we knew the analysis requirements, then wecould derive an optimal table layout with standard database approaches as forexample with the entity relationship model[31].)We will first show why the the straight forward approach fails and then theCMR solution. We use a relational MySQL database (see section 3.4.3) thatstores data in tables consisting of named columns and rows. <strong>The</strong> straight forwardapproach of storing all data in a single indexed table will not work because (A)the row-size is limited to 65535 bytes and the column-count to 4096 columns orless depending on the data that is stored in it[32] (B) adding a new column tothe table means to add it to every row that is already in the database and isexpensive (C) users can add arbitrary fields of arbitrary types with the samename, which will eventually result in type conflicts for the same column name.Fig. 3.4 is used to illustrate the problems.n rowsm columnsid Ekin Epot valid ...1 .1 .01 0 ...2 .2 .02 False ...n ... ... ... ...Figure 3.4: A table with n rows and m columns. <strong>The</strong> upload of the second piece ofdata results in a conflict because there can be only one variable type per column.(A) <strong>The</strong> row-size limit is quickly reached especially if strings are stored: if128 bytes per string were reserved, then there would be space for 512 columnswhich is quickly reached considering that data from multiple simulators andan arbitrary number of custom fields can be added. (B) <strong>The</strong> set of columnscannot be determined beforehand because users can create new custom fieldnames at any time. Every new column will result in a table reorganization thatmodifies all n already existing rows. This can result easily result in a delay ofseveral minutes depending on the size of the table. (C) <strong>The</strong> above table shows atype conflict: an earlier version defined valid to be an integer value while thefollowing uses boolean. In MySQL every column has an fixed type that cannotbe altered. <strong>The</strong>refore the upload with the boolean value would fail.Since we don’t know how the data will be analyzed and we cannot create ahuge sparse table because the fields cannot be identified beforehand, a pragmaticapproach was chosen; the variables are divided by type and written into onesingle table. An example is shown in Fig. 3.5. This approach results in a 5tables, one for strings, doubles, dates, booleans, and one for arrays.This schema allows fast querying, but when retrieving a whole db-file withj fields it would result in j database join operations which are expensive and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!