72 <strong>Computational</strong> <strong>Materials</strong> <strong>Repository</strong>class GPAWSchema( Schema ) :def i n i t ( s e l f ) :. . .c a l c u l a t o r = {. . .’ P o i s s o n S t e n c i l ’ : {” o p t i o n a l ” : False ,” type ” : ” s t r i n g ” ,” python type ” : ” i n t o r s t r ” } ,’ XCFunctional ’ : {” o p t i o n a l ” : False ,” type ” : ” s t r i n g ” ,” python type ” : ” s t r ” } ,’ Epot ’ : {” o p t i o n a l ” : False ,” u n i t ” : Units .HARTREE,” type ” : ” double ” ,” python type ” : ” f l o a t ” } ,’ AtomicNumbers ’ : {” o p t i o n a l ” : False ,” desc ” : ”A l i s t o f atomic numbers . ” ,” type ” : ” l o n g a r r a y ” ,” python type ” : ”numpy . array ” ,” m o s t i n n e r p y t h o n t y p e ” : ” i n t ” } ,’ RubidiumFingerprint ’ : {” o p t i o n a l ” : True ,” type ” : ” s t r i n g ” ,” python type ” : ” s t r ” } ,. . .Figure 3.8: Extract of the GPAW’s cmr-schema. PoissonStencil, XCFunctional,AtomicNumbers and RubidiumFingerprint are variables. type defines the internaltype of the variable, python type the original type, atomic numbers. <strong>The</strong> field unitdefines that Epot has the unit Hartree and the desc is a descriptive string of what ofthe value that the variable holds.
3.3 System Components and Processes 73original output file, all types are mapped to types that CMR supports internally(see Fig. 3.9 for a list). If an exception occurs during the conversion the partialdb-file is written to disk and an error message shown to the user. <strong>The</strong> reason forwriting the partial file is that DFT calculations are expensive and data shouldnot just be lost without being inspected.Technical Details: <strong>The</strong> converters were designed as plug-ins (2.9.3). <strong>The</strong> reasonPython type Int. type Python Int. Repr.int long type 1 1float decimal type 1.2 1.2complex array 1+2j 1+2jstr string ”john” ”john”list array [[1,2,3], [4,5,6]] [[1,2,3], [4,5,6]]tuple array ((1,2,3), (4,5,6)) [[1,2,3], [4,5,6]]numpy.array array numpy.array([]) []Figure 3.9: In order to enable CMR to run anywhere without third-party software,CMR uses only standard python types internally. During the conversion process orwhen writing fields to a db-files the types are mapped to the internal ones. <strong>The</strong> followingdata types are supported: boolean, int, float, complex, string, datetime.datetime, listand arrays of the previously mentioned data types. All other known data types aremapped to the internal representation.is to assure that license incompatibilities can be circumvented. CMR uses GPL.If a converter for a new file format should be created that causes a license conflict(because of some dependencies or restrictions), it is possible to distribute theconverter under a different license than GPL. <strong>The</strong> drawback is just that CMRcannot distribute this converter and the users have to install it separately.Conversion examples: Fig. 2.4, 2.2, Fig. 2.4 and Fig. 2.33.3.4.2 Upload<strong>The</strong> upload process consists of three independently executed tasks: (1) movesfiles from the db-file repository (3.3.2.1) to an inbox directory, (2) validates thedb-file and (3) uploads the data to the database (3.3.2.3).Purpose: <strong>The</strong> upload process copies files from the db-file repository to thedatabase.Usage: <strong>The</strong> upload tasks are scheduled to run periodically (every 10 seconds) tocheck, if new files were written to the db-file repository, if so task (1) renamesand moves and the files to an internal “inbox” folder. Task (2) checks the validityof the new arrivals and moves invalid files to a dedicated folder and the validones to the “valid” folder. Task (3) gets notified and uploads the new data tothe database.Implementation Details: Task (3) is responsible to check, if the newly arriveddata is new, a duplicate, an updated or outdated data. This is determined withthe unique identifier and the last modified time stamps as explained in section 2.6.