The Computational Materials Repository

More documents

Recommendations

Info

18 Introduction and usage of CMRAll the different components of CMR are depicted in Fig. 2.1. CMR simplifiesthe handling of heterogeneous data from electronic structure codes. It providesthe same interface for handling the data independent of the original file format(and file names). The general interface makes scripting easy and reusable. Usersof electronic structure codes are typically interested in obtaining and savingquantities like atomic positions, energies, and forces. Often though also codespecific input parameters and results are relevant. For this reason CMR extractsmost of the variables from the original output file and makes use of the ASE[16](Atomic Simulation Environment) interface (if applicable) to get the relevantdata in a unified representation in terms of variable names and units.The extracted data is stored in a file format that we call db-file. db-filescan be read on any linux machine that has Python[17] and CMR installed.The command line interface enables simple everyday tasks on these files likeviewing content and performing basic editing. Normally all db-files are moved toa single directory that we call the db-file repository(3.3.2.1). With the CMRpython interface queries can be run on the database as shown in section 2.2.The data cannot only be stored in a collection of files. As depicted in figure 2.1the data from the db-files can be uploaded to a database and queried with thepython interface which will result in faster searches, because the data is indexed.More elaborate installations can make use of the PHP/HTML interface thatprovides a graphical user interface for searching and viewing data while Silo 1features a workbench to analyze data. Last but not least so-called agents mayrun in the background – either invisible from the users or under user control –and perform certain tasks as for example grouping of db-files or preparing dataneeded by the user interfaces.Currently CMR supports the import of GPAW, Dacapo, VASP and ASEtrajectory files. Additionally CSV[18] files are supported which can be read andwritten by OpenOffice or Microsoft Excel.2.1 Working with CMRWorking with a database is quite different from the common approach of storingdata in a file system: files and directories are non-existent and therefore thedata has to be identified in an other way. CMR implements an abstraction layerthat simplifies finding of results by keywords or fields. A field is simply aname/value pair as for example energy/12.0 or program/gpaw.In this section we discuss how to organize the data to be able to do the sameas with the “old” approach and additionally profit from storing it with CMR ina database. Please note that the python interface supports to querying thedb-file repository as well as the database (MySQL database) while agentswork only with the database. However this does not change the way the data isorganized.1 Silo was initiated and is maintained by Jens Strabo Hummelshøj, at the time of writing apost-doc at SUNCAT Center for Interface Science and Catalysis at Stanford
2.1 Working with CMR 19There are few requirements that need to be identified in order to workefficiently with the database: First, the results need to be found after puttingthem into the database. Keywords replace directory and file names. For instancethe file with the path 211surfaces/Ag211/edge/H.gpw would get the keywords211surface,Ag211,edge. There is no need to put H as a keyword, for findingpurposes, because we can search/restrict by atom type. However, if the count ofthe atoms matters as for N2 then it would make sense to include it. (The reasonis that when looking for N2 we would also get the results with the single atomN.) The advantage of keywords over directory names is that the order doesn’tmatter and they enable to look at arbitrary subsets. When the two keywords211surface and edge and the requirement of presence of the atom H are combinedin a query, we find all kinds of 211 surfaces with an H atom at the edge position.To look for these files in subdirectories would be considerably more cumbersome.Second, we would like to read the data in a similar way as the native output file.This is achieved with the python user interface as described later in section 2.4or with the interface that views the data in a web browser as shown already inthe introduction in Fig. 1.2. Third we profit from the database capability tosearch efficiently for the defined keywords and fields. Fourth, the results shouldprovide more information than just the numbers from the output file. db-filesare capable of storing scripts, in-/or output files, calculation parameters andcustom fields as for example surface=211. Fifth, the results should be traceable.By creating groups (described later in the section 2.7) that reference the useddata is conserved.One might wonder what is the difference between the keyword 211surfaceand the field surface=211, because both define the same data set. The differentcharacteristics can be seen when grouping calculations according to criteriaspecific criteria. The chemisorption energy E chem is calculated as E chem =E Z − E X − E Y , where E X the adsorbate X in the gas-phase, E Y the cleansurface with atoms Y and E Z is the total energy of surface with atoms Y withthe adsorbate Z. If we had only the keywords, then we would have to know everysingle keyword and loop manually over all possible combinations. In pseudo codeit would look as follows:f o r s u r f a c e i n [ 1 1 1 , 2 1 1 , . . . ] :f o r adsorbate i n [H, O, . . . ] :f i n d r e s u l t with keyword s u r f a c e+adsorbateE chem = . . .This is not efficient because every possible combination has to be checked -even if there is no data in the database and because the actually available surfacesand adsorbates have to be known. They cannot automatically be determinedfrom the database.A better way is to use the keyword to identify a certain type of calculationand the fields to combine them. This can be written as the following list of rules:• X.keywords contains adsorbate• Y.keywords contains surface
Page 3: Document HistoryThis document bases
Page 8 and 9: 8 Contents
Page 10 and 11: 10 CONTENTS3.3.2.3 CMR Database . .
Page 12 and 13: 12 Introductionis an integral part
Page 14 and 15: 14 IntroductionFigure 1.2: The PHP/
Page 16 and 17: 16 Introduction
Page 20 and 21: 20 Introduction and usage of CMR•
Page 22 and 23: 22 Introduction and usage of CMRimp
Page 24 and 25: 24 Introduction and usage of CMRFig
Page 26 and 27: 26 Introduction and usage of CMRAt
Page 34 and 35: 34 Introduction and usage of CMR1 T
Page 36 and 37: 36 Introduction and usage of CMRloo
Page 38 and 39: 38 Introduction and usage of CMRto
Page 40 and 41: 40 Introduction and usage of CMRAAf
Page 42 and 43: 42 Introduction and usage of CMRTo
Page 44 and 45: 44 Introduction and usage of CMRmem
Page 46 and 47: 46 Introduction and usage of CMRato
Page 54 and 55: 54 Introduction and usage of CMR
Page 56 and 57: 56 Computational Materials Reposito
Page 68 and 69:
68 Computational Materials Reposito
Page 70 and 71:
Page 72 and 73:
Page 74 and 75:
Page 76 and 77:
Page 78 and 79:
Page 80 and 81:
80 Appendixputer system, Date, HF,
Page 82 and 83:
82 Appendix4.2 PHPUI script to cont
Page 84 and 85:
84 Appendix4.3 Deployment Examples
Page 86 and 87:
86 Appendix4.4 Inside a db-fileA mi
Page 88 and 89:
88 Bibliography
Page 90 and 91:
90 BIBLIOGRAPHY[11] Anubhav Jain, G
Page 92 and 93:
92 BIBLIOGRAPHY[35] XML Technology.
show all

The Computational Materials Repository

Create successful ePaper yourself

Delete template?

Save as template?