08.02.2013 Views

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

162CHAPTER 6. PROTEO<strong>MI</strong>CS.NET - PRODUCT-ORIENTED CASE STUDIES<br />

29<br />

30 // open t a s k found<br />

31 c h a n g e s t a t e ( ” P r o c e s s i n g Task ID : ” + c u r t a s k I D ) ;<br />

32<br />

33 // p r o c e s s t a s k<br />

34<br />

35 S t r i n g s o u r c e f i l e = params [ 1 ] ;<br />

36 S t r i n g t a r g e t f i l e = s o u r c e f i l e + ” . f i l t e r e d ” ;<br />

37 S t r i n g tophat param = params [ 2 ] ;<br />

38 S t r i n g t a r g e t d i r = params [ 4 ] ;<br />

39<br />

40 S t r i n g cmd = ”/home/ c o c k t a i l / conrad /work/phd/PROTEO<strong>MI</strong>CS WORKER/<br />

41 PROTEO<strong>MI</strong>CSWORKER apply tophat/ UseTopHat ” +<br />

42 s o u r c e f i l e + ” ” + tophat param ;<br />

43<br />

44 exec cmd (cmd ) ;<br />

45<br />

46 . . .<br />

47<br />

48<br />

49 // Move f i l e to new d i r e c t o r y<br />

50 r e s = f s o u r c e . renameTo (new F i l e ( t a r g e t d i r h a n d l e ,<br />

51<br />

52 . . .<br />

6.2.6 Allowing <strong>for</strong> Easy Creation and Integration <strong>of</strong> Grid Applications<br />

on <strong>the</strong> Example <strong>of</strong> Analyzing Molecular Dynamics<br />

Simulations<br />

Background<br />

The goal <strong>of</strong> this project was to build a pipeline <strong>for</strong> dimensionality reduction<br />

and analysis <strong>of</strong> molecular dynamics simulation data. This is a three step<br />

process:<br />

Import data into <strong>the</strong> system: To enable access to <strong>the</strong> data it must be<br />

copied to a temporary directory <strong>of</strong> this project accessible by <strong>the</strong> server.<br />

Subsequently, <strong>the</strong> datasets can be selected <strong>for</strong> import. This includes extraction<br />

<strong>of</strong> meta-data (such as structure <strong>of</strong> <strong>the</strong> simulated molecule) and<br />

conversion into <strong>the</strong> “trr” (Gromacs trajectory <strong>for</strong>mat) <strong>for</strong>mat, which<br />

is used internally. After successful conversion <strong>the</strong> trans<strong>for</strong>med data is<br />

moved to <strong>the</strong> project directory and a description, metadata and an ID<br />

is inserted into <strong>the</strong> database. The ID is necessary to access <strong>the</strong> data in<br />

fur<strong>the</strong>r analyses.<br />

Trans<strong>for</strong>m data: To prepare <strong>the</strong> data <strong>for</strong> analysis it must be converted to<br />

an internal binary <strong>for</strong>mat that can be read by <strong>the</strong> analysis algorithms.<br />

Because an analysis only needs parts <strong>of</strong> <strong>the</strong> full dataset this step is not<br />

included in <strong>the</strong> import process.<br />

Analyze data: The analysis <strong>of</strong> <strong>the</strong> trans<strong>for</strong>med data is per<strong>for</strong>med and <strong>the</strong><br />

results written to <strong>the</strong> database.<br />

Since <strong>the</strong> data is usually quite large (several Gigabytes) it makes perfectly<br />

sense to process many datasets in parallel.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!