New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
5.3. QAD GRID PLATFORM SERVER 123<br />
Data elements (e.g. a single 1D spectrum, or a peak picking result). This<br />
is <strong>the</strong> most basic element reflecting <strong>the</strong> actual in<strong>for</strong>mation physically<br />
stored in <strong>the</strong> Grid. There exist two classes <strong>of</strong> data:<br />
1. RAW Data: This kind <strong>of</strong> data usually exists as text or binary files<br />
on <strong>the</strong> plat<strong>for</strong>m server. These files can get quite large and easily<br />
exceed several gigabytes per file. Depending on <strong>the</strong> network path<br />
transferring <strong>the</strong>se files can take several minutes to several hours.<br />
Especially if large quantities <strong>of</strong> data is requested from a single node<br />
network traffic speed can decrease drastically.<br />
2. <strong>Analysis</strong> (intermediate) Result Data: This type <strong>of</strong> data is usually<br />
stored in database tables and fetched by database queries. The size<br />
<strong>of</strong> data fetched per query is usually comparatively low. However,<br />
if many workers query a database <strong>the</strong> overall per<strong>for</strong>mance drops as<br />
in <strong>the</strong> file based case.<br />
To model this we have chosen to use a rooted directed acyclic graph (DAG).<br />
This naturally introduces hierarchies and links between elements and allows <strong>for</strong><br />
fast searches. Studies are directly connected to <strong>the</strong> root, groups and datasets<br />
are part <strong>of</strong> a study and data elements are modeled as <strong>the</strong> leafs. Each leaf in<br />
a tree corresponds to an actual physical file or database entry <strong>of</strong> a result.<br />
A nice feature <strong>of</strong> this structure is that is allows <strong>for</strong> linking any kind <strong>of</strong><br />
in<strong>for</strong>mation to each element (node) without losing <strong>the</strong> DAG’s properties. For<br />
example, metadata (e.g. patient metadata, timestamp when a datum was<br />
added to <strong>the</strong> system, or in<strong>for</strong>mation about physical location <strong>of</strong> a file) can be<br />
directly linked to a leaf. Especially <strong>the</strong> latter will be <strong>of</strong> importance <strong>for</strong> data<br />
distribution and (subsequent) data location in <strong>the</strong> Grid (see section 5.3.2).<br />
Task Management<br />
To split up a large problem some problem specific method must break this up<br />
into smaller tasks. These tasks contain parameters describing<br />
� system data, such as: task priority, task owner, draft flag and linked<br />
task (see section 5.3.4)<br />
� target specification: sometimes a particular worker has to handle a task<br />
� <strong>the</strong> type <strong>of</strong> this task (e.g. peak picking or file copy)<br />
� (optional) dependencies on ano<strong>the</strong>r tasks<br />
� (optional) target worker that has to handle this particular task<br />
� what input data is needed<br />
� where results should be written to (e.g. database or file)<br />
� fur<strong>the</strong>r task dependent parameters - such as fitting variables or window<br />
sizes