08.02.2013 Views

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

5.3. QAD GRID PLATFORM SERVER 123<br />

Data elements (e.g. a single 1D spectrum, or a peak picking result). This<br />

is <strong>the</strong> most basic element reflecting <strong>the</strong> actual in<strong>for</strong>mation physically<br />

stored in <strong>the</strong> Grid. There exist two classes <strong>of</strong> data:<br />

1. RAW Data: This kind <strong>of</strong> data usually exists as text or binary files<br />

on <strong>the</strong> plat<strong>for</strong>m server. These files can get quite large and easily<br />

exceed several gigabytes per file. Depending on <strong>the</strong> network path<br />

transferring <strong>the</strong>se files can take several minutes to several hours.<br />

Especially if large quantities <strong>of</strong> data is requested from a single node<br />

network traffic speed can decrease drastically.<br />

2. <strong>Analysis</strong> (intermediate) Result Data: This type <strong>of</strong> data is usually<br />

stored in database tables and fetched by database queries. The size<br />

<strong>of</strong> data fetched per query is usually comparatively low. However,<br />

if many workers query a database <strong>the</strong> overall per<strong>for</strong>mance drops as<br />

in <strong>the</strong> file based case.<br />

To model this we have chosen to use a rooted directed acyclic graph (DAG).<br />

This naturally introduces hierarchies and links between elements and allows <strong>for</strong><br />

fast searches. Studies are directly connected to <strong>the</strong> root, groups and datasets<br />

are part <strong>of</strong> a study and data elements are modeled as <strong>the</strong> leafs. Each leaf in<br />

a tree corresponds to an actual physical file or database entry <strong>of</strong> a result.<br />

A nice feature <strong>of</strong> this structure is that is allows <strong>for</strong> linking any kind <strong>of</strong><br />

in<strong>for</strong>mation to each element (node) without losing <strong>the</strong> DAG’s properties. For<br />

example, metadata (e.g. patient metadata, timestamp when a datum was<br />

added to <strong>the</strong> system, or in<strong>for</strong>mation about physical location <strong>of</strong> a file) can be<br />

directly linked to a leaf. Especially <strong>the</strong> latter will be <strong>of</strong> importance <strong>for</strong> data<br />

distribution and (subsequent) data location in <strong>the</strong> Grid (see section 5.3.2).<br />

Task Management<br />

To split up a large problem some problem specific method must break this up<br />

into smaller tasks. These tasks contain parameters describing<br />

� system data, such as: task priority, task owner, draft flag and linked<br />

task (see section 5.3.4)<br />

� target specification: sometimes a particular worker has to handle a task<br />

� <strong>the</strong> type <strong>of</strong> this task (e.g. peak picking or file copy)<br />

� (optional) dependencies on ano<strong>the</strong>r tasks<br />

� (optional) target worker that has to handle this particular task<br />

� what input data is needed<br />

� where results should be written to (e.g. database or file)<br />

� fur<strong>the</strong>r task dependent parameters - such as fitting variables or window<br />

sizes

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!