08.02.2013 Views

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

New Statistical Algorithms for the Analysis of Mass - FU Berlin, FB MI ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

5.3. QAD GRID PLATFORM SERVER 127<br />

<strong>the</strong> current version <strong>of</strong> <strong>the</strong> worker is transferred as a zip archive and<br />

unzipped into a new sub-directory.<br />

5. Fur<strong>the</strong>r commands - as stored in <strong>the</strong> database <strong>for</strong> this kind <strong>of</strong> worker -<br />

are executed.<br />

6. The worker is started via <strong>the</strong> worker specific command line stored in <strong>the</strong><br />

QAD Grid’s database during registration (see section 5.5.2.<br />

Task Scheduling and Provision<br />

One <strong>of</strong> <strong>the</strong> central functions <strong>of</strong> <strong>the</strong> Grid plat<strong>for</strong>m server is to provide jobs and<br />

<strong>the</strong>ir respective details <strong>for</strong> workers. This means, <strong>the</strong>re must exist two basic<br />

functions on <strong>the</strong> plat<strong>for</strong>m server:<br />

� Receive jobs from some instance and insert <strong>the</strong>m into <strong>the</strong> central job<br />

queue. The jobs can be send from some (authorized) worker or from a<br />

Grid plat<strong>for</strong>m itself where a user has started an analysis that results in<br />

a set <strong>of</strong> jobs.<br />

� Provide jobs to workers: each authorized worker can request jobs <strong>of</strong> a<br />

particular kind from <strong>the</strong> server (see below). A job contains all needed<br />

in<strong>for</strong>mation <strong>the</strong> workers needs, such as algorithm parameters, location<br />

<strong>of</strong> data to be analyzed etc.<br />

Job/Worker Matching<br />

Each worker can handle exactly one particular kind <strong>of</strong> job, such as copy a file,<br />

per<strong>for</strong>m an analysis or classify an item (see section 5.4). Hence, each job and<br />

each worker is assigned a so called job type id (JTI) tag. If a worker requests<br />

a job it sends its JTI tag and <strong>the</strong> plat<strong>for</strong>m server checks if unprocessed jobs<br />

tagged with this JTI exist. If this is <strong>the</strong> case <strong>the</strong> first job in <strong>the</strong> queue is marked<br />

“in progress” and <strong>the</strong> parameters transferred to <strong>the</strong> requesting worker.<br />

Requesting Data<br />

To handle a task a worker mostly needs a dataset to e.g. per<strong>for</strong>m an analysis<br />

on. This data is stored at <strong>the</strong> central Grid server and usually at some workers<br />

within <strong>the</strong> Grid. To get this data <strong>the</strong> worker queries <strong>the</strong> plat<strong>for</strong>m server to<br />

get a list <strong>of</strong> all nodes that currently host that particular dataset. The request<br />

includes <strong>the</strong> geographical location and <strong>the</strong> id <strong>of</strong> <strong>the</strong> needed dataset. The<br />

resulting list includes <strong>the</strong> machine’s IP addresses ordered by <strong>the</strong> (geographical)<br />

distance to <strong>the</strong> requesting worker. We could also have used <strong>the</strong> upload speed<br />

<strong>of</strong> <strong>the</strong> target as order criterion but as hosting nodes must have a large upload<br />

bandwidth <strong>the</strong> geographical location is considered to be more important to<br />

save total network (Internet) bandwidth.<br />

Using <strong>the</strong> resulting list <strong>the</strong> worker tries to connect to and request from <strong>the</strong><br />

closest node to get <strong>the</strong> needed data (see section 5.3.1). If a connection fails it<br />

will try <strong>the</strong> next machine. If all connections fail it will request <strong>the</strong> data from<br />

<strong>the</strong> central server.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!