27.07.2013 Views

Design, Implementation, and Performance Evaluation of Flash ...

Design, Implementation, and Performance Evaluation of Flash ...

Design, Implementation, and Performance Evaluation of Flash ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

1878<br />

Application run time (ms)<br />

2500<br />

2000<br />

1500<br />

1000<br />

500<br />

0<br />

S. J. AHN, J. M. CHOI, D. H. LEE, S. H. NOH, S. L. MIN AND Y. K. CHO<br />

1) issue non-blocking read request for initial 4KB <strong>of</strong> data<br />

2) check if read request is completed<br />

a. if not completed, wait for completion<br />

3) issue non-blocking read request for the next 4KB <strong>of</strong> data<br />

4) sum all values in the read data<br />

5) count from the specified initial value to 0 (dummy loop)<br />

6) check if the total amount <strong>of</strong> read data is 1MB<br />

a. if it is, terminate program<br />

b. otherwise goto step 2)<br />

Fig. 10. Pseudo code for the synthetic workload using non-blocking read.<br />

App. execution time (ms)<br />

2500<br />

2400<br />

2300<br />

2200<br />

2100<br />

2000<br />

1900<br />

1800<br />

1700<br />

1600<br />

T_conv T_FSOC<br />

1500<br />

5000 10000 15000 20000 25000<br />

Computation overhead (iteration count <strong>of</strong> dummy loop)<br />

(a) Measured application execution time <strong>of</strong> synthetic workload.<br />

TFSOC_parallel derived from Eq. (4) T_FSOC measured from the experiment<br />

800 1000 1200 1400 1600 1800 2000<br />

Tcomp (ms)<br />

Application run time (ms)<br />

2500<br />

2000<br />

1500<br />

1000<br />

T_conv_parallel derived from Eq.(3) T_conv measured from the experiment<br />

500<br />

0<br />

800 1000 1200 1400 1600 1800 2000<br />

Tcomp (ms)<br />

(b) Comparison <strong>of</strong> measured application execution time <strong>and</strong> derived value from Eqs. (3) <strong>and</strong> (4)<br />

(with parameters N = 256, Tmedia + Ttrans = 5.6, Tdriver = 0.72, TFS_host = 0.24, TFS_FSOC = 0.42,<br />

Tstub = 0.75, <strong>and</strong> Tcomp_serial = 38).<br />

Fig. 11. Result <strong>of</strong> the synthetic workload experiment.<br />

variable is less than 15000, the application run time <strong>of</strong> both the conventional storage device<br />

<strong>and</strong> the FSOC are bounded by the I/O time, which consists <strong>of</strong> the file/storage device<br />

access time <strong>and</strong> the data transfer time. In this range, since most <strong>of</strong> the computation time<br />

is hidden by the I/O time, the application run time <strong>of</strong> both the conventional storage de-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!