30.07.2015 Views

Actas JP2011 - Universidad de La Laguna

Actas JP2011 - Universidad de La Laguna

Actas JP2011 - Universidad de La Laguna

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Actas</strong> XXII Jornadas <strong>de</strong> Paralelismo (<strong>JP2011</strong>) , <strong>La</strong> <strong>La</strong>guna, Tenerife, 7-9 septiembre 2011TABLE IIIRestart times (in s)NPB ALC Base LiveVar Zero Incr.BT 4.44 5.22 4.55 11.15CG 11.71 6.02 4.56 4.54EP 0.10 0.09 0.09 0.24FT 58.36 48.80 49.92 156.46IS 22.03 22.05 15.01 20.83LU 2.08 2.04 2.08 4.90MG 33.12 33.12 23.81 66.14SP 4.31 4.32 4.08 9.42mum time obtained is reported.The main goal of the experiment is to measure theoverhead introduced by the computation of the hashfunctions and the inspections nee<strong>de</strong>d to create theincremental checkpoint. The hash function selectedin CPPC for these experiments was MD5. From theresults, it can be observed that the overhead introducedby the incremental checkpointing technique ishid<strong>de</strong>n by the gain obtained from the reduction incheckpoint size. Results for the creation of the fullcheckpoint in the incremental technique allow alsoto assess the gain obtained when solely applying thezero-blocks exclusion.In general, both the live variable analysis, whoseoverhead is moved to compile time, and the incrementalcheckpointing technique with zero-blocks exclusion,perform better than the base approach. Insome cases the reduction in checkpoint overhead canbe as high as 30 − 40% (CG or IS).CPPC can be configured so that the checkpoint isbuilt in parallel with the execution of the applicationby creating new threads. Thus, the application executiondoes not have to be stalled until the checkpointsare created and the above latencies may behid<strong>de</strong>n.C. Restart overheadRestart times are shown in Table III. The restarttime inclu<strong>de</strong>s the read of the checkpoint files and therestart of the application to the point where the thecheckpoint was dumped. Again, at least 10 executionswere performed for each application and theminimum time obtained is reported. The memorywas freed before each execution to avoid the effectof page cache and to guarantee that checkpoint filesare read from disk.Column labeled as “Zero” shows the restart overheadwhen there is no incremental checkpoint file butthe full one. These results allow to evaluate the overheadwhen applying only the zero-blocks exclusion,which is always less than the overhead of the baseapproach.The incremental checkpointing technique presentsa high overhead at restart compared to the others.This is due to a larger volume of data to be movedand read in the case of incremental checkpointing(that can be calculated as the sum of the last threecolumns of Table I). A possible approach to reducethis overhead would be to bring together thefull checkpoint file and the incremental ones into anunique file at the checkpoint server before a restartis required. The number of incremental checkpointswill have a great influence in the restart overhead.There exist studies [12] that provi<strong>de</strong> a mo<strong>de</strong>l to <strong>de</strong>terminethe optimal number of incremental checkpointsbetween two consecutive full checkpoints.V. Related WorkThe live variable analysis, presented in Section II-A, can be seen as a complementary approach tomemory exclusion techniques proposed by Plank etal. [13].As regards incremental checkpointing, as mentionedin Section II-B, there exist in the literaturea number of techniques to implement it in SLC [5],[6], [7], [14], [15], [16]. The implementation proposedin this paper is inspired by the hash-based approaches[6], [16] but it is inten<strong>de</strong>d for ALC. Usingan application-level approach reduces drastically thenumber of memory blocks to be checked in runtimeand, thus, the overhead of the approach. The reductionin the number of analyzed blocks implies also areduction in the size of the hash tables to be stored.This reduction allows to store these tables into mainmemory instead of disk, reducing even more the overheadof the technique. Additionally, the size of thegenerated checkpoint files is reduced through the <strong>de</strong>tectionand elimination of zero-blocks.The i<strong>de</strong>a of not storing zero-blocks has a certainsimilarity to the technique used in the SLCtool Berkeley <strong>La</strong>b’s Checkpoint/Restart (BLCR) Library[17] to exclu<strong>de</strong> zero pages, that is, those thathave never been touched and logically contain all zeros.Other alternative present in the literature to reducethe checkpoint size is data compression. Itwas implemented, for instance, in the CATCH compiler[18] and ickp checkpointer [19]. Experimentalresults show that compression significantly reducescheckpoint sizes. However, the potential benefits ofcompression for reducing the overhead of checkpointing<strong>de</strong>pend on the time required to compress data,the compression rate and the ratio of processor speedto disk speed.CATCH also implements adaptive checkpoint,that is, it uses a heuristic algorithm to <strong>de</strong>terminethe optimal places, in terms of checkpoint size, toinsert checkpoints. This technique could be usefulfor programs with large variations in memory usage.All the techniques mentioned so far focus on reductionof the checkpoint file sizes. Another wayto reduce the computational and I/O cost of checkpointingis to avoid the storage of checkpoint files onthe parallel file system. In [20] Plank et al. proposedto replace stable storage with memory and processorredundancy. Some recent works [21], [22], [23] haveadapted the technique, known as diskless checkpointing,to contemporary architectures. The main draw-<strong>JP2011</strong>-717

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!