1. Copying the PCMDI data on<strong>to</strong> the terabyte disks2. Shipping the terabyte disks <strong>from</strong> NCAR, in Boulder, Colorado, USA, <strong>to</strong> <strong>BMRC</strong> in Melbourne,Vic<strong>to</strong>ria, Australia3. Copying the required data <strong>to</strong> ‘gale’4. Checking the <strong>to</strong>tal file size for consistency5. Processing the ‘daily’ data files <strong>to</strong> extract a sub-set (i.e., geographical region) <strong>of</strong> the data6. Copying the data files <strong>to</strong> long term archive s<strong>to</strong>rage on ‘sam’has been slightly better than 1 GB per month, or approximately equivalent <strong>to</strong> a continuous (24/7) datarate <strong>of</strong> 3.4 Mbps which is 0.425 MB/s (i.e., about half the speed <strong>of</strong> having a virtual USB-1 port connectedall the way across the thousands <strong>of</strong> kilometres between the two sites: NCAR and <strong>BMRC</strong>). Looking atit another way, this is about 100 times faster than trying <strong>to</strong> download the data on a 56 kbps dial-upmodem, which for the ten (10) terabytes we have downloaded so far, in only 9 months, could have takennearly 80 years by dial-up modem.No matter how you look at it, a terabyte is a lot <strong>of</strong> data.Acronyms and Abbreviations<strong>BMRC</strong>CDDVDFTPGBIEEEkbpsLANMBMbpsNCARBureau <strong>of</strong> Meteorology Research CentreCompact <strong>Disk</strong>Digital Video <strong>Disk</strong>File <strong>Transfer</strong> Pro<strong>to</strong>colgigabyteInstitute <strong>of</strong> Electrical and Electronic Engineerskilobits per secondLocal Area Networkmegabytemegabits per secondNational Center for Atmospheric Research (USA)netCDF Network Common <strong>Data</strong> Form (or Format)NCOPCnetCDF Opera<strong>to</strong>rsPersonal ComputerPCMDI Program for Climate Model Diagnosis and IntercomparisonSSHTBUCARURLUSBSecure ShellterabyteUniversity Corporation for Atmospheric Research (USA)Universal Resource Loca<strong>to</strong>rUniversal Serial Bus8
References[1] A.V. Aho, B.W. Kernighan, and P.J. Weinberger. The AWK Programming Language. Addison-Wesley,first edition, 1988.[2] J.K. Ousterhout. Tcl and the Tk Toolkit. Addison-Wesley, first edition, 1994.[3] L. Wall, T. Christiansen, and R.L. Schwartz. Programming Perl. O’Reilly and Associates, secondedition, 1996.A<strong>Data</strong> <strong>Transfer</strong> Activity LogThe following listing shows the contents <strong>of</strong> a manual log file kept by the author (LH) during the datatransfer work performed <strong>to</strong> extract files <strong>from</strong> the eighth terabyte disk. It contains many things notmentioned elsewhere in this document, and is included because it provides details <strong>of</strong> some <strong>of</strong> the moremundane tasks associated with moving such large slabs <strong>of</strong> data:1 bLog.disk82 ----------34 20050720,11005 --------6 1. On the "turtles" Linux machine, connect the LaCie disk-8, re-boot,7 and obtain a list <strong>of</strong> files:89 # mount /mnt/lacie?10 # cd /mnt/lacie?11 # ls -lR > /tmp/disk8files.list1213 2. On "gale", retrieve the "ls -lR" file list and reduce it <strong>to</strong> "ls -R" form:1415 % cd /bm/gkeep/lih/bmrc/afm/p0316 % ftp turtles17 ...1819 ftp> cd /tmp20 ftp> get disk8files.list21 ftp> quit2223 % strip-ls-lR disk8files.list | cat -r > disk8files.txt2425 3. Used email <strong>to</strong> send the file "disk8files.txt" <strong>to</strong> Aurel Moise <strong>to</strong> enable him26 <strong>to</strong> select the list <strong>of</strong> files he wants transferred <strong>from</strong> TB disk-8 <strong>to</strong> "sam".272829 20050801,130030 --------31 1. Aurel sent me email with an attached file called "disk8-bmrc.txt"32 containing the list <strong>of</strong> files <strong>to</strong> be extracted <strong>from</strong> LaCie TB disk #8.33 Saved the email <strong>to</strong> a file and edited it <strong>to</strong> leave the list <strong>of</strong> files34 with some comments <strong>to</strong> explain the file contents.3536 2. Split the list in<strong>to</strong> its separated "data{N}" part files:3738 % vi disk8-bmrc.txt39 { ... look for "data{N}" boundaries ... }40 { ... and save separate "p81" files ... }4142 % foreach f (data*-p81.txt)43 foreach? echo $f44 foreach? list-size.pl $f45 foreach? end4647 data17-p81.txt4849 list-size: List number <strong>of</strong> netCDF files: 13150 list-size: List <strong>to</strong>tal netCDF file size: 94924369408 => ( 88 GB)51 list-size: File sizes were obtained by: Reference: 1315253 data18-p81.txt5455 list-size: List number <strong>of</strong> netCDF files: 13556 list-size: List <strong>to</strong>tal netCDF file size: 83972739968 => ( 78 GB)57 list-size: File sizes were obtained by: Reference: 1355859 data3-p81.txt6061 list-size: List number <strong>of</strong> netCDF files: 5062 list-size: List <strong>to</strong>tal netCDF file size: 67138637448 => ( 63 GB)63 list-size: File sizes were obtained by: Reference: 506465 data4-p81.txt6667 list-size: List number <strong>of</strong> netCDF files: 34568 list-size: List <strong>to</strong>tal netCDF file size: 297743829948 => (277 GB)69 list-size: File sizes were obtained by: Reference: 3457071 data6-p81.txt7273 list-size: List number <strong>of</strong> netCDF files: 44274 list-size: List <strong>to</strong>tal netCDF file size: 300798324764 => (280 GB)75 list-size: File sizes were obtained by: Reference: 4427677 data9-p81.txt7879 list-size: List number <strong>of</strong> netCDF files: 29780 list-size: List <strong>to</strong>tal netCDF file size: 78253874644 => ( 73 GB)81 list-size: File sizes were obtained by: Reference: 2978283 Files lists "data4-p81.txt" and "data6-p81.txt" are probably <strong>to</strong>o large84 <strong>to</strong> be processed all at once, so split these in<strong>to</strong> two parts each named85 "data4-p81.txt", "data4-p82.txt", "data6-p81.txt", and "data6-p82.txt".868788 20050802,103089 --------90 1. Check the sizes <strong>of</strong> the list files "data4-p81.txt", "data4-p82.txt",91 "data6-p81.txt", and "data6-p82.txt":9293 % foreach f (data4-p* data6-p*)94 foreach? echo $f95 foreach? list-size.pl $f96 foreach? end97 data4-p81.txt9899 list-size: List number <strong>of</strong> netCDF files: 177100 list-size: List <strong>to</strong>tal netCDF file size: 149147469924 => (139 GB)101 list-size: File sizes were obtained by: Reference: 177102103 data4-p82.txt104105 list-size: List number <strong>of</strong> netCDF files: 168106 list-size: List <strong>to</strong>tal netCDF file size: 148596360024 => (138 GB)107 list-size: File sizes were obtained by: Reference: 168108109 data6-p81.txt110111 list-size: List number <strong>of</strong> netCDF files: 217112 list-size: List <strong>to</strong>tal netCDF file size: 149489610900 => (139 GB)113 list-size: File sizes were obtained by: Reference: 217114115 data6-p82.txt116117 list-size: List number <strong>of</strong> netCDF files: 225118 list-size: List <strong>to</strong>tal netCDF file size: 151308713864 => (141 GB)119 list-size: File sizes were obtained by: Reference: 225120121 These should now be manageable chunks.122123 2. Commence the transfer <strong>of</strong> files in the "data3-p81.txt" list <strong>from</strong>124 the LaCie TB disk #8 <strong>to</strong> the "gwork" area:125126 % nxt-run 3 81127 nxt-run: 20050802,110938: Running part file: data3-p81.txt128 ftp log file is: ftp-20050802-110938.log129130 The "ftp-20050802-110938.log" file contained error messages like:131132 ERROR: Error changing direc<strong>to</strong>ry!133 afm-ftp: ERROR: Direc<strong>to</strong>ry\134 (/mnt/lacie1/data3/sresa2/atm/da/tasmin/gfdl_cm2_1/run1) not found!135136 The LaCie TB disk #8 is mounted on "/mnt/lacie2", NOT "/mnt/lacie1" !!!137138 3. Edit the "${HOME}/afm-ftp.config" file <strong>to</strong> change the line:139140 set RemTopDir "/mnt/lacie1"141142 <strong>to</strong> read, instead:143144 set RemTopDir "/mnt/lacie2"145146 4. Re-commence the transfer <strong>of</strong> files in the "data3-p81.txt" list <strong>from</strong>147 the LaCie TB disk #8 <strong>to</strong> the "gwork" area:148149 % nxt-run 3 81150 nxt-run: 20050802,111416: Running part file: data3-p81.txt151 ftp log file is: ftp-20050802-111416.log152153 And a little later, check the progress:154155 % gwstats156 ...157 nxt-run: 20050802,111416: Running part file: data3-p81.txt158159 Tue Aug 2 11:23:32 AEST 2005160161 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:162 4484496 /bm/gwork/lih/bmrc/afm/p03163164 Number <strong>of</strong> files in this part: 50165 Number <strong>of</strong> files already have: 0166 Number <strong>of</strong> files transferred.: 89