12.07.2015 Views

Transfer of Data from Terabyte Disk to BMRC Archives

Transfer of Data from Terabyte Disk to BMRC Archives

Transfer of Data from Terabyte Disk to BMRC Archives

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>Transfer</strong> <strong>of</strong> <strong>Data</strong> <strong>from</strong> <strong>Terabyte</strong> <strong>Disk</strong> <strong>to</strong> <strong>BMRC</strong> <strong>Archives</strong>Lawson Hanson, and Aurel MoiseBureau <strong>of</strong> Meteorology Research CentreSeptember 30, 2005AbstractThis document describes the processes involved in copying netCDF files (which originated<strong>from</strong> the PCMDI data portal) <strong>from</strong> a terabyte disk attached <strong>to</strong> a FireWire port on a PCrunning Linux 1 which has a network connection <strong>to</strong> the <strong>BMRC</strong> LAN and provides FTP access<strong>to</strong> enable files <strong>to</strong> be copied <strong>to</strong> another host on the LAN.Commencing with a list <strong>of</strong> the netCDF files on the disk, another list (usually a sub-set) <strong>of</strong>files <strong>to</strong> be transferred <strong>to</strong> <strong>BMRC</strong> is prepared, and data is extracted <strong>from</strong> the terabyte diskvia scripted FTP sessions running on the ‘gale’ server, using an FTP ‘get’ operation <strong>to</strong> takecopies <strong>of</strong> the required files <strong>from</strong> the Linux PC <strong>to</strong> ‘gale’, and then these files (some <strong>of</strong> whichrequire further processing <strong>to</strong> extract a sub-set, i.e., a geographical region, <strong>of</strong> the data) arefinally copied <strong>to</strong> the ‘sam’ system for long term archive s<strong>to</strong>rage.1 IntroductionNo matter how you look at it, a terabyte is a lot <strong>of</strong> data. It is a million megabytes. A typical CD holdsonly 700 MB, so a terabyte would fill over 1428 CDs. A typical DVD holds about 4.7 GB <strong>of</strong> data, so aterabyte would fill over 212 DVDs. If you tried <strong>to</strong> transfer a terabyte <strong>of</strong> data over a 56k dial-up modemnetwork link, which at best might manage a sustained throughput <strong>of</strong> around three <strong>to</strong> four kilobytes persecond, that would take at least 250 million seconds, which is over 2,893 days, that is, 7.9 years! Evenwith a high speed broadband network access link, running at say 300 kbps, that would still take over26.7 million seconds, which is more than 308 days (i.e., over 10 months). And that is just for the firstterabyte, and contains no allowances for network outages, etc.Starting on December 20 th , 2004, and pausing <strong>to</strong> take s<strong>to</strong>ck on September 20 th , 2005, (i.e., in 9 months)we have transferred approximately ten terabytes <strong>of</strong> data <strong>from</strong> the PCMDI data portal, over <strong>to</strong> removableterabyte disks and have shipped them all the way <strong>from</strong> NCAR in Boulder, Colorado, USA, back <strong>to</strong> <strong>BMRC</strong>in Melbourne, Vic<strong>to</strong>ria, Australia, and have transferred the required data files on<strong>to</strong> long term s<strong>to</strong>ragemedium such as that available on the <strong>BMRC</strong>’s ‘sam’ archive server system. That equates <strong>to</strong> an end-<strong>to</strong>-end(sustained) throughput rate <strong>of</strong> over one terabyte per month, which is like having a rented network linkproviding in excess <strong>of</strong> 3.4 Mbps, running flat out, 24/7.When we last tried <strong>to</strong> estimate the cost <strong>of</strong> such a network link, the best we could come up with was asatellite link running at 1 Mbps, and costing over $300 per month (each end) plus call costs (more $?),and then there was a download limit <strong>of</strong> 3 GB per month, with an excess data download fee <strong>of</strong> 10 centsper megabyte! The cost <strong>of</strong> downloading a terabyte (i.e., one million megabytes) would be somewherenear $100,000. By comparison, the method described in this document has cost less than $10,000 for thehardware, plus the cost <strong>of</strong> s<strong>of</strong>tware development programming time, and some program moni<strong>to</strong>ring anddata handling time, which all up might amount <strong>to</strong> a few person months, plus a few hundred (or perhapsits a thousand) dollars in TB disk shipping costs, the <strong>to</strong>tal <strong>of</strong> which is very much less that $100,000, and<strong>of</strong> course, a network link running at only 1 Mbps, would have taken more than three times longer.1 Linux is a registered trademark <strong>of</strong> Linus Torvalds.1


LinuxPCTBdisk↕LAN‘gale’host‘gwork’disk⇕Jumbo-Frame‘sam’hostarchivesystem2 DescriptionFigure 1: TB disk data transfer <strong>to</strong> intermediate and archive systemThe first part <strong>of</strong> the data transfer process (i.e., filling a TB disk with PCMDI data) uses as its startingpoint, the list <strong>of</strong> the files which are available on the PCMDI data portal (this list is obtained by runningthe Unix 2 ‘ls -lR’ command at the PCMDI site). From this list, a required files list is extracted, andthis forms the input <strong>to</strong> a list-directed script which was written using the ‘Tcl’[2] language <strong>to</strong> read sucha files list, and <strong>to</strong> use a library <strong>of</strong> FTP functions <strong>to</strong> make connections <strong>to</strong> the PCMDI server <strong>to</strong> retrievecopies <strong>of</strong> the files in that list, one by one, across a fairly high speed network link back <strong>to</strong> a remote Linuxhost which provides access <strong>to</strong> a removable LaCie 3 TB disk. During this process, the hierarchical direc<strong>to</strong>ry(i.e., an inverted tree) structure, as found on the PCMDI site, is maintained, so that the file copies ares<strong>to</strong>red in analogous locations in the <strong>BMRC</strong> file archive systems. When a remote TB disk becomes full,the disk is unmounted, packaged up, and physically shipped back <strong>to</strong> <strong>BMRC</strong> in Melbourne.When a newly filled TB disk is received in Melbourne, it is connected <strong>to</strong> another similar Linux systemwhich provides FTP access <strong>to</strong> a user who runs another series <strong>of</strong> list-directed scripts <strong>to</strong> extract the <strong>BMRC</strong>required files <strong>from</strong> the TB disk, placing copies on an intermediate server (‘gale’), where if required, the(i.e., daily) data files may be processed <strong>to</strong> extract a sub-set <strong>of</strong> the data for use at <strong>BMRC</strong>, and then thesefiles are transferred <strong>to</strong> the ‘sam’ server for long term file archive. It is this second data transfer phasewhich this document will describe in more detail in the subsequent sections.Each TB disk is then sent on <strong>to</strong> other parties who have shared in the cost <strong>of</strong> providing access <strong>to</strong> thePCMDI data, and when these organisations have finished with the disk, it then becomes available <strong>to</strong> bere-cycled again for use in collecting the next terabyte <strong>of</strong> data.The diagram shown in Figure 1 depicts the interconnection over which data is transferred between theLinux PC with the FireWire (i.e., IEEE 1394, high performance serial bus) interface <strong>to</strong> the TB disk, andthe ‘gale’, and ‘sam’ servers with intermediate temporary and long term archive s<strong>to</strong>rage, respectively.2 UNIX is a registered trademark <strong>of</strong> The Open Group.3 URL: www.lacie.com2


3 Initial Setup On ‘gale’The initial set up 4 on ‘gale’ required the creation <strong>of</strong> a new direc<strong>to</strong>ry <strong>to</strong> hold a series <strong>of</strong> scripts, and <strong>from</strong>which the data transfer processes get run:1. Set up a new direc<strong>to</strong>ry on ‘gale’ for the data transfer work, and change <strong>to</strong> that direc<strong>to</strong>ry. Forexample:% mkdir -p /bm/gkeep/lih/bmrc/afm/p03% cd /bm/gkeep/lih/bmrc/afm/p032. Set up the ‘afm-ftp.tcl’, ‘run’, ‘nxt-run’, and ‘au<strong>to</strong>-nxt-run’ scripts for operation on ‘gale’ (at<strong>BMRC</strong> we have a different version <strong>of</strong> Tcl/Tk <strong>to</strong> that which was used at NCAR, so the ‘afm-ftp.tcl’script had <strong>to</strong> be modified <strong>to</strong> use a different version <strong>of</strong> the FTP package).Note: Other users will need <strong>to</strong> take copies <strong>of</strong> these scripts <strong>from</strong> the direc<strong>to</strong>ry named:/bm/gkeep/lih/bmrc/afm/p033. Copy each <strong>of</strong> the configuration files for the scripts <strong>to</strong> your ‘HOME’ direc<strong>to</strong>ry, and modify them sothat the scripts can run on ‘gale’:% cp configs/afm-ftp.config $HOME% cp configs/nxt-run.config $HOME% cp configs/au<strong>to</strong>-nxt-run.config $HOMENote: Other users will need <strong>to</strong> take copies <strong>of</strong> these configuration files <strong>from</strong> the direc<strong>to</strong>ry named:/bm/gkeep/lih/bmrc/afm/p03/configs4. Edit the configuration files <strong>to</strong> point <strong>to</strong> direc<strong>to</strong>ries on ‘gale’, and <strong>to</strong> set user name and passwordfor connection <strong>to</strong> ‘turtles’, i.e., the Linux PC 5 <strong>to</strong> which the TB disk will be connected, and uponwhich you will need a user account. The following listings show the important lines at the end <strong>of</strong>each <strong>of</strong> the script configuration files. First, ‘afm-ftp.config’:% tail $HOME/afm-ftp.config#set ListFile "./remoteDirFileList.txt"set LocTopDir "/bm/gwork/lih/bmrc/afm/p03"set RemTopDir "/mnt/lacie1"set FtpHost "134.178.4.71"set FtpUsrName "username"set FtpPasswd "password"You will need <strong>to</strong> change at least the values for the ‘LocTopDir’, ‘FtpUsrName’, and ‘FtpPasswd’entries, and possibly ‘RemTopDir’, and ‘FtpHost’. The ‘LocTopDir’ item is a new direc<strong>to</strong>ry in your‘/bm/gwork’ area 6 that you have set up <strong>to</strong> contain the data transferred by this work.The ‘nxt-run.config’ file should not require any modification, unless you had <strong>to</strong> change the filename ‘remoteDirFileList.txt’ because <strong>of</strong> some unforeseen ‘clash’.% tail $HOME/nxt-run.config#remDFlist="remoteDirFileList.txt"export remDFlist4 Obviously, I hope, users attempting <strong>to</strong> make use <strong>of</strong> this system <strong>of</strong> scripts and processes on another platform will need<strong>to</strong> make suitable changes <strong>to</strong> the absolute direc<strong>to</strong>ry path names and other identifiers <strong>to</strong> suit their particular system.5 Some additional configuration work was required <strong>to</strong> be done on the Linux PC <strong>to</strong> enable it <strong>to</strong> recognise the FireWireinterface and <strong>to</strong> have it ‘mount’ the TB disk at a known place.6 You may need <strong>to</strong> negotiate a larger than normal (i.e., ‘quota’) reserved space for these file transfers. Trying <strong>to</strong> do thisin small chunks is rather inefficient.3


Finally, ‘au<strong>to</strong>-nxt-run.config’:% tail $HOME/au<strong>to</strong>-nxt-run.config#MailTo="l.hanson@bom.gov.au"export MailToRunPath="/bm/gkeep/lih/bmrc/afm/p03"export RunPathRemDFlist="remoteDirFileList.txt"export RemDFlistYou will need <strong>to</strong> change the ‘MailTo’, and ‘RunPath’ entries. The ‘RunPath’ item is the ‘gale’direc<strong>to</strong>ry you have set up <strong>to</strong> hold the scripts and data lists, and <strong>from</strong> which you will run the scriptswhich au<strong>to</strong>mate the various processes <strong>of</strong> this work.5. Create, or take a copy <strong>of</strong> the ‘listalldirs.txt’ file (used by scripts like ‘list-size.pl’), whichis an ‘ls -lR’ format listing <strong>of</strong> all <strong>of</strong> the files available on the PCMDI data portal.4 <strong>Transfer</strong> <strong>Data</strong> From <strong>Terabyte</strong> <strong>Disk</strong> To ‘gale’<strong>Data</strong> is transferred <strong>from</strong> the TB disk (connected on the Linux PC) over <strong>to</strong> a staging area on ‘gale’(rather than directly accross <strong>to</strong> the long term archive s<strong>to</strong>rage system) because some files (i.e., huge dailydata files) need <strong>to</strong> be further processed <strong>to</strong> extract a sub-set <strong>of</strong> the data they contain (<strong>to</strong> reduce the finalamounts <strong>of</strong> s<strong>to</strong>rage required). The steps required in this data transfer process are:1. Partition the data list file (e.g., ‘data?.txt’), if required, in<strong>to</strong> a series <strong>of</strong> more manageable part listfiles:% ~lih/bin/df-part.sh data1.txtThis creates a series <strong>of</strong> files named like ‘data1-p1.txt’, ‘data1-p2.txt’, ‘data1-p3.txt’, etc.Or if the data list file is deemed small enough <strong>to</strong> handle in one chunk, then make a copy <strong>of</strong> it withan appropriate part number, for example:% cp data1.txt data1-p1.txt2. Use the ‘list-size.pl’ script <strong>to</strong> determine the <strong>to</strong>tal size <strong>of</strong> the netCDF files in the data list file(or part data list file), for example:% list-size.pl data1-p1.txtlist-size: List number <strong>of</strong> netCDF files: 55list-size: List <strong>to</strong>tal netCDF file size: 49394698640 => ( 46 GB)list-size: File sizes were obtained by: SymLinks: 55list-size: Symbolic Links: file count: 553. Start the transfer <strong>of</strong> data <strong>from</strong> the terabyte disk <strong>to</strong> ‘gale’ by running the ‘nxt-run’ script with theappropriate arguments. For example, <strong>to</strong> commence the transfer <strong>of</strong> the files listed in the part file‘data1-p1.txt’, you would run the command:% nxt-run 1 1This process can take a considerable amount <strong>of</strong> time. For example, when connected <strong>to</strong> a USB-1port, the transfer <strong>of</strong> approximately 46 gigabytes <strong>of</strong> data <strong>to</strong>ok just over seventeen (17) hours, hence,the effective data transfer rate was about 0.8 megabytes per second.Soon after we started, we replaced the USB port with a FireWire (i.e., IEEE 1394) interface, and thisconsiderably reduced the data transfer times, producing data transfer rates <strong>of</strong> about 11.5 megabytesper second, so a data transfer <strong>of</strong> 46 GB now takes approximately sixty eight (68) minutes.4


5 Consistency Checks Of The <strong>Transfer</strong>red <strong>Data</strong>Check that the files transferred <strong>from</strong> the terabyte disk <strong>to</strong> the ‘/bm/gwork/lih/bmrc/afm/p03’ direc<strong>to</strong>ryon ‘gale’ are all there, and are <strong>of</strong> the correct size.1. Collect a list <strong>of</strong> the files that now reside in the destination direc<strong>to</strong>ry on ‘gale’:% cd /bm/gwork/lih/bmrc/afm/p03% ls -lR > gw-data1-p1.list% cd /bm/gkeep/lih/bmrc/afm/p03% cp /bm/gwork/lih/bmrc/afm/p03/gw-data1-p1.list .Other users will probably need <strong>to</strong> substitute their own direc<strong>to</strong>ry path names and file names in place<strong>of</strong> the ones shown here.2. Run the ‘list-size.pl’ script <strong>to</strong> display the size <strong>of</strong> the files listed in the data part list file:% ~/lih/bin/list-size.pl data1-p1.txtlist-size: List number <strong>of</strong> netCDF files: 55list-size: List <strong>to</strong>tal netCDF file size: 49394698640list-size: File sizes were obtained by: Reference: 553. Run the ‘list-size.pl’ script <strong>to</strong> display the size <strong>of</strong> the files listed in the ‘ls -lR’ list file <strong>of</strong> thedata now residing in the ‘gwork’ area:% ~/lih/bin/list-size.pl gw-data1-p1.listlist-size: List number <strong>of</strong> netCDF files: 55list-size: List <strong>to</strong>tal netCDF file size: 49394698640list-size: File sizes were obtained by: List: 55It should be apparent that the set <strong>of</strong> data has been transferred <strong>to</strong> ‘gale’ successfully, with all files presentand <strong>of</strong> the correct size. If the two <strong>to</strong>tal netCDF file size items are different, then you should check further<strong>to</strong> discover what has changed, and perhaps re-run ‘nxt-run’ on the same data file part list again.The words ‘Reference’, and ‘List’ following the phrase ‘File sizes were obtained by’ in the output<strong>of</strong> the ‘list-size.pl’ command show where the individual file size numbers were obtained <strong>to</strong> calculatethe ‘List <strong>to</strong>tal netCDF file size’ figure.The ‘data1-p1.txt’ list is <strong>of</strong> ‘ls -R’ form, hence the file sizes are obtained <strong>from</strong> the reference list (i.e.,‘listalldirs.txt’), but the ‘gw-data1-p1.list’ file is an ‘ls -lR’ list which contains its own file sizeinformation, hence the ‘list-size.pl’ script uses that information <strong>to</strong> caculate its <strong>to</strong>tal.6 Extracting a Sub-Set <strong>of</strong> Huge Daily <strong>Data</strong> FilesIf the data transferred <strong>to</strong> the ‘gwork’ area on ‘gale’ contains any daily data files, then we run anotherscript called ‘cut-<strong>to</strong>-oz’ on those daily data files (using another cut-down ‘ls -R’ list which containsonly entries for those daily files upon which we wish <strong>to</strong> operate).1. Use the ‘get-lsR-ents’ script <strong>to</strong> extract a list <strong>of</strong> the daily data files that need <strong>to</strong> be processed:% cd /bm/gwork/lih/bmrc/afm/p03% get-lsR-ents ’\/da\/’ gw-data1-p1.list > gw-daily.listNotice that the backslash character (’\’) must be used <strong>to</strong> escape or hide the meaning <strong>of</strong> characters(like ‘/’) which are considered special by AWK[1].5


2. Run the ‘cut-<strong>to</strong>-oz’ script on the list <strong>of</strong> daily data files:% cut-<strong>to</strong>-oz gw-daily.listdata1/pdcntrl/atm/da/hus/bcc_cm1/run1:Created file(1): hus_A2_2170_2171.ncCreated file(2): hus_A2_2172_2173.nc...data1/pdcntrl/atm/da/psl/miub_echo_g/run1:Created file(24): psl_A2_a05_0276-0295.nc7 Archive <strong>Data</strong> From ‘gale’ To ‘sam’When the set <strong>of</strong> data has been successfully transferred <strong>from</strong> the terabyte disk over <strong>to</strong> the ‘gwork’ areaon ‘gale’, the data needs <strong>to</strong> be transferred <strong>to</strong> the ‘sam’ server for long term archive.To simplify this process, the shell script ‘dir2sam’ has been written <strong>to</strong> use ‘rcp -r’ <strong>to</strong> copy a direc<strong>to</strong>rystructure <strong>from</strong>‘gale’ <strong>to</strong> an associated direc<strong>to</strong>ry on ‘sam’ using the ‘samsrv2jf’ (i.e., ‘jumbo frame’) host.1. Change <strong>to</strong> the ‘/bm/gwork’ area and run the ‘dir2sam’ script:% cd /bm/gwork/lih/bmrc/afm/p03% ~/lih/bin/dir2sam data1dir2sam: Current direc<strong>to</strong>ry: /bm/gwork/lih/bmrc/afm/p03dir2sam: Project direc<strong>to</strong>ry: bmrc/afm/p03dir2sam: Copying "data1" direc<strong>to</strong>ry <strong>to</strong>: /sammrgh/ext/lih/bmrc/afm/p03dir2sam: Total direc<strong>to</strong>ry size (1k blocks): 48237688 data1dir2sam: Commencing at date/time: Thu Jan 27 10:31:31 AEDT 2005dir2sam: Completion at date/time: Thu Jan 27 11:24:13 AEDT 2005The archival process, transferring the data <strong>from</strong> ‘gale’ <strong>to</strong> ‘sam’ <strong>to</strong>ok about 53 minutes, so the datatransfer rate was approximately 15.5 megabytes per second.2. On the ‘sam’ server, obtain an ‘ls -lR’ list <strong>of</strong> the ‘data1’ direc<strong>to</strong>ry:% rlogin sam% cd /sammrgh/ext/lih/bmrc/afm/p03% ls -lR > sam-data1-p1.list3. Back on ‘gale’, use ‘rcp’ <strong>to</strong> get the ‘sam-data1-p1.list’ file back <strong>from</strong> ‘sam’:% rcp sam:/sammrgh/ext/lih/bmrc/afm/p03/sam-data1-p1.list .4. Check the ‘sam’ file list size:% list-size.pl sam-data1-p1.listlist-size: List number <strong>of</strong> netCDF files: 55list-size: List <strong>to</strong>tal netCDF file size: 49394698640list-size: File sizes were obtained by: List: 55Note that the value ‘49394698640’ is identical <strong>to</strong> the size <strong>of</strong> the file lists defined in files ‘data1-p1.txt’and ‘gw-data1-p1.list’, i.e., as obtained by running the ‘list-size.pl’ script on those files.6


8 Removing The Intermediate <strong>Data</strong>When you are satisfied that the data has been transferred correctly <strong>from</strong> the terabyte disk <strong>to</strong> ‘gale’, andthen on <strong>to</strong> the ‘sam’ server, you can delete the intermediate data now residing in your ‘gwork’ area on‘gale’:% cd /bm/gwork/lih/bmrc/afm/p03% rm -rf data1This will free up the space <strong>to</strong> be used for the transferring the next set <strong>of</strong> files <strong>from</strong> the TB disk on <strong>to</strong>‘gale’.9 Starting The Next <strong>Data</strong> <strong>Transfer</strong>When the ‘gwork’ direc<strong>to</strong>ry has been emptied in preparation for the next data transfer, you can changeback <strong>to</strong> the intermediate (i.e., ‘gkeep’) area and proceed <strong>to</strong> initiate the transfer <strong>of</strong> the next data partlist, for example, you could check the size <strong>of</strong> the data set listed in ‘data1-p2.txt’ with:% cd /bm/gkeep/lih/bmrc/afm/p03% list-size.pl data1-p2.txtlist-size: List number <strong>of</strong> netCDF files: 89list-size: List <strong>to</strong>tal netCDF file size: 39773262696list-size: File sizes were obtained by: Reference: 89And then commence the data transfer:% nxt-run 1 2Considering that this set <strong>of</strong> data is approximately 39 GB, it will again take a considerable time <strong>to</strong> completethis process. When the new data transfer has completed, and you should again check the number <strong>of</strong> filesand file size correctness before you transfer the data on <strong>to</strong> the ‘sam’ server for long term archive s<strong>to</strong>rage.While the data is being transferred (<strong>from</strong> the TB disk <strong>to</strong> ‘gale’), you can check the data list <strong>to</strong> see if itcontains any daily data which will need <strong>to</strong> be processed with the ‘cut-<strong>to</strong>-oz’ script:% grep ’\/da\/’ data1-p2.txt...10 ConclusionThe data s<strong>to</strong>red on each TB (terabyte) disk consists <strong>of</strong> model output data <strong>from</strong> many different models(retrieved <strong>from</strong> ‘climate.llnl.gov’ using FTP <strong>from</strong> the UCAR host ‘mineral.cgd.ucar.edu’ via anSSH login across <strong>to</strong> ‘m<strong>of</strong>fatt.cgd.ucar.edu’). When each terabyte disk is filled (using a Tcl/FTPscript, running <strong>from</strong> a ‘crontab’ entry), the disk is sent back <strong>to</strong> <strong>BMRC</strong>, connected via a FireWire (i.e.,IEEE 1394) interface <strong>to</strong> a Linux PC. We use a similar list-directed Tcl/FTP script <strong>to</strong> transfer the dataon<strong>to</strong> a couple <strong>of</strong> ‘gwork’ areas on ‘gale’ (in chunks <strong>of</strong> about 200 gigabytes), and then use the ‘rcp’ utility<strong>to</strong> copy the files across <strong>to</strong> ‘sam’ for long term archive (so we can delete the ‘gwork’ area <strong>to</strong> make way forthe next 200 GB chunk).The FireWire interface enables data transfers <strong>to</strong> be about ten times faster (approximately 8 MB/s)than our initial attempts with the terabyte disk connected <strong>to</strong> a USB-1 port (which delivered data atapproximately 0.8 MB/s).The sustained end-<strong>to</strong>-end data throughput, which includes the following processes:7


1. Copying the PCMDI data on<strong>to</strong> the terabyte disks2. Shipping the terabyte disks <strong>from</strong> NCAR, in Boulder, Colorado, USA, <strong>to</strong> <strong>BMRC</strong> in Melbourne,Vic<strong>to</strong>ria, Australia3. Copying the required data <strong>to</strong> ‘gale’4. Checking the <strong>to</strong>tal file size for consistency5. Processing the ‘daily’ data files <strong>to</strong> extract a sub-set (i.e., geographical region) <strong>of</strong> the data6. Copying the data files <strong>to</strong> long term archive s<strong>to</strong>rage on ‘sam’has been slightly better than 1 GB per month, or approximately equivalent <strong>to</strong> a continuous (24/7) datarate <strong>of</strong> 3.4 Mbps which is 0.425 MB/s (i.e., about half the speed <strong>of</strong> having a virtual USB-1 port connectedall the way across the thousands <strong>of</strong> kilometres between the two sites: NCAR and <strong>BMRC</strong>). Looking atit another way, this is about 100 times faster than trying <strong>to</strong> download the data on a 56 kbps dial-upmodem, which for the ten (10) terabytes we have downloaded so far, in only 9 months, could have takennearly 80 years by dial-up modem.No matter how you look at it, a terabyte is a lot <strong>of</strong> data.Acronyms and Abbreviations<strong>BMRC</strong>CDDVDFTPGBIEEEkbpsLANMBMbpsNCARBureau <strong>of</strong> Meteorology Research CentreCompact <strong>Disk</strong>Digital Video <strong>Disk</strong>File <strong>Transfer</strong> Pro<strong>to</strong>colgigabyteInstitute <strong>of</strong> Electrical and Electronic Engineerskilobits per secondLocal Area Networkmegabytemegabits per secondNational Center for Atmospheric Research (USA)netCDF Network Common <strong>Data</strong> Form (or Format)NCOPCnetCDF Opera<strong>to</strong>rsPersonal ComputerPCMDI Program for Climate Model Diagnosis and IntercomparisonSSHTBUCARURLUSBSecure ShellterabyteUniversity Corporation for Atmospheric Research (USA)Universal Resource Loca<strong>to</strong>rUniversal Serial Bus8


References[1] A.V. Aho, B.W. Kernighan, and P.J. Weinberger. The AWK Programming Language. Addison-Wesley,first edition, 1988.[2] J.K. Ousterhout. Tcl and the Tk Toolkit. Addison-Wesley, first edition, 1994.[3] L. Wall, T. Christiansen, and R.L. Schwartz. Programming Perl. O’Reilly and Associates, secondedition, 1996.A<strong>Data</strong> <strong>Transfer</strong> Activity LogThe following listing shows the contents <strong>of</strong> a manual log file kept by the author (LH) during the datatransfer work performed <strong>to</strong> extract files <strong>from</strong> the eighth terabyte disk. It contains many things notmentioned elsewhere in this document, and is included because it provides details <strong>of</strong> some <strong>of</strong> the moremundane tasks associated with moving such large slabs <strong>of</strong> data:1 bLog.disk82 ----------34 20050720,11005 --------6 1. On the "turtles" Linux machine, connect the LaCie disk-8, re-boot,7 and obtain a list <strong>of</strong> files:89 # mount /mnt/lacie?10 # cd /mnt/lacie?11 # ls -lR > /tmp/disk8files.list1213 2. On "gale", retrieve the "ls -lR" file list and reduce it <strong>to</strong> "ls -R" form:1415 % cd /bm/gkeep/lih/bmrc/afm/p0316 % ftp turtles17 ...1819 ftp> cd /tmp20 ftp> get disk8files.list21 ftp> quit2223 % strip-ls-lR disk8files.list | cat -r > disk8files.txt2425 3. Used email <strong>to</strong> send the file "disk8files.txt" <strong>to</strong> Aurel Moise <strong>to</strong> enable him26 <strong>to</strong> select the list <strong>of</strong> files he wants transferred <strong>from</strong> TB disk-8 <strong>to</strong> "sam".272829 20050801,130030 --------31 1. Aurel sent me email with an attached file called "disk8-bmrc.txt"32 containing the list <strong>of</strong> files <strong>to</strong> be extracted <strong>from</strong> LaCie TB disk #8.33 Saved the email <strong>to</strong> a file and edited it <strong>to</strong> leave the list <strong>of</strong> files34 with some comments <strong>to</strong> explain the file contents.3536 2. Split the list in<strong>to</strong> its separated "data{N}" part files:3738 % vi disk8-bmrc.txt39 { ... look for "data{N}" boundaries ... }40 { ... and save separate "p81" files ... }4142 % foreach f (data*-p81.txt)43 foreach? echo $f44 foreach? list-size.pl $f45 foreach? end4647 data17-p81.txt4849 list-size: List number <strong>of</strong> netCDF files: 13150 list-size: List <strong>to</strong>tal netCDF file size: 94924369408 => ( 88 GB)51 list-size: File sizes were obtained by: Reference: 1315253 data18-p81.txt5455 list-size: List number <strong>of</strong> netCDF files: 13556 list-size: List <strong>to</strong>tal netCDF file size: 83972739968 => ( 78 GB)57 list-size: File sizes were obtained by: Reference: 1355859 data3-p81.txt6061 list-size: List number <strong>of</strong> netCDF files: 5062 list-size: List <strong>to</strong>tal netCDF file size: 67138637448 => ( 63 GB)63 list-size: File sizes were obtained by: Reference: 506465 data4-p81.txt6667 list-size: List number <strong>of</strong> netCDF files: 34568 list-size: List <strong>to</strong>tal netCDF file size: 297743829948 => (277 GB)69 list-size: File sizes were obtained by: Reference: 3457071 data6-p81.txt7273 list-size: List number <strong>of</strong> netCDF files: 44274 list-size: List <strong>to</strong>tal netCDF file size: 300798324764 => (280 GB)75 list-size: File sizes were obtained by: Reference: 4427677 data9-p81.txt7879 list-size: List number <strong>of</strong> netCDF files: 29780 list-size: List <strong>to</strong>tal netCDF file size: 78253874644 => ( 73 GB)81 list-size: File sizes were obtained by: Reference: 2978283 Files lists "data4-p81.txt" and "data6-p81.txt" are probably <strong>to</strong>o large84 <strong>to</strong> be processed all at once, so split these in<strong>to</strong> two parts each named85 "data4-p81.txt", "data4-p82.txt", "data6-p81.txt", and "data6-p82.txt".868788 20050802,103089 --------90 1. Check the sizes <strong>of</strong> the list files "data4-p81.txt", "data4-p82.txt",91 "data6-p81.txt", and "data6-p82.txt":9293 % foreach f (data4-p* data6-p*)94 foreach? echo $f95 foreach? list-size.pl $f96 foreach? end97 data4-p81.txt9899 list-size: List number <strong>of</strong> netCDF files: 177100 list-size: List <strong>to</strong>tal netCDF file size: 149147469924 => (139 GB)101 list-size: File sizes were obtained by: Reference: 177102103 data4-p82.txt104105 list-size: List number <strong>of</strong> netCDF files: 168106 list-size: List <strong>to</strong>tal netCDF file size: 148596360024 => (138 GB)107 list-size: File sizes were obtained by: Reference: 168108109 data6-p81.txt110111 list-size: List number <strong>of</strong> netCDF files: 217112 list-size: List <strong>to</strong>tal netCDF file size: 149489610900 => (139 GB)113 list-size: File sizes were obtained by: Reference: 217114115 data6-p82.txt116117 list-size: List number <strong>of</strong> netCDF files: 225118 list-size: List <strong>to</strong>tal netCDF file size: 151308713864 => (141 GB)119 list-size: File sizes were obtained by: Reference: 225120121 These should now be manageable chunks.122123 2. Commence the transfer <strong>of</strong> files in the "data3-p81.txt" list <strong>from</strong>124 the LaCie TB disk #8 <strong>to</strong> the "gwork" area:125126 % nxt-run 3 81127 nxt-run: 20050802,110938: Running part file: data3-p81.txt128 ftp log file is: ftp-20050802-110938.log129130 The "ftp-20050802-110938.log" file contained error messages like:131132 ERROR: Error changing direc<strong>to</strong>ry!133 afm-ftp: ERROR: Direc<strong>to</strong>ry\134 (/mnt/lacie1/data3/sresa2/atm/da/tasmin/gfdl_cm2_1/run1) not found!135136 The LaCie TB disk #8 is mounted on "/mnt/lacie2", NOT "/mnt/lacie1" !!!137138 3. Edit the "${HOME}/afm-ftp.config" file <strong>to</strong> change the line:139140 set RemTopDir "/mnt/lacie1"141142 <strong>to</strong> read, instead:143144 set RemTopDir "/mnt/lacie2"145146 4. Re-commence the transfer <strong>of</strong> files in the "data3-p81.txt" list <strong>from</strong>147 the LaCie TB disk #8 <strong>to</strong> the "gwork" area:148149 % nxt-run 3 81150 nxt-run: 20050802,111416: Running part file: data3-p81.txt151 ftp log file is: ftp-20050802-111416.log152153 And a little later, check the progress:154155 % gwstats156 ...157 nxt-run: 20050802,111416: Running part file: data3-p81.txt158159 Tue Aug 2 11:23:32 AEST 2005160161 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:162 4484496 /bm/gwork/lih/bmrc/afm/p03163164 Number <strong>of</strong> files in this part: 50165 Number <strong>of</strong> files already have: 0166 Number <strong>of</strong> files transferred.: 89


167168 Information <strong>from</strong> LastFtpLog.: ftp-20050802-111416.log169170 And again, some time later:171172 % gwstats173 ...174 nxt-run: 20050802,111416: Running part file: data3-p81.txt175176 Tue Aug 2 14:02:48 AEST 2005177178 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:179 65439696 /bm/gwork/lih/bmrc/afm/p03180181 Number <strong>of</strong> files in this part: 50182 Number <strong>of</strong> files already have: 0183 Number <strong>of</strong> files transferred.: 50184185 Information <strong>from</strong> LastFtpLog.: ftp-20050802-111416.log186187 5. Obtain a list <strong>of</strong> files now on "gwork", and run the "cut-<strong>to</strong>-oz" process:188189 % cd /bm/gwork/lih/bmrc/afm/p03190 % ls -lR data3 > gw-data3-p81-dk8.txt191 % grep -c ’\.nc’ gw-data3-p81-dk8.txt192 50193194 % cut-<strong>to</strong>-oz data3-p81.txt195 /bm/gwork/lih/bmrc/afm/p03/data3/sresa2/atm/da/tasmin/gfdl_cm2_1/run1196 Created file(1): tasmin_A2.20860101-20901231_OZ.nc197 Created file(2): tasmin_A2.20910101-20951231_OZ.nc198 ...199 ...200 ...201 /bm/gwork/lih/bmrc/afm/p03/data3/sresa2/atm/da/va/mpi_echam5/run1202 Created file(45): va_A2_2046-2052_OZ.nc203 Created file(46): va_A2_2053-2059_OZ.nc204 Created file(47): va_A2_2060-2065_OZ.nc205 Created file(48): va_A2_2081-2087_OZ.nc206 Created file(49): va_A2_2088-2094_OZ.nc207 Created file(50): va_A2_2095-2100_OZ.nc208209 % date210 Wed Aug 3 06:40:53 AEST 2005211212 6. Commence the transfer <strong>of</strong> files listed in "data4-p81.txt" <strong>from</strong> the213 LaCie TB disk #8 <strong>to</strong> the "gwork" area:214215 % nxt-run 4 81216 nxt-run: 20050802,141715: Running part file: data4-p81.txt217 ftp log file is: ftp-20050802-141715.log218219 A few minutes later, check the status:220221 nxt-run: 20050802,141715: Running part file: data4-p81.txt222223 Tue Aug 2 14:20:43 AEST 2005224225 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:226 65086912 /bm/gwork/lih/bmrc/afm/p03227228 Number <strong>of</strong> files in this part: 177229 Number <strong>of</strong> files already have: 0230 Number <strong>of</strong> files transferred.: 2231232 Information <strong>from</strong> LastFtpLog.: ftp-20050802-141715.log233234 On Wednesday morning:235236 % gwstats237 ...238 nxt-run: 20050802,141715: Running part file: data4-p81.txt239240 Wed Aug 3 06:41:19 AEST 2005241242 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:243 160521112 /bm/gwork/lih/bmrc/afm/p03244245 Number <strong>of</strong> files in this part: 177246 Number <strong>of</strong> files already have: 0247 Number <strong>of</strong> files transferred.: 177248249 Information <strong>from</strong> LastFtpLog.: ftp-20050802-141715.log250251252 20050803,0640253 --------254 1. Check how much disk s<strong>to</strong>rage space I’m using:255256 % quo257258 Wed Aug 3 06:43:26 AEST 2005259260 Filesystem Usage Limit Used261 ---------- ----- ----- ----262 /bm/gwork 162837248 300000000 54%263 /bm/gkeep 10011672 20000000 50%264 /bm/ghome 622536 1000000 62%265 /bm/gdata 1239832 40000000 3%266267 2. Obtain a list <strong>of</strong> the "data3" files now on "gwork" (after "cut-<strong>to</strong>-oz"):268269 % cd /bm/gwork/lih/bmrc/afm/p03270271 % ls -lR data3 > gw-data3-p81-oz-dk8.txt272273 3. Commence the copy <strong>of</strong> the "data3" files <strong>from</strong> "gwork" <strong>to</strong> "sam":274275 % cd /bm/gwork/lih/bmrc/afm/p03276277 % dir2sam data3278 dir2sam: Current direc<strong>to</strong>ry: /bm/gwork/lih/bmrc/afm/p03279 dir2sam: Project direc<strong>to</strong>ry: bmrc/afm/p03280 dir2sam: Copying "data3" direc<strong>to</strong>ry <strong>to</strong>: /sammrgh/ext/lih/bmrc/afm/p03281 dir2sam: Total direc<strong>to</strong>ry size (1k blocks): 14866824 data3282 dir2sam: Commencing rcp at date/time: Wed Aug 3 07:24:32 AEST 2005283 dir2sam: Completing rcp at date/time: Wed Aug 3 09:02:46 AEST 2005284 dir2sam: Setting direc<strong>to</strong>ry/file permissions285286287 4. Obtain a listing <strong>of</strong> the "data4" files now on "gwork":288289 % cd /bm/gwork/lih/bmrc/afm/p03290291 % ls -lR data4 > gw-data4-p81-dk8.txt292 % grep -c ’\.nc’ gw-data4-p81-dk8.txt293 177294295 5. Commence the "cut-<strong>to</strong>-oz" process on the "data4" files:296297 % cd /bm/gwork/lih/bmrc/afm/p03298299 % cut-<strong>to</strong>-oz data4-p81.txt300 /bm/gwork/lih/bmrc/afm/p03/\301 data4/sresa1b/atm/da/hus/cccma_cgcm3_1/run1302 Created file(1): hus_a2_sresa1b_1_cgcm3.1_t47_2046_2065_OZ.nc303 Created file(2): hus_a2_sresa1b_1_cgcm3.1_t47_2081_2100_OZ.nc304 ...305 ... (at approximately: 11:18 a.m.):306 ...307 /bm/gwork/lih/bmrc/afm/p03/data4/sresa1b/atm/da/ta/inmcm3_0/run1308 Created file(138): ta_A2_1_OZ.nc309 ...310 ... (at approximately: 12:45 p.m.):311 ...312 Created file(176): tas_A2.22910101-22951231_OZ.nc313 Created file(177): tas_A2.22960101-23001231_OZ.nc314315 6. Commence transferring files listed in "data17-p81.txt" <strong>from</strong> the316 LaCie TB disk <strong>to</strong> the "gwork" area:317318 % cd /bm/gkeep/lih/bmrc/afm/p03319320 % nxt-run 17 81321 nxt-run: 20050803,080838: Running part file: data17-p81.txt322 ftp log file is: ftp-20050803-080838.log323324 A few minutes later:325326 % gwstats327 ...328 nxt-run: 20050803,080838: Running part file: data17-p81.txt329330 Wed Aug 3 08:11:56 AEST 2005331332 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:333 140774544 /bm/gwork/lih/bmrc/afm/p03334335 Number <strong>of</strong> files in this part: 131336 Number <strong>of</strong> files already have: 0337 Number <strong>of</strong> files transferred.: 2338339 Information <strong>from</strong> LastFtpLog.: ftp-20050803-080838.log340341 7. Log in <strong>to</strong> "sam" and obtain a list <strong>of</strong> the "data3" files:342343 % rlogin sam344 % cd /sammrgh/ext/lih/bmrc/afm/p03345 % ls -lR data3 > s-data3dk8-p81-oz.list346 % exit347348 Copy the list back <strong>to</strong> "gkeep":349350 % cd /bm/gkeep/lih/bmrc/afm/p03351 % rcp sam:/sammrgh/ext/lih/bmrc/afm/p03/s-data3dk8-p81-oz.list .352353 Remove the "data3" direc<strong>to</strong>ry on "gwork":354355 % cd /bm/gwork/lih/bmrc/afm/p03356 % rm -rf data3357358 8. Check the status <strong>of</strong> the "data17" file transfers <strong>to</strong> "gwork":359360 % gwstats361 ...362 nxt-run: 20050803,080838: Running part file: data17-p81.txt363364 Wed Aug 3 10:50:37 AEST 2005365366 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:367 156389984 /bm/gwork/lih/bmrc/afm/p03368369 Number <strong>of</strong> files in this part: 131370 Number <strong>of</strong> files already have: 0371 Number <strong>of</strong> files transferred.: 131372373 Information <strong>from</strong> LastFtpLog.: ftp-20050803-080838.log374375 9. Obtain a list <strong>of</strong> the "data17" files now on "gwork":376377 % cd /bm/gwork/lih/bmrc/afm/p03378379 % ls -lR data17 > gw-data17-p81-dk8.txt380381 10. Commence the "cut-<strong>to</strong>-oz" process on the "data17" files:382383 % cd /bm/gwork/lih/bmrc/afm/p03384 % cut-<strong>to</strong>-oz data17-p81.txt385 /bm/gwork/lih/bmrc/afm/p03/data17/20c3m/atm/da/hus/cccma_cgcm3_1/run1386 Created file(1): hus_a2_20c3m_1_cgcm3.1_t47_1961_1980_OZ.nc387 Created file(2): hus_a2_20c3m_1_cgcm3.1_t47_1981_2000_OZ.nc388 ...389 ...390 ...391 /bm/gwork/lih/bmrc/afm/p03/data17/20c3m/atm/da/va/mpi_echam5/run1392 Created file(126): va_A2_1961-1967_OZ.nc393 Created file(127): va_A2_1968-1974_OZ.nc394 Created file(128): va_A2_1975-1981_OZ.nc395 Created file(129): va_A2_1982-1988_OZ.nc396 Created file(130): va_A2_1989-1995_OZ.nc397 Created file(131): va_A2_1996-2000_OZ.nc398399 11. Commence the transfer <strong>of</strong> the "data18" files <strong>from</strong> the LaCie TB disk400 <strong>to</strong> the "gwork" area:401402 % nxt-run 18 81403 nxt-run: 20050803,105947: Running part file: data18-p81.txt404 ftp log file is: ftp-20050803-105947.log10


405406 A few minutes later:407408 % gwstats409 ...410 nxt-run: 20050803,105947: Running part file: data18-p81.txt411412 Wed Aug 3 11:07:45 AEST 2005413414 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:415 150367392 /bm/gwork/lih/bmrc/afm/p03416417 Number <strong>of</strong> files in this part: 135418 Number <strong>of</strong> files already have: 0419 Number <strong>of</strong> files transferred.: 3420421 Information <strong>from</strong> LastFtpLog.: ftp-20050803-105947.log422423 12. Obtain a list <strong>of</strong> the "data4" files on "gwork" after the "cut-<strong>to</strong>-oz"424 process has finished:425426 % cd /bm/gwork/lih/bmrc/afm/p03427428 % ls -lR data4 > gw-data4-p81-dk8-oz.txt429430 % grep -c ’\.nc’ gw-data4-p81-dk8-oz.txt431 177432433 Commence the copy <strong>of</strong> "data4" files <strong>from</strong> "gwork" <strong>to</strong> "sam":434435 % dir2sam data4436 dir2sam: Current direc<strong>to</strong>ry: /bm/gwork/lih/bmrc/afm/p03437 dir2sam: Project direc<strong>to</strong>ry: bmrc/afm/p03438 dir2sam: Copying "data4" direc<strong>to</strong>ry <strong>to</strong>: /sammrgh/ext/lih/bmrc/afm/p03439 dir2sam: Total direc<strong>to</strong>ry size (1k blocks): 32923072 data4440 dir2sam: Commencing rcp at date/time: Wed Aug 3 12:48:04 AEST 2005441 dir2sam: Completing rcp at date/time: Wed Aug 3 15:37:31 AEST 2005442 dir2sam: Setting direc<strong>to</strong>ry/file permissions443444 13. Check the status <strong>of</strong> the data transfer <strong>of</strong> "data18" files <strong>to</strong> "gwork":445446 % gwstats447 ...448 nxt-run: 20050803,105947: Running part file: data18-p81.txt449450 Wed Aug 3 13:49:54 AEST 2005451452 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:453 170693880 /bm/gwork/lih/bmrc/afm/p03454455 Number <strong>of</strong> files in this part: 135456 Number <strong>of</strong> files already have: 0457 Number <strong>of</strong> files transferred.: 135458459 Information <strong>from</strong> LastFtpLog.: ftp-20050803-105947.log460461 This appears <strong>to</strong> have run <strong>to</strong> completion successfully.462463 14. Obtain a list <strong>of</strong> the "data18" files now on "gwork":464465 % cd /bm/gwork/lih/bmrc/afm/p03466 % ls -lR data18 > gw-data18-p81-dk8.txt467 % grep -c ’\.nc’ gw-data18-p81-dk8.txt468 135469470 15. Commence the "cut-<strong>to</strong>-oz" process on files listed in "data18-p81.txt":471472 % cd /bm/gwork/lih/bmrc/afm/p03473 % cp /bm/gkeep/lih/bmrc/afm/p03/data18-p81.txt .474 % cut-<strong>to</strong>-oz data18-p81.txt475 /bm/gwork/lih/bmrc/afm/p03/\476 data18/commit/atm/da/hus/cccma_cgcm3_1/run1477 Created file(1): hus_a2_commit_1_cgcm3.1_t47_2046_2065_OZ.nc478 ...479 /bm/gwork/lih/bmrc/afm/p03/data18/commit/atm/da/hus/gfdl_cm2_1/run1480 Created file(7): hus_A2.20460101-20501231_OZ.nc481 Created file(8): hus_A2.20510101-20551231_OZ.nc482 Created file(9): hus_A2.20560101-20601231_OZ.nc483 Created file(10): hus_A2.20610101-20651231_OZ.nc484 Created file(11): hus_A2.20810101-20851231_OZ.nc485 Created file(12): hus_A2.20860101-20901231_OZ.nc486487 NOTE:488 ----489 Janice requested that I should use "qsub" <strong>to</strong> run the "ncks" processes490 which form a part <strong>of</strong> "cut-<strong>to</strong>-oz", <strong>to</strong> try <strong>to</strong> reduce the load on the491 "gale" computer ... so I used a "" <strong>to</strong> kill this batch.492493 TODO:494 ----495 Need <strong>to</strong> examine the files <strong>to</strong> see if the next file after the last one496 processed: "hus_A2.20860101-20901231_OZ.nc", is still there in its497 full size, and comment out the first twelve file entries in the list,498 modify "cut-<strong>to</strong>-oz" <strong>to</strong> use "qsub", and then re-run the process.499500 NOTE:501 ----502 Janice suggests that I should just "qsub" the entire "cut-<strong>to</strong>-oz" job.503504505 16. Check the amount <strong>of</strong> disk space I’m using on "gwork":506507 % quo508509 Wed Aug 3 14:32:23 AEST 2005510511 Filesystem Usage Limit Used512 ---------- ----- ----- ----513 /bm/gwork 155987544 300000000 51%514 /bm/gkeep 10013312 20000000 50%515 /bm/ghome 623296 1000000 62%516 /bm/gdata 1239832 40000000 3%517518 17. Commence the transfer <strong>of</strong> files in the list "data6-p81.txt" <strong>from</strong>519 the LaCie TB disk #8 <strong>to</strong> "gwork":520521 % nxt-run 6 81522 nxt-run: 20050803,143354: Running part file: data6-p81.txt523 ftp log file is: ftp-20050803-143354.log524525 A few minutes later:526527 % gwstats528 ...529 nxt-run: 20050803,143354: Running part file: data6-p81.txt530531 Wed Aug 3 14:36:37 AEST 2005532533 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:534 153885768 /bm/gwork/lih/bmrc/afm/p03535536 Number <strong>of</strong> files in this part: 217537 Number <strong>of</strong> files already have: 0538 Number <strong>of</strong> files transferred.: 2539540 Information <strong>from</strong> LastFtpLog.: ftp-20050803-143354.log541542 18. Login <strong>to</strong> "sam" <strong>to</strong> obtain a list <strong>of</strong> the "data4" files now on "sam"543544 % rlogin sam545 % cd /sammrgh/ext/lih/bmrc/afm/p03546 % ls -lR data4 > s-data4dk8-p81-oz.list547 % exit548549 % cd /bm/gkeep/lih/bmrc/afm/p03550 % rcp sam:/sammrgh/ext/lih/bmrc/afm/p03/s-data4dk8-p81-oz.list .551552 19. Remove teh "data4" direc<strong>to</strong>ry <strong>from</strong> "gwork":553554 % cd /bm/gwork/lih/bmrc/afm/p03555 % rm -rf data4556557 Check the amount <strong>of</strong> disk space being used on "gale":558559 % quo560561 Wed Aug 3 16:18:00 AEST 2005562563 Filesystem Usage Limit Used564 ---------- ----- ----- ----565 /bm/gwork 149967320 300000000 49%566 /bm/gkeep 10014088 20000000 50%567 /bm/ghome 623312 1000000 62%568 /bm/gdata 1239832 40000000 3%569570 20. Obtain a list <strong>of</strong> the "data17" files on "gwork" after the "cut-<strong>to</strong>-oz"571 process has completed:572573 % cd /bm/gwork/lih/bmrc/afm/p03574 % ls -lR data17 > gw-data17-p81-dk8-oz.txt575576 21. Rename the direc<strong>to</strong>ry "data17" <strong>to</strong> become "data2", and commence the copying577 <strong>of</strong> data <strong>from</strong> "gwork" <strong>to</strong> "sam":578579 % cd /bm/gwork/lih/bmrc/afm/p03580 % mv data17 data2581 % dir2sam data2582 dir2sam: Current direc<strong>to</strong>ry: /bm/gwork/lih/bmrc/afm/p03583 dir2sam: Project direc<strong>to</strong>ry: bmrc/afm/p03584 dir2sam: Copying "data2" direc<strong>to</strong>ry <strong>to</strong>: /sammrgh/ext/lih/bmrc/afm/p03585 dir2sam: Total direc<strong>to</strong>ry size (1k blocks): 21082664 data2586 dir2sam: Commencing rcp at date/time: Wed Aug 3 16:34:10 AEST 2005587 dir2sam: Completing rcp at date/time: Wed Aug 3 17:39:03 AEST 2005588 dir2sam: Setting direc<strong>to</strong>ry/file permissions589590 22. Check the status <strong>of</strong> the TB data transfers:591592 % gwstats593 ...594 nxt-run: 20050803,143354: Running part file: data6-p81.txt595596 Wed Aug 3 16:35:27 AEST 2005597598 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:599 153565600 /bm/gwork/lih/bmrc/afm/p03600601 Number <strong>of</strong> files in this part: 217602 Number <strong>of</strong> files already have: 0603 Number <strong>of</strong> files transferred.: 70604605 Information <strong>from</strong> LastFtpLog.: ftp-20050803-143354.log606607 23. Commence running "cut-<strong>to</strong>-oz" as a batch job on "gale" for the "data18"608 files. After a couple <strong>of</strong> false starts, created "cut-data18" shell609 wrapper <strong>to</strong> run "cut-<strong>to</strong>-oz":610611 % cd /bm/gwork/lih/bmrc/afm/p03612613 % cat cut-data18614 #!/bin/sh615 #616 # Program:617 # cut-data18618 #619 # Author:620 # Lawson Hanson, 20050803.621 #622 # Purpose:623 # A shell wrapper <strong>to</strong> enable "cut-<strong>to</strong>-oz" <strong>to</strong> be run624 # via a "qsub" job on "gale".625 #626 # Send mail at beginning and end <strong>of</strong> request execution:627 # QSUB -mb -me628 # Specify the batch queue:629 # QSUB -q large630 #631 cd /bm/gwork/lih/bmrc/afm/p03632633 /bm/ghome/lih/bin/cut-<strong>to</strong>-oz data18-p81.txt634635 exit 0636637638 % /opt/local/gnqs/bin/qsub ./cut-data18639 Request 49627.gale submitted <strong>to</strong> queue: large.640641 % /opt/local/gnqs/bin/qstat642 Request I.D. Owner Queue StartTime TimeLimit TimeUsed St11


643 -------------- ------- ----- ----- ---------- --------- --------- --644 cut-data18 49627 lih large Q645646 NOTE: This job had NOT started by 0640 hours on the following morning,647 so ran the "qdel" command <strong>to</strong> remove the job <strong>from</strong> the queue:648649 % /opt/local/gnqs/bin/qdel 49627650651652 20050804,0640653 --------654 1. Check the status <strong>of</strong> the "data6-p81.txt" file transfer <strong>from</strong> the LaCie655 TB disk #8 <strong>to</strong> the "gwork" area:656657 % gwstats658 ...659 nxt-run: 20050803,143354: Running part file: data6-p81.txt660661 Thu Aug 4 06:38:24 AEST 2005662663 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:664 236674632 /bm/gwork/lih/bmrc/afm/p03665666 Number <strong>of</strong> files in this part: 217667 Number <strong>of</strong> files already have: 0668 Number <strong>of</strong> files transferred.: 217669670 Information <strong>from</strong> LastFtpLog.: ftp-20050803-143354.log671672 Check the amount <strong>of</strong> disk space I’m using on "gale":673674 % quo675676 Thu Aug 4 06:40:21 AEST 2005677678 Filesystem Usage Limit Used679 ---------- ----- ----- ----680 /bm/gwork 238990768 300000000 79%681 /bm/gkeep 10014608 20000000 50%682 /bm/ghome 623320 1000000 62%683 /bm/gdata 1239832 40000000 3%684685 2. Started NQSII batch jobs <strong>to</strong> run the "cut-<strong>to</strong>-oz" process on the files686 in lists "data18-p81.txt" and "data6-p81.txt" via small shell wrapper687 scripts "cut-data18" and "cut-data6":688689 % /opt/local/gnqs/bin/qsub ./cut-data18690 Request 49988.gale submitted <strong>to</strong> queue: medium.691692 % /opt/local/gnqs/bin/qsub ./cut-data6693 Request 49991.gale submitted <strong>to</strong> queue: medium.694695 % /opt/local/gnqs/bin/qstat -696 Request I.D. Owner Queue Start Time TimeLimit TimeUsed St697 -------------- ------- ------ ------ ---------- -------- -------- --698 show_SMS_varia 20407 ajb rtdss Q699 display_mlaps 45006 r<strong>to</strong> short 8/04 07:12 0 0:01 R700 vertxlr 49548 rab medium 8/03 14:24 0 0:00 0 0:00:42 R701 calculate_bias 49983 aifsop medium 8/04 06:45 0 0:15 0 0:24:48 R702 cut-data18 49988 lih medium 8/04 06:54 0 0:15 0 0:00:22 R703 cut-data6 49991 lih medium Q704 verify 49322 sxy medium W705 vertxlr 47078 rab large 7/30 21:20 0 0:00 0 0:04:13 R706707 3. Commence transfer <strong>of</strong> the files in "data9-p81.txt" <strong>from</strong> the LaCie708 TB disk #8 <strong>to</strong> the "gwork" area:709710 % nxt-run 9 81711 nxt-run: 20050804,073247: Running part file: data9-p81.txt712 ftp log file is: ftp-20050804-073247.log713714 A little while later:715716 % gwstats717 ...718 nxt-run: 20050804,073247: Running part file: data9-p81.txt719720 Thu Aug 4 07:45:25 AEST 2005721722 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:723 197166456 /bm/gwork/lih/bmrc/afm/p03724725 Number <strong>of</strong> files in this part: 297726 Number <strong>of</strong> files already have: 0727 Number <strong>of</strong> files transferred.: 26728729 Information <strong>from</strong> LastFtpLog.: ftp-20050804-073247.log730731 And later still:732733 % gwstats734 ...735 nxt-run: 20050804,073247: Running part file: data9-p81.txt736737 Thu Aug 4 10:17:19 AEST 2005738739 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:740 240890360 /bm/gwork/lih/bmrc/afm/p03741742 Number <strong>of</strong> files in this part: 297743 Number <strong>of</strong> files already have: 0744 Number <strong>of</strong> files transferred.: 297745746 Information <strong>from</strong> LastFtpLog.: ftp-20050804-073247.log747748 4. Received email <strong>to</strong> inform me that the NQSII batch job "49988" has749 finished (i.e., "cut-data18"), so obtain a list <strong>of</strong> the files now750 on the "gwork" area, then rename the "data18" direc<strong>to</strong>ry <strong>to</strong> be751 called "data3", and commence the copying <strong>of</strong> the "data3" direc<strong>to</strong>ry752 over <strong>to</strong> the "sam" archive server:753754 % cd /bm/gwork/lih/bmrc/afm/p03755 % ls -lR data18 > gw-data18-p81-dk8-oz.txt756 % grep -c ’OZ\.nc’ gw-data18-p81-dk8-oz.txt757 135758759 % mv data18 data3760 % dir2sam data3761 dir2sam: Current direc<strong>to</strong>ry: /bm/gwork/lih/bmrc/afm/p03762 dir2sam: Project direc<strong>to</strong>ry: bmrc/afm/p03763 dir2sam: Copying "data3" direc<strong>to</strong>ry <strong>to</strong>: /sammrgh/ext/lih/bmrc/afm/p03764 dir2sam: Total direc<strong>to</strong>ry size (1k blocks): 18571208 data3765 dir2sam: Commencing rcp at date/time: Thu Aug 4 09:04:56 AEST 2005766 dir2sam: Completing rcp at date/time: Thu Aug 4 10:05:45 AEST 2005767 dir2sam: Setting direc<strong>to</strong>ry/file permissions768769 5. Obtain a list <strong>of</strong> the "data9" files now on "gwork", then set up a new770 wrapper script called "cut-data9" and use "qsub" <strong>to</strong> submit the job771 <strong>to</strong> run the "cut-<strong>to</strong>-oz" process on the "data9-p81.txt" files:772773 % cd /bm/gwork/lih/bmrc/afm/p03774 % ls -lR data9 > gw-data9-p81-dk8.txt775776 % cp cut-data6 cut-data9777 % vi cut-data9778 < ... modify <strong>to</strong> use "data9-p81.txt" files ... >779780 % /opt/local/gnqs/bin/qsub ./cut-data9781 Request 50071.gale submitted <strong>to</strong> queue: medium.782783 6. Login <strong>to</strong> "sam" and obtain a list <strong>of</strong> the "data3" files now there,784 and delete the "data3" direc<strong>to</strong>ry <strong>from</strong> the "gwork" area:785786 % rlogin sam787 % cd /sammrgh/ext/lih/bmrc/afm/p03788 % ls -lR data3 > s-data3dk8-p82-oz.list789 % exit790791 % cd /bm/gwork/lih/bmrc/afm/p03792 % rm -rf data3793794 7. Check how much disk space I’m using on "gwork":795796 % quo797798 Thu Aug 4 15:18:34 AEST 2005799800 Filesystem Usage Limit Used801 ---------- ----- ----- ----802 /bm/gwork 147869208 300000000 49%803 /bm/gkeep 10016144 20000000 50%804 /bm/ghome 633784 1000000 63%805 /bm/gdata 1239832 40000000 3%806807 Since it is at less than 50%, and there are more "cut-<strong>to</strong>-oz" runs808 (one current, and one in the queue), then start the transfer <strong>of</strong>809 the "data4-p82.txt" files <strong>from</strong> the LaCie TB disk #8 <strong>to</strong> "gwork":810811 % cd /bm/gkeep/lih/bmrc/afm/p03812 % nxt-run 4 82813 nxt-run: 20050804,152144: Running part file: data4-p82.txt814 ftp log file is: ftp-20050804-152144.log815816 A few minutes later:817818 % gwstats819 ...820 nxt-run: 20050804,152144: Running part file: data4-p82.txt821822 Thu Aug 4 15:29:17 AEST 2005823824 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:825 145735336 /bm/gwork/lih/bmrc/afm/p03826827 Number <strong>of</strong> files in this part: 168828 Number <strong>of</strong> files already have: 0829 Number <strong>of</strong> files transferred.: 7830831 Information <strong>from</strong> LastFtpLog.: ftp-20050804-152144.log832833834 20050809,0720835 --------836 1. Obtain a list <strong>of</strong> the "data4" files now on "gwork":837838 % cd /bm/gwork/lih/bmrc/afm/p03839 % ls -lR data4 > gw-data4-p82-dk8.txt840841 2. Submit the "cut-data4" job <strong>to</strong> the NQSII batch system on "gale":842843 % /opt/local/gnqs/bin/qsub ./cut-data4844 Request 52987.gale submitted <strong>to</strong> queue: medium.845846 3. Obtain lists <strong>of</strong> the "data6" and "data9" files now on "gwork" after847 the "cut-<strong>to</strong>-oz" processes have now been run:848849 % cd /bm/gwork/lih/bmrc/afm/p03850 % ls -lR data6 > gw-data6-p81-dk8-oz.txt851 % ls -lR data9 > gw-data9-p81-dk8-oz.txt852853 4. Commence copying "data6" direc<strong>to</strong>ry files <strong>to</strong> "sam":854855 % cd /bm/gwork/lih/bmrc/afm/p03856 % dir2sam data6857 dir2sam: Current direc<strong>to</strong>ry: /bm/gwork/lih/bmrc/afm/p03858 dir2sam: Project direc<strong>to</strong>ry: bmrc/afm/p03859 dir2sam: Copying "data6" direc<strong>to</strong>ry <strong>to</strong>: /sammrgh/ext/lih/bmrc/afm/p03860 dir2sam: Total direc<strong>to</strong>ry size (1k blocks): 33016376 data6861 dir2sam: Commencing rcp at date/time: Tue Aug 9 07:35:54 AEST 2005862 dir2sam: Completing rcp at date/time: Tue Aug 9 09:44:46 AEST 2005863 dir2sam: Setting direc<strong>to</strong>ry/file permissions864865 5. Obtain list <strong>of</strong> "data6" files on "sam":866867 % rlogin sam868 % cd /sammrgh/ext/lih/bmrc/afm/p03869 % ls -lR data6 > s-data6dk8-p81-oz.list870 % exit871872 % cd /bm/gkeep/lih/bmrc/afm/p03873 % rcp sam:/sammrgh/ext/lih/bmrc/afm/p03/s-data6dk8-p81-oz.list .874875 6. Remove "data6" direc<strong>to</strong>ry on "gwork":876877 % cd /bm/gwork/lih/bmrc/afm/p03878 % rm -rf data6879880 7. Commence copying "data9" direc<strong>to</strong>ry files <strong>to</strong> "sam"12


881882 % cd /bm/gwork/lih/bmrc/afm/p03883 % dir2sam data9884 dir2sam: Current direc<strong>to</strong>ry: /bm/gwork/lih/bmrc/afm/p03885 dir2sam: Project direc<strong>to</strong>ry: bmrc/afm/p03886 dir2sam: Copying "data9" direc<strong>to</strong>ry <strong>to</strong>: /sammrgh/ext/lih/bmrc/afm/p03887 dir2sam: Total direc<strong>to</strong>ry size (1k blocks): 17535504 data9888 dir2sam: Commencing rcp at date/time: Tue Aug 9 11:04:34 AEST 2005889 dir2sam: Completing rcp at date/time: Tue Aug 9 12:11:51 AEST 2005890 dir2sam: Setting direc<strong>to</strong>ry/file permissions891892 8. Obtain list <strong>of</strong> "data4" files on "gwork" after "cut-<strong>to</strong>-oz":893894 % cd /bm/gwork/lih/bmrc/afm/p03895 % ls -lR data4 > gw-data4-p82-dk8-oz.txt896897 9. Commence file transfer <strong>of</strong> "data6-p82.txt" files <strong>to</strong> "gwork":898899 % nxt-run 6 82900 nxt-run: 20050809,110857: Running part file: data6-p82.txt901 ftp log file is: ftp-20050809-110858.log902903 A few minutes later:904905 % gwstats906 ...907 nxt-run: 20050809,110857: Running part file: data6-p82.txt908909 Tue Aug 9 11:21:56 AEST 2005910911 S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:912 57958816 /bm/gwork/lih/bmrc/afm/p03913914 Number <strong>of</strong> files in this part: 225915 Number <strong>of</strong> files already have: 0916 Number <strong>of</strong> files transferred.: 10917918 Information <strong>from</strong> LastFtpLog.: ftp-20050809-110858.log919920 10. Obtain list <strong>of</strong> "data9" files on "sam"921922 % rlogin sam923 % cd /sammrgh/ext/lih/bmrc/afm/p03924 % ls -lR data9 > s-data9dk8-p81-oz.list925 % exit926927 % cd /bm/gkeep/lih/bmrc/afm/p03928 % rcp sam:/sammrgh/ext/lih/bmrc/afm/p03/s-data9dk8-p81-oz.list .929930 11. Remove "data9" direc<strong>to</strong>ry on "gwork"931932 % cd /bm/gwork/lih/bmrc/afm/p03933 % rm -rf data9934935 NOTE: Just realised that "data9" should be under "data1" ... so,936 extract a list <strong>of</strong> the "data9" files, and move them <strong>to</strong> "data1":937938 % rlogin sam939 % cd /sammrgh/ext/lih/bmrc/afm/p03940 % cd data9941 % find . -type f -name "*.nc" -print > zzz942 % cd ..943 % cat yyy944945 #!/bin/sh946 cat zzz | while read filePath947 do948 dirPath=‘dirname ${filePath}‘949 mkdir -p data1/${dirPath}950 mv -i data9/${filePath} data1/${filePath}951 echo data1/${filePath}952 done953954 % sh yyy955 ... # -> all "data9" files moved <strong>to</strong> "data1"956957 % ls -lR data1 > s-data1dk8-p819-oz.list958 % exit959960961 12. <strong>Transfer</strong> "data4" direc<strong>to</strong>ry files <strong>to</strong> "sam"962963 % cd /bm/gwork/lih/bmrc/afm/p03964 % dir2sam data4965 dir2sam: Current direc<strong>to</strong>ry: /bm/gwork/lih/bmrc/afm/p03966 dir2sam: Project direc<strong>to</strong>ry: bmrc/afm/p03967 dir2sam: Copying "data4" direc<strong>to</strong>ry <strong>to</strong>: /sammrgh/ext/lih/bmrc/afm/p03968 dir2sam: Total direc<strong>to</strong>ry size (1k blocks): 32822256 data4969 dir2sam: Commencing rcp at date/time: Tue Aug 9 12:42:27 AEST 2005970 dir2sam: Completing rcp at date/time: Tue Aug 9 15:16:16 AEST 2005971 dir2sam: Setting direc<strong>to</strong>ry/file permissions972973 13. Obtain list <strong>of</strong> "data4" files on "sam"974975 % rlogin sam976 % cd /sammrgh/ext/lih/bmrc/afm/p03977 % ls -lR data4 > s-data4dk8-p82-oz.list978 % exit979980 14. Remove "data4" direc<strong>to</strong>ry <strong>from</strong> "gwork"981982 % cd /bm/gwork/lih/bmrc/afm/p03983 % rm -rf data4984985 15. Obtain list <strong>of</strong> "data6" files now on "gwork"986987 % cd /bm/gwork/lih/bmrc/afm/p03988 % ls -lR data6 > gw-data6-p82-dk8.txt989990 16. Set up shell wrapper <strong>to</strong> run "cut-<strong>to</strong>-oz" on "data6-p82.txt"991992 % /opt/local/gnqs/bin/qsub ./cut-data6993 Request 53241.gale submitted <strong>to</strong> queue: medium.994995996 20050810,0700997 --------998 1. Obtain list <strong>of</strong> "data6" files on "gwork" after "cut-<strong>to</strong>-oz":9991000 % cd /bm/gwork/lih/bmrc/afm/p031001 % ls -lR data6 > gw-data6-p82-dk8-oz.txt10021003 2. Copy "data6" direc<strong>to</strong>ry files <strong>to</strong> "sam":10041005 % cd /bm/gwork/lih/bmrc/afm/p031006 % dir2sam data61007 dir2sam: Current direc<strong>to</strong>ry: /bm/gwork/lih/bmrc/afm/p031008 dir2sam: Project direc<strong>to</strong>ry: bmrc/afm/p031009 dir2sam: Copying "data6" direc<strong>to</strong>ry <strong>to</strong>: /sammrgh/ext/lih/bmrc/afm/p031010 dir2sam: Total direc<strong>to</strong>ry size (1k blocks): 33467112 data61011 dir2sam: Commencing rcp at date/time: Wed Aug 10 06:51:50 AEST 20051012 dir2sam: Completing rcp at date/time: Wed Aug 10 08:16:19 AEST 20051013 dir2sam: Setting direc<strong>to</strong>ry/file permissions10141015 3. Obtain "data6" files list on "sam":10161017 % rlogin sam1018 % cd /sammrgh/ext/lih/bmrc/afm/p031019 % ls -lR data6 > s-data6dk8-p82-oz.list10201021 Determine the amount <strong>of</strong> space used on "sam" for PCMDI data:10221023 % /opt/LSCsamfs/bin/sdu -sk data11024 320,971,742 data11025 % /opt/LSCsamfs/bin/sdu -sk data21026 355,305,771 data21027 % /opt/LSCsamfs/bin/sdu -sk data31028 271,242,762 data31029 % /opt/LSCsamfs/bin/sdu -sk data41030 381,471,131 data41031 % /opt/LSCsamfs/bin/sdu -sk data51032 390,083,986 data51033 % /opt/LSCsamfs/bin/sdu -sk data61034 108,482,991 data61035 % /opt/LSCsamfs/bin/sdu -sk data81036 96,766,420 data81037 % /opt/LSCsamfs/bin/sdu -sk .1038 1,924,324,810 .1039 % exit10401041 Total file space used on "sam" is approximately 1.9 TB10421043 4. Remove the "data6" direc<strong>to</strong>ry <strong>from</strong> "gwork"10441045 % cd /bm/gwork/lih/bmrc/afm/p031046 % rm -rf data610471048 5. The <strong>to</strong>tal file transfers <strong>from</strong> LaCie TB disk #8, on<strong>to</strong> "gwork", and1049 procesed through "cut-<strong>to</strong>-oz", and finally archived on<strong>to</strong> "sam" are:10501051 <strong>Data</strong> Set Files Total Size Un-Cut Size1052 -------- ----- ---------- -----------1053 data1 297 17 GB 73 GB1054 data2 131 20 GB 88 GB1055 data3 185 32 GB 141 GB1056 data4 345 62 GB 277 GB1057 data6 442 63 GB 280 GB1058 -------- ----- ---------- -----------1059 Totals: 1400 194 GB 859 GB106010611062 - LH, 20050810.1063BSource CodeThe following listings show the source code <strong>of</strong> the scripts used during the process <strong>of</strong> extracting data <strong>from</strong>the LaCie TB disks, and in transferring those files <strong>from</strong> ‘gale’ over <strong>to</strong> the ‘sam’ archive server for longterm s<strong>to</strong>rage. Other scripts mentioned in the body <strong>of</strong> this document, but which are not listed here willbe found listed in another document 7 which describes the initial phase <strong>of</strong> these data transfers.7 Lawson Hanson and Aurel Moise, 2005: ‘PCMDI <strong>Data</strong> <strong>Transfer</strong> via <strong>Terabyte</strong> <strong>Disk</strong> <strong>to</strong> <strong>BMRC</strong> ’13


B.1 Program: ‘strip-ls-lR’The ‘strip-ls-lR’ script extracts information <strong>from</strong> an ‘ls -lR’ list, producing a more compact listcontaining only direc<strong>to</strong>ries (which are lines ending in ‘run[0-9]:’), and netCDF files which are the ninthfield <strong>from</strong> lines ending in ‘.nc’. The resluting list resembles the output <strong>from</strong> a Unix ‘ls -R’ command.1 #!/bin/sh2 #3 # Program:4 # strip-ls-lR5 #6 # RCS-Strings:7 # $Source: /bm/gkeep/lih/src/sh/RCS/strip-ls-lR,v $8 # $Revision: 1.2 $9 #10 # Author:11 # Lawson Hanson, 20050121.12 #13 # Purpose:14 # Extracts only the required information <strong>from</strong> an "ls -lR" list,15 # producing a more compact list which contains only direc<strong>to</strong>ries16 # which are lines ending in "run[0-9]:", and netCDF files which17 # are the ninth field <strong>from</strong> lines ending in ".nc".18 #19 # Sample Input:20 # +--------21 # |./data1/pdcntrl/atm/mo/clt/mri_cgcm2_3_2a:22 # |<strong>to</strong>tal 423 # |drwxrwxr-x 2 8547 2200 4096 Dec 23 09:22 run124 # |25 # |./data1/pdcntrl/atm/mo/clt/mri_cgcm2_3_2a/run1:26 # |<strong>to</strong>tal 5772827 # |-rw-r--r-- 1 8547 2200 39357936 Dec 23 09:22 clt_A1.1801-1900.nc28 # |-rw-r--r-- 1 8547 2200 19682736 Dec 23 09:23 clt_A1.1901-1950.nc29 # |30 # |./data1/pdcntrl/atm/mo/clt/ncar_ccsm3_0:31 # |<strong>to</strong>tal 432 # |drwxrwxr-x 2 8547 2200 4096 Dec 23 09:23 run133 # |34 # |./data1/pdcntrl/atm/mo/clt/ncar_ccsm3_0/run1:35 # |<strong>to</strong>tal 47673636 # |-rw-r--r-- 1 8547 2200 487690508 Dec 23 09:27\ \$Debug,43 "M|Man" => \$Man,44 "h|help" => \$help45 ) or pod2usage(2);4647 pod2usage(1) if $help;48 pod2usage(-exitstatus => 0, -verbose => 2) if $Man;4950 # $MyRunPath = dirname($0);5152 $Prog = basename($0, ".pl");5354 $numNcFiles = 0;55 $<strong>to</strong>tNcFiles = 0;5657 if (scalar(@ARGV) != 1) {58 print "${Prog}: ERROR: Require one ’ls-R’ list <strong>from</strong> which <strong>to</strong> extract\n";59 exit 3;60 }6162 $file = $ARGV[0];6364 # Prefix is "Req-" instead <strong>of</strong> "New-" (if it exists):65 #66 ($reqFile = $file) =~ s/^New-/Req-/;6768 # Emit a small data file header:69 #70 print "#\n";71 print "# File:\n";72 print "# ${reqFile}\n";73 print "#\n";74 print "# Generation:\n";75 print "# Script: ${Prog}\n";76 print "# Source: ${file}\n";77 printf "# Time..: %s\n", scalar(localtime(time));78 print "#\n";79 print "# Purpose:\n";80 print "# List <strong>of</strong> files required <strong>to</strong> be transferred <strong>from</strong> TB disk <strong>to</strong> ’sam’.\n";81 print "#\n";82 print "\n";14


8384 print STDERR "\n";85 print STDERR "${Prog}: Info: Input file: $file\n";868788 # Using a hash <strong>of</strong> hashes <strong>of</strong> lists structure,89 # define lists <strong>of</strong> the required variables for90 # "atm", "ice", "land", and "ocn" data types91 # with "da", "mo", "yr", and "fixed" frequency:92 #93 %HoHoLvars = (94 atm => {95 da => [ qw(hus pr psl ta tas tasmax tasmin ua va) ],96 mo => [ qw(hfls pr psl tas tauu tauv ua uas va vas zg) ],97 yr => [ qw(cdd etr fd gsl hwdi r10 r5d r95t sdii tn90) ],98 },99 ice => {100 mo => [],101 },102 land => {103 fixed => [ qw(sftgif sftlf) ],104 mo => [],105 },106 ocn => {107 fixed => [],108 mo => [],109 },110 );111112113 # Using another hash <strong>of</strong> hashes <strong>of</strong> lists structure,114 # define lists <strong>of</strong> the required experiment-identifiers115 # for "atm", "ice", "land", and "ocn" data types116 # with "da", "mo", "yr", and "fixed" frequency:117 #118 %HoHoLexpts = (119 atm => {120 da => [ qw(20c3m commit pdcntrl picntrl121 sresa1b sresa2 sresb1) ],122 mo => [ qw(1pct<strong>to</strong>2x 1pct<strong>to</strong>4x 20c3m commit pdcntrl picntrl123 sresa1b sresa2 sresb1) ],124 yr => [ qw(1pct<strong>to</strong>2x 20c3m commit pdcntrl picntrl125 sresa1b sresa2 sresb1) ],126 },127 ice => {128 mo => [],129 },130 land => {131 fixed => [ qw(1pct<strong>to</strong>2x 1pct<strong>to</strong>4x 20c3m commit pdcntrl picntrl132 sresa1b sresa2 sresb1) ],133 mo => [],134 },135 ocn => {136 fixed => [],137 mo => [],138 },139 );140141 # Initialise some variables:142 #143 $reqd = 0;144 $data = ’’;145 $expt = ’’;146 $freq = ’’;147 $other = ’’;148 $run = ’’;149 $type = ’’;150 $var = ’’;151152 # Read the first "ls -R" list file <strong>to</strong> get direc<strong>to</strong>ries153 # and filenames for comparison:154 #155 open FILE, "< $file"156 or die "Opening: $file: $!\n";157158 while() {159 chomp;160161 # Remove leading and trailing blank space:162 #163 s/^\s+//;164 s/\s+$//;165166 # Remove any leading "/" or "./" characters:167 #168 s%^/%%;169 s%^\./%%;170171 # Ignore any comment lines:172 #173 s/^#.*//;174175 # Ignore all (now) blank entries:176 #177 next unless length;178179 $line = $_;180181 # An example <strong>of</strong> the type <strong>of</strong> entry we are interested in:182 #183 # data1/pdcntrl/atm/da/hus/ipsl_cm4/run1:184 # hus_A2_2390-2399.nc185 # hus_A2_2400-2409.nc186 #187 # The direc<strong>to</strong>ry components are:188 #189 # //////:190 #191192 # Check for required direc<strong>to</strong>ry entries:193 #194 if ($line =~ /:$/) {195 $reqd = 0;196197 # Remove the trailing colon (’:’) character:198 #199 ($dirPath = $line) =~ s/:$//;200201 ($data, $expt, $type, $freq, $var, $other, $run) =202 split(/\//, $dirPath);203204 # Some lists may have partial direc<strong>to</strong>ry paths,205 # so we just try <strong>to</strong> ignore those:206 #207 next unless defined $run;208209 if ($Debug > 0) {210 print STDERR "${Prog}: expt: $expt\n";211 print STDERR "${Prog}: type: $type\n";212 print STDERR "${Prog}: freq: $freq\n";213 print STDERR "${Prog}: var: $var\n";214 }215216 if (exists($HoHoLexpts{$type}{$freq})) {217 REQMNT:218 foreach $e (@{$HoHoLexpts{$type}{$freq}}) {219 if ($e eq $expt) {220 if (exists($HoHoLvars{$type}{$freq})) {221 foreach $v (@{$HoHoLvars{$type}{$freq}}) {222 if ($v eq $var) {223 $reqd = 1;224 last REQMNT;225 }226 }227 }228 }229 }230 }231232 if ($reqd > 0) {233 print $line . "\n";234 }235 }236237 # Emit the netCDF file entries:238 #239 # A typical set <strong>of</strong> entries:240 # +-----------------------241 # |hus_A2_2390-2399.nc242 # |hus_A2_2400-2409.nc243 # +-----------------------244 #245 if ($line =~ /\.nc$/) {246 $<strong>to</strong>tNcFiles++;247 next if $reqd < 1;248 print $line . "\n";249 $numNcFiles++;250 }251 }252 close FILE;253254 # Add a blank last line on stdout:255 #256 print "\n";257258 print STDERR "${Prog}: Total number <strong>of</strong> new netCDF files: $<strong>to</strong>tNcFiles\n";259 print STDERR "${Prog}: Number <strong>of</strong> netCDF files required.: $numNcFiles\n";260 print STDERR "\n";261262 exit 0;263264265 __END__266267 =head1 NAME268269 extract-req-list.pl - Extract required files <strong>from</strong> an ’ls -R’ list270271272 =head1 SYNOPSIS273274 extract-req-list.pl [options] ls-R-file-list275276 Options:277278 --Man (or ’-M’)279 --help (or ’-h’)280281282 =head1 OPTIONS283284 =over 4285286 =item --Man287288 Display the on-line manual page and exit.289290 =item --help291292 Display the help message and exit.293294 =back295296297 =head1 DESCRIPTION298299 Reads an ’ls -R’ file list <strong>to</strong> extract direc<strong>to</strong>ry paths and filenames300 <strong>of</strong> the required files for <strong>BMRC</strong>, i.e., ’atm’, and ’land’ for various301 variables and scenarios,302 and these (if any) are reported on the standard output.303304 The argument given <strong>to</strong> the program must correspond <strong>to</strong> the filename305 <strong>of</strong> an ’ls -R’ listing.306307308 =head1 AUTHOR309310 Lawson Hanson311312313 =head1 BUGS314315 None known.316317 =cut31815


B.3 Program: ‘list-size.pl’The ‘list-size.pl’ script reads a list <strong>of</strong> files (in either ‘ls -R’, or ‘ls -lR’ format), and calculates the<strong>to</strong>tal size <strong>of</strong> the list <strong>of</strong> files. The script has a special ‘-l’ option which enables it <strong>to</strong> output a file containingthe list <strong>of</strong> files with any symbolic links resolved. The format <strong>of</strong> the list is lines which contain both thedirec<strong>to</strong>ry and the netCDF file name, which makes it easy <strong>to</strong> compare two such lists which originate <strong>from</strong>different source lists. To use the list produced with the ‘-l’ option with the ‘afm-ftp.tcl’ script, thelist will need <strong>to</strong> be pre-processed by the ‘path-<strong>to</strong>-dir-list’ script.1 eval ’exec perl -w -S $0 ${1+"$@"}’2 if 0;34 # ! /usr/bin/env perl5 #6 # Program:7 # list-size.pl8 #9 # RCS-Strings:10 # $Source: /bm/gkeep/lih/src/perl/RCS/list-size.pl,v $11 # $Revision: 1.6 $12 #13 # Author:14 # Lawson Hanson, 20050113.15 #16 # Purpose:17 # Reads the "listalldirs.txt" file <strong>to</strong> get direc<strong>to</strong>ry path and filenames18 # and also file sizes for comparison. Next reads the specified (i.e.,19 # on the command-line) ’ls -R’ or ’ls -lR’ list file and compares its20 # file sizes with those in the original ("listalldirs.txt") list.21 # Reports <strong>to</strong>tals, and any differences or unknown file path names.22 #23 # Updates | By | Description24 # --------+----+------------25 # 20050125| LH | Added code <strong>to</strong> handle either ’ls -R’, or ’ls -lR’ lists.26 # 20050303| LH | Added code <strong>to</strong> ignore "*_OZ.nc" files in the lists.27 # 20050506| LH | Added code <strong>to</strong> ignore ".nc.bad" files in the lists.28 # 20050509| LH | Added code <strong>to</strong> resolve the PCMDI (double) symbolic links.29 # 20050510| LH | Added the ’-l’ option <strong>to</strong> emit the symLink resolved file list.30 # 20050913| LH | Added the ’-s size’ option <strong>to</strong> control the maximum list size.31 #3233 use File::Basename;34 use Ge<strong>to</strong>pt::Long;35 use Pod::Usage;3637 Ge<strong>to</strong>pt::Long::Configure("bundling", "pass_through", "no_ignore_case");3839 my $Debug = 0;40 my $Man = 0;41 my $help = 0;42 my $listOut = 0;43 $maxSize = 9999;44 my $verbose = 0;4546 GetOptions(47 "D|Debug" => \$Debug,48 "M|Man" => \$Man,49 "h|help" => \$help,50 "l|list" => \$listOut,51 "s|size=i" => \$maxSize,52 "v|verbose" => \$verbose53 ) or pod2usage(2);5455 pod2usage(1) if $help;56 pod2usage(-exitstatus => 0, -verbose => 2) if $Man;5758 # $MyRunPath = dirname($0);5960 $Prog = basename($0, ".pl");6162 $oneMegaByte = 1024 * 1024;63 $oneGigaByte = 1024 * 1024 * 1024;6465 $maxSize = $maxSize * $oneGigaByte;6667 $numAllNcFiles = 0;68 $numBadNcFiles = 0;69 $numNcFiles = 0;70 $<strong>to</strong>tSize = 0;7172 @realPaths = ();73 @linkInserts = ();7475 print "\n";7677 # Read the "listalldirs.txt" file <strong>to</strong> get direc<strong>to</strong>ry78 # and filenames and file sizes for comparison:79 #80 $allFiles = "listalldirs.txt";8182 if ($verbose > 0) {83 print "${Prog}: Reference list <strong>of</strong> netCDF files: $allFiles\n";84 }8586 open ALLFILES, "< $allFiles"87 or die "Opening: $allFiles: $!\n";8889 while() {90 chomp;9192 # Remove comments (’# ...’), any leading "./" prefix,93 # leading and trailing blank space, and "^<strong>to</strong>tal ..." lines:94 #95 s/#.*//;96 s%^\./%%;97 s/^\s+//;98 s/\s+$//;99 s/^<strong>to</strong>tal.*//;100101 # Ignore all (now) blank entries:102 #103 next unless length;104105 $line = $_;106107 # S<strong>to</strong>re direc<strong>to</strong>ry entries:108 #109 if ($line =~ /:$/) {110 # Remove the trailing colon (’:’) character:111 ($dir = $line) =~ s/:$//;112 }113114 # Extract symbolic links:115 #116 # 1pct<strong>to</strong>2x:117 # <strong>to</strong>tal 0118 # lrwxrwxrwx 1 root 10 21 Oct 31 2004 atm -> ../data8/1pct<strong>to</strong>2x/atm119 # lrwxrwxrwx 1 root 10 21 Oct 31 2004 ice -> ../data8/1pct<strong>to</strong>2x/ice120 # lrwxrwxrwx 1 root 10 22 Oct 31 2004 land -> ../data8/1pct<strong>to</strong>2x/land121 # lrwxrwxrwx 1 root 10 22 Oct 31 2004 ocn -> ../data15/1pct<strong>to</strong>2x/ocn122 # 0 1 2 3 4 5 6 7 8 9 10 (@items)123 #124 # Another example:125 # ---------------126 # data1/picntrl/atm:127 # lrwxrwxrwx 1 root 201 29 Mar 4 09:35 da -> \128 # ../../../data9/picntrl/atm/da129 # And:130 # ---131 # data4/sresa1b/atm:132 # lrwxrwxrwx 1 root 201 31 Mar 16 10:47 mo -> \133 # ../../../data16/sresa1b/atm/mo/134 #135 # Need <strong>to</strong> remove the parent direc<strong>to</strong>ry references (i.e., "../../../"):136 #137 if ($line =~ /^lrwxrwxrwx/) {138 @items = split / +/, $line;139 $realPath = $dir . ’/’ . $items[8];140 push @realPaths, $realPath;141 ($linkPath = $items[10]) =~ s%(\.\./)+%%;142 @parts = split /\//, $linkPath;143 push @linkInserts, $parts[0];144145 if ($verbose > 0) {146 print STDERR "pathLink: $realPath -> $parts[0]\n";147 }148 }149 # Example Result:150 # --------------151 # $realPaths[2] = "1pct<strong>to</strong>2x/land";152 # $linkInserts[2] = "data8";153154 # S<strong>to</strong>re netCDF file entries:155 #156 # A typical set <strong>of</strong> entries:157 # +-----------------------158 # |data1/pdcntrl/atm/3h/hfls/mri_cgcm2_3_2a/run1:159 # |<strong>to</strong>tal 186944160 # |-rw-r--r-- 1 8441 201 95713360 Nov 29 14:49 hfls_A3.1950.nc161 # +-----------------------162 # |data8/1pct<strong>to</strong>2x/land/mo/snd/miroc3_2_medres/run3:163 # |<strong>to</strong>tal 107632164 # |-rw-r----- 1 8441 201 27552904 Nov 22 2004 snd_A1.nc.bad165 # +-----------------------166 # 0 1 2 3 4 5 6 7 8 (@items)167 #168 if ($line =~ /\.nc$/) {169 @items = split / +/, $line;170 if (scalar(@items) != 9) {171 print STDERR "${Prog}: ERROR: wrong number <strong>of</strong> data fields\n";172 print STDERR " $line\n";173 print "\n";174 }175 else {176 $numAllNcFiles++;177 $file = $items[8];178 $size = $items[4];179 $path = $dir . ’/’ . $file;180 $hash{$path} = $size;181 }182 }183 elsif ($line =~ /\.nc.bad$/) {184 @items = split / +/, $line;185 if (scalar(@items) != 9) {186 print STDERR "${Prog}: ERROR: wrong number <strong>of</strong> data fields\n";187 print STDERR " $line\n";188 print "\n";189 }190 else {191 $numBadNcFiles++;192 $file = $items[8];193 $size = $items[4];194 $path = $dir . ’/’ . $file;195 $hash{$path} = $size;196 }197 }198 }199 close ALLFILES;200201 if ($verbose > 0) {202 print "${Prog}: Reference list netCDF file count: $numAllNcFiles\n";16


203 print "${Prog}: Reference list netCDF bad files.: $numBadNcFiles\n";204 }205206 $origs = 0;207 $lists = 0;208 $guess = 0;209 $ozFiles = 0;210 $badFiles = 0;211 $links = 0;212 $badLinks = 0;213214 $prefix = ’’;215216 $listOutput = "file-list.txt";217218 if ($listOut > 0) {219 open LISTOUT, "> $listOutput"220 or die "Opening: $listOutput: $!\n";221 }222223 $numRealPaths = scalar(@realPaths);224225 # Read the specified (on the command-line) file226 # and compare file sizes with the original data:227 #228 while() {229 chomp;230231 # Remove comments (’# ...’), any leading "./" prefix,232 # leading and trailing blank space, and "^<strong>to</strong>tal ..." lines:233 #234 s/#.*//;235 s%^\./%%;236 s/^\s+//;237 s/\s+$//;238 s/^<strong>to</strong>tal.*//;239240 # Remove the leading "/export/project/bmrc/" text241 # so that hash keys are identical:242 #243 s/^.export.project.bmrc.//;244 s%^/%%;245246 # Ignore all (now) blank entries:247 #248 next unless length;249250 $line = $_;251252 # S<strong>to</strong>re direc<strong>to</strong>ry entries:253 #254 if ($line =~ /:$/) {255 # Remove the trailing colon (’:’) character:256 ($dir = $line) =~ s/:$//;257 }258259 # Handle netCDF file entries:260 #261 # A typical set <strong>of</strong> entries:262 # +-----------------------263 # |/export/project/bmrc/data1/pdcntrl/atm/mo/clt/mri_cgcm2_3_2a/run1:264 # +-----------------------265 # But by now, that is:266 # +-----------------------267 # |data1/pdcntrl/atm/mo/clt/mri_cgcm2_3_2a/run1:268 # |-rw-r--r-- 1 afm cgdccr 39357936 Dec 22 15:22 clt_A1.1801-1900.nc269 # |-rw-r--r-- 1 afm cgdccr 19682736 Dec 22 15:23 clt_A1.1901-1950.nc270 # +-----------------------271 #272 if ($line =~ /\.nc$/) {273 @items = split / +/, $line;274 $numItems = scalar(@items);275 $numNcFiles++;276277 # If its an ’ls -lR’ list, and includes the file size information,278 # then use it:279 #280 if ($numItems == 9) {281 $file = $items[8];282 $size = $items[4];283 $lists++;284 }285 # Otherwise try <strong>to</strong> extract it <strong>from</strong> the reference list:286 #287 elsif ($numItems == 1) {288 $file = $line;289 $path = $dir . ’/’ . $file;290 $badPath = $path . ’.bad’;291292 if (exists($hash{$path})) {293 $size = $hash{$path};294 $origs++;295296 if ($listOut > 0) {297 print LISTOUT $prefix . $path . "\n";298 }299 }300 elsif (exists($hash{$badPath})) {301 if ($listOut > 0) {302 print LISTOUT $prefix . $path . "\n";303 }304 $badFiles++;305 }306 elsif ($line =~ /_OZ\.nc$/) {307 #308 # Ignore the "*_OZ.nc" files, which were probably309 # generated by running the "cut-<strong>to</strong>-oz" utility:310 #311 $ozFiles++;312 }313 else {314 # The required file could be s<strong>to</strong>red in a re-linked location,315 # or it could even be re-linked AND renamed with ".nc.bad":316 #317 $symPath = "";318319 SYMLOOP:320 for ($i = 0; $i < $numRealPaths; $i++) {321 $link = $realPaths[$i];322 # Do not handle "data" sym-links until last:323 next SYMLOOP if ($link =~ /^data/);324 $linkDir = $linkInserts[$i];325 #326 # e.g.: $link = "1pct<strong>to</strong>2x/land";327 # and: $linkDir = "data8";328329 # Resolve symbolic links if there is any match:330 #331 if ($path =~ m/${link}/) {332 ($symPath = $path)333 =~ s%([^/]*/)?${link}%${linkDir}/${link}%;334335 # Loop around through the symbolic links again,336 # because some <strong>of</strong> them appear <strong>to</strong> be doubly linked:337 #338 DATASYMS:339 for ($j = 0; $j < $numRealPaths; $j++) {340 $link = $realPaths[$j];341 # Handle only "data" sym-links now:342 next DATASYMS if ($link !~ /^data/);343 $linkDir = $linkInserts[$j];344 #345 # e.g.: $link = "1pct<strong>to</strong>2x/land";346 # and: $linkDir = "data8";347348 # Resolve symbolic links if there is any match:349 #350 if ($symPath =~ m/${link}/) {351 $symPathTwo = $symPath;352 ($symPath = $symPathTwo)353 =~ s%data[0-9]+%${linkDir}%;354 }355 }356 }357 }358 $badSymPath = $symPath . ’.bad’;359360 if (exists($hash{$symPath})) {361 $size = $hash{$symPath};362 $links++;363364 if ($listOut > 0) {365 print LISTOUT $prefix . $symPath . "\n";366 }367 }368 elsif (exists($hash{$badSymPath})) {369 if ($listOut > 0) {370 print LISTOUT $prefix . $symPath . "\n";371 }372 $badLinks++;373 }374 else {375 print "${Prog}: ERROR: unknown file ?\n";376 print " Path: $path\n";377 print " symPath: $symPath\n";378 print "\n";379 $size = 100* 1024 * 1024;380 $guess++;381382 if ($listOut > 0) {383 if (length($symPath) > 0) {384 print LISTOUT $prefix . $symPath . "\n";385 }386 else {387 print LISTOUT $prefix . $path . "\n";388 }389 }390 }391 }392 }393 else {394 print STDERR "${Prog}: ERROR: unknown data field format\n";395 print STDERR " $line\n";396 print "numItems: $numItems\n";397 print "\n";398 }399 $<strong>to</strong>tSize = $<strong>to</strong>tSize + $size;400401 if ($<strong>to</strong>tSize > $maxSize) {402 $prefix = ’# ’;403 }404 }405 }406407 if ($<strong>to</strong>tSize > $oneGigaByte) {408 $myUnit = ’GB’;409 $myQty = int(0.5 + ($<strong>to</strong>tSize * 1.0 / $oneGigaByte));410 }411 else {412 $myUnit = ’MB’;413 $myQty = int(0.5 + ($<strong>to</strong>tSize * 1.0 / $oneMegaByte));414 }415416 print "${Prog}: List number <strong>of</strong> netCDF files: $numNcFiles\n";417 print "${Prog}: List <strong>to</strong>tal netCDF file size:";418 printf " %12s => (%3d %s)\n", $<strong>to</strong>tSize, $myQty, $myUnit;419 print "${Prog}: File sizes were obtained by: ";420 $some = 0;421422 if ($origs > 0) {423 print "Reference: $origs";424 $some++;425 }426 if ($links > 0) {427 print "SymLinks: $links";428 $some++;429 }430 if ($lists > 0) {431 if ($some > 0) {432 print ", ";433 }434 print "List: $lists";435 $some++;436 }437 if ($guess > 0) {438 if ($some > 0) {439 print ", ";440 }17


441 print "Guess: $guess";442 }443 print "\n";444445 if ($links > 0) {446 print "${Prog}: Symbolic Links: file count: $links\n";447 }448449 if ($ozFiles > 0) {450 print "${Prog}: Ignore ’*_OZ.nc’ file count: $ozFiles\n";451 }452453 if ($badFiles > 0) {454 print "${Prog}: Ignore ’.nc.bad’ file count: $badFiles\n";455 }456457 if ($badLinks > 0) {458 print "${Prog}: Ignore ’.nc.bad’ link count: $badLinks\n";459 }460 print "\n";461462 if ($listOut > 0) {463 close LISTOUT;464 print "${Prog}: The file ’" . $listOutput . "’ now contains a list\n";465 print " <strong>of</strong> the required files with resolved symbolic links.\n";466 print " To use this list with ’afm-ftp.tcl’ it will need <strong>to</strong> be\n";467 print " pre-processed by the ’path-<strong>to</strong>-dir-list’ script.\n";468 print "\n";469 }470471 exit 0;472473474 __END__475476 =head1 NAME477478 list-size.pl - Perl script <strong>to</strong> <strong>to</strong>tal up data file transfer sizes.479480481 =head1 SYNOPSIS482483 list-size.pl [options] ls-[l]R-file-list484485 Options:486487 --Man (or ’-M’)488 --help (or ’-h’)489 --list (or ’-l’)490 --size (or ’-s’)491 --verbose (or ’-v’)492493494 =head1 OPTIONS495496 =over 4497498 =item --Man499500 Display the on-line manual page and exit.501502 =item --help503504 Display the help message and exit.505506 =item --list507508 Produces a file called "file-list.txt" containing the paths509 <strong>to</strong> each <strong>of</strong> the files in the input with resolved symbolic links.510 To use this list with ’afm-ftp.tcl’ it will need <strong>to</strong> be511 pre-processed by the ’path-<strong>to</strong>-dir-list’ script.512513 =item --size514515 Controls the maximum size (<strong>of</strong> the collection <strong>of</strong> NetCDF files516 which make up the ’active’ entries in the "file-list.txt" file.517518 =item --verbose519520 Displays the filename <strong>of</strong> the reference netCDF file list,521 and the number <strong>of</strong> netCDF files in that list.522523 =back524525526 =head1 DESCRIPTION527528 Reads the "listalldirs.txt" file <strong>to</strong> get direc<strong>to</strong>ry path and filenames529 and also file sizes. Next reads the specified (i.e., on the command-line)530 file and <strong>to</strong>tals up the file sizes either <strong>from</strong> the original ("listalldirs")531 entries, or <strong>from</strong> the actual "ls -lR" entries if possible.532 Reports the <strong>to</strong>tal size <strong>of</strong> all <strong>of</strong> the named files.533534 The argument given <strong>to</strong> the program must correspond <strong>to</strong> the filename535 <strong>of</strong> an "ls -lR" (or "ls -R") listing <strong>of</strong> the target <strong>to</strong>p level direc<strong>to</strong>ry,536 such as output <strong>from</strong> the command:537538 ls -lR /export/project/bmrc539540 In some cases the "listalldirs.txt" file contains symbolic link541 entries which indicate that some file sets have been relocated542 on the PCMDI server. This script now attempts <strong>to</strong> resolve these543 symbolic links and reports <strong>of</strong> files which remain "unknown", etc.544545546 =head1 FILES547548 listalldirs.txt549 List <strong>of</strong> all direc<strong>to</strong>ry and netCDF file names for data transfer550 <strong>from</strong> the source data filesystem.551552 file-list.txt553 Output file produced when the user specifies the ’-l’ option.554555556 =head1 AUTHOR557558 Lawson Hanson559560561 =head1 BUGS562563 None known.564565 =cut566B.4 Program: ‘path-<strong>to</strong>-dir-list’The ‘path-<strong>to</strong>-dir-list’ script reads a specified input-file (i.e., output <strong>from</strong> ‘list-size.pl’ when runwith its ‘-l’ option) and converts the netCDF file paths it finds there in<strong>to</strong> separate ‘direc<strong>to</strong>ry:’ and‘filename.nc’ lines, as required by ‘afm-ftp.tcl’, and generates separate files named ‘?-dataN .txt’for each data set where ‘?’ is the file prefix character (which is ‘r’ by default):1 #!/bin/sh2 #3 # Program:4 # path-<strong>to</strong>-dir-list5 #6 # RCS-Strings:7 # $Source: /bm/gkeep/lih/src/sh/RCS/path-<strong>to</strong>-dir-list,v $8 # $Revision: 1.6 $9 #10 # Author:11 # Lawson Hanson, 20050510.12 #13 # Purpose:14 # Reads a specified input-file (i.e., output <strong>from</strong> "list-size.pl -l")15 # and converts the netCDF file paths it finds there in<strong>to</strong> separate16 # "direc<strong>to</strong>ry:" and "filename.nc" lines, as required by "afm-ftp.tcl",17 # and generates separate files named "?-data{N}.txt" for each data set18 # where ’?’ is the file prefix character (’r’ by default).19 #20 # Updates | By | Description21 # --------+----+------------22 # 20050510| LH | Added code <strong>to</strong> emit the information <strong>to</strong> separate "r-data" files.23 # 20050511| LH | Added code <strong>to</strong> ignore comment lines (’#’) and blank data lines.24 # 20050511| LH | Added code <strong>to</strong> enable variable file prefix character.25 #26 Prog=‘basename $0 .sh‘2728 fnUsage ( )29 {30 echo31 echo "Usage: ${Prog} [-h] [-p c] [-w output-file ] input-file"32 echo "Where: -h = Display this help/usage message and exit"33 echo " -p c = Specifies the file prefix character (default=’r’)"34 echo " -w file = Specifies an output filename for the whole list"35 echo36 echo "Reads the named input-file (i.e., output <strong>from</strong> ’list-size.pl -l’)"37 echo "and converts the netCDF file paths it finds there in<strong>to</strong> separate"38 echo "’direc<strong>to</strong>ry:’ and ’filename.nc’ lines, as required by ’afm-ftp.tcl’"39 echo "and generates separate files named "?-data{N}.txt" for each data set"40 echo "where ’?’ is the file prefix character (’r’ by default)."41 echo42 }4344 pfx="r"45 wFile=""46 wOut=047 noArgs=14849 while [ \( $# -gt 0 \) -a \( $noArgs -gt 0 \) ]50 do51 case $1 in52 -h)53 fnUsage54 exit 055 ;;56 -p)57 pfx="$2"58 shift 259 ;;60 -w)61 wFile="$2"62 shift 263 ;;64 *)65 noArgs=066 ;;67 esac68 done6970 if [ $# -gt 0 ]71 then72 inFile="$1"73 else74 inFile="file-list.txt"18


75 fi7677 if [ ! -f "${inFile}" ]78 then79 echo80 echo "${Prog}: Error: ${inFile}: File not found"81 echo82 exit 283 fi8485 echo "${Prog}: Input file is: ${inFile}"8687 if [ -n "${wFile}" ]88 then89 wOut=190 > $wFile91 fi9293 lastDir=""94 data=""95 dataFile="${pfx}-data0.txt"9697 cat $inFile | while read line98 do99 comment=‘expr "$line" : ’[ ]*#’‘100101 if [ $comment -gt 0 ]102 then103 continue104 fi105 length=‘expr "$line" : ’.*’‘106107 if [ $length -lt 5 ]108 then109 continue110 fi111 dir=‘dirname $line‘112 file=‘basename $line‘113114 if [ "${dir}" != "${lastDir}" ]115 then116 if [ $wOut -gt 0 ]117 then118 echo "${dir}:" >> $wFile119 fi120 data=‘echo ${dir} | awk ’{121 split($0,ary,"/")122 print ary[1]123 }’ -‘124 dataFile="${pfx}-${data}.txt"125126 if [ ! -f "${dataFile}" ]127 then128 echo "${Prog}: Info: Creating: ${dataFile}" 1>&2129 > $dataFile130 fi131 echo "${dir}:" >> $dataFile132 fi133 if [ $wOut -gt 0 ]134 then135 echo "${file}" >> $wFile136 fi137 echo "${file}" >> $dataFile138 lastDir="${dir}"139 done140141 exit 0142B.5 Program: ‘df-part.sh’The ‘df-part.sh’ script partitions a direc<strong>to</strong>ry/data file list in<strong>to</strong> more manageable chunks. Finite resourcessuch as the amount <strong>of</strong> space available for intermediate file s<strong>to</strong>rage may require huge lists <strong>to</strong> behandled in smaller segments.1 #!/bin/sh2 #3 # Program:4 # df-part.sh5 #6 # RCS-Strings:7 # $Source: /bm/gkeep/lih/src/sh/RCS/df-part.sh,v $8 # $Revision: 1.2 $9 #10 # Author:11 # Lawson Hanson, 20041223.12 #13 # Purpose:14 # Partition a data direc<strong>to</strong>ry/file list in<strong>to</strong> manageable chunks.15 # It seems that the FTP process can drop out sometimes,16 # so this is an attempt <strong>to</strong> provide data sets with which17 # <strong>to</strong> run the "afm-ftp.tcl" script in smaller sections.18 #19 # Updates | By | Description20 # --------+----+------------21 # 20050208| LH | Added code <strong>to</strong> handle files with no blank lines.22 # 20050208| LH | Added "-s part" option <strong>to</strong> enable specified part files.23 #24 Prog=‘basename $0 .sh‘2526 usage ( )27 {28 echo29 echo "Usage: ${Prog} [-h] [-s part] data-file"30 echo "Where: -h = Display this help/usage message and exit."31 echo " : -s part = Start with the specified part number."32 echo33 echo " e.g.: ${Prog} data1.txt"34 echo35 echo " Note: This will partition ’data1.txt’ in<strong>to</strong> smaller files"36 echo " named ’data1-p1.txt’, ’data1-p2.txt’, etc."37 echo38 }3940 part=14142 case $1 in43 -h)44 usage45 exit 046 ;;47 -s)48 part=$249 shift 250 ;;51 *)52 ;;53 esac5455 if [ $# -ne 1 ]56 then57 usage58 exit 159 fi6061 dataFile="$1"6263 if [ ! -f ${dataFile} ]64 then65 echo66 echo "${Prog}: ERROR: ${dataFile}: file does not exist."67 echo68 exit69 fi7071 pfx=‘basename $dataFile .txt‘7273 lino=07475 cat $dataFile | while read line76 do77 if [ $lino -gt 100 ]78 then79 gotDir=‘echo $line | grep ’:$’‘80 sts=$?8182 if [ $sts -eq 0 ]83 then84 echo "${Prog}: Lines: $lino"85 part=‘expr $part + 1‘86 lino=087 fi88 fi8990 partFile=${pfx}-p${part}.txt9192 if [ ! -f ${partFile} ]93 then94 echo "${Prog}: New part file: ${partFile}"95 > $partFile96 fi9798 echo $line >> $partFile99100 lino=‘expr $lino + 1‘101102 if [ $lino -gt 100 ]103 then104 if [ "x${line}" = "x" ]105 then106 # Found blank line, so start next part file:107 #108 echo "${Prog}: Lines: $lino"109 part=‘expr $part + 1‘110 lino=0111 fi112 fi113 done114115 exit 011619


B.6 Program: ‘afm-ftp.tcl’The ‘afm-ftp.tcl’ script, although similar <strong>to</strong> that which has already been described before, is includedhere because it contains some changes which were necessary <strong>to</strong> enable it <strong>to</strong> run with the different version<strong>of</strong> Tcl/Tk that we have at <strong>BMRC</strong> (compared <strong>to</strong> what is in use at NCAR). This script reads a specially(‘ls -R’) formatted list <strong>of</strong> direc<strong>to</strong>ry paths and file names, and then connects via FTP <strong>to</strong> the Linux PCwhich (in this case) has the TB disk connected <strong>to</strong> it, and executes an FTP ‘get’ operation <strong>to</strong> copy eachfile over <strong>to</strong> the current host:1 # # #!/bin/sh2 #3 # Program:4 # afm-ftp.tcl5 #6 # Author:7 # Lawson Hanson, 20041220.8 #9 # Purpose:10 # Helps Dr.Aurel Moise <strong>to</strong> download and save data files using ftp11 # <strong>to</strong> connect <strong>from</strong> NCAR <strong>to</strong> PCMDI and get files back <strong>to</strong> NCAR.12 #13 # Input:14 # A file which was constructed <strong>from</strong> an edited "ls -R" output.15 # Basically, the file needs <strong>to</strong> supply a remote direc<strong>to</strong>ry path16 # and a list <strong>of</strong> files required <strong>to</strong> be downloaded <strong>from</strong> that path.17 #18 # Example Input:19 # --+----------20 # |data1/some/where/blah:21 # |file1.nc22 # |file2.nc23 # |file3.nc24 # |data1/some/other/blah:25 # |file1.nc26 # |file2.nc27 # +----------28 #29 # Action:30 # Using the supplied data path, the program should:31 #32 # 1. Construct the asociated direc<strong>to</strong>ries on the local machine,33 # 2. Open an FTP session <strong>to</strong> the remote host machine34 # 3. Get the specified list <strong>of</strong> files <strong>from</strong> the remote host (PCMDI)35 # back <strong>to</strong> the local host (NCAR) and s<strong>to</strong>re them in the named36 # location (as per the supplied data path)37 # 4. Close the FTP connection.38 #39 # Assumptions:40 # 1. The input text file will reside in the current direc<strong>to</strong>ry41 # 2. The retrieved files will be located below a "tmp" direc<strong>to</strong>ry42 # in the current direc<strong>to</strong>ry <strong>of</strong> the local host machine (NCAR)43 #44 # Updates | By | Description45 # --------+----+------------46 # 20050111| LH | Modified "pGet<strong>Data</strong>Files" <strong>to</strong> open/close FTP for each file,47 # | | because the FTP connection kept having time out errors.48 # 20050120| LH | Modified <strong>to</strong> read configuration data <strong>from</strong> a file.49 # 20050124| LH | Added the $openPortal variable <strong>to</strong> track ftp::Open50 # 20050125| LH | Modified <strong>to</strong> respond <strong>to</strong> request <strong>to</strong> cease operation.51 #52 # NOTE:53 # Do NOT remove the line with the ’#’ followed by ’\’,54 # nor separate it <strong>from</strong> the following ’exec wish ...’ line.55 # They are interpreted by the Tcl/Tk interpreter, "wish",56 # as a comment which continues on the next line, whereas57 # the shell simply executes the ’exec wish "$0" "$@" line.58 #\59 # exec tclsh "$0" "$@"6061 # The name <strong>of</strong> a Tcl/Tk script is held in the special variable "argv0"62 #63 set Prog [exec basename $argv0 .tcl]6465 # package require Tcl 8.566 # package require ftp676869 # Our version <strong>of</strong> Tcl/Tk is older than the required one,70 # hence, I downloaded the older "FTP" library code,71 # and have used "source" instead ... and needed <strong>to</strong>72 # change lowercase "ftp::..." <strong>to</strong> uppercase "FTP::...",73 # and remove most references <strong>to</strong> "$conn" in my code:74 #75 source /bm/gkeep/lih/src/tcl/ftp_lib1.2/ftp_lib.tcl767778 # Read in the configuration file:79 #80 set ConfigFile $env(HOME)/${Prog}.config8182 if [file exists $ConfigFile] {83 source $env(HOME)/${Prog}.config84 } else {85 puts stderr "${Prog}: ERROR: Configuration file not found"86 exit 487 }888990 # Set direc<strong>to</strong>ry path and make if necessary:91 #92 set MyTmpDir [file nativename [file join $LocTopDir "tmp"]]9394 if {[file exists $MyTmpDir] == 0} {95 file mkdir $MyTmpDir96 }979899 # p R e a d D i r L i s t100 #101 # Purpose:102 # Read a list <strong>of</strong> direc<strong>to</strong>ry paths and associated files103 # <strong>from</strong> a plain text file with entries such as:104 #105 # Example Input:106 # --+----------107 # |data1/some/where/blah:108 # |file1.nc109 # |file2.nc110 # |file3.nc111 # |112 # |data1/some/other/blah:113 # |file1.nc114 # |file2.nc115 # +----------116 #117 # Action:118 # Process each line:119 # 1. Ignore ("#") comments,120 # 2. Determine direc<strong>to</strong>ry path,121 # 3. Split and use "mkdir" if required122 # <strong>to</strong> make local host direc<strong>to</strong>ry path,123 # 4. S<strong>to</strong>re list <strong>of</strong> files names124 #125 proc pReadDirList { } {126 global Prog127 global LocTopDir128 global ListFile129 global DirPath130 global FileName131 global NumDirs132 global NumFiles133134 set NumDirs 0135 set NumFiles($NumDirs) 0136137 if [file exists $ListFile] {138 if [catch {open $ListFile r} fileId] {139 puts stderr "${Prog}: ERROR: Opening $ListFile: $fileId"140 } else {141 while {[gets $fileId line] >= 0} {142 #143 # Ignore ("#") comments:144 #145 set sts [regexp {^ *#} $line match]146147 if {$sts > 0} {148 continue149 }150 # Ignore ("^ *$") blank lines:151 #152 set sts [regexp {^ *$} $line match]153154 if {$sts > 0} {155 continue156 }157 # Determine direc<strong>to</strong>ry path:158 #159 set sts [regexp {run[1-9][0-9]*: *$} $line match]160161 if {$sts > 0} {162 set dirNum $NumDirs163 set NumFiles($dirNum) 0164 set DirPath($dirNum) [string trim $line { :}]165166 # Use "mkdir" <strong>to</strong> make local host direc<strong>to</strong>ry path:167 #168 set locDir [file nativename [file join \169 $LocTopDir $DirPath($dirNum)]]170171 if {[catch {exec mkdir -p $locDir} result]} {172 puts stderr "${Prog}: ERROR(mkdir): $result"173 exit174 }175 incr NumDirs176 continue177 }178179 # S<strong>to</strong>re list <strong>of</strong> netCDF (".nc") files names:180 #181 set sts [regexp {\.nc *$} $line match]182183 if {$sts > 0} {184 set FileName($dirNum,$NumFiles($dirNum)) [string trim $line]185 incr NumFiles($dirNum)186 continue187 }188 # Should only get here on unrecognized data list entries:189 #190 puts stderr "${Prog}: WARNING: Unrecognized <strong>Data</strong> List Entry:"191 puts stderr " Direc<strong>to</strong>ry: $DirPath($dirNum)"192 puts stderr " Bad Entry: $line"193 }194 close $fileId195196 if {$NumDirs < 1} {197 puts stderr "${Prog}: WARNING: (${ListFile}) appears empty"198 exit199 }200 }201 } else {202 puts stderr "${Prog}: WARNING: (${ListFile}) file does not exist"20


203 exit204 }205 }206207208 # p G e t D a t a F i l e s209 #210 # 1. Open an FTP connection <strong>to</strong> a remote host211 # 2. Change <strong>to</strong> some direc<strong>to</strong>ry212 # 3. Get the requested data files213 # 4. Write the data out <strong>to</strong> local disk files214 # 5. Close the FTP connection215 #216 proc pGet<strong>Data</strong>Files {} {217 global Prog218 global ListFile219 global DirPath220 global FileName221 global NumDirs222 global NumFiles223 global MyTmpDir224 global LocTopDir225 global RemTopDir226 global FtpHost FtpUsrName FtpPasswd227 global Get<strong>Data</strong>228 global EndOf<strong>Data</strong>229 global <strong>Data</strong>230 global Log231 global env232233 puts stdout "${Prog}: pGet<strong>Data</strong>Files: <strong>Data</strong> <strong>Transfer</strong>"234235 # TESTING --- TESTING --- TESTING:236 # ---------- <strong>from</strong> here ----------237 #238 ## puts stdout "${Prog}: ListFile: $ListFile"239 ## set dnum 0240 ## while {$dnum < $NumDirs} {241 ## puts stdout "${Prog}: DirPath: $DirPath($dnum)"242 ## set fnum 0243 ## while {$fnum < $NumFiles($dnum)} {244 ## if [info exists FileName($dnum,$fnum)] {245 ## puts stdout "${Prog}:\246 ## File($dnum,$fnum): $FileName($dnum,$fnum)"247 ## }248 ## incr fnum249 ## }250 ## incr dnum251 ## }252 ## puts stdout "${Prog}: ----------"253 ## return254 #255 # -------- down <strong>to</strong> here ---------256 # TESTING --- TESTING --- TESTING.257258 set numGetFailures 0259 set limGetFailures 5260 set <strong>to</strong>tFsize 0261 set remFsize 0262 set numGBs 0263 set <strong>to</strong>tGBs 0264 set oneGB [expr 1024 * 1024 * 1024]265266 set openPortal 0267268 # Change <strong>to</strong> the required direc<strong>to</strong>ry:269 #270 set dnum 0271 while {$dnum < $NumDirs} {272 puts stdout "${Prog}: DirPath: $DirPath($dnum)"273 set remDir [file nativename [file join $RemTopDir $DirPath($dnum)]]274275 # Get a sorted listing <strong>of</strong> the current direc<strong>to</strong>ry:276 #277 set fnum 0278 while {$fnum < $NumFiles($dnum)} {279280 # Check for the existence <strong>of</strong> a "cease.${Prog}" file,281 # and cease operation if neccesary:282 #283 set ceaseFile $env(HOME)/cease.${Prog}284285 if [file exists $ceaseFile] {286 puts "${Prog}: ${ceaseFile}: file exists"287 puts "${Prog}: Exiting at the request <strong>to</strong> cease operation"288 exit 0289 }290 if [info exists FileName($dnum,$fnum)] {291 set fileName $FileName($dnum,$fnum)292293 puts stdout "${Prog}: Connecting via FTP <strong>to</strong>: $FtpHost"294295 ## set ftp::VERBOSE 1296297 # Added the "-mode passive" option ... which made this work298 # for the PCMDI <strong>to</strong> NCAR file transfers (so perhaps there299 # is a firewall between those two sites).300 # Thanks <strong>to</strong> AFM for suggesting this. - LH, 20041222.301 #302 ## if {[set conn [ftp::Open $FtpHost $FtpUsrName $FtpPasswd303 ## -mode passive -blocksize 102400 -timeout 500 ]] < 0} { }304 ##305 #306 # Note:307 # Increased blocksize <strong>to</strong> reduce progress messages.308 # - LH, 20050125.309 #310 if {[set conn [FTP::Open $FtpHost $FtpUsrName $FtpPasswd \311 -mode passive -blocksize 10485760 -timeout 500 ]] < 0} {312 puts stderr "${Prog}: ERROR: Connection refused!"313 return314 } else {315 # Assume no error in connecting:316 #317 set openPortal 1318 ## ftp::Type $conn binary319 FTP::Type binary320321 ## if {[ftp::Cd $conn $remDir] == 0} { }322 ##323 if {[FTP::Cd $remDir] == 0} {324 puts stderr "${Prog}: ERROR:\325 Direc<strong>to</strong>ry ($remDir) not found!"326 } else {327 #328 # Download the data file if possible:329 #330 ## set remFsize [ftp::FileSize $conn $fileName]331 ##332 set remFsize [FTP::FileSize $fileName]333334 # Check for a genuine (non-null) remote file size,335 # otherwise assume the file does not exist, or336 # that it has been renamed as a ".nc.bad" file.337 # Note, the "WARNING" message will get picked up338 # by the "au<strong>to</strong>-nxt-run" script (assuming it is339 # being run) and will be emailed <strong>to</strong> the user.340 # So we proceed <strong>to</strong> the next file in the list341 # if this one is not available:342 #343 if {[string length $remFsize] < 1} {344 puts stdout "${Prog}:\345 WARNING: Null file size: $fileName"346 ## ftp::Close $conn347 FTP::Close348 set openPortal 0349 puts stdout "${Prog}:\350 Closed FTP connection <strong>to</strong>: $FtpHost"351 incr fnum352 continue353 }354 set locFile [file nativename [file join \355 "$LocTopDir" "$DirPath($dnum)" "$fileName"]]356357 # Check if the file already exists locally:358 #359 if [file exists $locFile] {360 set locFsize [file size $locFile]361362 # Proceed <strong>to</strong> the next file if we already363 # have it (at the same size):364 #365 if {$locFsize == $remFsize} {366 puts stdout "${Prog}:\367 Already have file: $fileName"368 ## ftp::Close $conn369 FTP::Close370 set openPortal 0371 puts stdout "${Prog}:\372 Closed FTP connection <strong>to</strong>: $FtpHost"373 incr fnum374 continue375 }376 }377 puts stdout "${Prog}: FTP get file: $fileName"378379 ## if {[ftp::Reget $conn $fileName $locFile] == 0} { }380 ##381 if {[FTP::Reget $fileName $locFile] == 0} {382 puts stderr "${Prog}: ERROR: Get failed: $fileName"383 if {$numGetFailures > $limGetFailures} {384 puts stderr "${Prog}: ERROR:\385 Get failure limit\386 ($limGetFailures) exceeded."387 exit 10388 }389 incr numGetFailures390 } else {391 puts stdout "${Prog}: <strong>Data</strong> Saved in: $locFile"392393 # Compare the remote and (now) local file sizes394 # and report any discrepancies:395 #396 ## set remFsize [ftp::FileSize $conn $fileName]397 set remFsize [FTP::FileSize $fileName]398 set locFsize [file size $locFile]399 puts stdout "${Prog}: <strong>Transfer</strong>red $locFsize bytes"400401 if {$locFsize != $remFsize} {402 puts stderr "${Prog}: ERROR: File Size Differs"403 puts stderr " Remote File Size: $remFsize"404 puts stderr " Local File Size: $locFsize"405 }406 set <strong>to</strong>tFsize [expr $<strong>to</strong>tFsize + $locFsize]407 set numGBs [expr int($<strong>to</strong>tFsize / $oneGB)]408409 if {$numGBs > 0} {410 set <strong>to</strong>tGBs [expr $<strong>to</strong>tGBs + $numGBs]411 set <strong>to</strong>tFsize [expr $<strong>to</strong>tFsize - \412 ($numGBs * $oneGB)]413 }414 }415 }416 ## ftp::Close $conn417 FTP::Close418 set openPortal 0419 puts stdout "${Prog}: FTP close connection <strong>to</strong>: $FtpHost"420 }421 }422 incr fnum423 }424 incr dnum425 }426 puts stdout "${Prog}: Finished."427 puts stdout "${Prog}: Info: <strong>Transfer</strong>red $<strong>to</strong>tGBs GB and $<strong>to</strong>tFsize bytes."428429 if {$openPortal > 0} {430 ## ftp::Close $conn431 FTP::Close432 set openPortal 0433 }434 }435436437 pReadDirList438439 pGet<strong>Data</strong>Files44021


441 exit 442B.7 Program: ‘run’The ‘run’ script generates a log file name and runs the ‘afm-ftp.tcl’ script, capturing both the standardoutput and standard error output in<strong>to</strong> the named log file:1 #!/bin/sh2 #3 # Program:4 # run5 #6 # Author:7 # Lawson Hanson, 20041223.8 #9 # Purpose:10 # Generates a log file name and runs the "afm-ftp.tcl" script11 # capturing both the standard output and standard error output12 # in<strong>to</strong> the named log file.13 #14 usage ( )15 {16 echo17 echo "Usage: sh run [-h] [-v]"18 echo "Where: -h = Display this help/usage text and exit."19 echo " -v = Provides a more verbose mode <strong>of</strong> operation"20 echo " where output is displayed as well as being"21 echo " captured in<strong>to</strong> the named log file."22 echo23 }2425 umask 0022627 # Define PATH so we do not run programs28 # <strong>from</strong> any other version <strong>of</strong> reality:29 #30 PATH=/bin:/usr/bin:/usr/local/bin31 export PATH3233 LogFile="ftp-‘date +%Y%m%d-%H%M%S‘.log"3435 echo "ftp log file is: ${LogFile}"3637 withTee=03839 case $1 in40 -h)41 usage42 exit 043 ;;44 -v)45 withTee=146 ;;47 *)48 ;;49 esac5051 ## LD_LIBRARY_PATH=/home/strandwg/data/tcl8.5a2/lib52 ## export LD_LIBRARY_PATH5354 if [ $withTee -gt 0 ]55 then56 (57 umask 00258 ## /home/strandwg/data/tcl8.5a2/bin/tclsh8.5 afm-ftp.tcl 2>&159 tclsh afm-ftp.tcl 2>&160 ) | sed -e ’s/PASS .*/PASS xxxxxxxxxx/’ | tee $LogFile61 else62 (63 umask 00264 ## /home/strandwg/data/tcl8.5a2/bin/tclsh8.5 afm-ftp.tcl 2>&165 tclsh afm-ftp.tcl 2>&166 ) | sed -e ’s/PASS .*/PASS xxxxxxxxxx/’ > $LogFile &67 fi6869 exit70B.8 Program: ‘nxt-run’Given the ‘data’ file and ‘part’ numbers <strong>of</strong> a previously partitioned data direc<strong>to</strong>ry/file (see ‘df-part.sh’),the ‘nxt-run’ script generates the associated list file name, copies it <strong>to</strong> the ‘remoteDirFileList.txt’file, and then calls the ‘run’ script <strong>to</strong> start ‘afm-ftp.tcl’ again:1 #!/bin/sh2 #3 # Program:4 # nxt-run5 #6 # Author:7 # Lawson Hanson, 20041223.8 #9 # Purpose:10 # Given the "data" file and "part" numbers <strong>of</strong> a previously partitioned11 # data direc<strong>to</strong>ry/file (see "df-part.sh"), this script generates the12 # associated file name, copies it <strong>to</strong> the "remoteDirFileList.txt" file,13 # and then calls the "run" script <strong>to</strong> start "afm-ftp.tcl" again.14 #15 # Updates | By | Description16 # --------+----+------------17 # 20050120| LH | Modified <strong>to</strong> read configuration data <strong>from</strong> a file.18 # 20050125| LH | Modified <strong>to</strong> set variable MyUser <strong>from</strong> USER or LOGNAME.19 # 20050125| LH | Modified <strong>to</strong> respond <strong>to</strong> request <strong>to</strong> cease operation.20 #21 Prog=‘basename $0 .sh‘2223 usage ( )24 {25 echo26 echo "Usage: ${Prog} [-h] data-file-number part-number"27 echo "Where: -h = Display this help/usage message and exit."28 echo29 echo " e.g.: ${Prog} 1 1"30 echo " or: ${Prog} 1 2"31 echo " or: ${Prog} 2 10"32 echo33 }3435 case $1 in36 -h)37 usage38 exit 039 ;;40 *)41 ;;42 esac4344 umask 0024546 # When this script is run on remote hosts,47 # this may help <strong>to</strong> keep my sanity:48 #49 # TZ=AEDT-1150 #51 # TZ=AEST-1052 # export TZ535455 MyUser=${USER:-${LOGNAME:-jma}}5657 # Define PATH so we do not run programs58 # <strong>from</strong> any other version <strong>of</strong> reality:59 #60 PATH=/bin:/usr/bin:/usr/local/bin61 export PATH6263 # Check for the existence <strong>of</strong> a "cease.${Prog}" file,64 # and obey the request if neccesary:65 #66 ceaseFile=${HOME}/cease.${Prog}6768 if [ -f $ceaseFile ]69 then70 echo "${Prog}: ${ceaseFile}: file exists"71 echo "${Prog}: Exiting on request <strong>to</strong> cease operation"72 exit 073 fi7475 # Check if the "tclsh8.5" process is already running, because we76 # probably should not attempt <strong>to</strong> run another session in parallel:77 #78 ps -u $MyUser | egrep ’[0-9] tclsh8.5’79 sts=$?8081 if [ $sts -eq 0 ]82 then83 lastFtpLog=‘ls -1 ftp-*.log | tail -1‘84 echo85 echo "${Prog}: WARNING: It appears that you are already running another"86 echo " ’tclsh8.5’ (i.e., another Tcl-FTP) process."87 echo " Please wait until that run has finished."88 echo " You can check on that by running the ’ps’ command,"89 echo " and examining the output for the ’tclsh8.5’ process."90 echo91 echo "Note:"92 echo " If the previous Tcl-FTP (’tclsh8.5’) process appears"93 echo " <strong>to</strong> be stuck, examine the latest ’ftp-*.log’ file for"94 echo " any sign <strong>of</strong> ’error’, i.e., try running the command:"95 echo96 echo " grep -i error ${lastFtpLog}"97 echo98 echo " Investigate any error messages, and if warranted,"99 echo " then you may need <strong>to</strong> ’kill’ the ’tclsh8.5’ process"100 echo " so that you can start the next run."101 echo " It might be useful <strong>to</strong> re-run the previous one again;"102 echo " it does not take very long <strong>to</strong> run through and check"22


103 echo " that it already has all <strong>of</strong> the files."104 echo105 exit 1106 fi107108 if [ $# -ne 2 ]109 then110 usage111 exit 2112 fi113114 dfn=$1115 pfn=$2116117 partFile="data${dfn}-p${pfn}.txt"118119120 if [ ! -f ${partFile} ]121 then122 echo123 echo "${Prog}: ERROR: ${partFile}: file does not exist."124 echo125 exit 3126 fi127128 remDFlist="remoteDirFileList.txt"129130 # Read the configuration file:131 #132 ConfigFile="${HOME}/${Prog}.config"133134 if [ ! -r $ConfigFile ]135 then136 echo137 echo "${Prog}: ERROR: Configuration file not found"138 echo139 exit 4140 else141 . $ConfigFile142 fi143144 start=‘date +%Y%m%d,%H%M%S‘145146 echo "${Prog}: ${start}: Running part file: ${partFile}" | tee -a ${Prog}.log147148 cp $partFile $remDFlist149150 sh run151152 exit 0153B.9 Program: ‘au<strong>to</strong>-nxt-run’When run without any options, the ‘au<strong>to</strong>-nxt-run’ script will check that there is no other ‘tclsh8.3’process (i.e., another scripted FTP session) already running for the user, otherwise it will just exit. Ifno other process is running, then the script will check for errors in the last FTP log file, and if therewere any, it will restart the previous data file part, otherwise, it increments the part number and startsthe next data file part. This script can be run <strong>from</strong> a ‘crontab’ entry <strong>to</strong> au<strong>to</strong>mate the data downloadprocess for multiple (carefully named) file lists.1 #!/bin/sh2 #3 # Program:4 # au<strong>to</strong>-nxt-run5 #6 # Author:7 # Lawson Hanson, 20041224.8 #9 # Purpose:10 # When run without any options, the script will check that there is11 # no other "tclsh8.5" (i.e., the FTP session) process already running12 # for the user, otherwise it will just exit. If no other process is13 # running, then the script will check for errors in the last FTP log14 # file, and if there were any, the restart the previous data file part,15 # otherwise, increment the part number and start the next data file part.16 #17 # Given the "data" file and "part" numbers <strong>of</strong> a previously partitioned18 # data direc<strong>to</strong>ry/file (see "df-part.sh"), this script generates the19 # associated file name, copies it <strong>to</strong> the "remoteDirFileList.txt" file,20 # and then calls the "run" script <strong>to</strong> start "afm=ftp.tcl" again.21 #22 # Updates | By | Description23 # --------+----+------------24 # 20050105| LH | Added code <strong>to</strong> handle the next data file path name series25 # 20050106| LH | Modified code for one "au<strong>to</strong>-nxt-run-.log" per day.26 # 20050117| LH | Modified <strong>to</strong> include a check for warnings and email <strong>to</strong> user.27 # 20050120| LH | Modified <strong>to</strong> read configuration data <strong>from</strong> a file.28 # 20050125| LH | Modified <strong>to</strong> set variable MyUser <strong>from</strong> USER or LOGNAME.29 # 20050125| LH | Modified <strong>to</strong> respond <strong>to</strong> request <strong>to</strong> cease operation.30 # 20050720| LH | Modified <strong>to</strong> try next file number with same part number.31 #32 Prog=‘basename $0 .sh‘3334 usage ( )35 {36 echo37 echo "Usage: ${Prog} [-h]"38 echo "Where: -h = Display this help/usage message and exit."39 echo " This script is designed <strong>to</strong> be run by ’cron’."40 echo41 }4243 case $1 in44 -h)45 usage46 exit 047 ;;48 *)49 ;;50 esac5152 umask 0025354 TZ=AEDT-1155 export TZ5657 MyUser=${USER:-${LOGNAME:-jma}}5859 # Define PATH so we do not run programs60 # <strong>from</strong> any other version <strong>of</strong> reality:61 #62 PATH=/bin:/usr/bin:/usr/local/bin63 export PATH6465 # Check for the existence <strong>of</strong> a "cease.${Prog}" file,66 # and obey the request if neccesary:67 #68 ceaseFile=${HOME}/cease.${Prog}6970 if [ -f $ceaseFile ]71 then72 echo "${Prog}: ${ceaseFile}: file exists"73 echo "${Prog}: Exiting on request <strong>to</strong> cease operation"74 exit 075 fi7677 # Read the configuration file or die:78 #79 ConfigFile="${HOME}/${Prog}.config"8081 if [ ! -r $ConfigFile ]82 then83 echo84 echo "${Prog}: ERROR: Configuration file not found"85 echo86 exit 187 else88 . $ConfigFile89 fi9091 # Change <strong>to</strong> the "run" direc<strong>to</strong>ry, so everything is relative:92 #93 cd $RunPath9495 StartHms=‘date +%H%M%S‘96 StartYmd=‘date +%Y%m%d‘9798 LogFile="${Prog}-${StartYmd}.log"99100 # Initialise log file if it does not exist:101 #102 if [ ! -f $LogFile ]103 then104 echo "${Prog}: ${StartYmd}-${StartHms}" > $LogFile105 echo "${Prog}: MailTo: ${MailTo}" >> $LogFile106 echo "${Prog}: RunPath: ${RunPath}" >> $LogFile107 echo "${Prog}: RemDFlist: ${RemDFlist}" >> $LogFile108 fi109110 LastFtpLog=‘ls -1 ftp-*.log | tail -1‘111 export LastFtpLog112113 # Check if the "tclsh8.5" process is already running, because we114 # probably should not attempt <strong>to</strong> run another session in parallel:115 #116 ## ps -u $MyUser | egrep ’[0-9] tclsh8.5’ >> /dev/null 2>&1117 #118 ps -u $MyUser | egrep ’[0-9] tclsh8.3’ >> /dev/null 2>&1119 sts=$?120121 # The ’egrep’ process returns ’0’ for success (i.e., found some):122 #123 if [ $sts -eq 0 ]124 then125 echo "${Prog}: ${StartHms}:" >> $LogFile126 ## echo " A ’tclsh8.5’ process was still running." >> $LogFile127 echo " A ’tclsh8.3’ process was still running." >> $LogFile128 echo " The last FTP log file was: ${LastFtpLog}" >> $LogFile129 exit 0130 fi131132 # Looks like it is ok <strong>to</strong> start the next run.133 #134 # Check if there were any warnings in the previous run,135 # and if there were, then send email <strong>to</strong> the user <strong>to</strong> alert136 # them <strong>of</strong> the incident:137 #138 warnCount=‘grep -c -i warning ${LastFtpLog}‘139140 # Check if there were any errors in the previous run,23


141 # and if there were, then re-start the same data file part,142 # otherwise increment the part number an run that:143 #144 errCount=‘grep -c -i error ${LastFtpLog}‘145146 # Get the data file and part numbers <strong>from</strong> the previous run:147 #148 # Example:149 # nxt-run: 20041223,171054: Running part file: data1-p6.txt150 #151 LastRunLine=‘tail -1 nxt-run.log‘152153 # Note, AWK arrays are indexed starting <strong>from</strong> 1:154 #155 LastFileNum=‘echo $LastRunLine | awk ’{156 split($6,ary,"-")157 print substr(ary[1],5)158 }’ -‘159 LastPartNum=‘echo $LastRunLine | awk ’{160 split($6,ary,"-")161 split(ary[2],bry,".")162 print substr(bry[1],2)163 }’ -‘164165 if [ $warnCount -gt 0 ]166 then167 # There were some warnings, so send an email <strong>to</strong> the user:168 #169 LastPartFile="data${LastFileNum}-p${LastPartNum}.txt"170 export LastPartFile171 (172 echo "<strong>Data</strong> <strong>Transfer</strong> Warnings"173 echo "----------------------"174 echo "Please see log file: $LastFtpLog"175 echo176 echo "<strong>Data</strong> List Part File: $LastPartFile"177 echo "produced the following warning messages:"178 echo179 grep -C 3 -i warning $LastFtpLog180 ) | mail -s "<strong>Data</strong> <strong>Transfer</strong> Warnings" $MailTo181 fi182183 if [ $errCount -gt 0 ]184 then185 # There were some errors, but perhaps we have attempted re-running186 # the same data part file several times already (i.e., there may187 # be a non-existent file, such as "*.nc.bad", in the list):188 #189 PrevFtpLog=‘ls -1 ftp-*.log | tail -2 | head -1‘190191 result=‘cmp $PrevFtpLog $LastFtpLog‘192 stsOne=$?193194 if [ $stsOne -eq 0 ]195 then196 # The last two ftp log files are identical,197 # so lets try the one before that, <strong>to</strong>o:198 #199 BeforeFtpLog=‘ls -1 ftp-*.log | tail -3 | head -1‘200201 result=‘cmp $BeforeFtpLog $LastFtpLog‘202 stsTwo=$?203204 if [ $stsTwo -eq 0 ]205 then206 # The last three ftp log files are identical,207 # so assume we have some unresolvable error,208 # and re-set the error count <strong>to</strong> zero209 # so that we can make some progress:210 #211 errCount=0212213 # And send some email <strong>to</strong> alert the user:214 #215 LastPartFile="data${LastFileNum}-p${LastPartNum}.txt"216 export LastPartFile217 (218 echo "<strong>Data</strong> <strong>Transfer</strong> Error Loop"219 echo "------------------------"220 echo "The last three ftp log files are identical,"221 echo "so assume we have some unresolvable error,"222 echo "and re-set the error count <strong>to</strong> zero"223 echo "so that we can make some progress:"224 echo225 echo "Please see FTP log files:"226 echo " $BeforeFtpLog"227 echo " $PrevFtpLog"228 echo " $LastFtpLog"229 echo230 echo "<strong>Data</strong> List Part File: $LastPartFile"231 echo "produced some error messages:"232 echo233 grep -i error $LastFtpLog234 echo235 ) | mail -s "<strong>Data</strong> <strong>Transfer</strong> Error Loop" $MailTo236 fi237 fi238 fi239240 if [ $errCount -gt 0 ]241 then242 # There were some errors, so re-run the same data file part:243 #244 PartFile="data${LastFileNum}-p${LastPartNum}.txt"245 else246 # No errors, that’s good. Run the next part:247 #248 NewPartNum=‘expr $LastPartNum + 1‘249 PartFile="data${LastFileNum}-p${NewPartNum}.txt"250 fi251252 if [ ! -f ${PartFile} ]253 then254 echo "${Prog}: ${StartHms}:" >> $LogFile255 echo " WARNING: ${PartFile}: file does not exist." >> $LogFile256257 # Try the next file number and reset part <strong>to</strong> 1:258 #259 NewFileNum=‘expr ${LastFileNum} + 1‘260 PartFile="data${NewFileNum}-p1.txt"261262 if [ ! -f ${PartFile} ]263 then264 echo "${Prog}: ${StartHms}:" >> $LogFile265 echo " WARNING: ${PartFile}: file does not exist." >> $LogFile266267 # Try the next file number with the same part number,268 # e.g., "data10-p92.txt" followed by "data11-p92.txt":269 #270 PartFile="data${NewFileNum}-p${LastPartNum}.txt"271272 if [ ! -f ${PartFile} ]273 then274 echo "${Prog}: ${StartHms}:" >> $LogFile275 echo " ERROR: ${PartFile}: file does not exist." >> $LogFile276 echo " S<strong>to</strong>pping ’crontab’ entry for $MyUser." >> $LogFile277 crontab -r278 exit 3279 fi280 fi281 fi282283 echo "${Prog}: ${StartHms}:" >> $LogFile284285 if [ $errCount -gt 0 ]286 then287 echo " Re-running part file: ${PartFile}" >> $LogFile288 else289 echo " Running part file: ${PartFile}" >> $LogFile290 fi291292 # Also need <strong>to</strong> record this in the "nxt-run.log" file:293 #294 NxtRunLog="nxt-run.log"295 StartDateTime="${StartYmd}-${StartHms}"296297 # NOTE:298 # Please DO NOT change the format <strong>of</strong> the following line,299 # this script depends on "${PartFile}" being the 6th argument:300 #301 echo "${Prog}: ${StartDateTime}: Running part file: ${PartFile}" >> $NxtRunLog302303 cp $PartFile $RemDFlist304305 NewFtpLogFile="${RunPath}/ftp-${StartDateTime}.log"306307 ## LD_LIBRARY_PATH=/home/strandwg/data/tcl8.5a2/lib308 ## export LD_LIBRARY_PATH309310 ## Tcl85sh=/home/strandwg/data/tcl8.5a2/bin/tclsh8.5311 ## export Tcl85sh312313 # Finally, run the selected part <strong>of</strong> the data file number:314 #315 ## ( umask 002; ${Tcl85sh} ${RunPath}/afm-ftp.tcl 2>&1 ) > $NewFtpLogFile &316 ##317 ( umask 002; tclsh ${RunPath}/afm-ftp.tcl 2>&1 ) > $NewFtpLogFile &318319 exit 0320B.10 Program: ‘gwstats’The ‘gwstats’ script runs a few commands which provide a simple snapshot view <strong>of</strong> the status <strong>of</strong> themultiple-terabyte file transfer process <strong>from</strong> a LaCie <strong>Terabyte</strong> disk on a Linux PC <strong>to</strong> the ‘gwork’ temporarys<strong>to</strong>rage area on host ‘gale’:1 #!/bin/sh2 #3 # Program:4 # gwstats5 #6 # RCS-Strings:7 # $Source: /bm/gkeep/lih/src/sh/RCS/gwstats,v $8 # $Revision: 1.1 $9 #10 # Author:11 # Lawson Hanson, 20050209.12 #13 # Purpose:14 # Run a few commands which provide a simple snapshot view <strong>of</strong>15 # the status <strong>of</strong> the multiple-terabyte file transfer process16 # <strong>from</strong> a LaCIE <strong>Terabyte</strong> disk on a Red Hat Linux PC <strong>to</strong> gwork.17 #18 Prog=‘basename $0 .sh‘1920 MyUser=${USER:-${LOGNAME:-nobody}}2122 case $MyUser in23 afm)24 runDir=/bm/gkeep/afm/bmrc/lacie124


25 gworkDir=/bm/gwork/afm/bmrc/lacie126 ;;27 lih)28 runDir=/bm/gkeep/lih/bmrc/afm/p0329 gworkDir=/bm/gwork/lih/bmrc/afm/p0330 ;;31 *)32 echo33 echo "${Prog}: Error: Unknown user $MyUser"34 echo35 exit 136 ;;37 esac3839 cd $runDir4041 echo4243 ls -lrt | tail44 echo4546 tail nxt-run.log47 echo4849 date50 echo5152 echo "S<strong>to</strong>rage space used on ’gwork’ in 1k blocks:"53 du -k -s $gworkDir54 echo5556 if [ ! -f nxt-run.log ]57 then58 echo59 echo "${Prog}: Error: nxt-run.log: file not found"60 echo61 exit 162 fi6364 LastRunPart=‘tail -1 nxt-run.log | awk ’{print $6}’ -‘6566 if [ ! -f $LastRunPart ]67 then68 echo69 echo "${Prog}: Error: ${LastRunPart}: file not found"70 echo71 exit 172 fi7374 echo "Number <strong>of</strong> files in this part: ‘grep -c ’\.nc$’ $LastRunPart‘"7576 LastFtpLog=‘ls -1 ftp-*.log | tail -1‘7778 if [ ! -f $LastFtpLog ]79 then80 echo81 echo "${Prog}: Error: ${LastFtpLog}: file not found"82 echo83 exit 184 fi8586 echo "Number <strong>of</strong> files already have: ‘grep -c -i ’already have’ $LastFtpLog‘"8788 echo "Number <strong>of</strong> files transferred.: ‘grep -c ’get file’ $LastFtpLog‘"89 echo9091 echo "Information <strong>from</strong> LastFtpLog.: $LastFtpLog"92 echo9394 numErrs=‘grep -c ’[Ee][Rr][Rr][Oo][Rr]’ $LastFtpLog‘9596 if [ $numErrs -gt 0 ]97 then98 isAre="is"99 msgWord="message"100101 if [ $numErrs -gt 1 ]102 then103 isAre="are"104 msgWord="messages"105 fi106 echo "There ${isAre} $numErrs FTP error ${msgWord}"107 echo108 fi109110 exit 0111B.11 Program: ‘get-lsR-ents’The ‘get-lsR-ents’ script can be used <strong>to</strong> extract particular entries <strong>from</strong> an ‘ls -R’ files list. See section6 on page 5 for an example <strong>of</strong> how <strong>to</strong> use this script <strong>to</strong> extract the daily (i.e., ‘/da/’) list <strong>of</strong> direc<strong>to</strong>riesand associated files <strong>from</strong> a data list such as ‘data1-p1.txt’:1 #!/bin/sh2 #3 # Program:4 # get-lsR-ents5 #6 # RCS-Strings:7 # $Source: /bm/gkeep/lih/src/sh/RCS/get-lsR-ents,v $8 # $Revision: 1.2 $9 #10 # Author:11 # Lawson Hanson, 20050317.12 #13 # Purpose:14 # Given a defining string, extract the corresponding entries <strong>from</strong> an15 # FTP site’s "ls -R" command (or an "ls -lR" shell command) listing.16 # The script uses the technique <strong>of</strong> a simple finite state machine.17 #18 # Example:19 # To extract the list <strong>of</strong> "/da/" (i.e., daily) data files which need20 # <strong>to</strong> be further processed by the "cut-<strong>to</strong>-oz" script:21 #22 # % cd /bm/gwork/lih/bmrc/afm/p0323 # % ls -R data1 > gw-data1-p1.list24 # % get-lsR-ents ’\/da\/’ gw-data1-p1.list > gw-data1-p1-da.list25 #26 Prog=‘basename $0 .sh‘2728 usage ( )29 {30 echo31 echo "Usage: ${Prog} pattern file"32 echo33 }3435 case $1 in36 -h)37 usage38 exit 039 ;;40 *)41 ;;42 esac4344 pattern=$145 shift4647 awk ’BEGIN {48 gotMatch = 049 }5051 # Keep any comment ("#") lines that exist in the input file:52 #53 /^ *#/ {54 print55 next56 }5758 # Use direc<strong>to</strong>ry entries <strong>to</strong> turn output listing "<strong>of</strong>f":59 #60 /: *$/ {61 gotMatch = 062 }6364 # Use direc<strong>to</strong>ry entries which match the specified pattern65 # <strong>to</strong> turn the output listing "on":66 #67 /’${pattern}’.*: *$/ {68 gotMatch = 169 }7071 # Ignore the "<strong>to</strong>tal" lines:72 #73 /^<strong>to</strong>tal / {74 next75 }7677 {78 if (gotMatch > 0) {79 print80 }81 }’ $*8283 exit 084B.12 Program: ‘cut-<strong>to</strong>-oz’The ‘cut-<strong>to</strong>-oz’ script is run on a list <strong>of</strong> huge daily data files <strong>to</strong> extract a portion <strong>of</strong> the data whichis relevant <strong>to</strong> Australian researchers. That sub-set <strong>of</strong> the data, (details <strong>of</strong> which was determined byDr.Aurel Moise), is that which is bounded by the region specified by latitudes 60 degrees South <strong>to</strong> 20degrees North, and longitudes 90 degrees East <strong>to</strong> 270 degrees East. This data set extraction is performed25


y using the NCO command: ‘ncks’, and the output is saved in files named with an ‘ OZ’ suffix beforethe ‘.nc’:1 #!/bin/sh2 #3 # Program:4 # cut-<strong>to</strong>-oz5 #6 # RCS-Strings:7 # $Source: /bm/gkeep/lih/src/sh/RCS/cut-<strong>to</strong>-oz,v $8 # $Revision: 1.1 $9 #10 # Author:11 # Lawson Hanson, 20050211.12 #13 # Purpose:14 # Read a list <strong>of</strong> "direc<strong>to</strong>ry:" and "file.nc" entries15 # <strong>from</strong> the "ls -R" file list specified on the command-line,16 # and use the "ncks" netCDF file utility <strong>to</strong> extract17 # a region <strong>of</strong> interest <strong>to</strong> Australian researchers,18 # which was determined by Dr.Aurel Moise <strong>to</strong> be:19 #20 # -d lat,-60.,20. -d lon,90.,270.21 #22 # and name the new file "${baseFile}_OZ.nc".23 #24 # Note:25 # Janice Sisson requested that I should use the "qsub" batch system26 # <strong>to</strong> run the "ncks" processes which form a part <strong>of</strong> "cut-<strong>to</strong>-oz", this27 # in an attempt <strong>to</strong> reduce the load on the "gale" computer system.28 # Janice suggests that I should just "qsub" the whole script.29 #30 # Send mail at beginning and end <strong>of</strong> request execution:31 # QSUB -mb -me32 # Specify the batch queue:33 # QSUB -q large34 #35 Prog=‘basename $0 .sh‘3637 usage ( )38 {39 echo40 echo "Usage: ${Prog} [-h] [-n] data-list-file"41 echo "Where: -h = Display this help/usage message and exit."42 echo " -n = Do NOT remove the original netCDF file."43 echo44 echo " e.g.: ${Prog} data1.txt"45 echo46 }4748 part=149 removeFlag=15051 case $1 in52 -h)53 usage54 exit 055 ;;56 -n)57 removeFlag=058 shift59 ;;60 *)61 ;;62 esac6364 if [ $# -ne 1 ]65 then66 usage67 exit 168 fi6970 dataFile="$1"7172 if [ ! -f ${dataFile} ]73 then74 echo75 echo "${Prog}: ERROR: ${dataFile}: file does not exist."76 echo77 exit 278 fi7980 fNum=08182 <strong>to</strong>pDir=‘pwd‘8384 cat $dataFile | while read line85 do86 gotComment=‘echo $line | grep ’^#’‘87 cSts=$?8889 if [ $cSts -eq 0 ]90 then91 continue92 fi9394 gotDir=‘echo $line | grep ’:$’‘95 sts=$?9697 if [ $sts -eq 0 ]98 then99 dir=‘echo $line | tr -d ’:’‘100101 if [ ! -d ${<strong>to</strong>pDir}/$dir ]102 then103 echo104 echo "${Prog}: Error: ${<strong>to</strong>pDir}/${dir}: Direc<strong>to</strong>ry not found"105 echo106 exit 3107 fi108109 cd ${<strong>to</strong>pDir}/$dir110 pwd111 fi112113 gotNcFile=‘echo $line | grep ’\.nc$’‘114 sts=$?115116 if [ $sts -eq 0 ]117 then118 baseFile=‘basename $line .nc‘119 nice ncks -Oha -d lat,-60.,20. -d lon,90.,270. $line ${baseFile}_OZ.nc120 cmdSts=$?121122 if [ $cmdSts -ne 0 ]123 then124 echo "${Prog}: ERROR: ncks process failure:"125 echo " direc<strong>to</strong>ry: ‘pwd‘"126 echo " with file: $line"127 else128 fNum=‘expr $fNum + 1‘129 echo "Created file(${fNum}): ${baseFile}_OZ.nc"130131 if [ $removeFlag -gt 0 ]132 then133 /bin/rm -f $line134 fi135 fi136 fi137 done138139 exit 0140B.13 Program: ‘dir2sam’The ‘dir2sam’ script uses ‘rcp’ <strong>to</strong> transfer direc<strong>to</strong>ry structures with sub-files <strong>from</strong> the current direc<strong>to</strong>ryon say ‘gale’ <strong>to</strong> the associated direc<strong>to</strong>ry on ‘sam’ using ‘samsrv2jf’ (i.e., the jumbo frame). After the‘rcp’ process has completed, the script calls the ‘setPerms’ script <strong>to</strong> set direc<strong>to</strong>ries <strong>to</strong> 775 and files <strong>to</strong> 664so that several people can be working on the same areas, i.e., perhaps the same direc<strong>to</strong>ries, but hopefullydifferent files:1 #!/bin/sh2 #3 # Program:4 # dir2sam5 #6 # RCS-Strings:7 # $Source: /bm/gkeep/lih/src/sh/RCS/dir2sam,v $8 # $Revision: 1.3 $9 #10 # Author:11 # Lawson Hanson, 20050127.12 #13 # Purpose:14 # Use "rcp" <strong>to</strong> transfer direc<strong>to</strong>ry structures with sub-files15 # <strong>from</strong> the current direc<strong>to</strong>ry on say "gale" <strong>to</strong> the associated16 # direc<strong>to</strong>ry on "sam" using "samsrv2jf" (the jumbo frame).17 # After the "rcp" process has completed, the script calls the18 # "setPerms" script <strong>to</strong> set direc<strong>to</strong>ries <strong>to</strong> 775 and files <strong>to</strong> 66419 # so that several people can be working on the same areas, i.e.,20 # perhaps the same direc<strong>to</strong>ries, but hopefully different files.21 #22 Prog=‘basename $0 .sh‘2324 usage ( )25 {26 echo27 echo "Usage: ${Prog} direc<strong>to</strong>ry"28 echo29 }3031 if [ $# -ne 1 ]32 then33 usage34 exit 135 fi3637 copyDir="$1"3839 if [ ! -d "${copyDir}" ]40 then41 usage42 echo "${Prog}: Error: ${copyDir}: direc<strong>to</strong>ry not found"43 echo44 exit 226


45 fi4647 currDir=‘pwd‘4849 umask 0025051 echo "${Prog}: Current direc<strong>to</strong>ry: $currDir"5253 # An example <strong>of</strong> the current direc<strong>to</strong>ry is:54 #55 # /bm/gwork/lih/bmrc/afm/p0356 # 1 /2 /3 /4 /5 /6 /7 &- 2>&- &\""8586 exit 087B.14 Program: ‘setPerms’The ‘setPerms’ script sets the direc<strong>to</strong>ry and file permissions so that user and group both have access <strong>to</strong>modify items:1 #!/bin/sh2 #3 # Program:4 # setPerms5 #6 # Author:7 # Lawson Hanson, 20050214.8 #9 # Purpose:10 # Set the direc<strong>to</strong>ry and file permissions so that user and group11 # both have access <strong>to</strong> modify items.12 #13 # Usage:14 # setPerms direc<strong>to</strong>ry15 #16 Prog=‘basename $0 .sh‘1718 usage ( )19 {20 echo21 echo "Usage: ${Prog} direc<strong>to</strong>ry"22 echo23 }2425 case $1 in26 -h)27 usage28 exit 029 ;;30 *)31 ;;32 esac3334 if [ $# -ne 1 ]35 then36 usage37 exit 138 fi3940 dir="$1"4142 if [ ! -d "${dir}" ]43 then44 usage45 echo "${Prog}: Error: ${dir} direc<strong>to</strong>ry not found"46 echo47 exit 248 fi4950 MyUser=${LOGNAME:-${USER:-unknown}}5152 if [ "${MyUser}" = "unknown" ]53 then54 echo55 echo "${Prog}: Error: User is ${MyUser}"56 echo57 exit 358 fi5960 find ${dir} -type d -user ${MyUser} -exec chmod 775 {} \;61 find ${dir} -type f -user ${MyUser} -exec chmod 664 {} \;6263 exit 064B.15 Program: ‘get-file-dates.sh’The ‘get-file-dates.sh’ script parses the data transfer logs <strong>to</strong> gather the actual date <strong>of</strong> the datafile transfers <strong>from</strong> the PCMDI data portal <strong>to</strong> the remote LaCie TB disk. From time-<strong>to</strong>-time PCMDIannounces that certain data files have been updated (withdrawn or replaced) due <strong>to</strong> some problem that hasbeen uncovered with the original file, so when those announcements are made, the ‘get-file-dates.sh’script makes it easy <strong>to</strong> check whether we have the good or bad versions <strong>of</strong> those files.1 #!/bin/sh30 #2 #31 Prog=‘basename $0 .sh‘3 # Program:324 # get-file-dates.sh33 ScriptDir=‘dirname $0‘5 #346 # RCS-Strings:35 fLst="da"7 # $Source: /bm/gkeep/lih/src/sh/RCS/get-file-dates.sh,v $36 eLst="inmcm3_0"8 # $Revision: 1.4 $37 kLst="atm"9 #38 rLst="1"10 # Author:39 sLst="20c3m,picntrl,sresa1b,sresa2,sresb1"11 # Lawson Hanson, 20050915.40 vLst="pr,psl,tas,tasmax,tasmin"12 #41 info=013 # Purpose:4214 # Given a set <strong>of</strong> control parameters, uses ’grep’ <strong>to</strong> determine43 # A series <strong>of</strong> configuration files should provide the flexibility15 # the date <strong>of</strong> the FTP log file(s) containing matching filename44 # <strong>to</strong> configure the default values for most script run scenarios,16 # and specific direc<strong>to</strong>ry paths which match the parameters.45 # the last configuration file <strong>to</strong> be sourced will, <strong>of</strong> course,17 #46 # have the over-riding influence:18 # Example:47 #19 #+-------48 ConfigFile="${Prog}.cfg"20 #|% get-file-dates.sh -s 20c3m -f da -k atm -r 2 -v tasmax \4921 #| -e mri_cgcm2_3_2a | grep 1961-197050 # Source the (in order, if they exist) the ${ScriptDir}, ${HOME},22 #|20050108: data2/20c3m/atm/da/tasmax/mri_cgcm2_3_2a/run2/tasmax_A2.1961-1970.nc 51 # and current direc<strong>to</strong>ry (’.’) versions <strong>of</strong> the configuration file:23 #|20050109: data2/20c3m/atm/da/tasmax/mri_cgcm2_3_2a/run2/tasmax_A2.1961-1970.nc 52 #24 #|20050110: data2/20c3m/atm/da/tasmax/mri_cgcm2_3_2a/run2/tasmax_A2.1961-1970.nc 53 cfgDirs="${ScriptDir} ${HOME} ."25 #|20050207: data2/20c3m/atm/da/tasmax/mri_cgcm2_3_2a/run2/tasmax_A2.1961-1970.nc 5426 #+-------55 for dir in $cfgDirs27 #56 do28 # Note:57 cfgFile="${dir}/${ConfigFile}"29 # Another utility written for the PCMDI data retrieval project.5827


59 if [ -f "${cfgFile}" ]60 then61 echo "${Prog}: Sourcing: ${cfgFile}"62 . ${cfgFile}63 fi64 done6566 fnUsage ( )67 {68 opts="[-h] [-i] [-s sL] [-k kL] [-f fL] [-v vL] [-e eL] [-r rL]"69 echo70 echo "Usage: ${Prog} ${opts}"71 echo "Where: -h = Displays this help/usage message and exits"72 echo " -i = Display some additional run-time information"73 echo " -s sL = Specifies a list <strong>of</strong> one or more Scenarios"74 echo " -k kL = Specifies a list <strong>of</strong> one or more Kinds <strong>of</strong> data"75 echo " (i.e., ’atm’, ’ice’, ’land’ or ’ocn’)"76 echo " -f fL = Specifies a list <strong>of</strong> one or more data Frequencies"77 echo " (i.e., ’da’, ’mo’, or ’fixed’)"78 echo " -v vL = Specifies a list <strong>of</strong> one or more Variables"79 echo " -e eL = Specifies a list <strong>of</strong> one or more Experiment-ids"80 echo " -r rL = Specifies a list <strong>of</strong> one or more Run numbers"81 echo82 echo " Note: Multiple list items should be separated by comma characters"83 echo " for example: -k atm,ice or: -r 1,2,3"84 echo85 echo "Example:"86 astr="data2/20c3m/atm/da/tasmax/mri_cgcm2_3_2a/run2/tasmax_A2.1961-1970.nc"87 echo " ${astr}"88 echo89 echo "Format:"90 echo " ’data’*//////’run’/*.nc"91 echo92 echo "Where:"93 echo " scenario = 20c3m"94 echo " kind = atm"95 echo " freq = da"96 echo " var = tasmax"97 echo " expt = mri_cgcm2_3_2a"98 echo " run = 2"99 echo100 }101102 # Read the command line options.103 #104 set -- ‘ge<strong>to</strong>pt "hie:f:k:r:s:v:" "$@"‘105106 for key in "$@"107 do108 case $key in109 --)110 shift111 break112 ;;113 -e)114 eLst=$2115 shift 2116 ;;117 -f)118 fLst=$2119 shift 2120 ;;121 -h)122 fnUsage123 exit 0124 ;;125 -i)126 info=1127 ;;128 -k)129 kLst=$2130 shift 2131 ;;132 -r)133 rLst=$2134 shift 2135 ;;136 -s)137 sLst=$2138 shift 2139 ;;140 -v)141 vLst=$2142 shift 2143 ;;144 esac145 done146147 dataFile="/bm/gkeep/lih/bmrc/afm/p02/ftpLogs/all-info.txt"148149 if [ ! -f $dataFile ]150 then151 echo152 echo "${Prog}: Error: $dataFile: File not found"153 echo154 exit 2155 fi156157158 # f n C o m m a S e p L i s t159 #160 fnCommaSepList ( )161 {162 echo $1 | awk -F, ’{163 printf "%s", $1164 for(i=2;i

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!