ECMWF Data Handling System ECFS
ECMWF Data Handling System ECFS
ECMWF Data Handling System ECFS
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>ECMWF</strong> <strong>Data</strong> <strong>Handling</strong> <strong>System</strong><br />
<strong>ECFS</strong><br />
Anne Fouilloux<br />
User Support<br />
advisory@ecmwf.int<br />
Slide 1
Content<br />
• Introduction<br />
• The <strong>ECFS</strong> client: a UNIX-like interface<br />
• <strong>ECFS</strong> User commands<br />
• Remarks<br />
• Trouble-shooting<br />
• Recommendations<br />
• Status & Future plans<br />
Slide 2<br />
Slide 2
Introduction<br />
•For many years (from 1983), <strong>ECMWF</strong> has operated a large-scale <strong>Data</strong><br />
<strong>Handling</strong> <strong>System</strong> (DHS), in which all <strong>ECMWF</strong> users can store and retrieve<br />
data.<br />
•The <strong>Data</strong> <strong>Handling</strong> <strong>System</strong> is composed of three main components:<br />
- IBM’s High Performance Storage <strong>System</strong> (HPSS), used as the<br />
underlying archiving system in which data is kept<br />
- MARS - Meteorological Archive and Retrieval <strong>System</strong><br />
• GRIB and BUFR data, over 4.2 Petabyte of data (including backup copies),<br />
more than 5 million files (approx. 1Gb/file) and about 3 TB stored every<br />
day.<br />
- <strong>ECFS</strong> - <strong>ECMWF</strong>’s File Storage system<br />
• Any kind of data, 1400 TB of data (including backup copy), about 40 Million<br />
files (~=23Mb/file) and between 2.5 and 3 TB stored daily<br />
Slide 3<br />
MARS and <strong>ECFS</strong> were developed by <strong>ECMWF</strong> to hide the complexities of HPSS and data<br />
management from the users.<br />
Slide 3
The <strong>ECFS</strong> client: a UNIX-like interface (1/2)<br />
•A Unix style of file interface has been adopted by <strong>ECFS</strong> :<br />
- Files are mapped to a Unix-compatible directory tree<br />
- Either absolute and relative pathnames can be used<br />
- Concept of current <strong>ECFS</strong> working directories ($ECDIR and<br />
$EC_TMPDIR), analogous to the Unix current working<br />
directory $PWD<br />
- Limited wildcard support is provided. Wildcard characters are<br />
NOT supported in <strong>ECFS</strong> DIRECTORY positions but CAN be<br />
used for the rightmost (file) position. (For instance, you cannot<br />
use els ec:t*/file*.out)<br />
- The <strong>ECFS</strong> filesize limit is 16 GB. Be aware that certain Unix<br />
Slide 4<br />
systems (not at <strong>ECMWF</strong>) or software packages cannot handle<br />
files over 2GB in size;<br />
Slide 4
The <strong>ECFS</strong> client: a UNIX-like interface (2/2)<br />
•But this is not a UNIX file system:<br />
- Files are migrated off to tape(s) behind the scenes.<br />
- There are (Network) overheads when files are transferred<br />
to/from <strong>ECFS</strong>, UNLESS file is on disk cache (small and<br />
recent data).<br />
•<strong>ECFS</strong> commands: els, erm, ermdir, emkdir, ecd, epwd,<br />
echmod, echgrp, ecp, emv, eumask and emove, ecfsdir,<br />
ecfs_status<br />
•<strong>ECFS</strong> commands are implemented as aliases/functions with the<br />
execution mapping to Perl scripts. Environment is set up for both<br />
Slide 5<br />
C-shell and korn-shell users. Other shells are not supported!<br />
Slide 5
Documentation & Availability at <strong>ECMWF</strong><br />
•<strong>ECFS</strong> commands are available on all our platforms (ecgate<br />
and HPCF systems) except ecfs_status command for<br />
monitoring your <strong>ECFS</strong> usage (available on ecgate only).<br />
•Documentation is available on our website at<br />
http://www.ecmwf.int/services/computing/docs/archives/ecfs/index.html<br />
•<strong>ECFS</strong> man page:<br />
man ecfs<br />
•In addition there are man pages for each specific command<br />
e.g.:<br />
man els<br />
Slide 6<br />
Slide 6
<strong>ECFS</strong> domains<br />
•<strong>ECFS</strong> files are stored in domains. Currently two domains exist:<br />
ec: and ectmp:<br />
- ec: is the permanent domain where files are stored indefinitely. This is the<br />
default domain<br />
- ectmp: is the temporary domain where files are stored for 90 days, after which<br />
they are automatically deleted. Note that once a file has been automatically<br />
deleted it CANNOT be recovered.<br />
NB: Co-Operating states may ONLY use ectmp: domain.<br />
•The domain names shown above (ec:, ectmp:) should used in<br />
most of the <strong>ECFS</strong> commands to indicate which domain you are<br />
working with.<br />
•Note that, as an alternative, the ectmp: domain can be<br />
referenced by ec:/TMP. Thus the following are equivalent:<br />
- ectmp:/uid/newdir<br />
- ec:/TMP/uid/newdir<br />
Slide 7<br />
Slide 7
<strong>ECFS</strong> user commands: Exploring the <strong>ECFS</strong> file system<br />
•List (ls –al) <strong>ECFS</strong> files described by target:<br />
els [-R] <br />
•Change the current <strong>ECFS</strong> working directory for the specified<br />
<strong>ECFS</strong> domain:<br />
ecd <br />
Target should be prefixed by an <strong>ECFS</strong><br />
domain either ec: or ectmp:<br />
To list subdirectories recursively.<br />
els can time out for very large <strong>ECFS</strong> directory trees. (see ecfs_audit)<br />
Set the environment variables ECDIR<br />
(ec:) or EC_TMPDIR (ectmp:)<br />
NB: Defaults to login name of user if target omitted.<br />
•Print name of the <strong>ECFS</strong> current working directory for the<br />
specified <strong>ECFS</strong> domain:<br />
Slide 8<br />
epwd ec:<br />
Or epwd ectmp:<br />
Print the content of the environment<br />
variables ECDIR (ec:) or EC_TMPDIR<br />
(ectmp:)<br />
Slide 8
Practical: Exploring the <strong>ECFS</strong> file system<br />
•For this first practical, try the following commands on ecgate:<br />
epwd ec:<br />
epwd ectmp:<br />
•Use els to list all the files contained in both domains:<br />
els ec:<br />
els ectmp:<br />
• Change the working directories and use els to list their contents:<br />
ecd ec:/trx<br />
epwd ec:<br />
els ec:<br />
ecd ec:<br />
Slide 9<br />
Slide 9
<strong>ECFS</strong> user commands: Transferring files between <strong>ECFS</strong><br />
and client storage<br />
Overwrite existing file<br />
unconditionally<br />
Create a backup copy in Disaster Recovery <strong>System</strong> (DRS)<br />
Use sparingly, for files impossible/expensive to recreate.<br />
Do not overwrite<br />
if file exists.<br />
Not an error<br />
(DEFAULT)<br />
Do not overwrite.<br />
Treat as an error<br />
if attempted<br />
(Return 1)<br />
file's timestamp, group and permission will be kept<br />
(optional) destroy disk cache copy<br />
after retrieval<br />
ecp [-e|n|o|t] [-b] [-p] [-m] <br />
emv [-e|n|o|t] [-b] [-p] <br />
Either target or source should be prefixed by an<br />
<strong>ECFS</strong> domain (ec: or ectmp:)<br />
Overwrite only if target older Slide than10<br />
requested source<br />
(Time standards differ on local workstations and servers).<br />
NB: emv is similar to ecp but files are removed after being transfered<br />
Slide 10
Example: Transferring files between <strong>ECFS</strong> and client storage<br />
> ecp $SCRATCH/my_file ectmp:Backup/Feb/ecfs_scratch_file<br />
mk_ecdirs: Created <strong>ECFS</strong> directory ectmp:/TMP/trx/Backup<br />
mk_ecdirs: Created <strong>ECFS</strong> directory ectmp:/TMP/trx/Backup/Feb<br />
File: ecfs_scratch_file.EcFs_XfEr successfully moved<br />
Transferred /scratch/ectrain/trx/my_file from ecgate to <strong>ECFS</strong> as<br />
-> ectmp:/trx/Backup/Feb/ecfs_scratch_file<br />
599894 bytes (599 Kbytes per second)<br />
ecp returning 0<br />
> emv ectmp:ecfs_scratch_file $SCRATCH/my_file<br />
Transferred ectmp:/trx/ecfs_scratch_file from <strong>ECFS</strong> to ecgate as<br />
-> /scratch/ectrain/trx/my_file<br />
599894 bytes (599 Kbytes per second)<br />
get: removed local source file /trx/ecfs_scratch_file<br />
Slide 11<br />
emv returning 0<br />
Client storage<br />
<strong>ECFS</strong><br />
Slide 11
Practical: ecp and emv<br />
Client storage<br />
<strong>ECFS</strong><br />
•Work in your $SCRATCH<br />
cd $SCRATCH<br />
•Make a copy of the practicals directory in your $SCRATCH<br />
tar –xvf /scratch/ectrain/trx/ecfs_practicals.tar<br />
•This will create a directory in your $SCRATCH containing the<br />
data files for all the practicals<br />
• Copy the files $SCRATCH/ecfs_practicals/data/file*.out in ectmp:<br />
• Move the file ectmp:file1.out in your $SCRATCH<br />
Slide 12<br />
Slide 12
<strong>ECFS</strong> user commands: file deletion<br />
erm <br />
Target should be prefixed by an <strong>ECFS</strong><br />
domain either ec: or ectmp:<br />
No client files are affected<br />
><br />
erm ec:ecfs_scratch_file<br />
It never prompt before any removal<br />
Successfully deleted ec:/trx/ecfs_scratch_file<br />
><br />
erm ec:test*<br />
Successfully deleted ectmp:/trx/test1<br />
Successfully deletec ectmp:/trx/test2<br />
erm: 2 files deleted<br />
Files are removed from <strong>ECFS</strong> with<br />
a soft-delete: files will still be<br />
kept for 28 days during which it<br />
will be possible, on request, to undelete<br />
any file that was deleted<br />
by mistake. After that period any<br />
removal will become permanent.<br />
Slide 13<br />
Contact us if you have large entire trees to remove<br />
Slide 13
<strong>ECFS</strong> user commands: creation and removal of directories<br />
•Make directory:<br />
Creates all the<br />
non-existing<br />
parent directories<br />
first<br />
emkdir [-p] [-m octal_mode] <br />
•Remove a specified empty directory:<br />
ermdir <br />
> emkdir –p ectmp:DIR1/DIR2/DIR3<br />
mk_ecdirs: Created <strong>ECFS</strong> directory ectmp://TMP/trx/DIR1<br />
mk_ecdirs: Created <strong>ECFS</strong> directory ectmp://TMP/trx/DIR1/DIR2<br />
mk_ecdirs: Created <strong>ECFS</strong> directory ectmp://TMP/trx/DIR1/DIR2/DIR3<br />
> ermdir ectmp:DIR1/DIR2/DIR3<br />
Successfully deleted ectmp:/trx/DIR1/DIR2/DIR3<br />
Specifies the mode to be used for new<br />
directories. If not present, it uses <strong>ECFS</strong><br />
umask (775 by default)<br />
Delete Slideempty 14 directories only<br />
Slide 14
Examples<br />
<strong>ECFS</strong> User Commands: changing permissions<br />
•Change permissions: echmod octal_mode <br />
> echmod 640 ec:myecdir<br />
echmod: ec:myecdir now has mode 640<br />
•Change the current <strong>ECFS</strong> eumask: eumask []<br />
><br />
eumask 022<br />
Only numerical values can be used as umasks.<br />
The default umask for <strong>ECFS</strong> is set to 002.<br />
One can directly define the ECUMASK environment<br />
variable directly (without calling eumask).<br />
•Change group of file(s): echgrp group <br />
> echgrp mysecgrp ec:/uid/*<br />
echgrp: ec:/uid/mydir1 now has group_id = mysecgrp<br />
Slide 15<br />
echgrp: ec:/uid/mydir2 now has group_id = mysecgrp<br />
echgrp: ec:/uid/myfile1 now has group_id = mysecgrp<br />
Slide 15
<strong>ECFS</strong> user commands: save or retrieve a complete UNIX<br />
directory as one <strong>ECFS</strong> file<br />
Date and time of last access<br />
ecfsdir [-o] [-b] [-m|-a] <br />
The date/time of last modification will<br />
be used as time stamp. This is the<br />
default for ecfsdir. Only meaningful at<br />
retrieval.<br />
> ecfsdir $SCRATCH/Results ectmp:Model/results_backup<br />
/scratch/ectrain/Results<br />
mk_ecdirs: Created <strong>ECFS</strong> directory ectmp:/TMP/trx/Model<br />
file: results_backup.EcFs_XfEr successfully moved<br />
Transferred /scratch/ectrain/trx/scratchdir/ecgate.139928/file287836/Results from ecgate to <strong>ECFS</strong> as<br />
ectmp:/trx/Model/results_backup<br />
341468160 bytes (18970 Kbytes per second)<br />
ecp returning 0<br />
Results directory saved<br />
NB: ecfsdir uses cpio to “compact”the files.<br />
Slide 16<br />
Slide 16<br />
Source orTarget should be<br />
prefixed by an <strong>ECFS</strong> domain<br />
either ec: or ectmp:<br />
Results is a directory and all the files in<br />
Results will be packed into a single file<br />
called results_backup
<strong>ECFS</strong> user commands: save or retrieve a complete<br />
UNIX directory as one <strong>ECFS</strong> file<br />
> cat $HOME/<strong>ECFS</strong>/Results_1217.03Feb2006<br />
Contents of the directory saved:<br />
========================<br />
./DIR1/DIR2/file1<br />
./DIR1/DIR2/DIR3/file2<br />
.<br />
./DIRn/…/DIRm/filep<br />
.<br />
Name of the directory saved:<br />
/scratch/ectrain/trx/Results<br />
Ecfs backup in :<br />
/trx/mp:Model/results_backup<br />
Date : Fri Feb 3 12:19:04 GMT 2006<br />
From : ecgate<br />
Slide 17<br />
This file is stored to in<br />
$HOME/<strong>ECFS</strong> to give you<br />
the list of files/directories<br />
saved. However, you can<br />
delete this file or move it<br />
(it is not needed when<br />
retrieved from <strong>ECFS</strong>).<br />
Slide 17
Practical: ecfsdir<br />
•Use ecfsdir to copy the content of the directory<br />
$SCRATCH/ecfs_practicals/data in ec:mydata<br />
><br />
ecfsdir $SCRATCH/ecfs_practicals/data ec:mydata<br />
Client storage<br />
<strong>ECFS</strong><br />
Faster than the<br />
equivalent ecp<br />
• Check the content your $HOME/<strong>ECFS</strong> (search for a file named<br />
data_*25Feb2008).<br />
><br />
cat $HOME/<strong>ECFS</strong>/data_*25Feb2008<br />
• Then retrieve ec:mydata in your $SCRATCH/ecfs_practicals/mydata<br />
><br />
><br />
ecfsdir ec:mydata $SCRATCH/ecfs_practicals/mydata<br />
cd $SCRATCH/ecfs_practicals/mydata<br />
Slide 18<br />
Slide 18
<strong>ECFS</strong> user commands: renaming or moving files within<br />
the same <strong>ECFS</strong> domain<br />
emove [-o|t|n|e]<br />
<br />
Source and target should be<br />
prefixed by the same <strong>ECFS</strong><br />
domain (ec: or ectmp:)<br />
><br />
emove ectmp:ecfs_file ectmp:DIR1/ecfs_fileFeb06<br />
move: entered with cmd = emove –n ectmp:ecfs_file ectmp:DIR1/ecfs_fileFeb06<br />
file: ecfs_file successfully moved<br />
move: : /TMP/trx/ecfs_file renamed to /TMP/trx/DIR1/ecfs_fileFeb06<br />
•DIR1 must exist!<br />
Slide 19<br />
•Not possible to move data between ec: and ectmp: domains<br />
Slide 19
<strong>ECFS</strong> user commands: monitoring<br />
•The ecfs_status command to be run on ecgate to get the most<br />
recent usage by project account<br />
Add this option to get a<br />
detailed report per user<br />
If omitted, print <strong>ECFS</strong><br />
status for you default<br />
project account<br />
ecfs_status [-h] [-u] [-a] [-d YYYYMMDD] [Project_name]<br />
To display a help screen<br />
to report summary for the<br />
project account (default)<br />
To print the <strong>ECFS</strong> status<br />
around a given date (default<br />
is the most recent)<br />
•To get an overview on their <strong>ECFS</strong> usage, you can also refer to<br />
the audit files ec:ecfs_audit and/or ectmp:ecfs_audit.tmp which<br />
are created once per month and contain a complete list of a<br />
Slide 20<br />
user's files in each <strong>ECFS</strong> domain.<br />
Slide 20
Examples: <strong>ECFS</strong> monitoring<br />
•Running ecfs_status on ecgate:<br />
ecfs_status<br />
Client storage<br />
<strong>ECFS</strong><br />
<strong>ECFS</strong> status on 20070110 for my_acc<br />
Account my_acc Total: 963713 MB –135942 files Transfer previous month: 499690 MB –3613 files<br />
Total: 963713 MB –135942 files Transfer previous month: 499590 MB - 3613 files.<br />
•To read ecfs_audit or ecfs_audit.tmp, you need first to copy them locally (these two<br />
files don’t exist for new accounts; they will be created after the first month):<br />
><br />
ecp ec:ecfs_audit $SCRATCH/ecfs_audit<br />
cat $SCRATCH/ecfs_audit<br />
><br />
-- uid gid size(bytes) creation last_access path today=2006-01-09<br />
* trx ectrain 1945665 2005-12-16 2005-12-16 /trx/test1<br />
* trx ectrain 1305088 2005-12-16 2005-12-16 /trx/test2<br />
…<br />
Slide 21<br />
Total files =20 megabytes = 116.864808082581<br />
total directories = 2 total files not accessed since 20040708 = 0<br />
Slide 21
Backup support<br />
• No automatic backup copy is made of <strong>ECFS</strong> data. Specify the “-b”<br />
option on the <strong>ECFS</strong> commands (ecp, emv, ecfsdir) to request a backup<br />
copy to be made:<br />
ecp –b myfile ec:essential_data<br />
emv –b myfile ec:essential_data<br />
ecfsdir –b $SCRATCH/results ec:essential_directory<br />
• The existence of backup copies will be indicated by the first character<br />
of the line listing:<br />
b for files with a secondary copy<br />
br--r----- 1 uid group 100 Nov 19 2003 essential_data<br />
-rw-rw---- 1 uid group 512 Nov 19 2003 unimportant_data<br />
- for files with no secondary copy<br />
Slide 22<br />
• NOTE: Irrespective of the existence of backup copies: any <strong>ECFS</strong> files<br />
removed (deleted) by a user can only be recovered for a limited period<br />
of time (28 days)<br />
Slide 22
Recommendations<br />
•Do not copy in/out the same files frequently. Use temporary local disk<br />
space such as $SCRATCH (on ecgate) or $TEMP (on hpce) to keep a<br />
local copy of these files (by default ecp will not overwrite a file if it exists;<br />
do not use the –o option in that case)<br />
•Create fewer larger files rather than many small files otherwise it can<br />
adversely affect performance of the whole system for everybody<br />
(overheads for EACH transfer).<br />
•Keep files locally THEN “compact”them using ecfsdir or cpio or tar and<br />
ONLY then store them into <strong>ECFS</strong>.<br />
•Use ectmp: if files do NOT need to be kept for long periods.<br />
•Delete files which you do not need in ec:. As there is no recursive<br />
delete, you may contact us if you need to delete entire trees.<br />
Slide 23<br />
•Never use <strong>ECFS</strong> commands in parallel jobs on hpce (in serial jobs only)<br />
Slide 23
<strong>ECFS</strong> within scripts (1/3)<br />
• <strong>ECFS</strong> commands within scripts (tcsh, csh, sh, ksh, ksh93, bash):<br />
- only C-shell and korn-shell supported<br />
- you should NOT use csh –f which will not define aliases.<br />
- When <strong>ECFS</strong> commands are not found, first check your environment:<br />
• if you are a ksh user:<br />
> echo $FPATH<br />
/usr/local/share/ecmwf/share/ecksh_functions<br />
• if you are a csh user:<br />
> which els<br />
els: aliased to (set noglob; $<strong>ECFS</strong>_SYS_PATH/els.p !{*};)<br />
• <strong>ECFS</strong> commands within cron(tab) tasks: cron tasks do not execute<br />
.profile, .cshrc, .login etc. Hence neither paths nor aliases would be setup.<br />
We advice you to use our time critical framework (See Dominique’s<br />
Slide 24<br />
presentation Friday morning) instead of cron tasks.<br />
Slide 24
<strong>ECFS</strong> within scripts (2/3)<br />
• Error <strong>Handling</strong>: the following techniques are suggested for trapping<br />
<strong>ECFS</strong> error codes when running batch scripts in the Korn shell<br />
environment<br />
- Set a trap for the entire script:<br />
#!/bin/ksh<br />
trap " echo <strong>ECFS</strong> call exited with RC= \$? " ERR<br />
ecp nofile ec:<br />
- or catch any errors on each call:<br />
#!/bin/ksh<br />
set +e<br />
ecp nofile ec:<br />
RC=$?<br />
set -e<br />
if [ $RC -gt 0 ]<br />
then<br />
Slide 25<br />
echo " <strong>ECFS</strong> call exited with RC= $RC"<br />
fi<br />
To avoid the script to stop after the first error<br />
Check the code returned by the <strong>ECFS</strong> command<br />
Slide 25
<strong>ECFS</strong> within scripts (3/3)<br />
•Check existence of a local copy before getting <strong>ECFS</strong> version of file:<br />
#!/bin/ksh<br />
if [ ! -r $SCRATCH/file2.out ]; then<br />
ecp ectmp:file2.out $SCRATCH/.<br />
fi<br />
•Loop over <strong>ECFS</strong> directories to change mode<br />
#!/bin/ksh<br />
<strong>ECFS</strong>dir=ec:/$USER/TESTDIR-1<br />
<strong>ECFS</strong>prefix=`dirname $<strong>ECFS</strong>dir`; dirs=`basename $<strong>ECFS</strong>dir`<br />
while [ -n "$dirs" ]; do To skip the four first lines<br />
newdirs=""<br />
for dir in $dirs; do<br />
for list in `els ${<strong>ECFS</strong>prefix}/$dir | tail -n +4 | tr -s ' ' ‘|'`; do<br />
name=`echo $list | cut -f 9 -d ‘|'`<br />
echmod 775 ${<strong>ECFS</strong>prefix}/${dir}/$name<br />
newdirs="$newdirs ${dir}/$name"<br />
done<br />
Slide 26<br />
done<br />
dirs=$newdirs<br />
done<br />
Slide 26<br />
Substitute space by |<br />
Keep the 9 th field which<br />
is the last one
Status & Future plans<br />
•An <strong>ECFS</strong> access control system is now in place to manage and<br />
control <strong>ECFS</strong> workload<br />
•Recursive commands (erm, echmod) will be provided. We enabled<br />
"soft delete" of files in preparation for the introduction later of<br />
recursive delete of entire trees.<br />
•The <strong>ECFS</strong> client part will be rewritten<br />
•To match the Supercomputer capacity the following aspects still<br />
have to be studied<br />
- The amount of daily transfers of data in/out from <strong>ECFS</strong><br />
(not retransferring the same file in/out helps us a lot)<br />
Slide 27<br />
- The growth in the number of physical tapes/disk cache<br />
Slide 27