28.10.2014 Views

ECMWF Data Handling System ECFS

ECMWF Data Handling System ECFS

ECMWF Data Handling System ECFS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>ECMWF</strong> <strong>Data</strong> <strong>Handling</strong> <strong>System</strong><br />

<strong>ECFS</strong><br />

Anne Fouilloux<br />

User Support<br />

advisory@ecmwf.int<br />

Slide 1


Content<br />

• Introduction<br />

• The <strong>ECFS</strong> client: a UNIX-like interface<br />

• <strong>ECFS</strong> User commands<br />

• Remarks<br />

• Trouble-shooting<br />

• Recommendations<br />

• Status & Future plans<br />

Slide 2<br />

Slide 2


Introduction<br />

•For many years (from 1983), <strong>ECMWF</strong> has operated a large-scale <strong>Data</strong><br />

<strong>Handling</strong> <strong>System</strong> (DHS), in which all <strong>ECMWF</strong> users can store and retrieve<br />

data.<br />

•The <strong>Data</strong> <strong>Handling</strong> <strong>System</strong> is composed of three main components:<br />

- IBM’s High Performance Storage <strong>System</strong> (HPSS), used as the<br />

underlying archiving system in which data is kept<br />

- MARS - Meteorological Archive and Retrieval <strong>System</strong><br />

• GRIB and BUFR data, over 4.2 Petabyte of data (including backup copies),<br />

more than 5 million files (approx. 1Gb/file) and about 3 TB stored every<br />

day.<br />

- <strong>ECFS</strong> - <strong>ECMWF</strong>’s File Storage system<br />

• Any kind of data, 1400 TB of data (including backup copy), about 40 Million<br />

files (~=23Mb/file) and between 2.5 and 3 TB stored daily<br />

Slide 3<br />

MARS and <strong>ECFS</strong> were developed by <strong>ECMWF</strong> to hide the complexities of HPSS and data<br />

management from the users.<br />

Slide 3


The <strong>ECFS</strong> client: a UNIX-like interface (1/2)<br />

•A Unix style of file interface has been adopted by <strong>ECFS</strong> :<br />

- Files are mapped to a Unix-compatible directory tree<br />

- Either absolute and relative pathnames can be used<br />

- Concept of current <strong>ECFS</strong> working directories ($ECDIR and<br />

$EC_TMPDIR), analogous to the Unix current working<br />

directory $PWD<br />

- Limited wildcard support is provided. Wildcard characters are<br />

NOT supported in <strong>ECFS</strong> DIRECTORY positions but CAN be<br />

used for the rightmost (file) position. (For instance, you cannot<br />

use els ec:t*/file*.out)<br />

- The <strong>ECFS</strong> filesize limit is 16 GB. Be aware that certain Unix<br />

Slide 4<br />

systems (not at <strong>ECMWF</strong>) or software packages cannot handle<br />

files over 2GB in size;<br />

Slide 4


The <strong>ECFS</strong> client: a UNIX-like interface (2/2)<br />

•But this is not a UNIX file system:<br />

- Files are migrated off to tape(s) behind the scenes.<br />

- There are (Network) overheads when files are transferred<br />

to/from <strong>ECFS</strong>, UNLESS file is on disk cache (small and<br />

recent data).<br />

•<strong>ECFS</strong> commands: els, erm, ermdir, emkdir, ecd, epwd,<br />

echmod, echgrp, ecp, emv, eumask and emove, ecfsdir,<br />

ecfs_status<br />

•<strong>ECFS</strong> commands are implemented as aliases/functions with the<br />

execution mapping to Perl scripts. Environment is set up for both<br />

Slide 5<br />

C-shell and korn-shell users. Other shells are not supported!<br />

Slide 5


Documentation & Availability at <strong>ECMWF</strong><br />

•<strong>ECFS</strong> commands are available on all our platforms (ecgate<br />

and HPCF systems) except ecfs_status command for<br />

monitoring your <strong>ECFS</strong> usage (available on ecgate only).<br />

•Documentation is available on our website at<br />

http://www.ecmwf.int/services/computing/docs/archives/ecfs/index.html<br />

•<strong>ECFS</strong> man page:<br />

man ecfs<br />

•In addition there are man pages for each specific command<br />

e.g.:<br />

man els<br />

Slide 6<br />

Slide 6


<strong>ECFS</strong> domains<br />

•<strong>ECFS</strong> files are stored in domains. Currently two domains exist:<br />

ec: and ectmp:<br />

- ec: is the permanent domain where files are stored indefinitely. This is the<br />

default domain<br />

- ectmp: is the temporary domain where files are stored for 90 days, after which<br />

they are automatically deleted. Note that once a file has been automatically<br />

deleted it CANNOT be recovered.<br />

NB: Co-Operating states may ONLY use ectmp: domain.<br />

•The domain names shown above (ec:, ectmp:) should used in<br />

most of the <strong>ECFS</strong> commands to indicate which domain you are<br />

working with.<br />

•Note that, as an alternative, the ectmp: domain can be<br />

referenced by ec:/TMP. Thus the following are equivalent:<br />

- ectmp:/uid/newdir<br />

- ec:/TMP/uid/newdir<br />

Slide 7<br />

Slide 7


<strong>ECFS</strong> user commands: Exploring the <strong>ECFS</strong> file system<br />

•List (ls –al) <strong>ECFS</strong> files described by target:<br />

els [-R] <br />

•Change the current <strong>ECFS</strong> working directory for the specified<br />

<strong>ECFS</strong> domain:<br />

ecd <br />

Target should be prefixed by an <strong>ECFS</strong><br />

domain either ec: or ectmp:<br />

To list subdirectories recursively.<br />

els can time out for very large <strong>ECFS</strong> directory trees. (see ecfs_audit)<br />

Set the environment variables ECDIR<br />

(ec:) or EC_TMPDIR (ectmp:)<br />

NB: Defaults to login name of user if target omitted.<br />

•Print name of the <strong>ECFS</strong> current working directory for the<br />

specified <strong>ECFS</strong> domain:<br />

Slide 8<br />

epwd ec:<br />

Or epwd ectmp:<br />

Print the content of the environment<br />

variables ECDIR (ec:) or EC_TMPDIR<br />

(ectmp:)<br />

Slide 8


Practical: Exploring the <strong>ECFS</strong> file system<br />

•For this first practical, try the following commands on ecgate:<br />

epwd ec:<br />

epwd ectmp:<br />

•Use els to list all the files contained in both domains:<br />

els ec:<br />

els ectmp:<br />

• Change the working directories and use els to list their contents:<br />

ecd ec:/trx<br />

epwd ec:<br />

els ec:<br />

ecd ec:<br />

Slide 9<br />

Slide 9


<strong>ECFS</strong> user commands: Transferring files between <strong>ECFS</strong><br />

and client storage<br />

Overwrite existing file<br />

unconditionally<br />

Create a backup copy in Disaster Recovery <strong>System</strong> (DRS)<br />

Use sparingly, for files impossible/expensive to recreate.<br />

Do not overwrite<br />

if file exists.<br />

Not an error<br />

(DEFAULT)<br />

Do not overwrite.<br />

Treat as an error<br />

if attempted<br />

(Return 1)<br />

file's timestamp, group and permission will be kept<br />

(optional) destroy disk cache copy<br />

after retrieval<br />

ecp [-e|n|o|t] [-b] [-p] [-m] <br />

emv [-e|n|o|t] [-b] [-p] <br />

Either target or source should be prefixed by an<br />

<strong>ECFS</strong> domain (ec: or ectmp:)<br />

Overwrite only if target older Slide than10<br />

requested source<br />

(Time standards differ on local workstations and servers).<br />

NB: emv is similar to ecp but files are removed after being transfered<br />

Slide 10


Example: Transferring files between <strong>ECFS</strong> and client storage<br />

> ecp $SCRATCH/my_file ectmp:Backup/Feb/ecfs_scratch_file<br />

mk_ecdirs: Created <strong>ECFS</strong> directory ectmp:/TMP/trx/Backup<br />

mk_ecdirs: Created <strong>ECFS</strong> directory ectmp:/TMP/trx/Backup/Feb<br />

File: ecfs_scratch_file.EcFs_XfEr successfully moved<br />

Transferred /scratch/ectrain/trx/my_file from ecgate to <strong>ECFS</strong> as<br />

-> ectmp:/trx/Backup/Feb/ecfs_scratch_file<br />

599894 bytes (599 Kbytes per second)<br />

ecp returning 0<br />

> emv ectmp:ecfs_scratch_file $SCRATCH/my_file<br />

Transferred ectmp:/trx/ecfs_scratch_file from <strong>ECFS</strong> to ecgate as<br />

-> /scratch/ectrain/trx/my_file<br />

599894 bytes (599 Kbytes per second)<br />

get: removed local source file /trx/ecfs_scratch_file<br />

Slide 11<br />

emv returning 0<br />

Client storage<br />

<strong>ECFS</strong><br />

Slide 11


Practical: ecp and emv<br />

Client storage<br />

<strong>ECFS</strong><br />

•Work in your $SCRATCH<br />

cd $SCRATCH<br />

•Make a copy of the practicals directory in your $SCRATCH<br />

tar –xvf /scratch/ectrain/trx/ecfs_practicals.tar<br />

•This will create a directory in your $SCRATCH containing the<br />

data files for all the practicals<br />

• Copy the files $SCRATCH/ecfs_practicals/data/file*.out in ectmp:<br />

• Move the file ectmp:file1.out in your $SCRATCH<br />

Slide 12<br />

Slide 12


<strong>ECFS</strong> user commands: file deletion<br />

erm <br />

Target should be prefixed by an <strong>ECFS</strong><br />

domain either ec: or ectmp:<br />

No client files are affected<br />

><br />

erm ec:ecfs_scratch_file<br />

It never prompt before any removal<br />

Successfully deleted ec:/trx/ecfs_scratch_file<br />

><br />

erm ec:test*<br />

Successfully deleted ectmp:/trx/test1<br />

Successfully deletec ectmp:/trx/test2<br />

erm: 2 files deleted<br />

Files are removed from <strong>ECFS</strong> with<br />

a soft-delete: files will still be<br />

kept for 28 days during which it<br />

will be possible, on request, to undelete<br />

any file that was deleted<br />

by mistake. After that period any<br />

removal will become permanent.<br />

Slide 13<br />

Contact us if you have large entire trees to remove<br />

Slide 13


<strong>ECFS</strong> user commands: creation and removal of directories<br />

•Make directory:<br />

Creates all the<br />

non-existing<br />

parent directories<br />

first<br />

emkdir [-p] [-m octal_mode] <br />

•Remove a specified empty directory:<br />

ermdir <br />

> emkdir –p ectmp:DIR1/DIR2/DIR3<br />

mk_ecdirs: Created <strong>ECFS</strong> directory ectmp://TMP/trx/DIR1<br />

mk_ecdirs: Created <strong>ECFS</strong> directory ectmp://TMP/trx/DIR1/DIR2<br />

mk_ecdirs: Created <strong>ECFS</strong> directory ectmp://TMP/trx/DIR1/DIR2/DIR3<br />

> ermdir ectmp:DIR1/DIR2/DIR3<br />

Successfully deleted ectmp:/trx/DIR1/DIR2/DIR3<br />

Specifies the mode to be used for new<br />

directories. If not present, it uses <strong>ECFS</strong><br />

umask (775 by default)<br />

Delete Slideempty 14 directories only<br />

Slide 14


Examples<br />

<strong>ECFS</strong> User Commands: changing permissions<br />

•Change permissions: echmod octal_mode <br />

> echmod 640 ec:myecdir<br />

echmod: ec:myecdir now has mode 640<br />

•Change the current <strong>ECFS</strong> eumask: eumask []<br />

><br />

eumask 022<br />

Only numerical values can be used as umasks.<br />

The default umask for <strong>ECFS</strong> is set to 002.<br />

One can directly define the ECUMASK environment<br />

variable directly (without calling eumask).<br />

•Change group of file(s): echgrp group <br />

> echgrp mysecgrp ec:/uid/*<br />

echgrp: ec:/uid/mydir1 now has group_id = mysecgrp<br />

Slide 15<br />

echgrp: ec:/uid/mydir2 now has group_id = mysecgrp<br />

echgrp: ec:/uid/myfile1 now has group_id = mysecgrp<br />

Slide 15


<strong>ECFS</strong> user commands: save or retrieve a complete UNIX<br />

directory as one <strong>ECFS</strong> file<br />

Date and time of last access<br />

ecfsdir [-o] [-b] [-m|-a] <br />

The date/time of last modification will<br />

be used as time stamp. This is the<br />

default for ecfsdir. Only meaningful at<br />

retrieval.<br />

> ecfsdir $SCRATCH/Results ectmp:Model/results_backup<br />

/scratch/ectrain/Results<br />

mk_ecdirs: Created <strong>ECFS</strong> directory ectmp:/TMP/trx/Model<br />

file: results_backup.EcFs_XfEr successfully moved<br />

Transferred /scratch/ectrain/trx/scratchdir/ecgate.139928/file287836/Results from ecgate to <strong>ECFS</strong> as<br />

ectmp:/trx/Model/results_backup<br />

341468160 bytes (18970 Kbytes per second)<br />

ecp returning 0<br />

Results directory saved<br />

NB: ecfsdir uses cpio to “compact”the files.<br />

Slide 16<br />

Slide 16<br />

Source orTarget should be<br />

prefixed by an <strong>ECFS</strong> domain<br />

either ec: or ectmp:<br />

Results is a directory and all the files in<br />

Results will be packed into a single file<br />

called results_backup


<strong>ECFS</strong> user commands: save or retrieve a complete<br />

UNIX directory as one <strong>ECFS</strong> file<br />

> cat $HOME/<strong>ECFS</strong>/Results_1217.03Feb2006<br />

Contents of the directory saved:<br />

========================<br />

./DIR1/DIR2/file1<br />

./DIR1/DIR2/DIR3/file2<br />

.<br />

./DIRn/…/DIRm/filep<br />

.<br />

Name of the directory saved:<br />

/scratch/ectrain/trx/Results<br />

Ecfs backup in :<br />

/trx/mp:Model/results_backup<br />

Date : Fri Feb 3 12:19:04 GMT 2006<br />

From : ecgate<br />

Slide 17<br />

This file is stored to in<br />

$HOME/<strong>ECFS</strong> to give you<br />

the list of files/directories<br />

saved. However, you can<br />

delete this file or move it<br />

(it is not needed when<br />

retrieved from <strong>ECFS</strong>).<br />

Slide 17


Practical: ecfsdir<br />

•Use ecfsdir to copy the content of the directory<br />

$SCRATCH/ecfs_practicals/data in ec:mydata<br />

><br />

ecfsdir $SCRATCH/ecfs_practicals/data ec:mydata<br />

Client storage<br />

<strong>ECFS</strong><br />

Faster than the<br />

equivalent ecp<br />

• Check the content your $HOME/<strong>ECFS</strong> (search for a file named<br />

data_*25Feb2008).<br />

><br />

cat $HOME/<strong>ECFS</strong>/data_*25Feb2008<br />

• Then retrieve ec:mydata in your $SCRATCH/ecfs_practicals/mydata<br />

><br />

><br />

ecfsdir ec:mydata $SCRATCH/ecfs_practicals/mydata<br />

cd $SCRATCH/ecfs_practicals/mydata<br />

Slide 18<br />

Slide 18


<strong>ECFS</strong> user commands: renaming or moving files within<br />

the same <strong>ECFS</strong> domain<br />

emove [-o|t|n|e]<br />

<br />

Source and target should be<br />

prefixed by the same <strong>ECFS</strong><br />

domain (ec: or ectmp:)<br />

><br />

emove ectmp:ecfs_file ectmp:DIR1/ecfs_fileFeb06<br />

move: entered with cmd = emove –n ectmp:ecfs_file ectmp:DIR1/ecfs_fileFeb06<br />

file: ecfs_file successfully moved<br />

move: : /TMP/trx/ecfs_file renamed to /TMP/trx/DIR1/ecfs_fileFeb06<br />

•DIR1 must exist!<br />

Slide 19<br />

•Not possible to move data between ec: and ectmp: domains<br />

Slide 19


<strong>ECFS</strong> user commands: monitoring<br />

•The ecfs_status command to be run on ecgate to get the most<br />

recent usage by project account<br />

Add this option to get a<br />

detailed report per user<br />

If omitted, print <strong>ECFS</strong><br />

status for you default<br />

project account<br />

ecfs_status [-h] [-u] [-a] [-d YYYYMMDD] [Project_name]<br />

To display a help screen<br />

to report summary for the<br />

project account (default)<br />

To print the <strong>ECFS</strong> status<br />

around a given date (default<br />

is the most recent)<br />

•To get an overview on their <strong>ECFS</strong> usage, you can also refer to<br />

the audit files ec:ecfs_audit and/or ectmp:ecfs_audit.tmp which<br />

are created once per month and contain a complete list of a<br />

Slide 20<br />

user's files in each <strong>ECFS</strong> domain.<br />

Slide 20


Examples: <strong>ECFS</strong> monitoring<br />

•Running ecfs_status on ecgate:<br />

ecfs_status<br />

Client storage<br />

<strong>ECFS</strong><br />

<strong>ECFS</strong> status on 20070110 for my_acc<br />

Account my_acc Total: 963713 MB –135942 files Transfer previous month: 499690 MB –3613 files<br />

Total: 963713 MB –135942 files Transfer previous month: 499590 MB - 3613 files.<br />

•To read ecfs_audit or ecfs_audit.tmp, you need first to copy them locally (these two<br />

files don’t exist for new accounts; they will be created after the first month):<br />

><br />

ecp ec:ecfs_audit $SCRATCH/ecfs_audit<br />

cat $SCRATCH/ecfs_audit<br />

><br />

-- uid gid size(bytes) creation last_access path today=2006-01-09<br />

* trx ectrain 1945665 2005-12-16 2005-12-16 /trx/test1<br />

* trx ectrain 1305088 2005-12-16 2005-12-16 /trx/test2<br />

…<br />

Slide 21<br />

Total files =20 megabytes = 116.864808082581<br />

total directories = 2 total files not accessed since 20040708 = 0<br />

Slide 21


Backup support<br />

• No automatic backup copy is made of <strong>ECFS</strong> data. Specify the “-b”<br />

option on the <strong>ECFS</strong> commands (ecp, emv, ecfsdir) to request a backup<br />

copy to be made:<br />

ecp –b myfile ec:essential_data<br />

emv –b myfile ec:essential_data<br />

ecfsdir –b $SCRATCH/results ec:essential_directory<br />

• The existence of backup copies will be indicated by the first character<br />

of the line listing:<br />

b for files with a secondary copy<br />

br--r----- 1 uid group 100 Nov 19 2003 essential_data<br />

-rw-rw---- 1 uid group 512 Nov 19 2003 unimportant_data<br />

- for files with no secondary copy<br />

Slide 22<br />

• NOTE: Irrespective of the existence of backup copies: any <strong>ECFS</strong> files<br />

removed (deleted) by a user can only be recovered for a limited period<br />

of time (28 days)<br />

Slide 22


Recommendations<br />

•Do not copy in/out the same files frequently. Use temporary local disk<br />

space such as $SCRATCH (on ecgate) or $TEMP (on hpce) to keep a<br />

local copy of these files (by default ecp will not overwrite a file if it exists;<br />

do not use the –o option in that case)<br />

•Create fewer larger files rather than many small files otherwise it can<br />

adversely affect performance of the whole system for everybody<br />

(overheads for EACH transfer).<br />

•Keep files locally THEN “compact”them using ecfsdir or cpio or tar and<br />

ONLY then store them into <strong>ECFS</strong>.<br />

•Use ectmp: if files do NOT need to be kept for long periods.<br />

•Delete files which you do not need in ec:. As there is no recursive<br />

delete, you may contact us if you need to delete entire trees.<br />

Slide 23<br />

•Never use <strong>ECFS</strong> commands in parallel jobs on hpce (in serial jobs only)<br />

Slide 23


<strong>ECFS</strong> within scripts (1/3)<br />

• <strong>ECFS</strong> commands within scripts (tcsh, csh, sh, ksh, ksh93, bash):<br />

- only C-shell and korn-shell supported<br />

- you should NOT use csh –f which will not define aliases.<br />

- When <strong>ECFS</strong> commands are not found, first check your environment:<br />

• if you are a ksh user:<br />

> echo $FPATH<br />

/usr/local/share/ecmwf/share/ecksh_functions<br />

• if you are a csh user:<br />

> which els<br />

els: aliased to (set noglob; $<strong>ECFS</strong>_SYS_PATH/els.p !{*};)<br />

• <strong>ECFS</strong> commands within cron(tab) tasks: cron tasks do not execute<br />

.profile, .cshrc, .login etc. Hence neither paths nor aliases would be setup.<br />

We advice you to use our time critical framework (See Dominique’s<br />

Slide 24<br />

presentation Friday morning) instead of cron tasks.<br />

Slide 24


<strong>ECFS</strong> within scripts (2/3)<br />

• Error <strong>Handling</strong>: the following techniques are suggested for trapping<br />

<strong>ECFS</strong> error codes when running batch scripts in the Korn shell<br />

environment<br />

- Set a trap for the entire script:<br />

#!/bin/ksh<br />

trap " echo <strong>ECFS</strong> call exited with RC= \$? " ERR<br />

ecp nofile ec:<br />

- or catch any errors on each call:<br />

#!/bin/ksh<br />

set +e<br />

ecp nofile ec:<br />

RC=$?<br />

set -e<br />

if [ $RC -gt 0 ]<br />

then<br />

Slide 25<br />

echo " <strong>ECFS</strong> call exited with RC= $RC"<br />

fi<br />

To avoid the script to stop after the first error<br />

Check the code returned by the <strong>ECFS</strong> command<br />

Slide 25


<strong>ECFS</strong> within scripts (3/3)<br />

•Check existence of a local copy before getting <strong>ECFS</strong> version of file:<br />

#!/bin/ksh<br />

if [ ! -r $SCRATCH/file2.out ]; then<br />

ecp ectmp:file2.out $SCRATCH/.<br />

fi<br />

•Loop over <strong>ECFS</strong> directories to change mode<br />

#!/bin/ksh<br />

<strong>ECFS</strong>dir=ec:/$USER/TESTDIR-1<br />

<strong>ECFS</strong>prefix=`dirname $<strong>ECFS</strong>dir`; dirs=`basename $<strong>ECFS</strong>dir`<br />

while [ -n "$dirs" ]; do To skip the four first lines<br />

newdirs=""<br />

for dir in $dirs; do<br />

for list in `els ${<strong>ECFS</strong>prefix}/$dir | tail -n +4 | tr -s ' ' ‘|'`; do<br />

name=`echo $list | cut -f 9 -d ‘|'`<br />

echmod 775 ${<strong>ECFS</strong>prefix}/${dir}/$name<br />

newdirs="$newdirs ${dir}/$name"<br />

done<br />

Slide 26<br />

done<br />

dirs=$newdirs<br />

done<br />

Slide 26<br />

Substitute space by |<br />

Keep the 9 th field which<br />

is the last one


Status & Future plans<br />

•An <strong>ECFS</strong> access control system is now in place to manage and<br />

control <strong>ECFS</strong> workload<br />

•Recursive commands (erm, echmod) will be provided. We enabled<br />

"soft delete" of files in preparation for the introduction later of<br />

recursive delete of entire trees.<br />

•The <strong>ECFS</strong> client part will be rewritten<br />

•To match the Supercomputer capacity the following aspects still<br />

have to be studied<br />

- The amount of daily transfers of data in/out from <strong>ECFS</strong><br />

(not retransferring the same file in/out helps us a lot)<br />

Slide 27<br />

- The growth in the number of physical tapes/disk cache<br />

Slide 27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!