13.08.2015 Views

PROGRAMMING GUIDE UNIX* SYSTEM

1DOKH4d

1DOKH4d

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

341-930Issue 1, June 1982Add Issue 1, January 1983<strong>PROGRAMMING</strong> <strong>GUIDE</strong><strong>UNIX*</strong> <strong>SYSTEM</strong>e 1982 Western Electric• Trademark of Bell Laboratories.


This document was prepared with specific references to use of the UNIX system on aparticular processor, the Western Electric 3B20S, which is not presently availableexcept for internal use within the Bell System. However, the information containedherein is generally applicable to use of the UNIX system on various processors whichare available in the general trade.DEC, MASSBUS, PDP, UNIBUS, and V AX are trademarks of Digital EquipmentCorporation.KODAK and EKTAMATIC are registered trademarks of Eastman Kodak Company.Mohrflow, Mohrdry, and Mohrchem are registered trademarks of Mohr Lino~SawCompany.TEKTRONIX is a registered trademark of Tektronix, Inc.TELETYPE is a trademark of Teletype Corporation.TRENDATA 4000A ~ is a registered trademark of Trendata Corporation.Versatec is a registered trademark of Versatec Corporation.DIABLO is a registered trademark of Xerox Corporation.


1/83ADD ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong><strong>PROGRAMMING</strong> <strong>GUIDE</strong>UNIX <strong>SYSTEM</strong>CONTENTSPAGE1. INTRODUCTION2. AN INTRODUCTION TO SHELLINTRODUCTIONSIMPLE COMMANDSA. Background CommandsB. Input/Output RedirectionC. ~ipelines and Filters . .D. File Name GenerationE. Quoting . . . . . .F. Prompting by the ShellG. The Shell and LoginH. SummarySHELL PROCEDURESA. Control Flow-"forllB. Control Flow-"case"C. Here DocumentsD. Shell Variables911111111121213141414lSlS16171819E.F.G.The "test" CommandControl Flow-"while llControl Flow-"if"212122Page)


<strong>PROGRAMMING</strong> <strong>GUIDE</strong>ISSUE 16/82CONTENTSH. Debugging Shell ProceduresI. The "man" CommandKEYWORD PARAMETERS. .A. Parameter TransmissionB. Parameter SubstitutionC. Command SubstitutionD. Evaluation and QuotingE. Error HandlingF. Fault HandlingG. Command ExecutionH. Invoking the ShellTHE SHELL TUTORIALINTRODUCTIONPAGE242S2626272728303133343737OVERVIEW OF THE UNIX <strong>SYSTEM</strong> ENVIRONMENTA. File System . . . .B. UNIX System ProcessesSHELL BASICSA. CommandsB. How the Shell Finds Commands .C. Generation of Argument ListsD. Shell Variables3737383939404041E.F.G.Quoting MechanismsRedirection of Input and OutputCommand Lines and Pipelines454547H. ExamplesI. Changing of the Shell and .profile State4748\Page 2


6/82 ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>CONTENTSUSING THE SHELL AS A COMMAND: SHELL PROCEDURESA. A Command's EnvironmentB. Invoking the ShellC. Passing Arguments to the Shell-"shift"D. Control CommandsE. Special Shell CommandsF. Creation and Organization of Shell ProceduresG. More about Execution FlagsMISCELLANEOUS SUPPORTING COMMANDS AND FEATURESA. Conditional Evaluation - "test"B. Reading a Line-"line"C. Simple Output-"echo"PAGE494949505159606161616262D.E.Expression Evaluation - "expr""true" and "false"6363F. Input/Output Redirection Using File DescriptorsG. Conditional SubstitutionH. Invocation FlagsEXAMPLES OF SHELL PROCEDURESEFFECTIVE AND EFFICIENT SHELL <strong>PROGRAMMING</strong>A. Overall ApproachB. Approximate Measures of Resource ConsumptionC. Efficient OrganizationREFERENCES3. THE C <strong>PROGRAMMING</strong> LANGUAGEINTRODUCTIONCLANGUAGE .636465651212121314151516Page 3


<strong>PROGRAMMING</strong> <strong>GUIDE</strong>ISSUE 1 6/82CONTENTSLEXICAL CONVENTIONSPAGE76A.B.C.D.CommentsIdentifiers (Names)KeywordsConstants76767676E. StringsF. Hardware CharacteristicsSYNTAX NOTATIONNAMESOBJECTS AND LVALUESCONVERSIONSA. Characters and IntegersB. Float and Double . .C. Floating and Integral777778787979808080D.E.Pointers and IntegersUnsigned8080F. Arithmetic ConversionsG. VoidEXPRESSIONSA. Primary ExpressionsB. Unary OperatorsC. Multiplicative OperatorsD. Additive OperatorsE. Shift OperatorsF. Relational OperatorsG. Equality Operators80818181828484858585Page 4


6/82 ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>CONTENTSH. Bitwise AND OperatorI. Bitwise Exclusive OR OperatorJ. Bitwise Inclusive OR OperatorK. Logical AND OperatorL. Logical OR OperatorM. Conditional OperatorN. Assignment OperatorsO. Comma OperatorDECLARATIONSA. Storage Class SpecifiersB. Type SpecifiersC. DeclaratorsD. Meaning of DeclaratorsE. Structure, Union, and Enumeration DeclarationsF. InitializationG. Type NamesH. TypedefSTATEMENTSA. Expression StatementB. Compound Statement or BlockC. Conditional StatementD. While StatementE. Do StatementF. For StatementG. Switch StatementH. Break StatementPAGE8586868686868787878888898991939595969696969797979798Page 5


<strong>PROGRAMMING</strong> <strong>GUIDE</strong>ISSUE 1 6/82CONTENTSPAGEI.J.Continue StatementReturn Statement9898K. Goto Statement99L.M.Labeled StatementNull Statement9999EXTERNAL DEFINITIONSA. External Function DefinitionsB. External Data DefinitionsSCOPE RULESA. Lexical ScopeB. Scope of ExternalsCOMPILER CONTROL LINESA. Token ReplacementB. File InclusionC. Conditional CompilationD. Line ControlIMPLICIT DECLARATIONSTYPES REVISITEDA. Structures and UnionsB. FunctionsC. Arrays, Pointers, and SubscriptingD. Explicit Pointer ConversionsCONSTANT EXPRESSIONSPORTABILITY CONSIDERATIONSANACHRONISMSSYNT AX SUMMARY9999. 100100100101· . . . 101· . 101· . . . 102102· . 103· . . . 103· . . . 103103104104105. 106· . . . 106· . 107· . . . 108Page 6


6/82ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>CONTENTSPAGEA. ExpressionsB. DeclarationsC. StatementsD. External DefinitionsE. PreprocessorLIBRARIESA. GeneralB. The C LibraryC. The Obiect File LibraryD. The Math LibraryTHE "cc" COMMANDA. GeneralB. UsageA C PROGRAM CHECKER-"Iint"A. GeneralB. Types of M~ssagesC. PortabilityA SYMBOLIC DEBUGGING PROGRAM-"sdb"A. GeneralB. UsageC. Source File Display and ManipulationD. A Controlled Environment for Program TestingE. Machine Language DebuggingF. Other Commands4. FORTRANINTRODUCTION108· . . 109· . . 111112. 112113· . . 113. . . . 114127129131131131· . . 132132. . . • 133· . . 139140140140. . 143. . 145147. 147149149Page 7


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 16/82CONTENTSFORTRAN 77A. GeneralB. Language ExtensionsC. Violations of the StandardD. Interprocedure InterfaceE. File FormatsA RATIONAL FORTRAN PREPROCESSOR-"ratfor"PAGE· . . . 149· . . . 149· . . . 150· . . . 153. 154. . . . . . 156. . 157A.B.C.GeneralUsageStatement Grouping. . 157· . . . 157· . . . 158D. The "if-else" ConstructionE. The "switch" Statement· . . . 159· . . . 160F.G.The "do" StatementThe "break" and "next" Statements160161H. The "while" StatementI. The "for" StatementJ. The "repeat-until" StatementK. The "return" StatementL. The "define" Statement· . . . 161· . . . 162163163· . . . 163M.N.O.The "include" StatementF .. ee-Form InputTranslations164. 164164P. Warnings165Page 8


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>1. INTRODUCTIONThis volume describes the three main programming languages supported on the UNIX operating system.These languages are:• Shell-The shell language is both a command language (which handles commands entered from aterminal) and a programming language (where commands are specified in a file and the file is executed).• C Language-A medium-level programming language which was used to write most of the UNIX operatingsystem. This volume describes the grammar and usage of C language, the libraries that provide additionalroutines, the cc(1) command, and two programs that are useful for checking/debugging Cprograms.• Fortran-Fortran 77 and a rational Fortran preprocessor (ratfor) are available. This volume describeshow Fortran 77 is implemented in terms of the variations from the American National Standard andthe interfaces to the UNIX operating system. The ratfor preprocessor provides a means by whichFortran 77 can be written in a fashion similar to C language. This preprocessor provides (among otherthings) simplified control-flow statements.Throughout this volume, each reference of the form name(1M), name(7), or name(8) refers to entries inthe UNIX System Administrator's Manual. All other references to entries of the form name(N), where "N"is a number (1 through 6) possibly followed by a letter, refer to entry name in section N of the UNIX SystemUser's Manual.Page 9


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82NOTESPage 10


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>2. AN INTRODUCTION TO SHELLINTRODUCTIONThe shell is a command programming language that provides an interface to the UNIX operating system.Its features include control-flow primitives, parameter passing, variables, and string substitution. Constructssuch as while, if then else, case, and for are available. Two-way communication is possible between the shelland commands. String-valued parameters, typically file names or flags, may be passed to a command. A returncode is set by commands that may be used to determine control-flow, and the standard output from a commandmay be used as shell input.The shell can modify the environment in which commands run. Input and output can be redirected to files,and processes that communicate through pipes can be invoked. Commands are found by searching directoriesin the file system in a sequence that can be defined by the user. Commands can be read either from the terminalor from a file which allows command procedures to be stored for later use.The shell is both a command language and a programming language that provides an interface to the UNIXoperating system. This volume describes, with examples, the UNIX operating system shell. The "SIMPLECOMMANDS" part of this section covers most of the everyday requirements of terminal users. Some familiaritywith the UNIX operating system is an advantage when reading this section; refer to section "BASICS FOR BE­GINNERS" in the UNIX System User's Guide. The "SHELL PROCEDURES" part of this section describesthose features of the shell primarily intended for use within shell commands or procedures. These include thecontrol-flow primitives and string-valued variables provided by the shell. A knowledge of a programming languagewould also be helpful when reading this section. The last part, "KEYWORD PARAMETERS", describesthe more advanced features of the shell. See Table 2.A for a defined listing of grammar words used in this section.Throughout this section, each reference of the form name(lM), name(7), or name(8) refers to entries inthe UNIX System Administrator's Manual. All other references to entries of the form name(N), where "N"is a number (1 through 6) possibly followed by a letter, refer to entry name in section N of the UNIX SystemUser's Manual.SIMPLE COMMANDSSimple commands consist of one or more words separated by blanks. The first word is the name of the commandto be executed; any remaining words are passed as arguments to the command. For example,whois a command that prints the names of users logged in. The commandIs -1prints a list of files in the current directory. The argument -1 tells Is(l) to print status information, size, andthe creation date for each file.A. Background CommandsTo execute a command, the shell normally creates a new process and waits for it to finish. A command maybe run without waiting for it to finish. For example,cc pgm.c &calls the C compiler to compile the file pgm.c. The trailing "&" is an operator that instructs the shell not towait for the command to finish. To help keep track of such a process, the shell reports its process number followingits creation. A list of currently active processes may be obtained using the ps(l) command.Page 11


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82B. Input/Output RedirectionMost commands produce output to the standard output that is initially connected to the terminal. This outputmay be directed to a file by using the notation ">", for example:Is -1 >fileThe notation >file is interpreted by the shell and is not passed as an argument to Is(1). If file does not exist,the shell creates it; otherwise, the original contents of file are replaced with the output from Is(1). Output maybe appended to a file using the notation "> >" as follows:Is -1 > > fileIn this case, file is also created if it does not already exist.The standard input of a command may be taken from a file instead of the terminal by using the notation"


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>A pipeline may consist of more than two commands, for example,Is I grep old I wc -Iprints only the number of file names in the current directory containing the string "old".D. File Name GenerationMany commands accept arguments which are file names. For example,Is -I main.cprints only information relating to the file main.c. The "Is -I" command alone prints the same informationabout all files in the current directory.The shell provides a mechanism for generating a list of file names that match a pattern. For example,Is -I *.cgenerates as arguments to Is(l) all file names in the current directory that end in .c. The character "*,, is apattern that will match any string including the null string. In general, patterns are specified as follows:*?[ ... ]Matches any string of characters including the null string.Matches any single character.Matches anyone of the characters enclosed. A pair of characters separated by a minus willmatch any character lexically between the pair.For example,[a-z]*matches all names in the current directory beginning with one of the letters a through z. The inputlusr/fred/test/?matches all names in the directory /usr/fred/test that consist of a single character. If no file name is found'that matches the pattern then the pattern is passed, unchanged, as an argument.This mechanism is useful both to save typing and to select names according to some pattern. It may alsobe used to find files. For example,echo lusr/fred/*/corefinds and prints the names of all core files in subdirectories of /usr/fred. [The echo(l) command is a standardUNIX operating system command that prints its arguments, separated by blanks.] This last feature can be expensiverequiring a scan of all subdirectories of /usr/fred. .There is one exception to the general rules given for patterns. The character "." at the start of a file namemust be explicitly matched. The inputecho *will therefore echo all file names in the current directory not beginning with ".". The inputecho .*Page 13


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82will echo all those file names that begin with ".". This avoids inadvertent matching of the names "." and " .. "which mean "the current directory" and "the parent directory", respectively. [Notice that Is(l) suppresses informationfor the files "." and " .. ".]E. QuotingCharacters that have a special meaning to the shell, such as< > *? &are called metacharacters. A complete list of metacharacters is given in Table 2.B. Any character preceded bya \ is quoted and loses its special meaning, if any. The \ is elided so thatecho \?will echo a single ?, andecho \ \will echo a single \. To allow long strings to be continued over more than one line, the sequence \new-line (orRETURN) is ignored. The \ is convenient for quoting single characters. When more than one character needsquoting, the above mechanism is clumsy and error prone. A string of characters may be quoted by enclosingthe string between single quotes. For example,will echoecho xx'****'xxxx****xxThe quoted string may not contain a single quote but may contain new-lines which are preserved. This quotingmechanism is the most simple and is recommended for casual use. A third quoting mechanism using doublequotes is also available and prevents interpretation of some but not all metacharacters. Details of quoting aredescribed under "D. Evaluation and Quoting" in part "KEYWORD PARAMETERS".F. Prompting by the ShellWhen the shell is used from a terminal, it will issue a prompt to the terminal user indicating it is readyto read a command from the terminal. By default, this prompt is "$". The prompt may be changed by entering,PSI =newpromptwhich sets the prompt to be the string" newprompt" . If a new-line is typed and further input is needed, theshell will issue the prompt "> ". Sometimes this can be caused by mistyping a quote mark. If it is unexpected,then an interrupt (DEL) will return the shell to read another command. The other prompt (> ) may be changed(for example) by entering:PS2=moreG. The Shell and LoginFollowing the user's login(I), the shell is called to read and execute commands typed at the terminal. Ifthe user's login directory contains the file .profile, then it is assumed to contain commands and is read immediatelyby the shell before reading any commands from the terminal.Page 14


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>H. SummarySHELL PROCEDURESIsPrints the names of files in the current directory.Is >filePuts the output from Is into file.Is I we -IPrints the number of files in the current directory.Is I grep oldPrints those file names containing the string "old".Is I grep old I we -IPrints the number of files whose name contains the string "old".ee pgm.e &Runs ee in the background.The shell may be used to read and execute commands contained in a file. For example, the following callsh file [ args ... ]calls the shell to read commands from file. Such a file call is called a "command procedure" or "shell procedure".Arguments may be supplied with the call and are referred to in file using the positional parameters $1,$2, .... For example, if the file wg containsthen the 'callis equivalent towho I grep $1sh wg fredwho I grep fredAll UNIX operating system files have three independent attributes (often called "permissions"), read,write, and execute (rwx). The UNIX operating system command ehmod(1) may be used to make a file executable.For example,chmod +x wgwill ensure that the file wg has execute status (permission). Following this, the commandwg fredis equivalent to the callsh wg fredThis allows shell procedures and programs to be used interchangeably. In either case, a new process is createdto execute the command.Page 15


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82As well as providing names for the positional parameters, the number of positional parameters in the callis available as $#. The name of the file being executed is available as $0.A special shell parameter $* is used to substitute for all positional parameters except $0. A typical useof this is to provide some default arguments, as in,nroff -T450 -cm $*which simply prepends some arguments to those already given.A. Control Flow-"for"A frequent use of shell procedures is to loop through the arguments ($1, $2, ... ) executing commands oncefor each argument. An example of such a procedure is tel that searches the file lusrllibltelnos that containslines of the formfred mhOI23bert mh0789The text of tel isfor idodonegrep $i lusr/lib/telnosThe commandtel fredprints those lines in lusrlJibltelnos that contain the string "fred".The commandtel fred bertprints those lines containing "fred" followed by those for "bert".The for loop notation is recognized by the shell and has the general formfor name in wl w2docommand-listdoneA command-list is a sequence of one or more simple commands separated or terminated by anew-line or asemicolon. Furthermore, reserved words like do and done are only recognized following a new-line or semicolon.A name is a shell variable that is set to the words wl w2 ... in turn each time the command-list followingdo is executed. If "in wI w2 ... " is omitted, then the loop is executed once for each positional parameter; thatis, in $* is assumed.Page 16Another example of the use of the for loop is the create command whose text isfor i do > $i; done


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>The commandcreate alpha betaensures that two empty files alpha and beta exist and are empty. The notation >file may be used on its ownto create or clear the contents of a file. Notice also that a semicolon (or new-line) is required before done.B. Control Flow-"case"A multiple way (choice) branch is provided for by the case notation. For example,case $# in1) cat> >$1 ;;2) cat > > $2 < $1 ;;*) echo 'usage: append [ from] to' ;;esacis an append command. (Note the use of semicolons to delimit the cases.) When called with one argument asInappend file$# is the string "I". The standard input is appended (copied) onto the end of file using the cat(l) command,andappend filel file2appends the contents of filel onto file2. If the number of arguments supplied to append is other than 1 or 2,then a message is printed indicating proper usage.The general form of the case command iscase word inpattern) command-list ;;esacThe shell attempts to match word with each pattern in the order in which the patterns appear. If a match isfound, the associated command-list is executed and execution of the case is complete. Since * is the patternthat matches any string, it can be used for the default case.Caution:No check is made to ensure that only one pattern matches the case argument.The first match found defines the set of commands to be executed. In the example below, the commands followingthe second "*,, will never be executed since the first "*,, executes everything it receives.case $# in*) ... ;;*) ... ;;esacPage 17


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82Another example of the use of the case construction is to distinguish between different forms of an argument.The following example is a fragment of a cc(l) command.for idodonecase $i in-[ocs]-*)*.c)*)esac..... "echo 'unknown flag $i' ;;/lib/ cO $i ... ;;echo 'unexpected argument $i' ;;To allow the same commands to be associated with more than one pattern, the case command provides foralternative patterns separated by a I. For example,is equivalent tocase $i in-x I-y)esaccase $i in-[xy])esacThe usual quoting conventions apply so thatcase $i in\?)will match the character ?C. Here DocumentsThe shell procedure tel described in subpart "A. Control Flow-for" uses the file lusrlJibltelnos to supplythe data for grep(l). An alternative is to include this data within the shell procedure as a here document, asin,for idogrep $i «!fred mh0123bert mh0789!doneIn this example, the shell takes the lines between <


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>Parameters are substituted in the document before it is made available to grep(l) as illustrated by the followingprocedure called edg.The called $3 «%g/$1/s/ /$2/gw%edg string1 string2 fileis then equivalent to the commanded file «%g/ string1/ s/ / string2/ gw%and changes all occurrences of "stringl" in file to "string2". Substitution can be prevented using \ to quote thespecial character $ as ined $3 «+1,\ $s/$1/$2/gw+[This version of edg is equivalent to the first except that ed(1) will print a ? if there are no occurrences of thestring $1.] Substitution within a here document may be prevented entirely by quoting the terminating string,for example,grep $i «\##The document is presented without modification to grep. If parameter substitution is not required in a heredocument, this latter form is more efficient.D. Shell VariablesThe shell provides string-valued variables. Variable names begin with a letter and consist of letters, digits,and underscores. Variables may be given values by writinguser=fred box=mOOO acct=mhOOOOwhich assigns values to the variables user, box, and acct. A variable may be set to the null string by enteringnull=The value of a variable is substituted by preceding its name with $; for example,will echo fred.echo $userVariables may be used interactively to provide abbreviations for frequently used strings. For example,b= / usr / fred/binmv file $bPage 19


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82will move the filefrom the current directory to the directory /usr/fred/bin. A more general notation is availablefor parameter (or variable) substitution, as in,echo ${user}which is equivalent toecho $userand is used when the parameter name is followed by a letter or digit. For example,tmp=/tmp/psps a >${tmp}awill direct the output of ps(l) to the file /tmp/psa, whereas,ps a >$tmpawould cause the value of the variable tmpa to be substituted.Except for $?, the following are set initially by the shell. The $? is set after executing each command.$?$#$$The exit status (return code) of the last command executed as a decimal string. Most commandsreturn a zero exit status if they complete successfully; otherwise, a nonzero exit statusis returned. Testing the value of return codes is dealt with later under if and whilecommands.The number of positional parameters (in decimal). The $# is used, for example, in the appendcommand to check the number of parameters.The process number of this shell (in decimal). Since process numbers are unique amongall existing processes, this string is frequently used to generate unique temporary filenames. For example,ps a >/tmp/ps$$rm /tmp/ps$$$!$-The process number of the last process run in the background (in decimal).The current shell flags, such as -x and -v.Some variables have a special meaning to the shell and should be avoided for general use.$ MAIL$ HOMEWhen used interactively, the shell looks at the file specified by this variable before it issuesa prompt. If the specified file has been modified since it was last looked at, the shellprints the message "you have mail" before prompting for the next command. This variableis typically set in the file .profile in the user's login directory. For example:MAIL= /usr / mail/fredThe default argument for the cd(l) command. The current directory is used to resolve filename references that do not begin with a / and is changed using the cd command. For example,cd /usr/fred/binPage 20


6/82ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>makes the current directory /usr/fred/bin. Thencat wnwill print on the terminal the file wn in this directory. The command cd(l) with no argumentis equivalent tocd $HOMEThis variable is also typically set in the user's login profile.$PATHA list of directories containing commands (the search path). Each time a command is executedby the shell, a list of directories is searched for an executable file. If $PA TH is notset, the current directory, /bin, and /usr/bin are searched by default. Otherwise, $PATHconsists of directory names separated by a colon (:). For example,PATH =:1 usr I fred/bin:/bin:1 usr Ibinspecifies that the current directory (the null string before the first :), /usr/fred/bin, /bin,and /usr/bin are to be searched in that order. In this way, individual users can have theirown "private" commands that are accessible independently of the current directory. If thecommand name contains a /, this directory search is not used; a single attempt is made toexecute the command.$PSl$PS2$IFSThe primary shell prompt string, by default, "$ ".The shell prompt when further input is needed, by default, "> ".The set of characters used by blank interpretation (See "D. Evaluation and Quoting" inpart "KEYWORD PARAMETERS".).E. The "test" CommandThe test command is intended for use by shell programs. For example,test -f filereturns zero exit status if file exists and nonzero exit status otherwise. In general, test evaluates a predicateand returns the result as its exit status. Some of the more frequently used test arguments are given below [seetest(l) for a complete specification].test stest -f filetest -r filetest -w filetest -d filetrue if the argument s isnot the null stringtrue if file existstrue if file is readabletrue if file is writabletrue if file is a directoryF. Control Flow-"while"The actions of the for loop and the case branch are determined by data available to the shell. A whileor until loop and an if then else branch are also provided whose actions are determined by the exit status returnedby commands. A while loop has the general formwhile command-listldocommand-list2donePage 21


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82The value tested by the while command is the exit status of the last simple command following while. Eachtime around the loop, command-listl is executed; if a zero exit status is returned, then command-list2is executed;otherwise, the loop terminates. For example,while test $1dodoneshiftis equivalent tofor idodoneThe shift command is a shell command that renames the positional parameters $2, $3, ... as $1, $2, ... andloses $1.Another kind of use for the while/until loop is to wait until some external event occurs and then run somecommands. In an until loop, the termination condition is reversed. For example,until test -f filedosleep 300donecommandswill loop until file exists. Each time around the loop, it waits for 5 minutes (300 seconds) before trying again.(Presumably, another process will eventually create the file.)G. Control Flow-"if"Also available is a general conditional branch of the form,if command-listthencommand-listelsecommand-listfithat tests the value returned by the last simple command following if.InThe if command may be used in conj unction with the test command to test for the existence of a file asif test -f filethenprocess fileelsedo something elsefiPage 22


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>An example of the use of if, case, and for constructions is given in "I. The Man Command" in part "SHELLPROCEDURES".A multiple test if command of the formif ...thenelseif ...thenfielsefiif ...fimay be written using an extension of the if notation as,if ...thenelif ...thenelif ...fiThe touch command changes the "last modified" time for a list of files. The command may be used in conjunctionwith make(l) to force recompilation of a list of files. The following example is the touch command:flag=for idodonecase $i in-c)*)esacflag=N ;;if test -f $ithenIn $i junk$$rm junk$$elif test $flagthenecho file \'$i\' does not existelse>$ifi ;;Page 23


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82The -c flag is used in this command to force subsequent files to be created if they do not already exist. Otherwise,if the file does not exist, an error message is printed. The shell variable flagis set to some non-null stringif the -c argument is encountered. The commandsIn ... ; rm ...make a link to the file and then remove it.The sequencemay be writtenConversely,if command!then command2ficommand! && command2command! II command2executes command2 only if command! fails. In each case, the value returned is that of the last simple commandexecuted.Command GroupingandCommands may be grouped in two ways,{ command-list; }( command-list)The first form, command-list, is simply executed. The second form executes command-list as a separate process.For example,( cd x; rm junk )executes rm junk in the directory x without changing the current directory of the invoking shell.The commandscd x; rm junkhave the same effect but leave the invoking shell in the directory x.H. Debugging Shell ProceduresThe shell provides two tracing mechanisms to help when debugging shell procedures. The first is invokedwi thin the procedure asset -vPage 24


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>(v for verbose) and causes linesof the procedure to be printed as they are read. It is useful to help isolate syntaxerrors. It may be invoked without modifying the procedure by enteringsh -v proc ...where proc is the name of the shell procedure. This flag may be used in conjunction with the -n flag whichprevents execution of subsequent commands. (Note that typing "set -n" at a terminal will render the terminaluseless until an end-of-file is typed.)The commandset -xwill produce an execution trace with flag -x. Following parameter substitution, each command is printed asit is executed. (Try the above at the terminal to see what effect they have.) Both flags may be turned off by typingset -and the current setting of the shell flags is available as $-.I. The "man" CommandThe following is the man command which is used to print sections of the UNIX System User's Manual. Itis called by enteringman shman -t edman 2 forkIn the first call, the manual section for sh is printed. Since no section is specified, Section 1 is used. The secondcall will typeset (-t option) the manual section for ed. The last call prints the fork manual page from Section2 of the manual.A version of the man command follows:cd lusr/man: 'colon is the comment command': 'default is nroff ($N), section 1 ($s)'N=n s=1for idocase $i in[1-9]*) s=$i ;;-t) N=t;;-n) N=n;;-*) echo unknown flag \'$i\' ;;*) if test -f man$s/$i.$sthen${N}roff manOI${N}aa man$s/$i.$selse: 'look through all manual sections'Page 25


<strong>PROGRAMMING</strong> <strong>GUIDE</strong>ISSUE 16/82doneesacfi ;;found=nofor j in 1 2 3 4 5 6 7 8 9doif test -f man$j/$i.$jthen man $j $ifound=yesfidonecase $found inno) echo '$i: manual page not found'esacKEYWORD PARAMETERSShell variables may be given values by assignment or when a shell procedure is invoked. An argument toa shell procedure of the form name=value tha t precedes the command name causes value to be assigned to namebefore execution of the procedure begins. The value of name in the invoking shell is not affected. For example,user=fred commandwill execute command with user set to fred. The -k flag causes arguments of the form name=valueto be interpretedin this way anywhere in the argument list. Such names are sometimes called keyword parameters. Ifany arguments remain, they are available as positional parameters $1, $2, ....The set command may also be used to set positional parameters from within a procedure. For example,set - *will set $1 to the first file name in the current directory, $2 to the next, etc. Note that the first argument, -,ensures correct treatment when the first file name begins with a -.A. Parameter TransmissionWhen a shell procedure is invoked, both positional and keyword parameters may be supplied with the call.Keyword parameters are also made available implicitly to a shell procedure by specifying in advance that suchparameters are to be exported. For example,export user boxmarks the variables user and box for export. When a shell procedure is invoked, copies are made of all exportablevariables for use within the invoked procedure. Modification of such variables within the procedure doesnot affect the values in the invoking shell. It is generally true of a shell procedure that it may not modify thestate of its caller without explicit request on the part of the caller. (Shared file descriptors are an exceptionto this rule.)Names whose value is intended to remain constant may be declared readonly. The form of this commandis the same as that of the export command,readonly name ...Subsequent attempts to set readonly variables are illegal.Page 26


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>B. Parameter SubstitutionIf a shell parameter is not set, then the null string is substituted for it. For example, if the variable disnot set,orecho $decho ${d}will echo nothing. A default string may be given as inecho ${d-.}which will echo the value of the variable d if it is set and H." otherwise. The default string is evaluated usingthe usual quoting conventions so thatecho ${d- '*'}will echo * if the variable d is not set. Similarly,echo ${d-$l}will echo the value of d if it is set and the value (if any) of $1 otherwise. A variable may be assigned a defaultvalue using the notationecho ${d=.}which substitutes the same string asecho ${d-.}and if d were not previously set, it will be set to the string H.". (The notation ${ ... = ... } is not available for positionalparameters.)If there is no sensible default, the notationecho ${d?message}will echo the value of the variable d if it has one; otherwise, message is printed by the shell and execution ofthe shell procedure is abandoned. If message is absent, a standard message is printed. A shell procedure thatrequires some parameters to be set might start as follows:: ${user?} ${acct?} ${bin?}Colon (:) is a command built in to the shell and does nothing once its arguments have been evaluated. If anyof the variables user, acct, or bin are not set, the shell will abandon execution of the procedure.C. Command SubstitutionThe standard output from a command can be substituted in a similar way to parameters. The commandpwd(l) prints on its standard output the name of the current directory. For example, if the current directoryis /usr/fred/bin, the commandd='pwd'Page 27


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82is equivalent tod=/usr/fred/binThe entire string between (' .... ) is taken as the command to be executed and is replaced with the output fromthe command. The command is written using the usual quoting conventions except that a 'must be escaped usinga \. For example,is equivalent toIs 'echo" $1" ,Is $1Command substitution occurs in all contexts where parameter substitution occurs (including here documents),and the treatment of the resulting text is the same in both cases. This mechanism allows string processing commandsto be used within shell procedures. An example of such a command is basename which removes a specifiedsuffix from a string. For example,basename main.c .cwill print the string "main". Its use is illustrated by the following fragment from a cc(1) command.case $A in*.c)B='basename $A .c'esacthat sets B to the part of $A with the suffix .c stripped.Here are some composite examples.• for i in 'Is -t'; do ...The variable i is setto the names of files in time order,most recent first .• set 'date'; echo $6 $2 $3, $4will print, e.g.,1977 Nov 1, 23:59:59D. Evaluation and QuotingThe shell is a macro processor that provides parameter substitution, command substitution, and file namegeneration for the arguments to commands. This section discusses the order in which these evaluations occurand the effects of the various quoting mechanisms.Commands are parsed initially according to the grammar given in Table 2.A. Before a command is executed,the following substitutions occur:1. parameter substitution, e.g., $userPage 28


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>2. command substitution, e.g., 'pwd'Only one evaluation occurs so that if, for example, the value of the variable X is the string "$y" thenecho $Xwill echo "$y".3. blank interpretationFollowing the- above substitutions, the resulting characters are broken into nonblank words (blankinterpretation). For this purpose, "blanks" are the characters of the string" $IFS': By default, this stringconsists of blank, tab, and new-line. The null string is not regarded as a word unless it is quoted. For example,echo' ,will pass on the null string as the first argument to echo, whereasecho $nullwill call echo with no arguments if the variable null is not set or set to the null string.4. file name generationEach word is then scanned for the file pattern characters *, ?, and [ ... ]; and an alphabetical list of filenames is generated to replace the word. Each such file name is a separate argument.The evaluations just described also occur in the list of words associated with a for loop. Only substitutionoccurs in the word used for a case branch.As well as the quoting mechanisms described earlier using \ and ' ... ', a third quoting mechanism is providedusing double quotes. Within double quotes, parameter and command substitution occurs; but file name generationand the interpretation of blanks does not. The following characters have a special meaning within doublequotes and may be quoted using \.For example,$ parameter substitutioncommand substitution" ends the quoted string\ quotes the special characters $ '" \echo" $x"will pass the value of the variable x as a single argument to echo. Similarly,echo" $*"will pass the positional parameters as a single argument and is equivalent toecho" $1 $2 ... "Page 29


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82The notation $@ is the same as $* except when it is quoted. Inputtingecho" $@"will pass the positional parameters, unevaluated, to echo and is 'equivalent toecho" $1" "$2" ...The following illustration gives, for each quoting mechanism, the shell metacharacters that are evaluated.metacharacter\ $ * "n n n n n ty n n t n n" y y n y t ntynterminatorinterpretednot interpretedIn cases where more than one evaluation of a string is required, the built-in command eval may be used.For example, if the variable X has the value "$y" and if y has the value "pqr", theneval echo $Xwill echo the string "pqr".In general, the eval command evaluates its arguments (as do all commands) and treats the result as inputto the shell. The input is read and the resulting command(s) executed. For example,is equivalent towg='eval who I grep'$wg fredwho I grep fredIn this example, eval is required since there is no interpretation of metacharacters, such as I, following substitution.E. Error HandlingThe treatment of errors detected by the shell depends on the type of error and on whether the shell is beingused interactively. An interactive shell is one whose input and output are connected to a terminal [as determinedby gttY(2)]. A shell invoked with the -i flag is also interactive.Execution of a command (see also "G. Command Execution") may fail for any of the following reasons:• Input/output redirection may fail, e.g., if a file does not exist or cannot be created.Page 30


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>• The command itself does not exist or cannot be executed.• The command terminates abnormally, e.g., with a "bus error" or "memory fault" signal.• The command terminates normally but returns a nonzero exit status.In all of these cases, the shell will go on to execute the next command. Except for the last case, an errormessage will be printed by the shell. All remaining errors cause the shell to exit from a command procedure.An interactive shell will return to read another command from the terminal. Such errors include the following:• Syntax errors, e.g., if ... then ... done• A signal such as interrupt. The shell waits for the current command, if any, to finish execution andthen either exits or returns to the terminal.• Failure of any of the built-in commands such as cd(l).The shell flag -e causes the shell to terminate if any error is detected. The following is a list of the UNIXoperating system signals:1 hangup2 interrupt3* quit4 * illegal instruction5* trace trap6* lOT instruction7* EMT instruction8* floating point exception9 kill (cannot be caught or ignored)10* bus error11 * segmentation violation12* bad argument to system call13 write on a pipe with no one to read it14 alarm clock15 software termination [from kill(l)]The UNIX operating system signals marked with an asterisk "*,, as shown in the list produce a core dumpif not caught. However, the shell itself ignores quit which is the only external signal that can cause a dump.The signals in this list of potential interest to shell programs are 1, 2, 3, 14, and 15.F. Fault HandlingShell procedures normally terminate when an interrupt is received from the terminal. The trap commandis used if some cleaning up is required, such as removing temporary files. For example,trap 'rm /tmp/ps$$; exit' 2sets a trap for signal 2 (terminal interrupt); and if this signal is received, it will execute the following commands:rm /tmp/ps$$; exitThe exit is another built-in command that terminates execution of a shell procedure. The exit is required; otherwise,after the trap has been taken, the shell will resume executing the procedure at the place where it wasin terru pted.Page 31


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82UNIX operating system signals can be handled in one of three ways.1. They can be ignored, in which case the signal is never sent to the process.2. They can be caught, in which case the process must decide what action to take when the signal is received.3. They can be left to cause termination of the process without it having to take any further action.If a signal is being ignored on entry to the shell procedure, for example, by invoking it in the background (see"G. Command Execution"), trap commands (and the signal) are ignored.The use of trap is illustrated by this modified version of the touch command illustrated below:flag=trap 'rm -f junk$$; exit' 1 2 3 15for idocase $i in-c) flag=N ;;*) if test -f $ithenIn $i junk$$; rm junk$$elif test $flagthenecho file \'$i\' does not existelse>$ifi ;;esacdoneThe cleanup action is to remove the file junk$$. The trap command appears before the creation of the temporaryfile; otherwise, it would be possible for the process to die without removing the file.Since there is no signal 0 in the UNIX operating system, it is used by the shell to indicate the commandsto be executed on exit from the shell procedure.A procedure may, itself, elect to ignore signals by specifying the null string as the argument to trap. Thefollowing:trap' , 1 2 3 15is a fragment taken from the nohup(l) command which causes the UNIX operating system HANG UP, INTER­RUPT, QUIT, and SOFTWARE TERMINATION signals to be ignored both by the procedure and by invokedcommands.Traps may be reset by enteringtrap 2 3which resets the traps for signals 2 and 3 to their default values. A list of the current values of traps may beobtained by writingtrapPage 32


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>The scan procedure is an example of the use of trap where there is no exit in the trap command. The scantakes each directory in the current directory, prompts with its name, and then executes commands typed at theterminal until an end of file or an interrupt is received. Interrupts are ignored while executing the requestedcommands but cause termination when scan is waiting for input. The scan procedure follows:d='pwd'for i in *doif test -d $d/$ithencd $d/$iwhile echo" $i:" && trap exit 2 && read xdonefidodonetrap: 2eval $xThe read x is a built-in command that reads one line from the standard input and places the result in thevariable x. It returns a nonzero exit status if either an end-of-file is read or an interrupt is received.G. Command ExecutionTo run a command (other than a built-in), the shell first creates a new process using the system callfork(2). The execution environment for the command includes input, output, and the states of signals and isestablished in the child process before the command is executed. The built-in command exec is used in rarecases when no fork is required and simply replaces the shell with a new command. For example, a simple versionof the nohup command looks liketrap , , 1 2 3 15exec $*The trap turns off the signals specified so that they are ignored by subsequently created commands, and execreplaces the shell by the command specified.Most forms of input/output redirection have already been described. In the following, word is only subjectto parameter and command substitution. No file name generation or blank interpretation takes place so that,for example,echo ... >*.cwill write its output into a file whose name is *.c. Input/output specifications are evaluated left to right as theyappear in the command. Some input/output specifications are as follows:> word» word< word« wordThe standard output (file descriptor 1) is sent to the file word which is created if it doesnot already exist.The standard output is sent to file word. If the file exists, then output is appended (by seekingto the end); otherwise, the file is created.The standard input (file descriptor 0) is taken from the file word.The standard input is taken from the lines of shell input that follow up to but not includinga line consisting only of word. If word is quoted, no interpretation of the document occurs.If word is not quoted, parameter and command substitution occur and \ is used toPage 33


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82quote the characters \, $, " and the first character of word. In the latter case, \new-lineis ignored (e.g., quoted strings).>& digit&1runs a command with its standard output and message output merged. (Strictly speaking, file descriptor 2 iscreated by duplicating file descriptor 1; but the effect is usually to merge the two streams.)The environment for a command run in the background such aslist *.c Ilpr &is modified in two ways. First, the default standard input for such a command is the empty file Idevlnull. Thisprevents two processes (the shell and the command), which are running in parallel, from trying to read thesame input. Chaos would ensue if this were not the case. For example,ed file &would allow both the editor and the shell to read from the same input at the same time.The other modification to the environment of a background command is to turn off the QUIT and INTER­RUPT signals so that they are ignored by the command. This allows these signals to be used at the terminalwithout causing background commands to terminate. For this reason, the UNIX operating system conventionfor a signal is that if it is set to 1 (ignored) then it is never changed even for a short time. Note that the shellcommand trap has no effect for an ignored signal.H. Invoking the ShellThe following flags are interpreted by the shell when it is invoked. If the first character of argument zerois a minus, commands are read from the file .profile.-c string-s-iIf the -c flag is present, then commands are read from string.If the -s flag is present or if no arguments remain, commands are read from the standardinput. Shell output is written to file descriptor 2.If the -i flag is present or if the shell input and output are attached to a terminal [as toldby getty(8)], this shell is interactive. In this case, TERMINATE is ignored (so that killo does not kill an interactive shell, and INTE.RRUPT is caught and ignored (so that waitis interruptible). In all cases, QUIT is ignored by the shell.Page 34


6/82ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>item:simple-command:command:pipeline:andor:command-list:input-output:TABLE 2.AGRAMMARwordinput-outputname = valueitemsimple-command itemsim pIe-command( command-list){ command-list 1for name do command-list donefor name in word ... do command-list donewhile command-list do command-list doneuntil command-list do command-list donecase word in case-part ... esacif command-list then command-list else-part ficommandpipeline I commandpipelineandor && pipelineandor II pipelineandorcommand-list;command-list &command-list; andorcommand-list & andor> file< file> > word« wordfile:case-part:pattern:else-part:empty:word:name:digit:word& digit&-pattern) command-list;;wordpattern I wordelif command-list then command-list else-partelse command-listemptya sequence of non blank charactersa sequence of letters, digits, or underscores starting with a letter0123456789Page 3S


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 16/82TABLE 2.BMETACHARACTERS AND RESERVED WORDS(a)syntactic:&&IIII.."&( )»pipe symbol'andf' symbol'orf' symbolcommand separatorcase delimiterbackground commandscommand groupinginput redirectioninput from a here documentoutput creationoutput append(b)patterns:*?[ ... Jmatch any character(s) including nonematch any single charactermatch any of the enclosed characters(c)substitution:${ ... l, ,substitute shell variablesubstitute command output(d)quoting:\, ," "quote the next characterquote the enclosed characters except for'quote the enclosed characters except for the $, ' ,\, and "(e)reserved words:if then else elif ficase in esacfor while until do done{ } [ J testPage 36


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>THE SHELL TUTORIALINTRODUCTIONIn any programming proj ect, some effort is used to build the end product. The remainder is consumed inbuilding the supporting tools and procedures used to manage and maintain that end product. The second effortcan far exceed the first, especially in larger projects. A good command language can be an invaluable tool insuch situations. If it is a flexible programming language, it can be used to solve many internal support problemswithout requiring compilable programs to be written, debugged, and maintained. The most important advantageof a good command language is the ability to get the job done now. For a perspective on the motivationsfor using a command language in this way, see [1,2,3,4]. Throughout this section references of the type [1through 10] refer to documents listed in part "REFERENCES".When users log into a UNIX system, they communicate with an instance of the shell that reads commandstyped at the terminal and arranges for the execution of the commands entered. Thus, the shell's most importantfunction is to provide a good interface for human beings. In addition, a sequence of commands may be preservedfor repeated use by saving it in a file, called a shell procedure, command file, or run com according to local preference.Some UNIX system users need little knowledge of the shell to do their work while others make heavy useof its programming features. This section may be read in several different ways, depending on the reader's interests.A brief discussion of the UNIX system environment is found in part "OVERVIEW OF THE UNIX <strong>SYSTEM</strong>ENVIRONMENT". The discussion in part "SHELL BASICS" covers aspects of the shell that are important foreveryone, while all of part "USING THE SHELL AS A COMMAND: SHELL PROCEDURES" and most of part"MISCELLANEOUS SUPPORTING COMMANDS AND FEATURES" are mainly of interest to those who writeshell procedures. A group of annotated shell procedure examples is given in part "EXAMPLES OF SHELLPROCEDURES". Finally, a brief discussion of efficiency is offered in part "EFFECTIVE AND EFFICIENTSHELL <strong>PROGRAMMING</strong>". The discussion on efficiency is found in its proper place (at the end) and is intendedfor those who write especially time-consuming shell procedures.Complete beginners should not be reading this section, but should work their way through other availabletutorials first. See [10] for an appropriate plan of study.Throughout this section, each reference of the form name(IM), name(7), or name(8) refers to entries inthe UNIX System Administrator's Manual. All other references to entries of the form name(N), where "N"is a number (1 through 6) possibly followed by a letter, refer to entry name in section N of the UNIX SystemUser's Manual.OVERVIEW OF THE UNIX <strong>SYSTEM</strong> ENVIRONMENTFull understanding of what follows depends on familiarity with the UNIX system; [9] is useful for that, andit would be helpful to read [5] and at least one of [6,7]. For completeness, a short overview of the most relevantconcepts are given below.A. File SystemThe UNIX system file system's overall structure is that of a rooted tree composed of directories and otherfiles. A simple file name is a sequence of characters other than a slash (I). A pathname is a sequence of directorynames followed by a simple file name, each separated from the previous one by a I. If a pathname begins witha I, the search for the file begins at the root of the entire tree; otherwise, it begins at the user's current directory(also known as the working directory). The first kind of name is often called a full (or absolute) pathname becauseit is invariant with regard to the user's current directory. The latter is often called a relative pathname,because it specifies a path relative to the current directory. The user may change the current directory at anyPage 37


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82time by using the cd(l) command. In most cases, a file name and its corresponding pathname may be used interchangeably.Some sample names are:IIbinla11tfljtblbinbinlxmemoxabsolute pathname of the root directory of the entire file structure.directory containing most of the frequently used public commands.a full (or absolute) pathname typical of multiperson programming projects. This one happensto be a private directory of commands belonging to person jtb in project tf; a1 is thename of a file system.a relative pathname; it names file xin subdirectory bin of the current directory. If the currentdirectory is /, it names Ibinlx. If, on the other hand, the current directory isla1ltfljtb, it names la1ltfljtblbinlx.name of a file in the current directory.The file system for the UNIX operating system provides special shorthand notations for the current directoryand the parent directory of the current directory:B. UNIX System Processesis the generic name of the current directory. A .1m em ox names the same file as memoxif such a file exists in the current directory.is the generic name of the parent directory of the current directory. If the user types:cd ..then the parent directory of your current working directory will become your new currentdirectory.An image is a computer execution environment, including contents of memory, register values, name of thecurrent directory, status of open files, information recorded at login time, and various other items. A processis the execution of an image. Most UNIX system commands execute as separate processes. One process mayspawn another using the fork(2) system call, which duplicates the image of the original (parent) process. Thenew (child) process continues to execute the same program as the parent. The two images are identical, exceptthat each program can determine whether it is executing as parent or child. Each program may continue executionof the image or may abandon it by issuing an exec(2) system call, thus initiating execution of another program.In any case, each process is free to proceed in parallel with the other, although the parent most commonlyissues a wait(2) system call to suspend execution until a child terminates (exits).Figure 2.1 illustrates these ideas. Program A is executing (as process 1) and wishes to run program B.It forks and spawns a child (process 2) that continues to run program A. The child abandons A by execingB, while the parent goes to sleep until the child exits.Page 38


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>FORKPROCESS 2PROGRAM ACHILDPROGRAM BEXITFig. 2.1- The Shell Executing a Typical UNIX System CommandA child inherits its parent's open files. This mechanism permits processes to share common input streamsin various ways. In particular, an open file possesses a poin ter that indicates a position in the file and is modifiedby various operations on the file. The read(2) and write(2) system calls copy a requested number of bytes fromand to a file beginning at the position given by the current value of the pointer. As a side effect, the pointeris incremented by the number of bytes transferred yielding the effect of sequential 1/0. The Iseek(2) systemcall can be used to obtain random-access 1/0 by setting the pointer to an absolute position within the file orto a position offset either from the end of the file or from the current pointer position.When a process terminates, it can set an 8-bit exit status (see $? in "Predefined Special Variables" insubpart "D. Shell Variables") that is available to its parent. This code is usually used to indicate success (zero)or failure (nonzero).Signals indicate the occurrence of events that may have some impact on a process. A signal may be sentto a process by another process from the terminal or by the UNIX system itself. A child process inherits itsparent's signals. For most signals, a process can arrange to be terminated on receipt of a signal, to ignore itcompletely, or to catch it and take appropriate action as described in "Interrupt Handling-trap" in subpart"D. Control Commands". For example, an INTERRUPT signal may be sent by depressing an appropriate key(del, break, or rubout). The action taken depends on the requirements of the specific program being executed:• The shell invokes most commands in such a way that they immediately die when an interrupt is received.For example, the pr(l) (print) command normally dies allowing the user to terminate unwantedoutput.• The shell itselfignores interrupts when reading from the terminal because it should continue executioneven when the user terminates a command like pro• The editor ed(l) chooses to catch interrupts so that it can halt its current action (especially printing)without being terminated.SH ELL BASICSThe shell [Le., the sh(l) command] implements the command language visible to most UNIX system users.The shell reads input from a terminal or a file and arranges for the execution of the requested commands. Itis a program written in the C language [8]. The shell is not part of the operating system but is an ordinaryuser program.A. CommandsA simple command is a sequence of nonblank arguments separated by blanks or tabs. The first argument(numbered zero) usually specifies the name of the command to be executed. Any remaining arguments, witha few exceptions, are passed as arguments to that command. A command may be as simple as:whoPage 39


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82which prints the login names of users who are currently logged into the system. The following line requests thepr(l) command to print files a, b, and c:pr abcIf the first argument of a command names a file that is executable (as indicated by an appropriate set ofpermission bits associated with that file) and is actually a compiled program, the shell (as parent) spawns anew (child) process that immediately executes that program. If the file is marked as being executable but isnot a compiled program, it is assumed to be a shell procedure, i.e., a file of ordinary text containing shell commandlines, as well as possibly lines meant to be read by other programs. In this case, the shell spawns anotherinstance of itself (a subshell) to read the file and execute the commands included in it. The shell forks to dothis, but no exec call is made. The man(l) command requests that entries in the on-line UNIX System User'sManual be printed on the terminal. For example, the section that describes the who and pr commands can beprinted by entering the following:man who pr(Incidentally, the man(1) command itself is actually implemented as a shell procedure.) From the user'sviewpoint, compiled programs and shell procedures are invoked in exactly the same way. The shell determineswhich implementation has been used rather than requiring the user to do so. This preserves the uniformity ofinvocation and the ease of changing the choice of implementation for a given command. The actions of the shellin executing any of these commands are illustrated in Fig. 2.1.B. How the Shell Finds CommandsThe shell normally searches for commands in a way that permits them to be found in three distinct locationsin the file structure. The shell first attempts to find the command (as given on the command line) in thecurrent directory. If this fails, the shell prepends the string Ibin to the name and, finally, lusrlbin. The effectis to search, in order, the current directory, then the directory Ibin, and finally, lusrlbin. For example the pr(l)and man(l) commands are actually the files Ibinlpr and lusrlbinlman, respectively. A more complexpathname may be given either to locate a file relative to the user's current directory or to access a commandvia an absolute pathname. If a command name as given begins with a /, .I, or . .1 (e.g., Ibinlsort or . ./cmd), theprepending is not performed. Instead, a single attempt is made to execute the command as given.This mechanism gives the user a convenient way to execute public commands and commands in or near thecurrent directory as well as the ability to execute any accessible command regardless of its location in the filestructure. Because the current directory is usually searched first, anyone can possess a private version of a publiccommand without affecting other users. Similarly, the creation of a new public command will not affect auser who already has a private command with the same name. The particular sequence of directories searchedmay be changed by resetting the PATHvariable as described in "User-defined Variables" in subpart "D. ShellVaria bles".C. . Generation of Argument ListsCommand arguments are very often file names. A list of file names can be automatically generated as argumentson a command line by specifying a pattern that the shell matches against the file names in a directory.Most characters in such a pattern match themselves, but there are also special metacharacters that maybe included in a pattern. These special characters follow:* Matches any string including the null string? Matches anyone characterPage 40


6/82ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>[ ... ][! ..• ]Matches any sequence of characters enclosed within the square brackets. Be warned thatsquare brackets are also used to indicate that the enclosed argument is optional. See "A.Conditional Evaluation-test" in part "MISCELLANEOUS SUPPORTING COMMANDSAND FEATURES" for more details.Any sequence of characters preceded by a ! and enclosed within [ ... ] will match anyonecharacter otherthan one of the enclosed characters. Inside square brackets, a pair of charactersseparated by a - includes in the set all characters lexically within the inclusiverange of that pair, so that [a-de] is equivalent to [abcde].For example, the * matches all file names in the current directory. The *temp* matches all file names containingstring" temp" . A [a-f]* matches all file names that begin with a through f. The [!0-9] matches all single-characternames other than the digits, and *.c matches all file names ending in .c. The lal/tf/binl? matchesall single-character file names found in /al/tf/bin. This pattern matching capability saves much typing and,more importantly, makes it possible to organize information in large collections of small files that are namedin disciplined ways.Pattern matching has some restrictions. If the first character of a file name is a period (.), it can be matchedonly by an argument that literally begins with a period. If a pattern does not match any file names, then thepattern itself is returned as the result of the match, for example:will print:echo *.c*.cif the current directory contains no files ending in .c.Directory names should not contain the characters *, ?, [, or ] because this may cause infinite recursion duringpattern matching attempts. This may be changed in a future release.D. Shell VariablesThe shell has several mechanisms for creating variables. A variable is a name representing a string value.Certain variables are usually referred to as parameters. Parameters are the variables which are normally setonly on a command line. There are also positional parameters (see "Positional Parameters") and keywordparameters (see "A. A Command's Environment" in part "USING THE SHELL AS A COMMAND: SHELLPROCEDURES"). Other variables are simply names to which the user or the shell itself may assign string values.Positional Parameters: When a shell procedure is invoked, the shell implicitly creates positional parameters.The argument in position zero on the command line (the name of the shell procedure itself) is called$0, the first argument is called $1, etc. The shift command (see "Passing Arguments to the Shell-shift" inpart "USING THE SHELL AS A COMMAND: SHELL PROCEDURES") may be used to access arguments inpositions numbered higher than nine.One can explicitly force values into these positional parameters by using the set command:set abc def ghiwhich assigns stringl(" abc" ) to the first positional parameter ($1), string2 to the second ($2), and string3 tothe third ($3). For this example, set also unsets $4, $5, etc. even if they were previously set. The $0 may notPage 41


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82be assigned a value so that it always refers to the name of the shell procedure or to the name of the shell (inthe login shell).User-defined Variables: The shell also recognizes alphanumeric variables to which string values maybe assigned. Positional parameters may not appear on the left-hand side of an assignment statement. Positionalparameters can only be set as described in "Positional Parameters". A simple assignment is of the form:name = stringThereafter, $name will yield the value" string" . A name is a sequence of letters, digits, and underscores thatbegins with a letter or an underscore. Note that no spaces surround the = in an assignment statement.More than one assignment may appear in an assignment statement, but beware since the shell performsthe assignments from right to left. The following command line results in the variable a acquiring the value"abc": fa=$b b=abcThe following are examples of simple assignments. Double quotes around the right-hand side allow blanks,tabs, semicolons, and new-lines to be included in" string" , while also allowing variable substitution (also knownas parameter substitution) to occur. In parameter substitution references to positional parameters and othervariable names that are prefaced by $ are replaced by the corresponding values, if any. Single quotes inhibitvariable substitution. Some examples follow:MAIL=/usr /mail/ gasvar=" echo $1 $2 $3 $4"stars=*****asterisks='$stars'The variable var has as its value the string consisting of the values of the first four positional parameters, separatedby blanks. No quotes are needed around the string of asterisks being assigned to stars because patternmatching (expansion of *, ?, [ ... ]) does not apply in this context. Note that the value of $asterisks'is the literalstring "$stars", not the string" *****" , because the single quotes inhibit substitution.In assignments, blanks are not reinterpreted after variable substitution, so that the following example resultsin $first and $second having the same value:first='a string with embedded blanks'second=$firstIn accessing the value of a variable, one may enclose the variable's name (or the digit designating the positionalparameter) in braces {} to delimit the variable name from any following string. See "CommandGrouping-Parentheses and Braces" in "D. Control Commands" and "G. Conditional Substitution" in part"MISCELLANEOUS SUPPORTING COMMANDS AND FEATURES" for other meanings of braces in theshell. In particular, if the character immediately following the name is a letter, digit, or underscore (digit onlyfor positional parameters), then the braces are required:a='This is a string'echo" ${a}ent test"The following variables are used by the shell. Some of them are set by the shell, and all of them can beset and reset by the user:HOMEis initialized by the login(1) program to the name of the user's login directory, i.e., the directorythat becomes the current directory upon completion of a login. The cd commandPage 42


6/82ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>without arguments uses $ HOME as the directory to switch to. Using this variable helpsone to keep full pathnames out of shell procedures. This is a big help when the pathnameof your login directory is changed (e.g., to balance disk loads).MAILPATHis the pathname of a file where your mail is deposited. If MAlLis set, then the shell checksto see if anything has been added to the file it names and announces the arrival of newmail every time you return to command level (e.g., by leaving the editor). MAIL must beset by the user. (The presence of mail in the standard mail file is also announced at login,regardless of whether MAIL is set.)is the variable that specifies where the shell is to look when it is searching for commands.Its value is an ordered list of directory pathnames separated by colons. A null characteranywhere in that list represents the current directory. The shell initializes PATHto thelist :Ibin:lusrlbin where, by convention, a null character appears in front of the first colon.Thus if you wish to search your current directory last, rather than first, you would type:PATH = Ibin:/usr Ibin::where the two colons together represent a colon followed by a null followed by a colon, thusnaming the current directory. A user often has a personal directory of i commands (say,$HOME) and causes it to be searched before the Ibin and usrlbin directories by using:CDPATHPATH =:$HO ME/bin:/bin:/usr IbinThe setting of PATHto other than the default value is normally done in a user's .profilefile (see "The .profile File" in subpart "I. Changing the State of the shell and the .profileFile").is the variable that specifies where the shell is to look when searching for the argumentof the cd command whenever that argument is not null and does not begin with /, .I, or•.1 [see cd(l), "A. File System" in part "OVERVIEW OF THE UNIX <strong>SYSTEM</strong> ENVIRON­MENT", and "E. Special Shell Commands" in part "USING THE SHELL AS A COM­MAND: SHELL PROCEDURES"]. The value of CDPATHis an ordered list of directorypathnames separated by colons. A null character anywhere in that list represents the currentdirectory. By convention, if the list begins with a colon, a null character is assumedto precede that colon. Initially, CDPATH is unset, resulting in only the current directorybeing searched. Thus if you wish the cd command to first search your current directoryand then your home directory, you would type:CDPATH=:$HOMEThe setting of CDPATH to other than the default value is normally done in a user's .profilefile (see "The .profile File" in subpart "I. Changing the State of the shell and the. profileFile"). Note that if the cd command changes to a directory that is not a descendent of thecurrent directory, it writes the full name of the new directory on the diagnostic output(see"Standard Input and Standard Output" and "Diagnostic and Other Outputs" insubpart "F. Redirection of Input and Output").PSIPS2is the variable that specifies what string is to be used as the primary prompt string. If theshell is interactive, it prompts with the value of PSl when it expects input. The defaultvalue of PSl is "$" (a $ followed by a blank).is the variable that specifies the secondary prompt string. If the shell expects more inputwhen it encounters a new-line in its input, it will prompt with the value of PS2. The defaultvalue of PS2 is ">" (a > followed by a blank).Page 43


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82IFSis the variable that specifies which characters are internal field separators. These are thecharacters the shell uses during blank interpretation. (If you want to parse somedelimiter-separated data easily, you can set IFS to include that delimiter.) The shell initiallysets IFS to include the blank, tab, and new-line characters.Command Substitution: Any command line can be placed within gr2,ve accents (' ...') to capture the outputof the command. This concept is known as commandisubstitution:'The command or commands enclosed betweengrave accents are first executed by the shell and then· their o~tput replaces the whole expression, graveaccents and all. This feature is often combined with shell variables so thattodaY='date'assigns the string representing the current date to the variable today (e.g., Tue Nov 27 16:01:09 EST 1979).The commandusers='who I wc -1'saves the number of logged-in users in the variable users. Any command that writes to the standard output canbe enclosed in grave accents. Grave accents (see "E. Quoting Mechanisms") may be nested. The inside sets mustbe escaped with \. For example:logmsg='echo Your login directory is\'pwd\ "Shell variables can also be given values indirectly by using the read(2) command. The read command takesa line from the standard input (usually your terminal) and assigns consecutive words on that line to anyvariables named:read first ini t lastwill take an input line of the form:G. A. Snyderand have the same effect as if you had typed:first=G. init=A. last=SnyderThe read command assigns any excess "words" to the last variable.Predefined Special Variables: Several variables have special meanings. The following are set only bythe shell:$#records the number of positional arguments passed to the shell, not counting the nameof the shell procedure itself. The variable $# yields the number of the highest-numberedpositional parameter that is set. Thus, sh x abc sets $# to 3. One of its primary uses isin checking for the presence of the required number of arguments:if test $# -It 2thenecho 'two or more args required'; exitfi$?$$is the exit status (also referred to as return code, exit code, or value) of the last commandexecuted. Its value is a decimal string. Most UNIX system commands return 0 to indicatesuccessful completion. The shell itself returns the current value of $? as its exit status.is the process number of the current process. Since process numbers are unique among allexisting processes, this string of up to five digits is often used to generate unique namesPage 44


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>for temporary files. The UNIX system provides no mechanism for the automatic creationand deletion of temporary files. A file exists until it is explicitly removed. Temporary filesare generally undesirable. The UNIX system pipe mechanism is far superior for many applications.However, the need for uniquely-named temporary files does occasionally occur.The following example also illustrates the recommended practice of creating temporaryfiles in a directory used only for that purpose:temp=$HOME/temp/$$ # use current process numberIs > $temp# to form unique temp filecommands, some of which use $temp, go hererm $temp# clean up at end$! is the process number of the last process run in the background (using &-see "D. ControlCommands" in part "USING THE SHELL AS A COMMAND: SHELL PROCEDURES").Again, this is a string of up to five digits.$- is a string consisting of names of execution flags (see "Execution Flags-set" in subpart"F. Redirection of Input and Output" and "G. More about Execution Flags" in part "US­ING THE SHELL AS A COMMAND: SHELL PROCEDURES") currently turned on in theshell. The $- variable might have the value xv if you are tracing your output.E. Quoting MechanismsMany characters have a special meaning to the shell which is sometimes necessary to conceal. Single quotes(' ') and double quotes (" II ) surrounding a string or backslash (\) before a single character provide this functionin somewhat different ways. (Grave accents [' '] are sometimes called back quotes but are used only for commandsubstitution [see "Command Substitution" in subpart "D. Shell Variables"] in the shell and do not hide specialmeanings of any characters.)Within right single quotes, all characters (except' itself) are taken literally with any special meaning removed.Thus:stuff='echo $? $*; Is * I we'results only in the string echo $? $*; Is * I we being assigned to the variable stuff but not in any other commandsbeing executed.Within double quotes, the special meaning of certain characters does persist while all other characters aretaken literally. The characters that retain their special meaning are $, ',and II itself. Thus, within double quotes,variables are expanded and command substitution takes place. However, any commands in a command substitutionare not affected by double quotes outside of the grave accents, so that characters such as * retain their specialmeaning.To hide the special meaning of $, " and II within double quotes, you can precede these characters with abackslash (\). Outside of double quotes, preceding a character with \ is equivalent to placing single quotesaround that character. A \ followed by a new-line causes that new-line to be ignored, thus allowing continuationof long command lines.F. Redirection of Input and OutputIn general, most commands neither know nor care whether their input (output) is coming from (going to)a terminal or a file. Thus, a command can be used conveniently either at a terminal or in a pipeline (see subpartPage 45


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82"G. Command Lines and Pipelines"). Depending on the nature of a command's input or output, the actions takenby a few commands can be varied either for efficiency's sake or to avoid useless actions (such as attemptingrandom access I/O on a terminal).Standard Input/Output: When a command begins execution, it usually expects that three files are alreadyopen-a standard input, a standard output, and a diagnostic (error) output. A number called a file descriptoris associated with each of these files. By convention, the file descriptor 0 is associated with standard input,file descriptor 1 with standard output, and file descriptor 2 with diagnostic output. A child process normallyinherits these files from its parent. All three files are initially connected to the terminal (0 to the keyboard,1 and 2 to the printer or screen). The shell permits them to be redirected elsewhere before control is passedto an invoked command. An argument to the shell of the form < file or > file opens the specified file as thestandard input or output, respectively (in the case of output, destroying the previous contents of file, if any).An argument of the form> > file directs the standard output to the end of file, thus providing a way to appenddata to it without destroying its existing contents. In either of the two output cases, the shell creates file ifit does not already exist (thus> output alone on a line creates a zero-length file). The following appends tofile log the list of users who are currently logged on:who » logSuch redirection arguments are only subject to variable and command substitution. Neither blank interpretationnor pattern matching of file names occurs after these substitutions. Thus:and:echo 'this is a test' > * .gggcat < ?will produce, respectively, a l-line file named *.ggg (a rather disastrous name for a file) and an error message(unless you have a file named?, which is also not a wise choice for a file name-see end of part "C. Generationof Argument Lists").Diagnostic & Other Outputs: Diagnostic output from UNIX system commands is traditionally directedto the file associated with file descriptor 2. (There is often a need for an error output file that is different fromstandard output so that error messages do not get lost down pipelines-see subpart "G. Command Lines andPipelines".) One can redirect this error output to a file by immediately prepending the number of the file descriptor(Le., 2 in this case) to either output redirection symbol (> or > ». The following line will append errormessages from the cc(l) command to file ERRORS:cc testfile.c 2> > ERRORSNote that the file descriptor number must be prepended to the redirection symbol without any interveningblanks or tabs. Otherwise, the number will be passed as an argument to the command.This method may be generalized to allow one to redirect output associated with any of the first ten filedescriptors (numbered 0 through 9) so that, for instance, if cmd puts output on file descriptor 9, the followingline will capture that output in file savedata:cmd 9> sa veda taA command often generates standard output and error output and might even have some other output, perhapsa data file. In this case, one can redirect independently all the different outputs. Suppose that cmd directs itsstandard output to file descriptor 1, its error output to file descriptor 2, and builds a data file on file descriptor9. The following would direct each of these three outputs to a different file:cmd > standard 2> error 9> dataPage 46


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>Other forms of input/output redirection are described in "Input/Output Redirection and Control Commands"and "In-line Input Documents" in subpart "D. Control Commands" and "F. Input/Output RedirectionUsing File Descriptors" in part "MISCELLANEOUS SUPPORTING COMMANDS AND FEATURES".G. Command Lines and PipelinesA sequence of one or more commands separated by I (or A) make up a pipeline. In a pipeline consisting ofmore than one command, each command is run as a separate process connected to its neighbor(s) by pipes, i.e.,the output of each command (except the last one) becomes the input of the next command in line. A filter isa command that reads its standard input, transforms it in some way, then writes it as its standard output. Apipeline normally consists of a series of filters. Although the processes in a pipeline are permitted to executein parallel, they are synchronized to the extent that each program needs to read the output of its predecessor.Many commands operate on individual lines of text, reading a line, processing it, writing it out, and looping backfor more input. Some must read larger amounts of data before producing output. The sort(1) command is anexample of the extreme case that requires all input to be read before any output is produced.The following is an example of a typical pipeline: nroff (see troff in Section 1) is a text formatter whoseoutput may contain reverse line motions; eol(1) converts these motions to a form that can be printed on a terminallackingreverse-motion capability; greek(1) is used to adapt the output to a specific terminal, here specifiedby -Thp. The flag -em indicates one of the commonly used formatting options, and text is the name of thefile to be formatted:H. Examplesnroff -cm text I colI greek -ThpThe following examples illustrate the variety of effects that can be obtained by combining a few commandsin the ways described above. It may be helpful to try these examples at a terminal:whoPrints (on the terminal) the list of logged-in users.who »logAppends the list of logged-in users to the end of file log.who I we-IPrints the number of logged-in users. (The argument to we is minus ell.)who I prPrints a paginated list of logged-in users.who I sortPrints an alphabetized list of logged-in users.who I grep pwPrints the list of logged-in users whose login names contain the string" pw" .who I grep pw I sort I prPrints an alphabetized, paginated list of logged-in users whose login names contain the string"pw".{ date; who I we -I; } > > logAppends (to file log) the current date followed by the count of logged-in users (see "CommandGrouping-Parentheses and Braces" in subpart "D. Control Commands" for the meaning of {...}in this context).Page 47


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82who I sed 'sf .*//' I sort I uniq -dPrints only the login names of all users who are logged in more than once.The who command does not by itself provide options to yield all of the results which can be obtained bycombining who with other commands. Note that who just serves as the data source in these examples. As anexercise, replace who I by < /etc/passwd in the above examples to see how a file can be used as a data sourcein the same way. Notice that redirection arguments may appear anywhere on the command line.I. Changing of the Shell and .profile StateThe state of a given instance of the shell includes the values of positional parameters (see "Positional Parameters"in subpart "D. Shell Variables"), user-defined variables (see "User-defined Variables" in subpart "D.Shell Variables"), environmental variables (see "A. A Command's Environment"), modes of execution (see "G.More about Execution Flags"), and the current working directory.The state of a shell may be altered in various ways. These include the cd command, several flags that canbe set by the user, and a file in one's login directory called .profile that is treated specially by the shell."CD": The cd(l) command changes the current directory to the one specified as its argument. This can (andshould) be used to change to a convenient place in the directory structure. The cd command is often combinedwith 0 to cause a subshell to change to a different directory and execute a group of commands without affectingthe original shell. The first sequence below extracts the component files of the archive file /al/tf/q.aand placesthem in whatever directory is the current one. The second sequence of commands places them in directory /al/tf.ar x I al/tfl q.a(cd lal/tf; ar x q.a)The .profile File: When you log in, the shell is invoked to read your commands. First, however, the shellchecks to see if a file name /etc/profile exists on your UNIX system and, if it does, commands are read fromit. The /etc/profile is used by system administrators to set up variables needed by all users. Typecat I etc/profileto see what your system administrator has already done for you. After this, the shell proceeds to see if youhave a file named .profile in your login directory. If so, commands are read and executed from it. For a sample.profile, see profile(5). Finally, the shell is ready to read commands from your standard input-usually theterminal.The Execution Flags-"set":The set command provides the capability of altering several aspects of thebehavior of the shell by setting certain shell flags. In particular, the x and v flags may be useful from the terminal.Flags may be set by typing, for example:set -xv(to turn on flags x and v). The same flags may be turned off by typingset +xv-vThese two flags have the following meaning:Input lines are printed as they are read by the shell. This flag is particularly useful forisolating syntax errors. The commands on each input line are executed after that input lineis printed.Page 48


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>-x Commands and their arguments are printed as they are executed. (Shell control commands,such as for, while, etc., are not printed, however.) Note that -x causes a traceof only those commands that are actually executed, whereas -v prints each line of inputuntil a syntax error is detected.The set command is also used to set these and other flags within shell procedures (see "G. More about ExecutionFlags").USING THE SHELL AS A COMMAND: SHELL PROCEDURESA. A Command's EnvironmentAll the variables (with their associated values) that are known to a command at the beginning of executionof that command constitute its environment. This environment includes variables that the command inheritsfrom its parent process and variables specified as keyword parameters on the command line that invokes thecommand.The variables that a shell passes to its child processes are those that have been named as arguments tothe export (see sh) command. The export command places the named variables in the environments of boththe shell and all its future child processes.Keyword parameters are variable-value pairs that appear in the form of assignments, normally before theprocedure name on a command line (see also -k flag in "G. More about Execution Flags"). Such variables areplaced in the environment of the procedure being invoked. For example:# key_commandecho $a $bis a simple procedure that echoes the values of two variables. If it is invoked as:then the output is:a=keyl b=key2 key_commandkeyl key2A procedure's keyword parameters are not included In the argument count $# (see "Predefined SpecialVariables" in "D. Shell Variables").A procedure may access the value of any variable in its environment. However, if changes are made to thevalue of a variable, these changes are not reflected in the environment. The changes are local to the procedurein question. In order for these changes to be placed in the environment that the procedure passes to its childprocesses, the variable must be named as an argument to the export command within that procedure (see "B.Invoking the Shell"). To obtain a list of variables that have been made exportable from the current shell, type:export(You will also get a list of variables that have been made readonly-see "E. Special Shell Commands".) Toget a list of name-value pairs in the current environment, type:envPage 49


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82B. Invoking the ShellThe shell is an ordinary command and may be invoked in the same way as other commands:sh proc [arg .•. ]A new instance of the shell is explicity invoked to read proc. Arguments, if any, can bemanipulated as described in "C. Passing Arguments to the Shell; shift".sh -v proc [arg .•. ] This is equivalent to putting set-vat the beginning of proc. Similarly for the x, e, u, andn flags (see "Execution Flags-set" in subpart "I. Changing the State of the Shell and theprofile File" and "G. More about Execution Flags").proc [arg ••. ]If proc is marked executable and is not a compiled, executable program, the effect is similarto that of sh proc [args •.. ].An advantage of this form is that proc may be found bythe search procedure described in "B. How the Shell Finds Commands" and "User-definedVariables" in subpart "D. Shell Variables". Also, variables that have been exported inthe shell will still. be exported from procwhen this form is used (because the shell onlyforks to read commands from proc). Thus any changes made within proc to the values ofexported variables will be passed on to subsequent commands invoked from within proc.c. Passing Arguments to the Shell-"shift"When a command line is scanned, any character sequence of the form $n is replaced by the nth argumentto the shell counting the name of the shell procedure itself as $0. This notation permits direct reference tothe procedure name and to as many as nine positional parameters (see "Positional Parameters" in subpart "D.Shell Variables"). Additional arguments can be processed using the shift command or by using a for loop (see"Looping over a List-for" in subpart "D. Control Commands").The shift command shifts arguments to the left; i.e., the value of $1 is thrown away, $2 replaces $1, $3replaces $2, etc .. The highest-numbered positional parameter becomes unset. ($0 is never shifted.) The commandshift n is a shorthand notation for n consecutive shifts. A shift 0 does nothing. For example, considerthe shell procedure ripple below. The echo command writes its arguments to the standard output. The whilecommand is discussed in "Conditional Looping-while and until" in subpart "D. Control Commands". Thelines that begin with # are comments.# ripple commandwhile test $# != 0doecho $1 $2 $3 $4 $5 $6 $7 $8 $9shiftdoneIf the procedure were invoked byit would printripple abcabcbccThe notation $* causes substitution of all positional parameters except $0. Thus, the echo line in the rippleexample above could be written more compactly as:echo $*These two echo commands are not equivalent. The first prints at most nine positional parameters. The secondprints all of the current positional parameters. The $* notation is more concise and less error-prone. OnePage 50


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>obvious application is in passing an arbitrary number of arguments to a command such as the nroff textformatter:nroff -h -rW120 -T450 -cm $*It is important to understand the sequence of actions used by the shell in scanning command lines and substitutingarguments. The shell first reads input up to a new-line or semicolon and then parses that much ofthe input. Variables are replaced by their values and then command substitution (via grave accents) is attempted.I/O redirection arguments are detected, acted upon, and deleted from the command line. Next the shellscans the resulting command line for internal field separators, that is, for any characters specified by IFS tobreak the command line into distinct arguments. Any explicit null arguments (specified by "" or' ') are retained,while implicit null arguments resulting from evaluation of variables that are null or not set are removed.Then file name generation occurs, with all metacharacters being expanded. The resulting command line is executedby the shell.Sometimes, one builds command lines inside a shell procedure. In this case one might want to have theshell rescan the command line after all the initial substitutions and expansions are done. The special commandeval is available for this purpose. The eval command takes a command line as its argument and simply rescansthe line performing any variable or command substitutions that are specified. Consider the following (simplified)situation:command=whooutput=' I wc -1'eval $command $outputThis segment of code results in the pipeline who I we -I being executed.The output of eval cannot be redirected. The uses of eval can, however, be nested.D. Control CommandsThe shell provides several flow-of-control commands that are useful in creating shell procedures. To explainthem, we first need a few definitions.A simple command is defined in "A. Commands". Input/Output redirection arguments can appear in asimple command line and are passed to the shell, not to the command.A command is a simple command or any of the shell control commands described below. A pipeline isa sequence of one or more commands separated by I. (For historical reasons, A is a synonym for I in this context.)The standard output of each command but the last in a pipeline is connected [by a pipe(2)] to the standard inputof the next command. Each command in a pipeline is run separately. The shell waits for the last command tofinish. The exit status of a pipeline is nonzero if the exit status of either the first or last process in the pipelineis nonzero. (This is a bit weird and may be changed in the future.)A command list is a sequence of one or more pipelines separated by;, &, &&, or II, and optionally terminatedby ; or &. A semicolon (;) causes sequential execution of the previous pipeline (i.e., the shell waits forthe pipeline to finish before reading the next pipeline), while & causes asynchronous execution of the precedingpipeline. Both sequential and asynchronous execution are thus allowed. An asynchronous pipeline continuesexecution until it terminates voluntarily or until its processes are killed. Figure 2.2 shows the actions of theshell involved in executing these two command lists:who> log; datewho >log& date&For the first command list in Fig. 2.2, the shell executes who, waits for it to terminate, then executes dateand waits for it to terminate. For the second command list in Fig. 2.2, the shell invokes both commands in orderbut does not wait for either one to finish.Page 51


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82SH FORK WAIT FORK WAITI- "( ASLEEP)- I- -(ASLEEP) ~2,3 EXECDATE 1------1SH FORK FORK (FREE TO DO OTHER COMMANDS)32WHODATEEXITEXITFig. 2.2-The Shell Executing Typical Command ListsMore typical uses of & include off-line printing, background compilation, and generation of jobs to be sentto other computers. For example, if you typenohup cc prog.c&you may continue working while the C compiler runs in the background. A command line ending with & is immuneto interrupts and quits, but it is wise to make it immune to hang-ups as well. The nohup command isused for this purpose. Without nohup, if you hang up while cc in the above example is still executing, cc willbe killed and your output will disappear.Note: The & operator should be used with restraint, especially on heavily-loaded systems. Other userswill not consider you a good citizen if you start up a large number of simultaneous, asynchronous processeswithout a compelling reason for doing so.The && and II operators, which are of equal precedence (but lower than & and I), cause conditional executionof pipelines. In cmdl II cmd2, cmdl is executed and its exit status examined. Only if cmdl fails (Le., has anonzero exit, status) is cmd2 executed. This is thus a more terse notation for:if cmdltest $? != 0thencmd2fiSee writemail in part "EXAMPLES OF SHELL PROCEDURES" for an example of use of I LThe && operator yields the complementary test: in cmdl && cmd2, the second command is executed onlyif the first succeeds (has a zero exit status). In the sequence below, each command is executed in order untilone fails:cmdl && cmd2 && cmd3 && ... && cmdnPage 52


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>A simple command in a pipeline may be replaced by a command list enclosed in either parentheses or braces.The output of all the commands so enclosed is combined into one stream that becomes the input to the next commandin the pipeline. The following line prints two separate documents in a way similar to that shown in a previousexample (see "G. Command Lines and Pipelines"):{ nroff -cm text1; nroff -cm text2; } I colI greek -ThpSee "Command Grouping-Parentheses and Braces" in subpart "D. Control Commands" for further details oncommand grouping.All of the following commands are formally described in sh(l).is:Structured Con dition al-&iir': The shell provides an if command. The simplest form of the if commandif command listthencommand listfiThe command list following if is executed. If the last command in this list has a zero exit status, then the commandlist that follows then is executed. The fi indicates the end of the if command.In order to cause an alternative set of commands to be executed in the case where the command list followingif has a nonzero exit status, one may add an else clause to the form given above. This results in the followingstructure:if command listthencommand listelsecommand listfiMultiple tests can be achieved in an if command by using the elif clause. For example:if test -f" $1" # is $1 a file?thenpr $1elif test -d " $1" # else, is $1 a directory?then(cd $1; pr *)elseecho $1 is neither a file nor a directoryfiThe above example is executed as follows. If the value of the first positional parameter is a file name, then printthat file. If not, then check to see if it is the name of a directory. If so, change to that directory and print allthe files there. Otherwise, echo the error message.The if command may be nested (but be sure to end each one with an fi). The new-lines in the above examplesof if may be replaced by semicolons.The exit status of the if command is the exit status of the last command executed in any then clause orelse clause. If no such command was executed, if returns a zero exit status.Page 53


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82Multiway Branch-"case": A multiple way branch is provided by the case command. The basic formatof case is:case string inpattern) command list ;;pattern) command list ;;esacThe shell tries to match string against each pattern in turn, using the same pattern-matching conventions asin file-name generation ("C. Generation of Argument Lists"). If a match is found, the command list followingthe matched pattern is executed. The ;; serves as a break out of the case and is required after each commandlist except the last. Note that only one pattern is ever matched and that matches are attempted in order, so thatif * is the first pattern in as case, no other patterns will ever be looked at.More than one pattern may be associated with a given command list by specifying alternate patterns separatedby I. For example:case $i in*.c)esac*.hl*.sh)*)cc $i.." # do nothing..echo"" $i of unknown type".."In the above example, no action is taken for the second set of patterns because the null command is specified.The * is used as a default pattern because it matches any word.The exit status of case is the exit status of the last command executed in the case command. If no commandswere executed, then case has a zero exit status.Conditional Looping-"while" and "until": A while command has the general form:while command listdocommand listdoneThe commands in the first command list are executed; and if the exit status of the last command in that listis zero, then the commands in the second list are executed. This sequence is repeated as long as the exit statusof the first command list is zero. A loop can be executed as long as the first command list returns a nonzeroexit status by replacing while with until.Any new-line in the above example may be replaced by a semicolon. The exit status of a while (until) commandis the exit status of the last command executed in the second command list. If no such command is executed,while (until) has exit status zero.Looping over a List-"for":Often, one wishes to perform some set of operations for each in a set of filesor execute some command once for each of several arguments. The for command can be used to accomplish this.Page 54


6/82ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>The for command has the format:for variable in word listdocommand listdonewhere word list is a list of strings separated by blanks. The commands in the command list are executed oncefor each word in word list Variable takes on as its value each word from word list, in turn, word list is fixedafter it is evaluated the first time. For example, the following for loop will cause each of the C source files xec.c,cmd.c, and word.c in the current directory to be diffed with a file of the same name in the directorylusrlsrclcmdlslr.for cfile in xec cmd worddodiff $cfile.c /usr/src/cmd/sh/$cfile.cdoneOne can omit the "in word lise' part of a for command. This will cause the current set of positional parametersto be used in place of word list. This is very convenient when one wishes to write a command that performsthe same set of commands for each of an unknown number of arguments. See null in part "Examples of ShellProcedures" for an example of this feature.Loop Control-"break" and "continue": The break command can be used to terminate execution ofa while, until,.or a for loop. The continue command requests the execution of the next iteration of the loop.These commands are effective only when they appear between do and done.The break command terminates execution of the smallest (Le., innermost) enclosing loop causing executionto resume after the nearest following unmatched done. Exit from n levels is obtained by break n.The continue command causes execution to resume at the nearest enclosing while, until, or for, i.e., theone that begins the innermost loop containing the continue. One can also specify an argument n to continueand execution will resume at the nth enclosing loop:# This procedure is interactive; 'break' and 'continue'# commands are used to allow the user to control data entry.while truedodoneecho " Please en ter data"read responsecase"$response" in" done" ) break..# no more dataesac"" " ) continue.."*)process the data here.."End-of-file and "exit": When the shell reaches end-of-file, it terminates execution, returning to its parentthe exit status of the last command executed prior to the end-of-file. The exit command simply reads tothe end-of-file and returns, setting the exit status to the value of its argument, if any. Thus, a procedure canbe terminated "normally" by using exit o.Command Grouping-Parentheses/Braces: There are two methods for grouping commands in theshell. As mentioned in "Cd" in subpart "I. Changing the State of the Shell and the .profile File', parenthesesPage 55


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82o cause the shell to spawn a subshell that reads the enclosed commands. Both the right and left parenthesesare recognized wherever they appear in a command line. The left and right parentheses can appear as literalparentheses only by being quoted. For example, if you type garble(stuff), the shell interprets this as fourseparate words: garble, (, stuff, and ).This subshell capability is useful if one wishes to perform some operations without affecting the values ofvariables in the current shell or to temporarily change directory and execute some commands in the new directorywithout having to explicitly return to the current directory. The current environment is passed to thesubshell and variables that are exported in the current shell are also exported in the subs hell. Thus:andcurrent='pwd'; cd /usr/docs/sh_tut;nohup mm -TIp sc_? llpr& cd $current(cd /usr/docs/sh_tut; nohup mm -TIp sc_? llpr&)accomplish the same result. Both examples are used to print a copy of a document on the line printer. However,the second example automatically puts you back in your original working directory. In the second exampleabove, blanks or new-lines surrounding the parentheses are allowed but not necessary. The shell will promptwith $PS2 is expected. See also the example in "Cd" in subpart "I. Changing the State of the Shell and the.profile File".Braces {} may also be used to group commands together. See "User-defined Variables" in subpart "D. ShellVariables" and "G. Conditional Substitution" for other meanings of braces in the shell. Both the left and theright brace are recognized only if they appear as the first (unquoted) word of a command. The opening brace{ may be followed by a new-line (in which case the shell will prompt for more input). Unlike the case involvingparentheses, no subshell is spawned for braces. The enclosed commands are simply read by the shell. The bracesare convenient when you wish to use the (sequential) output of several commands as input to one command.See the last example in "D. Control Commands".The exit status of a set of commands grouped by either parentheses or braces is the exit status of the lastenclosed executed command.110 Redirection and Control Commands: The shell normally does not fork when it recognizes thecontrol commands (other than parentheses) described above. However, each command in a pipeline is run asa separate process in order to direct input (output) to (from) each command. Also, when redirection of input!output is specified explicitly for a control command, a separate process is spawned to execute that command.Thus, when if, while, until, case, or for is used in a pipeline consisting of more than one command, the shellforks and a subshell runs the control command. This has certain implications. The most noticeable one is thatany changes made to variables within the control command are not effective once that control command finishes(similar to the effect of using parentheses to group commands). The control commands run slightly slower whenredirection is specified.Beginners should skip to "E. Special Shell Commands" on first reading.In-line Input Documents: Upon seeing a command line of the form:command < < eofstringwhere eofstring is any arbitrary string, the shell will take the subsequent lines as the standard input of commanduntil a line is read consisting only of eofstring(possibly preceded by one or more tab characters). By appendinga minus (-) to <


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>The shell creates a temporary file containing the input document and performs variable and command substitution("Command Substitution" in subpart "D. Shell Variables") on its contents before passing it to the command.Pattern matching on file names is performed on the arguments of command lines in commandsubstitutions. In order to prohibit all substitutions, one may quote any character of eofstring:command < < \ eofstringTypically eofstring consists of a single chararcter like! which is often used for this purpose.The in-line input document feature is especially useful for small amounts of input data (e.g., an editor"script"), where it is more convenient to place the data in the shell procedure than to keep it in a separate file.For instance, one could type:cat «- xyzThis message will be printed on theterminal with leading tabs removed.xyzThis in-line input document feature is most useful in shell procedures. See edfind, edlast, and mmt in part"EXAMPLES OF SHELL PROCEDURES". Note that in-line input documents may not appear within grave accents.This is an implementation bug that may be changed in the future.Transfer to Another File and Back via Dot (.J: A command line of the form. proccauses the shell to read commands from proc without spawning a new process. Changes made to variables inproc are in effect after the dot command finishes. This is thus a good way to gather a number of shell variableinitializations into one file. Note that an exit command in a file executed in this manner will cause an exit fromyour current shell. If you are at login level, you will be logged out.Interrupt Handling-~~trap": As noted in "B. UNIX System Processes", a program may choose to catchan interrupt from the terminal, ignore it completely, or be terminated by it. Shell procedures can use the trapcommand to obtain the same effects.trap arg signal-listis the form of the trap command, where arg is a string to be interpreted as a command list and signal-list consistsof one or more signal numbers [as described in signal(2)]. The commands in argare scanned at least oncewhen the shell first encounters the trap command. Because of this, it is usually wise to use single rather thandouble quotes to surround these commands. The single quotes inhibit immediate command and variable substitution.This becomes important, for instance, when one wishes to remove temporary files and the names of thosefiles have not yet been determined when the trap command is first read by the shell. The following procedurewill print the name of the current directory on the file errdirect when it is interrupted, thus giving the userinformation as to how much of the job was done:-,trap 'echo 'pwd' >errdirect' 2 3 15for i in Ibin lusr/bin lusr/gas/bindocd $icommands to be executed in directory $i heredonewhile the same procedure with double (rather than single) quotes trap "echo'pwd'>errdirect" 2 3 15 will,instead, print the name of the directory from which the procedure was executed.Page 57


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82Signal 11 (SEGMENTATION VIOLATION) may never be trapped because the shell itself needs to catchit to deal with memory allocation. Zero is not a UNIX system signal but is effectively interpreted by the trapcommand as a signal generated by exiting from a shell (either via an exit command or by "falling through"the end of a procedure). If argis not specified, then the action taken upon receipt of any of the signals in signallistis reset to the default system action. If argis an explicit null string (" or" "), then the signals in signal-listare ignored by the shell.! The most frequent use of trap is to assure removal of temporary files upon termination of a procedure. Thesecond example of "Predefined Special Variables" in subpart "D. Shell Variables" would be written more typicallyas follows:temp=$HOME/temp/$$trap 'rm $temp; trap 0; exit' 0 1 2 3 15Is > $tempcommands, some of which use $temp, go hereIn this example whenever signals 1 (HANGUP), 2 (INTERRUPT), 3 (QUIT), or 15 (SOFTWARE TERMINA­TION) are received by the shell procedure or whenever the shell procedure is about to exit, the commands enclosedbetween the single quotes will be executed. The exit command must be included or else the shellcontinues reading commands where it left off when the signal was received. The trap 0 turns off the originaltrap on exits from the shell so that the exit command does not reactivate the execution of the trap commands.Sometimes it is useful to take advantage of the fact that the shell continues reading commands after executingthe trap commands. The following procedure takes each directory in the current directory, changes toit, prompts with its name, and executes commands typed at the terminal until an end-of-file (control-d) or aninterrupt is received. An end-of-file causes the read commandto return a nonzero exit status, thus terminatingthe while loop and restarting the cycle for the next directory. The entire procedure is terminated if interruptedwhen waiting for input; but during the execution of a command, an interrupt terminates only that command:dir='pwd'for i in *dodoneif test -d $dir/$ithencd $dir/$iwhile echo" $i:"trap exit 2read xdotrap: 2eval $xdonefi# ignore interruptsSeveral traps may be in effect at the same time. If multiple signals are received simultaneously, they areserviced in ascending order. To check what traps are currently set, type:trapIt is important to understand some things about the way in which the shell implements the trap commandin order not to be surprised. When a signal (other than 11) is received by the shell, it is passed on to whateverPage 58


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>child processes are currently executing. When those (synchronous) processes terminate, normally or abnormally,the shell then polls any traps that happen to be set and executes the appropriate trap commands. This processis straightforward except in the case of traps set at the command (outermost or login) level. In this case,it is possible that no child process is running, so the shell waits for the termination of the first process spawnedafter the signal is received before it polls the traps.For internal commands, the shell normally polls traps on completion of the command. An exception to thisrule is made for the read command, for which traps are serviced immediately, so that read can be interruptedwhile waiting for input.E. Special Shell CommandsThere are several special commands that are internal to the shell (some of which have already been mentioned).These commands should be used in preference to other UNIX system commands whenever possible becausethey are faster and more efficient. The shell does not fork to execute these commands, so no additionalprocesses are spawned. The trade-off for this efficiency is that redirection of input/output is not allowed formost of these special commands.Several of the special commands have already been described in "D. Control Commands" because they affectthe flow of control. They are break, continue, exit, dot (.), and trap. The set command described in "PositionalParameters" in subpart "D. Shell Variables" and "Execution Flags-set" in subpart "I. Changing theState of the Shell and the .profile File" is also a special command. Descriptions of the remaining special commands[see sh(l)] are given here:The null command does nothing. The exit status is zero (true). Beware: any arguments tothe null command are parsed for syntactic correctness; when in doubt, quote such arguments.Parameter substitution takes place just as in other commands.cd argexec arg ...newgrp arg ...read var ...readonly var ...testMake argthe current directory. If argdoes not begin with I, ./, or . ./, cd uses the CDPATHshell variable ("User-defined Variables" in subpart "D. Shell Variables") to locate a parentdirectory that contains the directory argo If arg is not a directory or the user is notauthorized to access it, a nonzero exit status is returned. Specifying cd with no argis equivalentto typing cd $HOME.If argis a command, then the shell executes it without forking. No new process is created.Input/ output redirection arguments are allowed on the command line. If only input/output redirection arguments appear, then the input/output of the shell itself is modifiedaccordingly. See merge in part "Examples of Shell Procedures" for an example of this useof exec.The newgrp(l) command is executed replacing the shell. The newgrp command in turnspawns a new shell. Beware: Only variables in the environment will be known in the shellthat is spawned by the newgrp command. Any variables that were exported will no longerbe marked as such.One line (up to a new-line) is read from standard input and the first word is assigned tothe first variable, the second word to the second variable, etc. All leftover words are assignedto the last variable. The exit status of read is zero unless an end-of-file is read.The specified variables are made readonly so that no subsequent assignments may bemade to them. If no arguments are given, a list of all readonly and of all exportedvariables is given.A conditional expression is evaluated. More details are given in "A. ConditionalEvaluation-test".Page 59


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82timesumask nnnulimit nwait nThe accumulated user and system times for processes run from the current shell are printed.The user file creation mask is set to nnn. See umask(2) for details. If nnn is omitted, thenthe current value of the mask is printed.This command imposes a limit of n blocks on the size of files written by the shell and itschild processes (files of any size may be read). If n is omitted, the current value of this limitis printed. The default value for n varies from one installation to another.The shell waits for the child process whose process number is n to terminate. The exit statusof the wait command is that of the process waited on. If n is omitted or is not a childof the current shell, then all currently active processes are waited for and the return codeof the wait command is zero.F. Creation and Organization of Shell ProceduresA shell procedure can be created in two simple steps:1. Build an ordinary text file.2. Change the file's mode to make it executable.Changing the mode allows a shell procedure to be invoked by proc args rather than by sh proc args. The secondstep may be omitted for a procedure to be used once or twice and then discarded but is recommended for longerlivedones. Here is the entire input needed to set up a simple procedure (the executable part of draft in part"Examples of Shell Procedures"):ed .anroff -rC3 -T450-12 -cm $*w draftqchmod +x draftIt may then be invoked as draft filel file2. Note that shell procedures must always be at least readable sothat the shell itself can read commands from the file.If draft were thus created in a directory whose name appears in the user's PATHvariable, the user couldchange working directories and still invoke the draft command.Shell procedures may be created dynamically. A procedure may generate a file of commands, invoke anotherinstance of the shell to execute that file, and then remove it. An alternate approach is that of using the dotcommand (.) to make the current shell read commands from the new file, allowing use of existing shellvariables, and avoiding the spawning of an additional process for another shell. .Many users prefer to write shell procedures instead of C programs. First, it is easy to create and maintaina shell procedure because it is only a file of ordinary text. Second, it has no corresponding object program thatmust be generated and maintained. Third, it is easy to create a procedure on the fly, use it a few times, and thenremove it. Finally, because shell procedures are usually short in length, written in a high-level programminglanguage, and kept only in their source-language form, they are generally easy to find, understand, and modify.By convention, directories that contain only commands and/or shell procedures are usually named bin. Mostgroups of users sharing common interests have one or more bin directories set up to hold common procedures.Page 60


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>Some users have their PATH variable list several such directories. Although you can have a number of suchdirectories, it is unwise to go overboard. It may become difficult to keep track of your environment, and efficiencymay suffer (see "C. Efficient Organization").G. More about Execution FlagsThere are several execution flags available in the shell that can be useful in shell procedures:-e-u-t-n-kThe shell will exit immediately if any command that it executes exits with a nonzero exitstatus.When this flag is set, the shell treats the use of an unset variable as an error. This flagcan be used to perform a global check on variables.The shell exits after reading and executing the commands on the remainder of the currentinput line.This is a don't execute flag. On occasion, one may want to check a procedure for syntaxerrors but not to execute the commands in the procedure. Writing set -nv at the beginningof the file will accomplish this.All arguments of the form variable= value are treated as keyword parameters. When thisflag is not set, only such arguments that appear before the command name are treated askeyword parameters.MISCELLANEOUS SUPPORTING COMMANDS AND FEATURESShell procedures can make use of any UNIX system command. The commands described in this part areeither used especially frequently in shell procedures or are explicitly designed for such use. More detailed descriptionsof each of these commands can be found in Section 1 of the UNIX System User's Manual.A. Conditional Evaluation-"test"The test(l) command evaluates the expression specified by its arguments and, if the expression is true, returnsa zero exit status. Otherwise, a nonzero (false) exit status is returned. The test command also returnsa nonzero exit status if it has no arguments. Often it is convenient to use the test command as the first commandin the command list following an if or a while. Shell variables used in test expressions should be enclosedin double quotes if there is any chance of their being null or not set.On some UNIX systems, the square brackets ([]) may be used as an alias for test; e.g., [expression] has thesame effect as test expression.The following is a partial list of the primaries that can be used to construct a conditional expression:-r file true if the named file exists and is readable by the user.-w file true if the named file exists and is writable by the user.-x file true if the named file exists and is executable by the user.-s file true if the named file exists and has a size greater than zero.-d file true if the named file exists and is a directory.-f file true if the named file exists and is an ordinary file.Page 61


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82-p file-z sl-n sl-t fildessl = s2sl != s2sln1 -eq n2true if the named file exists and is a named pipe (fifo).true if the length of string" sl" is zero.true if the length of the string" sl" is nonzero.true if the open file whose file descriptor number is fildes is associated with a terminaldevice. If fildes is not specified, file descriptor 1 is used by default.true if strings" sl" and" s2" are identical.true if strings" sl" and" s2" are not identical.true if "sl" is not the null string.true if the integers nl and n2 are algebraically equal. Other algebraic comparisons are indicatedby -ne, -gt, -ge, -It, and -Ie.These primaries may be combined with the following operators:unary negation operator.-a-0( expr)binary logical and operator.binary logical or operator. The -0 has lower precedence than -a.parentheses for grouping; they must be escaped to remove their significance to the shell.When parentheses are absent the evaluation proceeds from left to right.Note that all primaries, operators, file names, etc. are separate arguments to test.B. Reading a Line-"line"The line(l) command takes one line from standard input and prints it on standard output. This is usefulwhen you need to read a line from a file or capture the line in a variable. The functions of line and of the readcommand that is internal to the shell differ in that input/output redirection is possible only with line. If theuser does not require input/output redirection, read is faster and more efficient. An example of a usage of linefor which read would not suffice is:firstline='line < somefile'c. Simple Output-"echo"The echo(l) command, invoked as echo [ arg ... ], copies its arguments to the standard output, each followedby a single space except for the last argument which is normally followed by a new-line. Often, echo is usedto prompt the user for input to issue diagnostics in shell procedures or to add a few lines to an output streamin the middle of a pipeline. Another use is to verify the argument list generation process before issuing a commandthat does something drastic. The command Is is often replaced by echo * because the latter is faster andprints fewer lines of output.The echo command recognizes several escape sequences. A \n yields a new-line character. A \c removesthe new-line from the end of the echoed line. The following prompts the user, allowing one to type on the sameline as the prompt:echo 'enter name:\c'read namePage 62


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>The echo command also recognizes octal escape sequences for all characters whether printable or not. An echo"\007" typed at a terminal will cause the bell on that terminal to ring.D. Expression Evaluation-"expr"The expr(1) command provides arithmetic and logical operations on integers and some pattern matchingfacilities on its arguments. It evaluates a single expression and writes the result on the standard output. Theexpr command can be used inside grave accents to set a variable. Typical examples are:# increment $aa='expr $a + l'# put third through last characters of# $1 into substringsubstring='expr" $1" : ' .. \('*\)"# obtain length of $1c='expr "$1" : '.*"The most common uses of expr are in counting iterations of a loop and in using its pattern matching capabilityto pick apart strings. See expr(1) for more details.E. "true" and "false"The true(1) and false [see true(1)] commands perform the obvious functions of exiting with zero and nonzeroexit status, respectively. The true command is often used to implement an unconditional loop.F. Input/Output Redirection Using File DescriptorsBeginners should skip this subpart on first reading. A command occasionally directs output to some file associatedwith a file descriptor other than 1 or 2 (see "Diagnostic and Other Outputs" in subpart "F. Redirectionof Input and Standard Output"). In languages such as C, one can associate output with any file descriptor byusing the write(2) system call. The shell provides its own mechanism for creating an output file associatedwith a particular file descriptor. By typingfdl>&fd2where fdl and fd2 are valid file descriptors, one can direct output that would normally be associated with filedescriptor fdl onto the file associated with fd2. The default value for fdl and fd2 is 1. If, at execution time, nofile is associated with fd2, then the redirection is void. The most common use of this mechanism is that of directingstandard error output to the same file as standard output. This is accomplished by typingcommand 2>&1If one wanted to redirect both standard output and standard error output to the same file, one would typecommand 1> file 2>&1The order here is significant. First, file descriptor 1 is associated with file. Then file descriptor 2 is associatedwith the same file that is currently associated with file descriptor 1. If the order of the redirections were reversed,standard error output would go to the terminal, and standard output would go to file because at the timeof the error output redirection file descriptor 1 still would have been associated with the terminal.This mechanism can also be generalized to the redirection of standard input. One could typefda


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82to cause both file descriptors fda and fdb to be associated with the same input file. If fda or fdb is not specified,file descriptor 0 is assumed. Such input redirection is useful for commands that use two or more input sources.Another use of this notation is for sequential reading and processing of a file. See merge in part "EXAMPLESOF SHELL PROCEDURES" for an example of use of this feature.G. Conditional SubstitutionNormally, the shell replaces occurrences of $ variable by the string value assigned to variable, if any. However,there exists a special notation to allow conditional substitution depending upon whether the variable isset and/or not null. By definition, a variable is set if it has ever been assigned a value. The value of a variablecan be the null string which may be assigned to a variable in anyone of the following ways:A=bcd=" "Ef~="set' '"''The first three of these examples assign the null string to each of the corresponding shell variables. The lastexample sets the first and second positional parameters to the null string and unsets all other positional parameters.The following conditional expressions depend upon whether a variable is set and not null. (Note that, inthese expressions, variable refers to either a digit or a variable name and the meaning of braces differs fromthat described in "User-defined Variables" in subpart "D. Shell Variables" and "Command Grouping-Parenthesesand Braces" in subpart "D. Control Commands".)${ variable :-stringj${variable :=stringj${ variable :? stringjIf variable is set and is non-null, then substitute the value $ variable in place of this expression.Otherwise, replace the expression with string. Note that the value of variableis not changed by the evaluation of this expression.If variable is set and is non-null, then substitute the value $ variable in place of this expression.Otherwise, set variable to' string, and then substitute the value $ variable inplace of this expression. Positional parameters may not be assigned values in this fashion.If variable is set and is non-null, then substitute the value of variable for the expression.Otherwise, print a message of the formvariable:stringand exit from the current shell. (If the shell is the login shell, it is not exited.) If stringis omitted in this form, then the messagevariable:parameter null or not setis printed instead.${ variable :+ stringjIf variable is set and is non-null, then substitute string for this expression; otherwise,substitute the null string. Note that the value of variable is not altered by the evaluationof this expression.Page 64


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>These expressions may also be used without the colon (:), in which case the shell does not check whethervariable is null or not. It only checks whether variable has ever been set.The two examples below illustrate the use of this facility:1. If PATH has ever been set and is not null, then keep its current value. Otherwise, set it to the string:/bin:/usr/bin. Note that one needs an explicit assignment to set PATH in this form:PATH=${PATH:-':/bin:/usr/bin'}2. If HOME is set and is not null, then change directory to it; otherwise, set it to the given value and changedirectory to it. Note that HOME is automatically assigned a value in this case:H. Invocation Flagscd ${HOME:=' /usr/gas'}There are four flags that may be specified on the command line invoking the shell. These flags may notbe turned on via the set command:-i-s-c-rIf this flag is specified or if the shell's input and output are both attached to a terminal,the shell is interactive. In such a shell, INTERRUPT (signal 2) is caught and ignored,while QUIT (signal 3) and SOFTWARE TERMINATION (signal 15) are ignored.If this flag is specified or if no input/output redirection arguments are given, the shellreads commands from standard input. Shell output is written to file descriptor 2. The shellyou get upon logging into the system effectively has the -s flag turned on.When this flag is turned on, the shell reads commands from the first string following theflag. Remaining arguments are ignored. Double quotes should be used to enclose amultiword string in order to allow for variable substitution.When this flag is specified on invocation, then the restricted shell is invoked. This is a versionof the shell in which certain actions are disallowed. In particular, the cd commandproduces an error message, and the user cannot set PATH See sh(l) for a more detaileddescription.EXAMPLES OF SHELL PROCEDURESSome examples in this subpart are quite difficult for beginners. For ease of reference, the examples are arrangedalphabetically by name, rather than by degree of difficulty.copypairs# usage: copypairs file1 file2 ...# copy file1 to file2, file3 to file4, ...while test" $2" !=""docp $1 $2shift; shiftdoneif test" $1" !=""thenecho" $0: odd number of arguments"fiPage 65


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82copy todistinctNote: This procedure illustrates the use of a while loop to process a list of positional parameters thatare somehow related to one another. Here a while loop is much better than a for loop because you canadjust the positional parameters via shift to handlerelated arguments.# usage: copy to dir file ...# copy argument files to 'dir', making sure that at least# two arguments exist and that 'dir' is a directoryif test $# -It 2thenecho" $0: usage: copy to directory file ..."elif test ! -d $1thenecho" $0: $1 is not a directory" ;elsefidir=$l; shiftfor eachfiledocp $eachfile $dirdoneNote: This procedure uses an if command with two tests in order to screen out improper usage. The forloop at the end of the procedure loops over all of the arguments to copy to but the first. The original $1is shifted off.# usage: distinct# reads standard input and reports list of alphanumeric strings# that differ only in case, giving lowercase form of eachtr -cs '[A-Z][a-z][0-9]' '[\012*]' I sort -u Itr '[A-ZJ' '[a-z]' I sort I uniq -dNote: This procedure is an example of the kind of process that is created by the left-to-right constructionof a long pipeline. It may not be immediately obvious how this works. [See tr(I), sort(l), and uniq(l)if you are completely unfamiliar with these commands.] The tr translates all characters except letters anddigits into new-line characters and then squeezes out repeated new-line characters. This leaves each string(in this case, any contiguous sequence of letters and digits) on a separate line. The sort command sortsthe lines and emits only one line from any sequence of one or more repeated lines. The next tr convertseverything to lowercase, so that identifiers differing only in case become identical. The output is sortedagain to bring such duplicates together. The uniq -d prints (once) only those lines that occur more thanonce yielding the desired list.The process of building such a pipeline uses the fact that pipes and files can usually be interchanged. Thetwo lines below are equivalent assuming that sufficient disk space is available:cmd! I cmd2 I cmd3cmdl > tempI; < tempI cmd2 > temp2; < temp2 cmd3; rm temp[12]Page 66


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>Starting with a file of test data on the standard input and working from left to right, each command is executedtaking its input from the previous file and putting its output in the next file. The final output is then examinedto make sure that it contains the expected result. The goal is to create a series of transformations that will convertthe input to the desired output. As an exercise, try to mimic distinct with such a step-by-step process usinga file of test data containing:ABC:DEF/DEFABC1 ABCAbc abcAlthough pipelines can give a concise notation for complex processes exercise some restraint lest you succumbto the "one-line syndrome" sometimes found among users of especially concise languages. This syndromeoften yields incomprehensible code.draftedfindedlast# usage: draft file ( s)# prints the draft (-rC3) of a document on a DASI 450# terminal in 12-pitch using memorandum macros (MM).nroff -re3 -T450-12 -cm $*Note: Users often write this kind of procedure for convenience in dealing with commands that requirethe use of many distinct flags that cannot be given default values that are reasonable for all (or even most)users.# usage: edfind file arg# find the last occurrence in 'file' of a line whose# beginning matches 'arg', then print 3 lines (the one# before, the line itself, and the one after)ed-$l«!H?A$2?;-,+p!Note: This procedure illustrates the practice of using editor (ed) in-line input scripts into which theshell can substitute the values of variables. It is a good idea to turn on the H option of ed when embeddingan ed script in a shell procedure [see ed(l)].# usage: edlast file# prints the last line of file, then deletes that lineed - $1 < < - \ eof # no variable substitutions in "ed" scriptH$p$dwqeofecho Done.Page 67


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82[splitNote: This procedure contains an in-line input document or script (see "In-line Input Documents" insubpart "D. Control Commands"); it also illustrates the effect of inhibiting substitution by escaping acharacter in the eofstring (here, eof) of the input redirection. If this had not been done, $p and $d wouldhave been treated as shell variables.# usage: fsplit filel file2# read standard input and divide it into three parts:# append any line containing at least one letter# to filel, any line containing at least one digit# but no letters to file2, and throw the rest awaytotal =0 lost=Owhile read nextdototal=" 'expr $total + l' "case " $next" in*[A-Za-z]*)echo " $next" » $1;;*[0-9]*)echo " $next" » $2;;*)lost=" 'expr $lost + l' "esacdoneecho "$totallines read, $lost thrown away"Note: In this procedure, each iteration of the while loop reads a line from the input and analyzes it.The loop terminates only when read encounters an end-of-file.Do not use the shell to read a line at a time unless you must-it can be grotesquely slow (see "Number ofProcesses Generated" in subpart "B. Approximate Measures of Resource Consumption").initvars# usage: . initvars# use carriage return to indicate" no change"echo" initializations? \c"read responseif test" $response" = ythenecho" PSl=\c" ; read tempPSI =${temp:-$PSl}echo" PS2=\ c " ; read tempPS2=${temp:-$PS2}echo" PATH=\c"; read tempPATH =${temp:-$P ATH}echo" TERM=\c" ; read tempTERM =${temp:-$TERM}fiNote: This procedure would be invoked by a user at the terminal or as part of a .profile file. The assignmentsare effective even when the procedure is finished because the dot command is used to invoke it. ToPage 68


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>mergemkfilesbetter understand the dot command invoke initvars as indicated above and check the values of PS1, PS2,PATH, and TERM; then make initvars executable, type initvars, assigning different values to the threevariables, and check again the values of these three shell variables after initvars terminates. It is assumedthat PS1, PS2, PATH, and TERM have been exported, presumably by your .profile (see "The.profile File" in subpart "I. Changing the State of the Shell and the .profile File" and "A. A Command'sEnvironment").# usage: merge src1 src2 [ dest ]# merge two files, every other line.# the first argument starts off the merge,# excess lines of the longer file are appended to# the end of the resultant fileexec 4$dest II { more=4; break ;}ed - $dest < $destdo :; done# delete the last line of destination# file, because it is blank.# read the remainder of the longer# file-the body of the 'while' loop# does nothing; the work of the loop# is done in the command list following# 'while'Note: This procedure illustrates a technique for reading sequential lines from a file or files without creatingany subshells to do so. When the file descriptor is used to access a file, the effect is that of openingthe file and moving a file pointer along until the end of the file is read. If the input redirections used srcland src2 explicitly rather than the associated file descriptors, this procedure would never terminate becausethe first line of each file would be read over and over again.# usage: mkfiles pref [ quantity]# makes 'quantity' (default = 5) files, named prefl, pref2, ...quantity=${2-5}i=1while test" $i" -Ie" $quantity"do> $1$ii=" 'expr $i + 1'"donePage 69


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82mmtNote: This procedure uses input/output redirection to create zero-length files. The expr command isused for counting iterations of the while loop. Compare this procedure with procedure null below.if test" $#" = 0; then cat < output to terminal-e => preprocess input with eqn-t = > preprocess input with tbl-Tst => output to STARE phototypesetter-T4014-Tvpmanufactured by Honeywell=> output to 4014 manufactured by Tektronix=> output to printer manufactured by Versatec=> use instead of" files" when mmt usedinside a pipeline.Other options as required by TROFF and the MM macros.!exit 1fiPATH='/bin:/usr/bin'; O='-g'; 0=' I gcat -ph';# Assumes typesetter is accessed via gcat(l)# If typesetter is on-line, use 0=' '; 0="while test -n "$1" -a! -r " $1"do#case" $1" in-a)-Tst)esacshift-T4014)-Tvp)-e)-t)-)*)doneif test -z "$1"thenfiif test" $0" =' -g'thenfid=" $*"if test" $d"then, ,O='-a';O='-g';Above line for STARE onlyO='-t';O='-t';e='eqn';;f='tbl';;break;;a=" $a $1";;echo 'mmt: no input file'exit 1x=" -f$l"shiftx=' ,d=' ,0=" .."o='lgcat -st';;o='ltc';;o='lvpr -t';;Page 70


6/82ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>fiif test -n "$f"thenf=" tbl $* I"d=' ,fiif test -n "$e"thenif test -n " $f"then e='eqn I 'else e=" eqn $* I "d="fifieval "$f $e troff $0 -cm $a $d $0 $x" ; exit 0Note: This is a slightly simplified version of an actual UNIX system command (although this is not theversion included in UNIX system Release 4.0). It uses many of the features available in the shell. If youcan follow through it without getting lost, you have a good understanding of shell programming. Pay particularattention to the process of building a command line from shell variables and then using eval toexecute it.nullphone# usage: null file# create each of the named files as an empty filefor eachfiledo> $eachfiledoneNote: This procedure uses the fact that output redirection creates the (empty) output file if that filedoes not already exist. Compare this procedure with procedure mkfiles above.# usage: phone initials# prints the phone number(s) of person with given initialsecho 'inits ext home'grep" "'$1" «\!abc 1234def 2234ghi 3342xyz 4567!999-2345583-2245988-1010555-1234Note: This procedure is an example of using an in-line input document or script to maintain a small database.writemail# usage: writemail message user# if user is logged in, write message on terminal;# otherwise, mail it to userecho "$1" I { write "$2" II mail "$2" ;}Page 71


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82Note: This procedure illustrates command grouping. The message specified by $1 is piped to the writecommand and, if write fails, to the mail command.EFFECTIVE AND EFFICIENT SHELL <strong>PROGRAMMING</strong>A. Overall ApproachThis subpart outlines strategies for writing efficient shell procedures, i.e., ones that do not waste resourcesunreasonably in accomplishing their purposes. The primary reason for choosing the shell procedure as the implementationmethod is to achieve a desired result at a minimum human cost. Emphasis should always be placedon simplicity, clarity, and readability; but efficiency can also be gained through awareness of a few design strategies.In many cases, an effective redesign of an existing procedure improves its efficiency by reducing its sizeand often increases its comprehensibility. In any case, one should not worry about optimizing shell proceduresunless they are intolerably slow or are known to consume a lot of resources.The same kind of iteration cycle should be applied to shell procedures as to other programs-write code,measure it, and optimize only the few important parts. The user should become familiar with the time commandwhich can be used to measure both entire procedures and parts thereof. Its use is strongly recommended; humanintuition is notoriously unreliable when used to estimate timings of programs even when the style of programmingis a familiar one. Each timing test should be run several times because the results are easily disturbedby, for instance, variations in system load.B. Approximate Measures of Resource ConsumptionNumber of Processes Generated: When large numbers of short commands are executed, the actual executiontime of the commands may well be dominated by the overhead of creating processes. The procedures thatincur significant amounts of such overhead are those that perform much looping arid those that generate commandsequences to be interpreted by another shell.If you are worried about efficiency, it is important to know which commands are currently built into theshell and which are not. Here is the alphabetical list of those that are built-in:break case cd continue eva execexit export for if newgrp readreadonly set shift test times trapulimit umask{...}until wait whileThe ( ... ) command executes as a child process, i.e., the shell does a fork, but no exec. Any command notin the above list requires both fork and exec.The user should always have at least a vague idea of the number of processes generated by a shell procedure.In the bulk of observed procedures, the number of processes spawned (not necessarily simultaneously) canbe described byprocesses = k*n + cwhere k and c are constants, and n is the number of procedure arguments, the number of lines in some inputfile, the number of entries in some directory, or some other obvious quantity. Efficiency improvements are mostcommonly gained by reducing the value of k, sometimes to zero. Any procedure whose complexity measure includesn2 terms or higher powers of n is likely to be intolerably expensive.As an example, here is an analysis of procedure fsplit of part "EXAMPLES OF SHELL PROCEDURES".For each iteration of the loop, there is one expr plus either an echo or another expr. One additional echoPage 72


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>is executed at the end. If n is the number of lines of input, the number of processes is 2*n + 1. On the otherhand, the number of processes in the following (equivalent) procedure is 12 regardless of the number of linesof input:# faster fsplittrap 'rm temp$$; trap 0; exit' 0 1 2 3 15startl =0 start2=0b='[A-Za-z]'cat > temp$$# read standard input into temp file# save original lengths of $1, $2if test -s " $1" ; then startl = 'wc -1 < $1'; fiif test -s " $2" ; then start2= 'wc -1 < $2'; figrep "$b" temp$$ > > $1 # lines with letters onto $1grep -v" $b" temp$$1 grep '[0-9]' > >$2# lines with only numbers onto $2total=" 'wc -1 < temp$$' "endl=" 'wc -1 < $1'"end2=" 'wc -1 < $2'"lost=" "expr $total - \( $endl - $startl \) - \ ($end2 -$start2\)""echo" $totallines read, $lost thrown away"This version is often ten times faster than fsplit, and it is even faster for larger input files.Some types of procedures should not be written using the shell. For example, if one or more processes aregenerated for each character in some file, it is a good indication that the procedure should be rewritten in C.Note: Shell procedures should not be used to scan or build files a character at a time.Number of Data Bytes Accessed: It is worthwhile considering any action that reduces the number ofbytes read or written. This may be important for those procedures whose time is spent passing data aroundamong a few processes rather than in creating large numbers of short processes. Some filters shrink their output,others usually increase it. It always pays to put the shrinkers first when the order is irrelevant. Which ofthe following is likely to be faster?sort file I grep patterngrep pattern file I sortDirectory Searches: Directory searching can consume a great deal of time, especially in those applicationsthat utilize deep directory structures and long pathnames. Judicious use of cd can help shorten longpathnames and thus reduce the number of directory searches needed. As an exercise, try the following commands(on a fairly quiet system):time sh -c 'Is -1 lusr/bin/* >/dev/null'time sh -c 'cd lusr/bin; Is -1 * >/dev/null'If you do not understand exactly what is going on in these examples, read Section 7 in the UNIX System User'sManual.c. Efficient OrganizationDirectory-Search Order and PATH Variable: The PATHvariable is a convenient mechanism for allowingorganization and sharing of procedures. However, it must be used in a sensible fashion; or the result maybe a great increase in system overhead that occurs in a subtle, but avoidable, way.The process of finding a command involves reading every directory included in every pathname that precedesthe needed pathname in the current PATH variable. As an example, consider the effect of invoking nroffPage 73


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ADD ISSUE 1 1/83(Le., lusrlbinlnrofi) when $PATH is :Ibin:lusrlbin. The sequence of directories read is: ., I, Ibin, I, lusr, andlusrlbin, i.e., a total of six directories. A long path list assigned to PATH can increase this number significantly.The vast majority of command executions are of commands found in Ibin and, to a somewhat lesser extent,in lusrlbin. Careless PATHsetup may lead to a great deal of unnecessary searching. The following four examplesare ordered from worst to best (but only with respect to the efficiency of command searches)::1 a1/tf/j tb/bin:/usr Ilbin:/bin:/usr Ibin:/bin:1 all tfl j tb/bin:1 usr Ilbin:1 usr Ibin:/bin:/usr Ibin:1 a1/tf/j tb/bin:1 usr IlbinIbin::1 usr Ibin:1 all tfl j tb/bin:1 usr II binThe first one above should be avoided. The others are acceptable; the choice among them is dictated by therate of change in the set of commands kept in Ibin and lusrlbin.A procedure that is expensive because it invokes many short-lived commands may often be speeded up bysetting the PATH variable inside the procedure such that the fewest possible directories are searched in anoptimum order. The mmt example in part "EXAMPLES OF SHELL PROCEDURES" does this.Setting Up Directories: It is wise to avoid directories that are larger than necessary. You should beaware of several magic sizes. A directory that contains entries for up to 30 files (plus the required. and .. ) fitsin a single disk block and can be searched very efficiently. One that has up to 286 entries is still a small file.Anything larger is usually a disaster when used as a working directory. It is especially important to keep logindirectories small, preferably one block at most. Note that, as a rule, directories never shrink.REFERENCES[I]-Bianchi, M. H., and Wood, J. L. A User's Viewpoint on the Programmer's Workbench. Proc. SecondInt. Conf. on Software Engineering, pp. 193-99 (Oct. 13-15, 1976).[2]-Dolotta, T. A., Haight, R. C., and Mashey, J. R. The Programmer's Workbench. The Bell SystemTechnical Journal, Vol. 57, No.6, Part 2, pp. 2177-200 (July-Aug. 1978).[3]-Dolotta, T. A., and Mashey, J. R. An Introduction to the Programmer's Workbench. Proc. SecondInt. Conf. on Software Engineering, pp. 164-68 (Oct. 13-15, 1976).[4]-Dolotta, T. A., and Mashey, J. R. Using a Command Language as the Primary Programming Tool.In: Beech, D. (ed.), Command Language Directions (Proc. of the Second IFIP Working Conf. on CommandLanguages), pp. 35-55. Amsterdam: North Holland (1980).[5]-Kernighan, B. W., and Mashey, J. R. The UNIX Programming Environment. COMPUTER, Vol. 14,No.4, pp. 12-24 (April 1981); an earlier version of this paper was published in Software-Practice &Experience, Vol. 9, No.1, pp. 1-15 (Jan. 1979).[6]-Kernighan, B. W., and PI auger, P. J. Software Tools. Proc. First Nat. Conf. on Software Engineering,pp. 8-13 (Sept. 11-12, 1975).[7]-Kernighan, B. W., and PI auger, P. J. Software Tools. Reading, MA: Addison-Wesley (1976).[B]-Kernighan, B. W., and Ritchie, D. M. The C Programming Language. Englewood Cliffs, NJ: Prentice­Hall (1978).[9]-Ritchie, D. M., and Thompson, K. The UNIX Time-Sharing System. The Bell System TechnicalJournal, Vol. 57, No.6, Part 2, pp. 1905-29 (July-Aug. 1978).[IO]-Snyder, G. A., and Mashey, J. R. UNIX System Documentation Road Map. Bell Laboratories (January1981).Page 74


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>3. THE C <strong>PROGRAMMING</strong> LANGUAGEINTRODUCTIONThis section describes C language as it is implemented and supported on the 3B20S Processor, the PDP*-II,the VAX*-11/780, the Honeywell 6000, the IBM System/370, and the Interdata 8/32. Where differences exist,this section concentrates on the PDP-II but tries to point out implementation-dependent details. With few exceptions,such dependencies follow directly from the properties of the hardware. The various compilers are generallyquite compatible. This section contains the following subsections:• C LANGUAGE-A summary of the grammar and rules of the C programming language.• LIBRARIES-Descriptions of functions and declarations that support C language and how to usethese functions.• THE "cc" COMMAND-The command used to compile C language programs, assemble assembly languageprograms, and produce executable programs is briefly described in terms of usage.• A C PROGRAM CHECKER-"lint"-A program that attempts to detect bugs in C programs duringcompilation.• A SYMBOLIC DEBUGGER-"sdb" -A symbolic debugging program that is used to debug compiledC language programs.Throughout this section, each reference of the form name(IM), name(7), or name(8) refers to entries inthe UNIX System Administrator's Manual. All other references to entries of the form name(N), where "N"is a number (1 through 6) possibly followed by a letter, refer to entry name in section N of the UNIX SystemUser's Manual.*Trademarks of the Digital Equipment Corporation.Page 7S


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82C LANGUAGELEXICAL CONVENTIONSThere are six classes of tokens-identifiers, keywords, constants, strings, operators, and other separators.Blanks, tabs, new-lines, and comments (collectively, "white space") as described below are ignored except asthey serve to separate tokens. Some white space is required to separate otherwise adjacent identifiers,keywords, and constants.If the input stream has been parsed into tokens up to a given character, the next token is taken to includethe longest string of characters which could possibly constitute a token.A. CommentsThe characters 1* introduce a comment which terminates with the characters *1. Comments do not nest.B. Identifiers (Names)An identifier is a sequence of letters and digits. The first character must be a letter. The underscore (_)counts as a letter. Uppercase and lowercase letters are different. No more than the first eight characters aresignificant, although more may be used. External identifiers, which are used by various assemblers and loaders,are more restricted:PDP-11VAX-IIHoneywell 6000IBM 360/370Interdata 8/32WECo 3B207 characters, 2 cases7 characters, 2 cases6 characters, 1 case7 characters, 1 case8 characters, 2 cases8 characters, 2 casesC. KeywordsThe following identifiers are reserved for use as keywords and may not be used otherwise:auto do float register switchbreak double for return typedefcase else goto short unionchar entry if sizeof unsignedcontinue enum int static voiddefault extern long struct whileThe entry keyword is not currently implemented by any compiler but is reserved for future use. Someimplementations also reserve the words fortran and asm .D. ConstantsThere are several kinds of constants. Some of the more important constants are integer, long, character,floating, and enumeration. Hardware characteristics that affect sizes are summarized in "F. Hardware Characteristics"under part "LEXICAL CONVENTIONS".Integer ConstantsAn integer constant consisting of a sequence of digits is taken to be octal if it begins with 0 (digit zero),decimal otherwise. A sequence of digits preceded by Ox or OX (digit zero) is taken to be a hexadecimal integer.Page 76


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>The hexadecimal digits include a or A through for F with values 10 through 15. A decimal constant whose valueexceeds the largest signed machine integer is taken to be long; an octal or hex constant which exceeds the largestunsigned machine integer is likewise taken to be long.Explicit Long ConstantsA decimal, octal, or hexadecimal integer constant immediately followed by I (letter ell) or L is a long constant.As discussed below, on some machines integer and long values may be considered identical.Character ConstantsA character constant is a character enclosed in single quotes, as in 'x'. The value of a character constantis the numerical value of the character in the machine's character set.Certain nongraphic characters, the single quote (') and the backslash (\), may be represented according tothe following table of escape sequences:new-line NL (LF) \nhorizontal tab HTvertical tab VT\t\vbackspace BS \bcarriage return CR \rform feed FF \fbacks lash \,\\single quote\'bit pattern ddd \dddThe escape \ddd consists of the backslash followed by 1, 2, or 3 octal digits which are taken to specify the valueof the desired character. A special case of this construction is \0 (not followed by a digit), which indicates thecharacter NUL. If the character following a backslash is not one of those specified, the backslash is ignored.Floating ConstantsA floating constant consists of an integer part, a decimal point, a fraction part, an e or E, and an optionallysigned integer exponent. The integer and fraction parts both consist of a sequence of digits. Either the integerpart or the fraction part (not both) may be missing. Either the decimal point or the e and the exponent (notboth) may be missing. Every floating constant is taken to be double precision.Enumeration ConstantsNames declared as enumerators (see "E. Structure, Union, and Enumeration Declarations" in part "DEC­LARATIONS") are constants of the corresponding enumeration type. They behave like int constants.E. StringsA string is a sequence of characters surrounded by double quotes, as in " ... " . A string has type "array ofcharacters" and storage class static (see part "NAMES") and is initialized with the given characters. All strings,even when written identically, are distinct. The compiler places a null byte (\0) at the end of each string sothat programs which scan the string can find its end. In a string, the double quote character (" ) must be precededby a \; in addition, the same escapes as described for character constants may be used. Finally, a \ andthe immediately following new-line are ignored.F. Hardware CharacteristicsThe following table summarizes certain hardware properties that vary from machine to machine.Page 77


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82TABLE 3.AHARDWARE CHARACTERISTICSDEC PDP-ll DEC VAX-ll HONEYWELL 6000 IBM 370 INTERDATA 8/32 WECO 3BASCII ASCII ASCII EBCDIC ASCII ASCIIchar 8 bits 8 bits 9 bits 8 bits 8 bits 8 bitsint 16 32 36 32 32 32short 16 16 36 16 16 16long 32 32 36 32 32 32float 32 32 36 32 32 32double 64 64 72 64 64 64float range ±10±38 ±10±38 ±10±38 ±10±76 ±10±76 ±10±38double range ±10±38 ±10±38 ±10±38 ±10±76 ±10±76 ±10±308SYNTAX NOTATIONSyntactic categories are indicated by italic type and literal words and characters in bold type. Alternativecategories are listed on separate lines. An optional terminal or nonterminal symbol is indicated by the subscript"opt," so that{ expression opt }Iindicates an optional expression enclosed in braces. The syntax is summarized in part "SYNTAX SUMMARY".NAMESThe C language bases the interpretation of an identifier upon two attributes of the identifier-its storageclass and its type. The storage class determines the location and lifetime of the storage associated with an identifier;the type determines the meaning of the values found in the identifier's storage.There are four declarable storage classes:• automatic• static• external• register.Automatic variables are local to each invocation of a block (see "B. Compound Statement or Block" in part"STATEMENTS") and are discarded upon exit from the block. Static variables are local to a block but retaintheir values upon reentry to a block even after control has left the block. External variables exist and retaintheir values throughout the execution of the entire program and may be used for communication between functions,even separately compiled functions. Register variables are (if possible) stored in the fast registers of themachine; like automatic variables they are local to each block and disappear on exit from the block.The C language supports several fundamental types of objects. Objects declared as characters (char) arelarge enough to store any member of the implementation's character set. If a genuine character from that characterset is stored in a character variable, its value is equivalent to the integer code for that character. Otherquantities may be stored into character variables, but the implementation is machine dependent.Page 78


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>Up to three sizes of integer, declared short int, int, and long int, are available. Longer integers provideno less storage than shorter ones, but the implementation may make either short integers or long integers, orboth, equivalent to plain integers. "Plain" integers have the natural size suggested by the host machine architecture.The other sizes are provided to meet special needs.Each enumeration (see "E. Structure, Union, and Enumeration Declarations" in part "DECLARATIONS")is conceptually a separate type with its own set of named constants. The properties of an enum type are identicalto those of int type.Unsigned integers, declared unsigned, obey the laws of arithmetic modulo 2 n where n is the number of bitsin the representation. (On the PDP-ll, unsigned long quantities are not supported.)Single-precision floating point (float) and double precision floating point (double) may be synonymousin some implementations.Because objects of the foregoing types can usefully be interpreted as numbers, they will be referred to asarithmetic types. Types char, int of all sizes, and enum will collectively be called integral types. The floatand double types will collectively be called floating types.The void type specifies an empty set of values. It is used as the type returned by functions that generateno value.Besides the fundamental arithmetic types, there is a conceptually infinite class of derived types constructedfrom the f~ndamental types in the following ways:• arrays of objects of most types• functions which return objects of a given type• pointers to objects of a given type• structures containing a sequence of objects of various types• unions capable of containing anyone of several objects of various types.In general these methods of constructing objects can be applied recursively.OBJECTS AND LVALUESAn object is a manipulatable region of storage. An lvalueis an expression referring to an object. An obviousexample of an lvalue expression is an identifier. There are operators which yield lvalues: for example, if E isan expression of pointer type, then *E is an lvalue expression referring to the object to which E points. Thename "lvalue" comes from the assignment expression El = E2 in which the left operand El must be an lvalueexpression. The discussion of each operator below indicates whetherit expects lvalue operands and whether ityields an lvalue.CONVERSIONSA number of operators may, depending on their operands, cause conversion of the value of an operand fromone type to another. This part explains the result to be expected from such conversions. The conversions demandedby most ordinary operators are summarized under "F. Arithmetic Conversions". The summary will besupplemented as required by the discussion of each operator.Page 79


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82A. Characters and IntegersA character or a short integer may be used wherever an integer may be used. In all cases the value is convertedto an integer. Conversion of a shorter integer to a longer always involves sign extension; integers aresigned quantities. Whether or not sign-extension occurs for characters is machine dependent, but it is guaranteedthat a member of the standard character set is non-negative. Of the machines treated here, only the PDP-IIand VAX-II sign-extend. On these machines, char variables range in value from -128 to 127. The more explicittype unsigned char forces the values to range from 0 to 255.On machines that treat characters as signed, the characters of the ASCII set are all positive. However, acharacter constant specified with an octal escape suffers sign extension and may appear negative; for example,'\377' has the value -1.When a longer integer is converted to a shorter or to a char, it is truncated on the left. Excess bits are simplydiscarded.B. Float and DoubleAll floating arithmetic in C is carried out in double precision. Whenever a float appears in an expressionit is lengthened to double by zero padding its fraction. When a double must be converted to float, for exampleby an assignment, the double is rounded before truncation to float length.C. Floating and IntegralConversions of floating values to integral type tend to be rather machine dependent. In particular the directionof truncation of negative numbers varies from machine to machine. The result is undefined if the valuewill not fit in the space provided.Conversions of integral values to floating type are well behaved. Some loss of precision occurs if the destinationlacks sufficient bits.D. Pointers and IntegersAn expression of integral type may be added to or subtracted from a pointer; in such a case, the first is convertedas specified in the discussion of the addition operator. Two pointers to objects of the same type may besubtracted; in this case, the result is converted to an integer as specified in the discussion of the subtractionoperator.E. UnsignedWhenever an unsigned integer and a plain integer are combined, the plain integer is converted to unsignedand the result is unsigned. The value is the least unsigned integer congruent to the signed integer (modulo2wordsize). In a 2's complement representation, this conversion is conceptual; and there is no actual change in thebi t pattern.When an unsigned integer is converted to long, the value of the result is the same numerically as that ofthe unsigned integer. Thus the conversion amounts to padding with zeros on the left.F. Arithmetic ConversionsA great many operators cause conversions and yield result types in a similar way. This pattern will be calledthe "usual arithmetic conversions."• First, any operands of type char or short are converted to int, and any of type float are convertedto double.Page 80


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>G. Void• Then, if either operand is double, the other is converted to double and that is the type of the result.• Otherwise, if either operand is long, the other is converted to long and that is the type of the result.• Otherwise, if either operand is unsigned, the other is converted to unsigned and that is the type ofthe result.• Otherwise, both operands must be int, and that is the type of the result.The (nonexistent) value of a void object may not be used in any way, and neither explicit nor implicit conversionmay be applied. Because a void expression denotes a nonexistent value, such an expression may be usedonly as an expression statement (see "A. Expression Statement" in part "STATEMENTS") or as the left operandof a comma expression (see "Comma Operator" in part "EXPRESSIONS").An expression may be converted to type void by use of a cast. For example, this makes explicit the discardingof the value of a function call used as an expression statement.EXPRESSIONSThe precedence of expression operators is the same as the order of the major subsections of this section,highest precedence first. Thus, for example, the expressions referred to as the operands of + (see "D. AdditiveOperators") are those expressions defined under "A. Primary Expressions", "B. Unary Operators", and "C. MultiplicativeOperators". Within each subpart, the operators have the same precedence. Left- or rightassociativityis specified in each subsection for the operators discussed therein. The precedence and associativityof all the expression operators is summarized in the grammar of part "SYNTAX SUMMARY".Otherwise, the order of evaluation of expressions is undefined. In particular the compiler considers itselffree to compute subexpressions in the order it believes most efficient even if the subexpressions involve sideeffects. The order in which side effects take place is unspecified. Expressions involving a commutative and associativeoperator (*, +, &, I, A) may be rearranged arbitrarily even in the presence of parentheses; to force a particularorder of evaluation, an explicit temporary must be used.The handling of overflow and divide check in expression evaluation is machine dependent. Most existingimplementations of C ignore integer overflows; treatment of division by 0 and all floating-point exceptions variesbetween machines and is usually adjustable by a library function.A. Primary ExpressionsPrimary expressions involving., - >, subscripting, and function calls group left to right.primary-expression:identifierconstantstring( expression)primary-expression [ expression]primary-expression ( expression-list opt )primary-expression. identifierprimary-expression -> identifierexpression-list:expressionexpression-list, expressionPage 81


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82An identifier is a primary expression provided it has been suitably declared as discussed below. Its type is specifiedby its declaration. If the type of the identifier is "array of ... ", however, then the value of the identifier expressionis a pointer to the first object in the array; and the type of the expression is "pointer to ... ". Moreover,an array identifier is not an lvalue expression. Likewise, an identifier which is declared "function returning ... ",when used except in the function-name position of a call, is converted to "pointer to function returning ... ".A constant is a primary expression. Its type may be int, long, or double depending on its form. Characterconstants have type int and floating constants are double.A string is a primary expression. Its type is originally "array of char", but following the same rule givenabove for identifiers, this is modified to "pointer to char" and the result is a pointer to the first character inthe string. (There is an exception in certain initializers; see "F. Initialization" in part "DECLARATIONS".)A parenthesized expression is a primary expression whose type and value are identical to those of the unadornedexpression. The presence of parentheses does not affect whether the expression is an lvalue.A primary expression followed by an expression in square brackets is a primary expression. The intuitivemeaning is that of a subscript. Usually, the primary expression has type "pointer to ... ", the subscript expressionis int, and the type of the result is " ... ". The expression El [E2] is identical (by definition) to *( (El )+(E2)).All the clues needed to understand this notation are contained in this subpart together with the discussions in"B. Unary Operators" and "D. Additive Operators" on identifiers, * and +, respectively. The implications aresummarized under "C. Arrays, Pointers, and Subscripting" in part "TYPES REVISITED".A function call is a primary expression followed by parentheses containing a possibly empty, commaseparatedlist of expressions which constitute the actual arguments to the function. The primary expressionmust be of type "function returning ... ", and the result of the function call is of type " ... ". As indicated below,a hitherto unseen identifier followed immediately by a left parenthesis is contextually declared to representa function returning an integer; thus in the most common case, integer-valued functions need not be declared.Any actual arguments of type float are converted to double before the call. Any of type char or shortare converted to into Array names are converted to pointers. No other conversions are performed automatically;in particular, the compiler does not compare the types of actual arguments with those of formal arguments.If conversion is needed, use a cast; see "B. Unary Operators" and "G. Type Names" in part "DECLARATIONS".In preparing for the call to a function, a copy is made of each actual parameter. Thus, all argument passingin C is strictly by value. A function may change the values of its formal parameters, but these changes cannotaffect the values of the actual parameters. It is possible to pass a pointer on the understanding that the functionmay change the value of the object to which the pointer points. An array name is a pointer expression. The orderof evaluation of arguments is undefined by the language; take note that the various compilers differ. Recursivecalls to any function are permitted.A primary expression followed by a dot followed by an identifier is an expression. The first expression mustbe a structure or a union, and the identifier must name a member of the structure or union. The value is thenamed member of the structure or union, and it is an lvalue if the first expression is an lvalue.A primary expression followed by an arrow (built from - and» followed by an identifier is an expression.The first expression must be a pointer to a structure or a union and the identifier must name a member of thatstructure or union. The result is an lvalue referring to the named member of the structure or union to whichthe pointer expression points. Thus the expression El->MOS is the same as (*El).MOS. Structures andunions are discussed in "E. Structure, Union, and Enumeration Declarations" in part "DECLARATIONS".B. Unary OperatorsExpressions with unary operators group right to left.Page 82


6/82ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>unary -expression:* expression& lvalue- expression! expressionexpression++ lvalue--lvaluelvalue ++lvalue --(type-name) expressionsizeof expressionsize of ( type-name)The unary * operator means indirection: the expression must be a pointer, and the result is an lvalue referringto the object to which the expression points. If the type of the expression is "pointer to ... ", the type of the resultis " ... ".The result of the unary & operator is a pointer to the object referred to by the lvalue. If the type of the lvalueis " ... ", the type of the result is "pointer to ... ".The result of the unary - operator is the negative of its operand. The usual arithmetic conversions are performed.The negative of an unsigned quantity is computed by subtracting its value from 2 n where n is the numberof bits in an into There is no unary + operator.The result of the logical negation operator! is 1 if the value of its operand is zero, zero if the value of itsoperand is nonzero. The type of the result is into It is applicable to any arithmetic type or to pointers.The - operator yields the l's complement of its operand. The usual arithmetic conversions are performed.The type of the operand must be integral.The object referred to by the lvalue operand of prefix ++ is incremented. The value is the new value of theoperand but is not an lvalue. The expression ++x is equivalent to x+= 1. See the discussions "D. Additive Operators"and "N. Assignment Operators" for information on conversions.The lvalue operand of prefix -- is decremented analogously to the prefix ++ operator.When postfix ++ is applied to an lvalue, the result is the value of the object referred to by the lvalue. Afterthe result is noted, the object is incremented in the same manner as for the prefix ++ operator. The type ofthe result is the same as the type of the lvalue expression.When postfix -- is applied to an lvalue, the result is the value of the object referred to by the lvalue. Afterthe result is noted, the object is decremented in the manner as for the prefix -- operator. The type of the resultis the same as the type of the lvalue expression.An expression preceded by the parenthesized name of a data type causes conversion of the value of the expressionto the named type. This construction is called a cast. Type names are described under "G. Type Names"in part "Declarations".The sizeof operator yields the size in bytes of its operand. (A byte is undefined by the language except interms of the value of sizeof. However, in all existing implementations, a byte is the space required to hold achar.) When applied to an array, the result is the total number of bytes in the array. The size is determinedfrom the declarations of the objects in the expression. This expression is semantically an unsigned constantPage 83


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82and may be used anywhere a constant is required. Its major use is in communication with routines like storageallocators and I/O systems.The sizeof operator may also be applied to a parenthesized type name. In that case it yields the size in bytesof an object of the indicated type.The construction sizeof( type) is taken to be a unit, so the expression sizeof( type)-2 is the same as(sizeof( type) )-2.c. Multiplicative OperatorsThe multiplicative operators *, /, and % group left to right. The usual arithmetic conversions are performed.multiplicative expression:expression * expressionexpression / expressionexpression % expressionThe binary * operator indicates multiplication. The * operator is associative and expressions with severalmultiplications at the same level may be rearranged by the compiler.The binary / operator indicates division. When positive integers are divided, truncation is toward 0; but theform of truncation is machine-dependent if either operand is negative. On all machines covered by this manual,the remainder has the same sign as the dividend. It is always true that (a/b)*b + a%b is equal to a (if b isnot 0).The binary % operator yields the remainder from the division of the first expression by the second. Theusual arithmetic conversions are performed. The operands must not be floating.D. Additive OperatorsThe additive operators + and - group left to right. The usual arithmetic conversions are performed. Thereare some additional type possibilities for each operator.additive-expression:expression + expressionexpression - expressionThe result of the + operator is the sum of the operands. A pointer to an object in an array and a value of anyintegral type may be added. The latter is in all cases converted to an address offset by multiplying it by thelength of the object to which the pointer points. The result is a pointer of the same type as the original pointer,and which points to another object in the same array, appropriately offset from the original object. Thus if Pis a pointer to an object in an array, the expression P+ 1 is a pointer to the next object in the array. No furthertype combinations are allowed for pointers.The + operator is associative and expressions with several additions at the same level may be rearrangedby the compiler.The result of the - operator is the difference of the operands. The usual arithmetic conversions are performed.Additionally, a value of any integral type may be subtracted from a pointer, and then the same conversionsfor addition apply.If two pointers to objects of the same type are subtracted, the result is converted (by division by the lengthof the object) to an int representing the number of objects separating the pointed-to objects. This conversionPage 84


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>will in general give unexpected results unless the pointers point to objects in the same array, since pointers,even to objects of the same type, do not necessarily differ by a multiple of the object length.E. Shift OperatorsThe shift operators < < and> > group left to right. Both perform the usual arithmetic conversions on theiroperands, each of which must be integral. Then the right operand is converted to int; the type of the result isthat of the left operand. The result is undefined if the right operand is negative or greater than or equal to thelength of the object in bits.shift-expression:expression < < expressionexpression > > expressionThe value of El«E2 is El (interpreted as a bit pattern) left-shifted E2 bits. Vacated bits are 0 filled. Thevalue of El»E2 is El right-shifted E2 bit positions. The right shift is guaranteed to be logical (0 fill) if Elis unsigned; otherwise, it may be arithmetic (fill by a copy of the sign bit).F. Relational Operatorsto.The relational operators group left to right. This fact is not very useful; a (greater than), < = (less than or equal to), and> = (greater than or equal to) allyield 0 if the specified relation is false and 1 if it is true. The type of the result is into The usual arithmetic conversionsare performed. Two pointers may be compared; the result depends on the relative locations in the addressspace of the pointed-to objects. Pointer comparison is portable only when the pointers point to objectsin the same array.G. Equality Operatorsequality-expression:expression == expressionexpression != expressionThe == (equal to) and the!= (not equal to) operators are exactly analogous to the relational operators exceptfor their lower precedence. (Thus a


<strong>PROGRAMMING</strong> <strong>GUIDE</strong>ISSUE 1 6/82I. Bitwise Exclusive OR Operatorexclusive-or-expression:expression A expressionThe A operator is associative, and expressions involving A may be rearranged. The usual arithmetic conversionsare performed; the result is the bitwise exclusive OR function of the operands. The operator applies onlyto integral operands.J. Bitwise Inclusive OR Operatorinclusive-or-expression:expression I expressionThe I operator is associative, and expressions involving I maybe rearranged. The usual arithmetic conversionsare performed; the result is the bitwise inclusive OR function of its operands. The operator applies onlyto integral operands.K. Logical AND Operatorlogical-and-expression:expression && expressionThe && operator groups left to right. It returns l' if both its operands are nonzero, 0 otherwise. Unlike &,&& guarantees left to right evaluation; moreover, the second operand is not evaluated if the first operand iso.The operands need not have the same type, but each must have one of the fundamental types or be a pointer.The result is always intoL. Logical OR Operatorlogical-or-expression:expression II expressionThe II operator groups left to right. It returns 1 if either of its operands is nonzero, 0 otherwise. Unlike I,II guarantees left to right evaluation; moreover, the second operand is not evaluated if the value of the first operandis nonzero.The operands need not have the same type, but each must have one of the fundamental types or be a pointer.The result is always intoM. Conditional Operatorconditional-expression:expression ? expression : expressionConditional expressions group right to left. The first expression is evaluated; and if it is nonzero, the resultis the value of the second expression, otherwise that of third expression. If possible, the usual arithmetic conversionsare performed to bring the second and third expressions to a common type. If both pointers are of the sametype, the result has the common type. Otherwise, one must be a pointer and the other the constant 0, and theresult has the type of the pointer. Only one of the second and third expressions is evaluated.Page 86


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>N. Assignment OperatorsThere are a number of assignment operators, all of which group right to left. All require an lvalue as theirleft operand, and the type of an assignment expression is that of its left operand. The value is the value storedin the left operand after the assignment has taken place. The two parts of a compound assignment operatorare separate tokens.assignmen t-expressi on:lvalue = expressionlvalue += expressionlvalue -= expressionlvalue *= expressionlvalue /= expressionlvalue % = expressionlvalue > > = expressionlvalue < < = expressionlvalue &= expressionlvalue A = expressionlvalue 1= expressionIn the simple assignment with =, the value of the expression replaces that of the object referred to by thelvalue. If both operands have arithmetic type, the right operand is converted to the type of the left preparatoryto the assignment. Second, both operands may be structures or unions of the same type. Finally, if the left operandis a pointer, the right operand must in general be a pointer of the same type. However, the constant 0 maybe assigned to a pointer; and it is guaranteed that this value will produce a null pointer distinguishable froma pointer to any object.The behavior of an expression of the form El op = E2 may be inferred by taking it as equivalent to El= El op (E2); however, El is evaluated only once. In += and -=, the left operand maybe a pointer; in whichcase, the (integral) right operand is converted as explained in "D. Additive Operators". All right operands andall nonpointer left operands must have arithmetic type.O. Comma Operatorcomma-expression:expression, expressionA pair of expressions separated by a comma is evaluated left to right, and the value of the left expressionis discarded. The type and value of the result are the type and value of the right operand. This operator groupsleft to right. In contexts where comma is given a special meaning, e.g., in lists of actual arguments to functions("A. Primary Expressions") and lists of initializers ("F. Initialization" in part "Declarations"), the comma operatoras described in this subpart can only appear in parentheses. For example,f(a, (t=3, t+2), c)has three arguments, the second of which has the value 5.DECLARATIONSDeclarations are used to specify the interpretation which C gives to each identifier; they do not necessarilyreserve storage associated with the identifier. Declarations have the formdeclaration:decl-specifiers declarator-list oPt;Page 87


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82The declarators in the declarator-list contain the identifiers being declared. The decl-specifiers consist of a sequenceof type and storage class specifiers.decl-specifiers:type-specifier decl-specifiers optsc-specifier decl-specifiers optThe list must be self-consistent in a way described below.A. Storage Class SpecifiersThe sc-specifiers are:sc-specifier:autostaticexternregistertypedefThe typedef specifier does not reserve storage and is called a "storage class specifier" only for syntactic convenience.See "H. Typedef" for more information. The meanings of the various storage classes were discussed inpart "Names".The auto, static, and register declarations also serve as definitions in that they cause an appropriateamount of storage to be reserved. In the extern case, there must be an external definition (see part "ExternalDefinitions") for the given identifiers somewhere outside the function in which they are declared.A register declaration is best thought of as an auto declaration, together with a hint to the compiler thatthe variables declared will be heavily used. Only the first few such declarations are effective. Moreover, onlyvariables of certain types will be stored in registers; on the PDP-II, they are int or pointer. One other restrictionapplies to register variables: the address-of operator & cannot be applied to them. Smaller, faster programscan be expected if register declarations are used appropriately, but future improvements in code generation mayrender them unnecessary.At most one sc-specifier may be given in a declaration. If the sc-specifier is missing from a declaration, itis taken to be auto inside a function, extern outside. Exception: functions are never automatic.B. Type SpecifiersThe type-specifiers aretype-specifier:charshortintlongunsignedfloatdoublevoidstruct-or-union-specifiertypedef-nameenum-specifierPage 88


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>The words long, short, and unsigned may be thought of as adjectives. The following combinations are acceptable.short intlong intunsigned intunsigned charlong floatThe meaning of the last is the same as double. Otherwise, at most one type-specifier may be given in a declaration.If the type-specifier is missing from a declaration, it is taken to be intoSpecifiers for structures, unions, and enumerations are discussed in "E. Structure, Union, and EnumerationDeclarations". Declarations with typedef names are discussed in "R. Typedef".C. DeclaratorsThe declarator-list appearing in a declaration is a comma-separated sequence of declarators, each of whichmay have an initializer.declara tor-list:ini t-declara torinit-declarator , declarator-listinit-declarator:declarator initializer oPtInitializers are discussed in "F. Initialization". The specifiers in the declaration indicate the type and storageclass of the objects to which the declarators refer. Declarators have the syntax:declarator:identifier( declarator)* declaratordeclarator ()declarator [ constant-expression opt ]The grouping is the same as in expressions.D. Meaning of DeclaratorsEach declarator is taken to be an assertion that when a construction of the same form as the declaratorappears in an expression, it yields an object of the indicated type and storage class.Each declarator contains exactly one identifier; it is this identifier that is declared. If an unadorned identifierappears as a declarator, then it has the type indicated by the specifier heading the declaration.A declarator in parentheses is identical to the unadorned declarator, but the binding of complex declaratorsmay be altered by parentheses. See the examples below.Now imagine a declarationT Dlwhere T is a type-specifier (like int, etc.) and DI is a declarator. Suppose this declaration makes the identifierhave type " ... T," where the " ... " is empty if DI is just a plain identifier (so that the type of x in "int x" is justint). Then if DI has the form*DPage 89


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82the type of the contained identifier is " ... pointer to T."If D 1 has the formDOthen the contained identifier has the type " ... function returning T."orIf D 1 has the formD [constan t-expression]D[]then the contained identifier has type " ... array of T." In the first case, the constant expression is an expressionwhose value is determinable at compile time and whose type is into (Constant expressions are defined preciselyin part "Constant Expressions".) When several "array of" specifications are adjacent, a multidimensional arrayis created; the constant expressions which specify the bounds of the arrays may be missing only for the firstmember of the sequence. This elision is useful when the array is external and the actual definition, which allocatesstorage, is given elsewhere. The first constant expression may also be omitted when the declarator is followedby initialization. In this case the size is calculated from the number of initial elements supplied.An array may be constructed from one of the basic types, from a pointer, from a structure or union, or fromanother array (to generate a multidimensional array).Not all the possibilities allowed by the syntax above are actually permitted. The restrictions are as follows:functions may not return arrays or functions although they may return pointers to such things; there are noarrays of functions although there may be arrays of pointers to functions. Likewise, a structure or union maynot contain a function; but it may contain a pointer to a function.As an example, the declarationint i, *ip, fO, *fipO, (*pfi)O;declares an integer i, a pointer ip to an integer, a function f returning an integer, a function fip returning apointer to an integer, and a pointer pfi to a function which returns an integer. It is especially useful to comparethe last two. The binding of *fip() is *(fip(». The declaration suggests, and the same construction in an expressionrequires, the calling of a function fip. Using indirection through the (pointer) result to yield an integer.In the declarator (*pfi)(), the extra parentheses are necessary, as they are also in an expression, to indicatethat indirection through a pointer to a function yields a function, which is then called; it returns an integer.As another example,float fa[17], *afp[17];declares an array of float numbers and an array of pointers to float numbers. Finally,static int x3d[3] [5] [7];declares a static 3-dimensional array of integers, with rank 3X5X7. In complete detail, X3d is an array of threeitems; each item is an array of five arrays; each of the latter arrays is an array of seven integers. Any of thePage 90


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>expressions X3d, X3d[i], X3d[i] [j], X3d[i] [j] [k] may reasonably appear in an expression. The first three havetype "array," the last has type intoE. Structure, Union, and Enumeration DeclarationsA structure is an object consisting of a sequence of named members. Each member may have any type. Aunion is an object which may, at a given time, contain anyone of several members. Structure and union specifiershave the same form.struct-or-union-specifier:struct-or-union { struct-decl-list }struct-or-union identifier { struct-decl-list }struct-or-union iden tifierstruct-or-union:structunwnThe struct-decl-list is a sequence of declarations for the members of the structure or union:struct-decl-list:struct-declara tionstruct-declaration struct-decl-Jiststruct-declara tion:type-specifier struct-declarator-list ;struct-declarator-list:struct-declara torstruct-declarator, struct-declarator-listIn the usual case, a struct-declarator is just a declarator for a member of a structure or union. A structure membermay also consist of a specified number of bits. Such a member is also called a field; its length is set off fromthe field name by a colon.struct-declara tor:declaratordeclarator: constant-expression: constant-expressionWithin a structure, the objects declared have addresses which increase as the declarations are read left to right.Each nonfield member of a structure begins on an addressing boundary appropriate to its type; therefore, theremay be unnamed holes in a structure. Field members are packed into machine integers; they do not straddlewords. A field which does not fit into the space remaining in a word is put into the next word. No field maybe wider than a word.Fields are assigned right to left on the PDP-II and VAX-II, left to right on other machines.A struct-declarator with no declarator, only a colon and a width, indicates an unnamed field useful for paddingto conform to externally-imposed layouts. As a special case, an unnamed field with a width of 0 specifiesalignment of the next field at a word boundary. The "next field" presumably is a field, not an ordinary structuremember because in the latter case the alignment would have been automatic.The language does not restrict the types of things that are declared as fields, but implementations are notrequired to support any but integer fields. Moreover, even int fields may be considered to be unsigned. On thePage 91


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82PDP-II, fields are not signed and have only integer values; on the VAX-II, fields declared with int are treatedas containing a sign. For these reasons, it is strongly recommended that fields be declared as unsigned. In allimplementations, there are no arrays of fields, and the address-of operator & may not be applied to them, sothat there are no pointers to fields.A union may be thought of as a structure all of whose members begin at offset 0 and whose size is sufficientto contain any of its members. At most one of the members can be stored in a union at any time.A structure or union specifier of the second form, that is, one ofstruct identifier { struct-decl-list }union identifier { struct-decl-list}declares the identifier to be the structure tag (or union tag) of the structure specified by the list. A subsequentdeclaration may then use the third form of specifier, one ofstruct identifierunion identifierStructure tags allow definition of self-referential structures. Structure tags also permit the long part of the declarationto be given once and used several times. It is illegal to declare a structure or union which contains aninstance of itself, but a structure or union may contain a pointer to an instance of itself.The names of members and tags do not conflict with each other or with ordinary variables. A particularname may not be used twice in the same structure, but the same name may be used in several different structuresin the same scope.A simple example of a structure declaration isstruct tnode{char tword [20] ;int count;struct tnode *left;struct tnode *right;};which contains an array of 20 characters, an integer, and two pointers to similar structures. Once this declarationhas been given, the declarationstruct tnode s, *sp;declares s to be a structure of the given sort and sp to be a pointer to a structure of the given sort. With thesedeclarations, the expressionsp->countrefers to the count field of the structure to which sp points;s.leftrefers to the left subtree pointer of the structure s; ands.right->tword[O]Page 92


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>refers to the first character of the tword member of the right subtree of s.Enumerations are unique types with named constants. However, the current language treats enumerationvariables and constants as being of int type.en um -specifier:enum { enum-list }enum identifier { enum-list }enum identifierenum-list:enumeratorenum-list, enumeratorenumerator:identifieridentifier = constant-expressionThe identifiers in an enum-list are declared as constants and may appear wherever constants are required. Ifno enumerators with = appear, then the values of the corresponding constants begin at ° and increase by 1 asthe declaration is read from left to right. An enumerator with = gives the associated identifier the value indicated;subsequent identifiers continue the progression from the assigned value.The names of enumerators in the same scope must all be distinct from each other and from those of ordinaryvariables.The role of the identifier in the enum-specifier is entirely analogous to that of the structure tag in a structspecifier;it names a particular enumeration. For example,enum color { chartreuse, burgundy, claret=20, winedark };enum color *cp, col;col = claret;cp = &col;if (*cp == burgundy) ...makes color the enumeration-tag of a type describing various colors, and then declares cp as a pointer to anobject of that type, and col as an object of that type. The possible values are drawn from the set {O,1,20,21}.F. InitializationA declarator may specify an initial value for the identifier being declared. The initializer is preceded by =and consists of an expression or a list of values nested in braces.initializer:= expression= { initializer-list }= { initializer-list , }initializer-list:expressioninitializer-list, initializer-list{ initializer-list }All the expressions in an initializer for a static or external variable must be constant expressions, whichare described in part "CONSTANT EXPRESSIONS", or expressions which reduce to the address of a previouslyPage 93


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82declared variable, possibly offset by a constant expression. Automatic or register variables may be initializedby arbitrary expressions involving constants and previously declared variables and functions.Static and external variables which are not initialized are guaranteed to start off as O. Automatic and registervariables which are not initialized are guaranteed to start off as garbage.When an initializer applies to a scalar (a pointer or an object of arithmetic type), it consists of a single expression,perhaps in braces. The initial value of the object is taken from the expression; the same conversionsas for assignment are performed.When the declared variable is an aggregate (a structure or array), the initializer consists of a braceenclosed,comma-separated list of initializers for the members of the aggregate written in increasing subscriptor member order. If the aggregate contains subaggregates, this rule applies recursively to the members of theaggregate. If there are fewer initializers in the list than there are members of the aggregate, then the aggregateis padded with O's. It is not permitted to initialize unions or automatic aggregates.Braces may be elided as follows. If the initializer begins with a left brace, then the succeeding commaseparatedlist of initializers initializes the members of the aggregate; it is erroneous for there to be moreinitializers than members. If, however, the initializer does not begin with a left brace, then only enough elementsfrom the list are taken to account for the members of the aggregate; any remaining members are left to initializethe next member of the aggregate of which the current aggregate is a part.A final abbreviation allows a char array to be initialized by a string. In this case successive characters ofthe string initialize the members of the array.For example,int x [ J = { I, 3, 5 };declares and initializes x as a I-dimensional array which has three members, since no size was'specified andthere are three initializers.float y [4] [3]{{ I, 3, 5 },{ 2, 4, 6 },{ 3, 5, 7 },};is a completely-bracketed initialization: 1, 3, and 5 initialize the first row of the array y[O], namely y[O][O],y[O][I], and y[O][2]. Likewise, the next two lines initialize y[l] and y[2]. The initializer ends early and thereforey[3] is initialized with o. Precisely, the same effect could have been achieved byfloat y [4] [3] ={1,3,5,2,4,6,3,5,7};The initializer for y begins with a left brace but that for y[O] does not; therefore, three elements from the listare used. Likewise, the next three are taken successively for y[l] and y[2]. Also,float y [4] [3] ={{ 1 }, { 2 }, { 3 }, { 4 }};Page 94


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>initializes the first column of y (regarded as a 2-dimensional array) and leaves the rest o.Finally,charmsg[] ="Syntax error on line %s\n";shows a character array whose members are initialized with a string.G. Type NamesIn two contexts (to specify type conversions explicitly by means of a cast and as an argument of sizeof),it is desired to supply the name of a data type. This is accomplished using a "type name", which in essence isa declaration for an object of that type which omits the name of the object.type-name:type-specifier abstract-declaratorabstract-declarator:empty( abstract-declarator)* abstract-declaratorabstract-declarator ()abstract-declarator [ constant-expression opt ]To avoid ambiguity, in the construction( abstract-declarator)the abstract-declarator is required to be nonempty. Under this restriction, it is possible to identify uniquely thelocation in the abstract-declarator where the identifier would appear if the construction were a declarator ina declaration. The named type is then the same as the type of the hypothetical identifier. For example,intint *int * [3]int (*) [3]int *()int (*)0name respectively the types "integer", "pointer to integer", "array of 3 pointers to integers", "pointer to anarray of 3 integers","function returning pointer to integer", and "pointer to function returning an integer".H. TypedefDeclarations whose "storage class" is typedef do not define storage but instead define identifiers whichcan be used later as if they were type keywords naming fundamental or derived types.typedef-name:identifierWithin the scope of a declaration involving typedef, each identifier appearing as part of any declarator thereinbecomes syntactically equivalent to the type keyword naming the type associated with the identifier in the waydescribed in "D. Meaning of Declarators". For example, aftertypedef int MILES, *KLICKSP;typedef struct { double re, im; } complex;Page 9S


<strong>PROGRAMMING</strong> <strong>GUIDE</strong>ISSUE 1 6/82the constructionsMILES distance;extern KLICKSP metricp;complex z, *zp;are all legal declarations; the type of distance is int, that of metricp is "pointer to int," and that of z is thespecified structure. The zp is a pointer to such a structure.The typedef does not introduce brand new types, only synonyms for types which could be specified in anotherway. Thus in the example above distance is considered to have exactly the same type as any other intobject.STATEMENTSExcept as indicated, statements are executed in sequence.A. Expression StatementMost statements are expression statements, which have the formexpression;Usually expression statements are assignments or function calls.B. Compound Statement or BlockSo that several statements can be used where one is expected, the compound statement (also, and equivalently,called "block") is provided:compound-statement:{ declaration-list oPtstatement-list oPt}declaration-list:declarationdeclaration declaration-liststatement-list:statementstatement statement-listIf any of the identifiers in the declaration-list were previously declared, the outer declaration is pushed downfor the duration of the block, after which it resumes its force.Any initializations of auto or register variables are performed each time the block is entered at the top.It is currently possible (but a bad practice) to transfer into a block; in that case the initializations are not performed.Initializations of static variables are performed only once when the program begins execution. Insidea block, extern declarations do not reserve storage so initialization is not permitted.C. Conditional StatementThe two forms of the conditional statement areif ( expression) statementif ( expression) statement else statementPage 96


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>In both cases, the expression is evaluated; and if it is nonzero, the first sub statement is executed. In the secondcase, the second substatement is executed if the expression is O. As usual the "else" ambiguity is resolved byconnecting an else with the last encountered else-less if.D. While StatementThe while statement has the formwhile ( expression) statementThe sub statement is executed repeatedly so long as the value of the expression remains nonzero. The test takesplace before each execution of the statement.E. Do StatementThe do statement has the formdo statement while ( expression) ;The substatement is executed repeatedly until the value of the expression becomes O. The test takes place aftereach execution of the statement.F. For StatementThe for statement has the form:for ( expression-l opt ; expression-2 opt ; expression-3 opt ) statementThis statement is equivalent toexpression-l ;while ( expression-2 ){statementexpression-3 ;Thus the first expression specifies initialization for the loop; the second specifies a test, made before each iteration,such that the loop is exited when the expression becomes O. The third expression often specifies anincrementing that is performed after each iteration.Any or all of the expressions may be dropped. A missing expression-2 makes the implied while clause equivalentto while(l); other missing expressions are simply dropped from the expansion above.G. Switch StatementThe switch statement causes control to be transferred to one of several statements depending on the valueof an expression. It has the formswitch ( expression) statementThe usual arithmetic conversion is performed on the expression, but the result must be into The statement istypically compound. Any statement within the statement may be labeled with one or more case prefixes as follows:case constant-expression:Page 97


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82where the constant expression must be into No two of the case constants in the same switch may have the samevalue. Constant expressions are precisely defined in part "CONSTANT EXPRESSIONS".There may also be at most one statement prefix of the formdefault :When the switch statement is executed, its expression is evaluated and compared with each case constant. Ifone of the case constants is equal to the value of the expression, control is passed to the statement followingthe matched case prefix. If no case constant matches the expression and if there is a default, prefix, controlpasses to the prefixed statement. If no case matches and if there is no default, then none of the statementsin the switch is executed.The prefixes case and default do not alter the flow of control, which continues unimpeded across such prefixes.To exit from a switch, see "R. Break Statement".Usually, the statement that is the subject of a switch is compound. Declarations may appear at the headof this statement, but initializations of automatic or register variables are ineffective.H. Break StatementThe statementbreak;causes termination of the smallest enclosing while, do, for, or switch statement; control passes to the statementfollowing the terminated statement.LContinue StatementThe statementcontinue;causes control to pass to the loop-continuation portion of the smallest enclosing while, do, or for statement;that is to the end of the loop. More precisely, in each of the statementswhile (...) do for (...){ { {contin: ; contin: ; contin: ;} } while (...); }a continue is equivalent to goto contino (Following the contin: is a null statement, see "M. Null Statement".)J. Return StatementA function returns to its caller by means of the return statement, which has one of the formsreturn;return expression;In the first case, the returned value is undefined. In the second case, the value of the expression is returned tothe caller of the function. If required, the expression is converted, as if by assignment, to the type of the functionin which it appears. Flowing off the end of a function is equivalent to a return with no returned value.Page 98


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>K. Goto StatementControl may be transferred unconditionally by means of the statementgoto identifier;The identifier must be a label (see "L. Labeled Statement") located in the current function.L. Labeled StatementAny statement may be preceded by label prefixes of the formidentifier:which serve to declare the identifier as a label. The only use of a label is as a target of a goto. The scope ofa label is the current function, excluding any subblocks in which the same identifier has been redeclared. Seepart "SCOPE RULES".M. Null StatementThe null statement has the formA null statement is useful to carry a label just before the} of a compound statement or to supply a null bodyto a looping statement such as while.EXTERNAL DEFINITIONSA C program consists of a sequence of external definitions. An external definition declares an identifier tohave storage class extern (by default) or perhaps static, and a specified type. The type-specifier (see "B. TypeSpecifiers" in part "DECLARATIONS") may also be empty, in which case the type is taken to be into The scopeof external definitions persists to the end of the file in which they are declared just as the effect of declarationspersists to the end of a block. The syntax of external definitions is the same as that of all declarations exceptthat only at this level may the code for functions be given.A. External Function DefinitionsFunction definitions have the formfunction-definition:decl-specifiers optfunction-declarator function-bodyThe only sc-specifiers allowed among the decl-specifiers are extern or static; see "B. Scope of Externals" inpart "SCOPE RULES" for the distinction between them. A function declarator is similar to a declarator fora "function returning ... " except that it lists the formal parameters of the function being defined.function-declarator:declarator ( parameter-list oPt)parameter-list:identifieridentifier, parameter-listThe function-body has the formfunction-body:declaration-list compound-statementPage 99


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82The identifiers in the parameter list, and only those identifiers, may be declared in the declaration list. Anyidentifiers whose type is not given are taken to be into The only storage class which may be specified is register;if it is specified, the corresponding actual parameter will be copied, if possible, into a register at the outset ofthe function.A simple example of a complete function definition is/int max(a, b, c)int a, b, c;int m;m = (a > b) ? a : b;return«m > c) ? m : c);Here int is the type-specifier; max(a, b, c) is the function-declarator; int a, b, c; is the declaration-list for theformal parameters; { ... } is the block giving the code for the statement.The C program converts all float actual parameters to double, so formal parameters declared float havetheir declaration adjusted to read double. Also, since a reference to an array in any context (in particular asan actual parameter) is taken to mean a pointer to the first element of the array, declarations of formal parametersdeclared "array of ... " are adjusted to read "pointer to ... ".B. External Data DefinitionsAn external data definition has the formdata-definition:declarationThe storage class of such data may be extern (which is the default) or static but not auto or register.SCOPE RULESA C program need not all be compiled at the same time. The source text of the program may be kept in severalfiles, and precompiled routines may be loaded from libraries. Communication among the functions of a programmay be carried out both through explicit calls and through manipulation of external data.Therefore, there are two kinds of scope to consider: first, what may be called the lexical scope of an identifier,which is essentially the region of a program during which it may be used without drawing "undefined identifier"diagnostics; and second, the scope associated with external identifiers, which is characterized by the rulethat references to the same external identifier are references to the same object.A. Lexical ScopeThe lexical scope of identifiers declared in external definitions persists from the definition through the endof the source file in which they appear. The lexical scope of identifiers which are formal parameters persiststhrough the function with which they are associated. The lexical scope of identifiers declared at the head of ablock persists until the end of the block. The lexical scope of labels is the whole of the function in which theyappear.In all cases, however, if an identifier is explicitly declared at the head of a block, including the block constitutinga function, any declaration of that identifier outside the block is suspended until the end of the block.Page 100


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>Remember also ("E. Structure, Union, and Enumeration Declarations" in part "DECLARATIONS") thatidentifiers associated with ordinary variables on the one hand and those associated with structure and unionmembers and tags on the other form two disjoint classes which do not conflict. Members and tags follow thesame scope rules as other identifiers. The typedef names are in the same class as ordinary identifiers. Theymay be redeclared in inner blocks, but an explicit type must be given in the inner declaration:typedef float distance;auto int distance;The int must be present in the second declaration, or it would be taken to be a declaration with no declaratorsand type distance.B. Scope of ExternalsIf a function refers to an identifier declared to be extern, then somewhere among the files or libraries constitutingthe complete program there must be an external definition for the identifier. All functions in a givenprogram which refer to the same external identifier refer to the same object, so care must be taken that thetype and size specified in the definition are compatible with those specified by each function which referencesthe data.The appearance of the extern keyword in an external definition indicates that storage for the identifiersbeing declared will be allocated in another file. Thus in a multifile program, an external data definition withoutthe extern specifier must appear in exactly one of the files. Any other files which wish to give an external definitionfor the identifier must include the extern in the definition. The identifier can be initialized only in thedeclaration where storage is allocated.Identifiers declared static at the top level in external definitions are not visible in other files. Functionsmay be declared static.COMPILER CONTROL LINESThe C compiler contains a preprocessor capable of macro substitution, conditional compilation, and inclusionof named files. Lines beginning with # communicate with this preprocessor. These lines have syntax independentof the rest of the language; they may appear anywhere and have effect which lasts (independent ofscope) until the end of the source program file.A. Token ReplacementA compiler-control line of the form#define identifier token-stringcauses the preprocessor to replace subsequent instances of the identifier with the given string of tokens. Semicolonsin or at the end of the token-string are part of that string. A line of the form#define identifier( identifier, ... , identifier) token-stringwhere there is no space between the first identifier and the (, is a macro definition with arguments. Subsequentinstances of the first identifier followed by a (, a sequence of tokens delimited by commas, and a ) are replacedPage 101


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82by the token string in the definition; Each occurrence of an identifier mentioned in the formal parameter listof the definition is replaced by the corresponding token string from the call. The actual arguments in the callare token strings separated by commas; however, commas in quoted strings or protected by parentheses do notseparate arguments. The number of formal and actual parameters must be the same. Strings and character constantsin the token-string are scanned for formal parameters, but strings and character constants in the restof the program are not scanned for defined identifiers to replacement.In both forms the replacement string is rescanned for more defined identifiers. In both forms a long definitionmay be continued on another line by writing \ at the end of the line to be continued.This facility is most valuable for definition of "manifest constants," as in#define TABSIZE 100int table [TABSIZE];A control line of the form#undef identifiercauses the identifier's preprocessor definition to be forgotten.B. File InclusionA compiler control line of the form#include " filename"causes the replacement of that line by the entire contents of the file filename. The named file is searched forfirst in the· directory of the original source file and then in a sequence of specified or standard places. Al ternatively,a control line of the form#include searches only the specified or standard places and not the directory of the source file. (How the places are specifiedis not part of the language.)#include's may be nested.C. Conditional CompilationA compiler control line of the form#if constant-expressionchecks whether the constant expression evaluates to nonzero. (Constant expressions are discussed in part "CON­STANT EXPRESSIONS"; the following additional restriction applies here: the constant expression may notcontain sizeof or an enumeration constant.) A control line of the form#ifdef identifierchecks whether the identifier is currently defined in the preprocessor; i.e., whether it has been the subject ofa #define control line. A control line of the form#ifndef identifierPage 102


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>checks whether the identifier is currently undefined in the preprocessor.All three forms are followed by an arbitrary number of lines, possibly containing a control line#eIseand then by a control line#endifIf the checked condition is true, then any lines between #else and #endif are ignored. If the checked conditionis false, then any lines between the test and a #else or, lacking a leIse, the #endif are ignored.These constructions may be nested.D. Line ControlFor the benefit of other preprocessors which generate C programs, a line of the form#line constant" filename"causes the compiler to believe, for purposes of error diagnostics, that the line number of the next source lineis given by the constant and the current input file is named by the identifier. If the identifier is absent, the rememberedfile name does not change.IMPLICIT DECLARATIONSIt is not always necessary to specify both the storage class and the type of identifiers in a declaration. Thestorage class is supplied by the context in external definitions and in declarations of formal parameters andstructure members. In a declaration inside a function, if a storage class but no type is given, the identifier isassumed to be int; if a type but no storage class is indicated, the identifier is assumed to be auto. An exceptionto the latter rule is made for functions because auto functions do not exist. If the type of an identifier is "functionreturning ... ", it is implicitly declared to be extern.In an expression, an identifier followed by ( and not already declared is contextually declared to be "functionreturning int".TYPES REVISITEDThis part summarizes the operations which can be performed on objects of certain types.A. Structures and UnionsStructures and unions may be assigned, passed as arguments to functions, and returned by functions. Otherplausible operators, such as equality comparison and structure casts, are not implemented.In a reference to a structure or union member, the name on the right must specify a member of the aggregatenamed or pointed to by the expression on the left. In general, a member of a union may not be inspected unlessthe value of the union has been assigned using that same member. However, one special guarantee is made bythe language in order to simplify the use of unions: if a union contains several structures that share a commonPage 103


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82initial sequence and if the union currently contains one of these structures, it is permitted to inspect the commoninitial part of any of the contained structures. For example, the following is a legal fragment:union} u;struct{} n;struct{} ni;struct{} nf;intintintintfloattype;type;intnode;type;floatnode;B. Functionsu.nf.type = FLOAT;u.nf.floatnode = 3.14;if (u.n.type == FLOAT)... sin(u.nf.floatnode) ...There are only two things that can be done with a function-call it or take its address. If the name of a functionappears in an expression not in the function-name position of a call, a pointer to the function is generated.Thus, to pass one function to another, one might sayint fO;g(f);Then the definition of g might readg(funcp)int (*funcp)O;(*funcp)O;Notice that f must be declared explicitly in the calling routine since its appearance in g(f) was not followedby (.C. Arrays, Pointers, and SubscriptingEvery time an identifier of array type appears in an expression, it is converted into a pointer to the firstmember of the array. Because of this conversion, arrays are not lvalues. By definition, the subscript operator[] is interpreted in such a way that El[E2] is identical to *«El)+(E2». Because of the conversion rules whichapply to +, if El is an array and E2 an integer, then El[E2] refers to the E2-th member of El. Therefore,despite its asymmetric appearance, subscripting is a commutative operation.Page 104


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>A consistent rule is followed in the case of multidimensional arrays. If E is an n-dimensional array of rankixjx···xk, then E appearing in an expression is converted to a pointer to an (n-l)-dimensional array with rankjx··.xk. If the * operator, either explicitly or implicitly as a result of subscripting, is applied to this pointer,the result is the pointed-to (n-l)-dimensional array, which itself is immediately converted into a pointer.If the * operator, either explicitly or implicitly as a result of subscripting, is applied to this pointer, the resultis the pointed-to (n-l)-dimensional array, which itself is immediately converted into a pointer.For example, considerint x[3][5];;Here x is a 3X5 array of integers. When x appears in an expression, it is converted to a pointer to (the firstof three) 5-membered arrays of integers. In the expression x[i], which is equivalent to *(x+i), x is first convertedto a pointer as described; then i is converted to the type of x, which involves multiplying i by the lengththe object to which the pointer points, namely 5-integer objects. The results are added and indirection appliedto yield an array (of five integers) which in turn is converted to a pointer to the first of the integers. If thereis another subscript, the same argument applies again; this time the result is an integer.It follows from all this that arrays in C are stored row-wise (last subscript varies fastest) and that the firstsubscript in the declaration helps determine the amount of storage consumed by an array but plays no otherpart in subscript calculations.D. Explicit Pointer ConversionsCertain conversions involving pointers are permitted but have implementation-dependent aspects. They areall specified by means of an explicit type-conversion operator, see "E. Unary Operators" in part "EXPRES­SIONS" and "G. Type Names" in part "DECLARATIONS".A pointer may be converted to any of the integral types large enough to hold it. Whether an int or longis required is machine dependent. The mapping function is also machine dependent but is intended to beunsurprising to those who know the addressing structure of the machine. Details for some particular machinesare given below.An object of integral type may be explicitly converted to a pointer. The mapping always carries an integerconverted from a pointer back to the same pointer but is otherwise machine dependent.A pointer to one type may be converted to a pointer to another type. The resulting pointer may cause addressingexceptions upon use if the subject pointer does not refer to an object suitably aligned in storage. It isguaranteed that a pointer to an object of a given size may be converted to a pointer to an object of a smallersize and back again without change.For example, a storage-allocation routine might accept a size (in bytes) of an object to allocate, and returna char pointer; it might be used in this way.extern char *alloc();double *dp;dp = (double *) alloc(sizeof( double));*dp = 22.0 / 7.0;The alloc must ensure (in a machine-dependent way) that its return value is suitable for conversion to a pointerto double; then the use of the function is portable.The pointer representation on the PDP-ll corresponds to a 16-bit integer and measures bytes. The charshave no alignment requirements; everything else must have an even address.Page lOS


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>On the VAX-II, pointers are 32 bits long and measure bytes. Elementary objects are aligned on a boundaryequal to their length, except that double quantities need be aligned only on even 4-byte boundaries. Aggregatesare aligned on the strictest boundary required by any of their constituents.On the Honeywell 6000, a pointer corresponds to a 36-bit integer; the word part is in the left 18 bits, andthe two bits that select the character in a word lie just to their right. Thus char pointers measure units of 2 16bytes; everything else is measured in units of 2 18 machine words. The double quantities and aggregates containingthem must lie on an even word address (0 mod 2 19 ).The IBM 370 and the Interdata 8/32 are similar. On each, pointers are 32-bit quantities that measure bytes;elementary objects are aligned on a boundary equal to their length, so pointers to short must be 0 mod 2, toint and float 0 mod 4, and to double 0 mod 8. Aggregates are aligned on the strictest boundary required byany of their constituents.The 3B20 and 3B5 Processors characteristics are the same. On each, pointers are 24-bit quantities. Most objectsare aligned on 4-byte boundaries. Shorts are aligned in all cases on 2-byte boundries. Arrays of characters,all structures, inits, longs, floats, and doubles are aligned on 4-byte boundries; but structure members maybe packed tighter.CONSTANT EXPRESSIONSIn several places C requires expressions which evaluate to a constant: after case, as array bounds, and ininitializers. In the first two cases, the expression can involve only integer constants, character constants, enumerationconstants, and sizeof expressions, possibly connected by the binary operatorsor by the unary operators+ * / % & « > > != < > =or by the ternary operator?:Parentheses can be used for grouping but not for function calls.More latitude is permitted for initializers; besides constant expressions as discussed above, one can alsoapply the unary & operator to external or static objects and to external or static arrays subscripted with a constantexpression. The unary & can also be applied implicitly by appearance of unsubscripted arrays and functions.The basic rule is that initializers must evaluate either to a constant or to the address of a previouslydeclared external or static object plus or minus a constant.Less latitude is allowed for constant expressions after #if; sizeof expressions and enumeration constantsare not permitted.PORTABILITY CONSIDERATIONSCertain parts of C are inherently machine dependent. The following list of potential trouble spots is notmeant to be all-inclusive but to point out the main ones.Purely hardware issues like word size and the properties of floating point arithmetic and integer divisionhave proven in practice to be not much of a problem. Other facets of the hardware are reflected in differingPage 106


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>implementations. Some of these, particularly sign extension (converting a negative character into a negativeinteger) and the order in which bytes are placed in a word, are a nuisance that must be carefully watched. Mostof the others are only minor problems.The number of register variables that can actually be placed in registers varies from machine to machineas does the set of valid types. Nonetheless, the compilers all do things properly for their own machine; excessor invalid register declarations are ignored.Some difficulties arise only when dubious coding practices are used. It is exceedingly unwise to write programsthat depend on any of these properties.The order of evaluation of function arguments is not specified by the language. It is right to left on thePDP-11 and VAX-11; left to right on the others. The order in which side effects take place is also unspecified.Since character constants are really objects of type int, multicharacter character constants may be permitted.The specific implementation is very machine dependent because the order in which characters are assignedto a word varies from one machine to another.Fields are assigned to words and characters to integers right to left on the PDP-1I and VAX-II and leftto right on other machines. These differences are invisible to isolated programs which do not indulge in typerunning (e.g., by converting an int pointer to a char pointer and inspecting the pointed-to storage) but mustbe accounted for when conforming to externally-imposed storage layouts.The language accepted by the various compilers differs in minor details. Most notably, the current PDP-IIcompiler will not initialize structures containing bit fields and does not accept a few assignment operators incertain contexts where the value of the assignment is used.ANACHRONISMSBecause C is an evolving language, certain obsolete constructions may be found in older programs. Althoughsome versions of the compiler support such anachronisms, they have by and large disappeared leaving only aportability problem behind.Earlier versions of C used the form =op instead of op= for assignment operators. This leads to ambiguities,typified byx=-1which assigns -1 to x, but previously decremented x.The syntax of initializers has changed. Previously, the equals sign that introduces an initializer was notpresent, so instead ofone usedint x = 1;int x 1;The change was made because the initializationint f (1)resembles a function declaration closely enough to confuse the compilers.Page 107


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82A structure or union member reference is a chain of member references (qualifications) that are prefixedby either a pointer to a structure or union or a structure or union proper. Because each qualification impliesthe addition of an offset within an address computation, older compilers (which failed to check for membershipin the appropriate structure or union) allowed omission of those qualifications with an offset of zero. Completequalification is now required.Previous versions of the compiler were lax in detecting mixed assignments involving pointers and arithmeticquantities. These are now remarked upon.SYNTAX SUMMARYThis summary of C syntax is intended more for aiding comprehension than as an exact statement of thelanguage.A. ExpressionsThe basic expressions are:expression:primaryprimary:lvalue:* expression& lvalue- expression! expressionexpression++ lvalue--lvaluelvalue ++lvalue --sizeof expression( type-name) expressionexpression binop expressionexpression ? expression: expressionlvalue asgnop expressionexpression, expressionidentifierconstantstring( expression)primary ( expression-list opt )primary [ expression]primary. identifierprimary - > identifieridentifierprimary [ expression]lvalue . identifierprimary - > identifier* expression( lvalue )The primary-expression operators() [] ->Page 108


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>have highest priority and group left to right. The unary operators* & ++ sizeof ( type-name)have priority below the primary operators but higher than any binary operator and group right to left. Binaryoperators group left to right; they have priority decreasing as indicated below. The conditional operator groupsright to left.binop:* / %+» «< > =&!=&&IIII?:Assignment operators all have the same priority and all group right to left.asgnop:+= *= /= %= > >=«= &=I­ I-Page 109The comma operator has the lowest priority and groups left to right.B. Declarationsdeclaration:decl-specifiers init-declarator-list opt ;decl-specifiers:type-specifier decl-specifiers optsc-specifier decl-specifiers optsc-specifier:autostaticexternregistertypedeftype-specifier:charshortintlongunsignedfloatdoublevoidstruct-or-union -specifiertypedef-nameen um -specifier


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 16/82enum-specifier:enum { enum-list }enum identifier { enum-list }enum identifierenum-list:enumeratorenum-list, enumeratorenumerator:identifieridentifier = constant-expressionini t-declara tor-list:ini t-declara torinit-declarator, init-declarator-listini t-declara tor:declarator initializer oPtdeclarator:identifier( declarator)* declaratordeclarator ( )declarator [ constant-expression opt ]struct-or-union-specifier:struct { struct-decl-list }struct identifier { struct-decl-list }struct identifierunion { struct-decl-list}union identifier { struct-decl-Jist }union identifierstruct-decl-list:struct-declarationstruct-declaration struct-decl-Jiststruct-declara ti on:type-specifier struct-declarator-list ;struct-declarator-list:struct-declaratorstruct-declarator, struct-declarator-liststruct-declarator:declaratordeclarator: constant-expression: constant-expressionPage 110


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>initializer:= expression= { initializer-list }= { initializer-list , }ini tializer-list:expressioninitializer-list, initializer-list{ initializer-list }type-name:type-specifier abstract-declaratorabstract-declarator:empty( abstract-declarator)* abstract-declaratorabstract-declarator ( )abstract-declarator [ constant-expression opt ]typedef-name:identifierC. Statementscompound-statement:{ declaration-list oPtstatement-list oPt }declaration-list:declarationdeclaration declaration-liststatement-list:statementstatement statement-liststatement:compound-sta temen texpression;if ( expression) statementif ( expression) statement else statementwhile ( expression) statementdo statement while ( expression) ;for ( expression-l opt ; expression-2 opt ; expression-3 opt ) statementswitch ( expression) statementcase constant-expression: statementdefault: statementbreak;continue;return;return expression;goto identifier;identifier: statementPage 111


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 16/82D. External Definitionsprogram:external-definitionexternal-definition programE. Preprocessorexternal-defini tion:function-definitiondata-definitionfunction-definition:type-specifier opt function-declarator function-bodyfunction-declarator:declarator ( parameter-list opt )parameter-list:identifieridentifier, parameter-listfunction-body:declara tion -list compound-sta temen tdata-defini tion:extern opt type-specifier opt init-declarator-list opt ;static oPttype-specifier opt init-declarator-list oPt;#define identifier token-string#define identifier ( identifier, ... , identifier) token-string#undef identifier#include "filename'#include < filename>#if cons tan t-expression#ifdef identifier#ifndef identifier#else#endif#line constant "filename'Page 112


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>LIBRARIESA. GeneralThis part describes the libraries that are supported on the UNIX operating system. A library is a collectionof related functions and/or declarations that simplify programming effort. All of the functions described arealso described in Part 3 of the UNIX System User's Manual. Most of the declarations described are in Part 5of the UNIX System User's Manual. The three main libraries on the UNIX system are:C libraryObject fileMath libraryThis is the basic library for C language programs. The C library is composed of functionsand declarations used for file access, string testing and manipulation, character testingand manipulation, memory allocation, and other functions.This library provides functions for the access and manipulation of object files.This library provides exponential, bessel functions, logarithmic, hyperbolic, and trigonometricfunctions.Some libraries consist of two portions-functions and declarations. In some cases, the user must requestthat the functions (and/or declarations) of a specific library be included in a program being complied. In othercases, the functions (and/or declarations) are included automatically.Including FunctionsWhen a program is being complied, the complier will automatically search the C language library to locateand include functions that are used in the program. This is the case only for the C library and no other library.In order for the complier to locate and include functions from other libraries, the user must specifiy these librarieson the command line for the complier. For example, when using functions of the math library, the usermust request that the math library be searched by including the argument -1m on the command line, such as:cc file.c -1mThis method should be used for all functions that are not part of the C language library.Including DeclarationsSome functions require a set of declarations in order to operate properly. The declarations of all librariesmust be included by request of the user. A set of declarations is stored in a file under the lusrlincludedirectory.These files are referred to as header files. In order to include a certain header file, the user must specifiy thisrequest within the C language program. The request is in the form:#include where file is the name of the file. Since this request is handled by the preprocessor, header files should appearat the beginning of the (first) file being complied.The remainder of this part decribes the functions and header files of the various libraries. The descriptionof each library begins with the actions required by the user to include the functions and/or header files in aprogram being complied (if any). Following the description of the actions required, is information in three columnformat of the form:function reference(N) Brief description.The functions are grouped by type while the reference refers to section 'N' in the UNIX System User's Manual.Following this, are descriptions of the header files associated with these functions (if any).Page 113


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82B. The C LibraryThe C library consists of several types of functions. All the functions of the C library are loaded automaticallyby the complier. Various declarations must be included by the user as required. The functions of the C libraryare divided into the following types:• input/ output control• string manipulation• character manipulation• time functions• miscellaneous functions.Input/Output ControlThese functions of the C library are included as needed during the compiling of a C language program automatically.No command line request is needed.The header file required by the input/output functions should be included in the program being compiled.This is accomplished by including the line:#include < stdio.h >near the beginning of the (first) file being compiled.The input/output functions are grouped into the following categories:• file access• file status• input• output• miscellaneous.File Access FunctionsFUNCTIONfclosefdopenfilenofopenfreopenREFERENCEfclose(3S)fopen(3S)ferror(3S)fopen(3S)fopen(3S)Close an open stream.BRIEF DESCRIPTIONAssociate stream with an open(2) ed file.Integer associated with an open stream.Open a stream with specified permissions. A "stream" isdefined to be what fopen returns.Substitute named file in place of open stream.Page 114


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>fseek fseek(3S) Reposition stream pointer.pclose popen(3S) Close a stream opened by popen.popen popen(3S) Create pipe as a stream between calling process and command.rewind fseek(3S) Reposition stream pointer at beginning of file.setbuf setbuf(3S) Assign buffering to stream.File Status FunctionsFUNCTION REFERENCE BRIEF DESCRIPTIONclearerr ferror(3S) Reset error condition on stream.feof ferror(3S) Test for "end of file" on stream.ferror ferror(3S) Test for error condition on stream.ftell fseek(3S) Return current stream pointer.Input FunctionsFUNCTION REFERENCE BRIEF DESCRIPTIONfgetc getc(3S) True function for getc (3S).fgets gets(3S) Read string from stream.fread fread(3S) General buffered read from stream.fscanf scanf(3S) Read using format from stream.getc getc(3S) Return next character from stream.getchar getc(3S) Return next character from stdin.gets gets(3S) Read string from stdin.getw getc(3S) Read word from stream.scanf scanf(3S) Read using format from stdin.sscanf scanf(3S) Read using format from string.ungetc ungetc(3S) Put back one character on stream.Output FunctionsFUNCTION REFERENCE BRIEF DESCRIPTIONfflush fclose(3S) W rite all currently buffered characters from stream.Page 115


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82fprintf printf(3S) Print using format to stream.fpute pute(3S) True function for pute (3S).fputs puts(3S) Write string to stream.fwrite fread(3S) General buffered write to stream.printf printf(3S) Print using format to stdout.pute pute(3S) Write next character to stream.putehar pute(3S) Write next character to stdout.puts puts(3S) Write string to stdout.putw pute(3S) Write word to stream.sprintf printf(3S) W rite using format to string.Miscellaneous FunctionsFUNCTION REFERENCE BRIEF DESCRIPTIONetermid etermid(3S) Return file name for controlling terminal.euserid euserid(3S) Return login name for owner of current process.system system(3S) Execute system command.tempnam tmpnam(3S) Create temporary file name using directory and prefix.tmpnam tmpnam(3S) Create temporary file name.tmpfile tmpfile(3S) Create temporary file.Input/Output Header File (stdio.h)The following listing is the contents of stdio.h for the 3B20S Processor. Note that along with parametersused by the stdio functions several macros are defined. The following files are open automatically by the system:stdinstdoutstderrstandard input filestandard output filestandard error fileAlso, note that the functions fopen, fdopen, and freopen are declared as returning a structure of typeFILE; fgets and gets are declared as returning character pointers; and ftell is declared as returning a longinteger./* this example is included as * //* a typical illustration of the * //* header file stdio.h * /#ifndef _NFILE#define _NFILE 20Page 116


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>#define BUFSIZ 512typedef struct {unsigned char * _ptr;int _cnt;unsigned char * _base;char _flag;char _file;} FILE;#define _IOREAD 01#define _IOWRT 02#define _IONBF 04#define _IOMYBUF 010#define _IOEOF 020#define _IOERR 040#define _IOUNK 0100 /* indicates buffering status unknown */#define _IORW 0200#ifndef NULL#define NULL 0#endif#ifndef EOF#define EOF (-1)#endif#define stdin (&_iob[O])#define stdout (&_iob[1])#define stderr (&_iob[2])#ifndef lint#define getc(p) (( --(p)-> _cnt) >= O)?((int) *((p)-> _ptr)++): _filbuf(p))#endif#define getchar() getc(stdin)#ifndef lint#define putc(x,p) ((--((p)->_cnt) >= O)?((int)) (*((p)->_ptr)++ = (unsigned char)(x))); \_flsbuf((unsigned char)(x),p))#endif#define putchar(x) putc(x,stdout)#define feof(p) (((p)->_flag & _IOEOF) != 0)#define ferror(p) (((p)-> _flag & _IOERR) != 0)#define fileno(p) p-> fileextern FILE _iobLNFILE];extern FILE *fopen( );extern FILE *fdopen( );extern FILE *freopen( );extern long ftell( );extern char *fgets( );extern char *gets( );#define L_ctermid 9#define L_cuserid 9#define P _tmpdir "/usr/tmp/"#define L_tmpnam 10+sizeof(P _tmpdir)#endifString Manipulation FunctionsThese functions are used to locate characters within a string, copy, concatenate, and compare strings. Thesefunctions are located and loaded during the compling of a C language program automatically. No command linePage 117


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82request is needed since these functions are part of the C library. There are no declarations associated with thesefunctions.FUNCTION REFERENCE BRIEF DESCRIPTIONstrcat string(3C) Concatenate two strings.strchr string(3C) Search string for character.strcmp string(3C) Compares two strings.strcpy string(3C) Copy string over string.strcspn string(3C) Length of initial string not containing set of characters ..strlen string(3C) Length of string.strncat string(3C) Concatenate two strings with fixed length.strncmp string(3C) Compares two strings with fixed length.strncpy string(3C) Copy string over string with fixed length.strpbrk string(3C) Search string for any set of characters.strrchr string(3C) Search string backwards for character.strspn string(3C) Length of initial string containing set of characters.strtok string(3C) Search string for token separated by any of a set of characters.Character ManipulationThe following functions and declarations are used for testing and translating ASCII characters. These functionsare located and loaded automatically during the compiling of a C language program. No command linerequest is needed since these functions are part of the C library.The declarations associated with these functions should be included in the program being compiled. Thisis accomplished by including the line:#include < ctype.h >near the beginning of the (first) file being compiled.Character Testing FunctionsThese functions can be used to identify characters as uppercase or lowercase letters, digits, punctuation,etc.FUNCTIONisalnumisalphaREFERENCEctype(3C)ctype(3C)BRIEF DESCRIPTIONIs character alphanumeric?Is character alphabetic?Page 118


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>is ascii ctype(3C) Is integer ASCII character?iscntrl ctype(3C) Is character a control character?isdigit ctype(3C) Is character a digit?is graph ctype(3C) Is character a printing character?islower ctype(3C) Is character a lowercase letter?isprint ctype(3C) Is character a printing character including space?ispunct ctype(3C) Is character a punctuation character?isspace ctype(3C) Is character a white space character?isupper ctype(3C) Is character an uppercase letter?isxdigit ctype(3C) Is character a hex digit?Character Translation FunctionsThese functions provide translation of uppercase to lowercase, lowercase to uppercase, and integer to ASCII.FUNCTIONtoasciitolowertoupperREFERENCEcODv(3C)cODv(3C)cODv(3C)BRIEF DESCRIPTIONConvert integer to ASCII character.Convert character to lowercase.Convert character to uppercase.Character Header File (ctype.h)The following listing is the ctype.h file which is located in the lusrlinclude directory. This file is includedin a program with the line:#include < ctype.h >This file provides a few data declarations and defines the macros./* this example is included as *//* a typical illustration of the *//* header file ctype.h */#define _ U 01#define _L 02#define _N 04#define _S 010#define _P 020#define _ C 040#define _B 0100#define _X 0200extern char _ctype[];Page 119


<strong>PROGRAMMING</strong> <strong>GUIDE</strong>ISSUE 16/82#define#define#define#define#define#define#define#define#define#define#define#define#define#define#defineisalpha(c)isupper(c)islower(c)isdigit(c)isxdigi t( c)isspace(c)ispunct(c)isalnum(c)isprint(c)isgraph(c)iscntrl(c)isascii(c)_toupper(c)_tolower(c)toascii(c)((_ctype+ 1)[c]&(_ULL»((_ctype+ 1)[c]&_U)(Lctype+ 1)[c]&_L)((_ctype+ 1)[c]&_N)((_ctype+ 1)[c]&_X)((_ctype+ 1)[c]&_S)((_ctype+ 1)[c]&_P)(Lctype+ 1)[c]&(_ULLLN»((_ctype+ 1) [c] &(_PL ULLLNLB»((_ctype+ 1) [c] &(_PL ULLI_N»((_ctype+ 1)[c]&_C)((unsigned char)( c) < =0177)((c)-'a'+' A')((c)-' A'+'a')((c)&0177)Time FunctionsThese functions are used for accessing and reformatting the systems idea of the current date and time.These functions are located and loaded automatically during the compling of a C language program. No commandline request is needed since these functions are part of the C library.The header file associated with these functions should be included in the program being compiled. This isaccomplished by including the line:#include near the beginning of the (first) file being compiled.These functions (except tzset) convert a time such as returned by time(2).FUNCTIONasctimectimegmtimelocaltimetzsetREFERENCEctime(3C)ctime(3C)ctime(3C)ctime(3C)ctime(3C)BRIEF DESCRIPTIONReturn string representation of date and time.Return string representation of date and time, given integerform.Return Greenwich Mean Time.Return local time.Set time zone field from environment variable.Time Header File (time.h)The following listing is the time.h file which is located in the lusrlinclude directory. The tm structure isthe type of structure returned by gmtime and localtime. Note that the gmtime and localtime functions aredeclared as returning pointers to structures of type tm and ctime and asctime are declared as returning characterpointers.The long timezone variable contains the difference (in seconds) between GMT and local standard time. ForEST, this difference is 18,000 seconds. The daylight variable is nonzero if Daylight Savings Time conversionPage 120


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>should be applied. The tzname variable defines the time zone names. For EST, the declaration would be char*tzname [2] = " EST" ," EDT" ;If the environment variable TZ is present, asctime uses the contents of the variable to override the defaulttime zone. The value of TZ must be a 3-letter time zone name, followed by a number representing the differencebetween GMT and local time (in hours). Following the time difference is an optional3-letter name for a daylighttime zone. For example, for users in EST, the value of TZwould be EST5EDT. By setting the TZvariable, thevalues of tim ezone, daylight, and tzname are changed accordingly./* this example is included as * //* a typical illustration of the * //* header file time.h * /struct tm {int tm_sec;int tm_min;int tm_hour; /* hour of day (0 to 24) */int tm_mday; /* day of month (1 to 31) * /int tm_mon; /* month of year (0 to 11) * /int tmJear; /* last two digits of current year * /int tm_wday; /* day of week (Sunday = 0) */int tmJday; /* day of year (0 to 365) * /int tm_isdst; /* nonzero if DST in effect * /};extern struct tm *gmtime( ), *localtime( );extern char *ctime( ), *asctime( );extern void tzset( );extern long timezone;extern in t day ligh t;extern char *tzname[ ];Miscellaneous FunctionsThese functions support a wide variety of operations. Some of these are numerical conversion, password fileand group file access, memory allocation, random number generation, and table management. These functionsare located and included in a program being complied automatically. No command line request is needed sincethese functions are part of the C library.Some of these functions require declarations to be included. These are described following the descriptionsof the functions.Numerical ConversionThe following functions perfprm numerical conversion./FUNCTION REFERENCE BRIEF DESCRIPTIONa641 a641(3C) Convert string to base 64 ASCII.atof atof(3C) Convert string to floating.atoi atof(3C) Convert string to integer.atol atof(3C) Convert string to long.frexp frexp(3C) Split floating into mantissa and exponent.Page 121


<strong>PROGRAMMING</strong> <strong>GUIDE</strong>ISSUE 1 6/8213tolltol3ldexp164amodf13tol(3C)13tol(3C)frexp(3C)a641(3C)frexp(3C)Convert 3-byte integer to long.Convert long to 3-byte integer.Combine mantissa and exponent.Convert base 64 ASCII to string.Split mantissa into integer and fraction.DES Algorithm AccessThe following functions allow access to the DES algorithm used on the UNIX operating system. The DESalgorithm is implemented with variations to frustrate use of hardware implementations of the DES for keysearch.FUNCTIONcryptencryptsetkeyGroup File AccessREFERENCEcrypt(3C)crypt(3C)crypt(3C)BRIEF DESCRIPTIONEncode string using salt.Encode/decode string of O's and l's.Initialize for subsequent use of encrypt.The following functions are used to obtain entries from the group file. Declarations for these functions mustbe included in the program being compiled with the line:#include FUNCTION REFERENCE BRIEF DESCRIPTIONendgrent getgrent(3C) Close group file being processed.getgrent getgrent(3C) Get next group file entry.getgrgid getgrent(3C) Return next group with matching gid.getgrnam getgrent(3C) Return next group with matching name.setgrent getgrent(3C) Rewind group file being processed.Password File AccessThese functions are used to search and access information stored in the password file (I etc/passwd). Somefunctions require declarations that can be included in the program being compiled by adding the line:#include FUNCTIONendpwentgetpwREFERENCEgetpwent(3C)getpw(3C)BRIEF DESCRIPTIONClose password file being processed.Search password file for uid.Page 122


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>getpwent getpwent(3C) Get next password file entry.getpwnam getpwent(3C) Return next entry with matching name.getpwuid getpwent(3C) Return next entry with matching uid.putpwent putpwent(3C) Write entry on stream.setpwent getpwent(3C) Rewind password file being accessed.Parameter AccessThe following functions provide access to several different types of paramenters. None require any declarations.FUNCTIONgetoptgetcwdgetenvgetpassREFERENCEgetopt(3C)getcwd(3C)getenv(3C)getpass(3C)BRIEF DESCRIPTIONGet next option from option list.Return string representation of current working directory.Return string value associated with environment variable.Read string from terminal without echoing.Hash Table ManagementThe following functions are used to manage hash search tables.FUNCTION REFERENCE BRIEF DESCRIPTIONhcreate hsearch(3C) Create hash table.hdestroy hsearch(3C) Destroy hash table.hsearch hsearch(3C) Search hash table for entry.. Binary Tree ManagementThe following functions are used to manage a binary tree.FUNCTION REFERENCE BRIEF DESCRIPTIONtdelete tsearch(3C) Deletes nodes from binary tree.tsearch tsearch(3C) Search binary tree.twalk tsearch(3C) Walk binary tree.Table ManagementThe following functions are used to manage a table. Th~ "table" is basically a 2-dimensional character array.The first subscript defines the maximum number of entries in the table. The second subscript defines the widthPage 123


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82(or length) of a single entry. Since none of these functions allocate storage, sufficient memory must be allocatedbefore using these functions.FUNCTION REFERENCE BRIEF DESCRIPTIONbsearch bsearch(3C) Search table using binary search.lsearch Isearch(3C) Search table using linear search.qsort qsort(3C) Sort table using quicker-sort algorithm.Memory AllocationThe following functions provide a means by which memory can be dynamically allocated or freed.FUNCTION REFERENCE BRIEF DESCRIPTIONcalloc malloc(3C) Allocate zeroed storage.free malloc(3C) Free previously allocated storage.malloc malloc(3C) Allocate storage.realloc malloc(3C) Change size of allocated storage.Pseudorandom Number GenerationThe following functions are used to generate pseudorandom numbers. The functions that end with 48 area family of interfaces to a pseudorandom number generator based upon the linear congruential algorithm and48-bit integer arithmetic. The rand and srand functions provide an interface to a multiplicative congruentialrandom number generator with period of 232.FUNCTIONdrand48Icong48Irand48mrand48randseed48srandsrand48REFERENCEdrand48(3C)drand48(3C)drand48(3C)drand48(3C)rand(3C)drand48(3C)rand(3C)drand48(3C)BRIEF DESCRIPTIONRandom double over the interval [0 to 1).Set parameters for drand48, Irand48,and mrand48.Random long over the interval [0 to 2 31 ).Random long over the interval [-2 31 to 2 31 ).Random integer over the interval [0 to 214).Seed the generator for drand48, Irand48, and mrand48.Seed the generator for rand.Seed the generator for drand48, Irand48, and mrand48using a long.Page 124


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>Signal Handling FunctionsThe functions gsignal and ssignal implement a software facility similar to signal(2) in the UNIX SystemUser's Manual. This facility enables users to indicate the disposition of error conditions and allows users to handlesignals for their own purposes. The declarations associated with these functions can be included in the programbeing complied by the line#include These declarations define ASCII names for the 15 software signals.FUNCTIONgsinalssignalMiscellaneousREFERENCEssignal(3C)ssignal(3C)BRIEF DESCRIPTIONSend a software signal.Arrange for handling of software signals.The following functions do not fall into any previously described category.FUNCTIONabortabseevtfevtgevtisattymktempmonitorswabttynameREFERENCEabort(3C)abs(3C)eevt(3C)eevt(3C)eevt(3C)ttyname(3C)mktemp(3C)monitor(3C)swab(3C)ttyname(3C)BRIEF DESCRIPTIONCause an lOT signal to be sent to the process.Return the absolute integer value.Convert double to string.Convert double to string using Fortran format.Convert double to string using Fortran F or E format.Test whether integer file descriptor is associated with a terminal.Create file using template.Cause process to record a histogram of program counter location.Swap and copy bytes.Return path name of terminal associated with integer filedescriptor.Group File Header File (grp.h)The grp.h file provides the structure used by several group file functions. This file can be included in a programby adding the line:#include Page 125


<strong>PROGRAMMING</strong> <strong>GUIDE</strong>ISSUE 16/82/* this example is included as *//* a typical illustration of the *//* header file grp.h */struct group {char *gr_name; /* group name */char *gr_passwd; /* group password */int gr...,gid; /* group id */char **gr_mem; /* list of pointers to group members */};Password File Header File (pwd.h)The following listing describes the contents of pwd.h which is used by several of the password file functions.This file can be included in the program with the line:#include The pw _comment field is actually a structure of type commen t./* this example is included as * //* a typical illustration of the * //* header file pwd.h * /struct passwd {char *pw _name; /* login name * /char *pw _passwd; /* password * /int pw _uid; /* user id * /int pw...,gid; /* group id * /char *pw _age; /* age of password * /char *pw _comment; /* comments * /char *pw...,gecos; /* optional GCOS user id * /char *pw _dir; /* login directory * /char *pw _shell; /*shellused by this login * /};struct comment {char *c_dept; /* user's department * /char *c_name; /* user's name */char *c_acct; /* user's account number * /char *c_bin; /* user's mail bin * /};Signal Handling Header File (signal.h)The following listing describes the signa1.h file which is under the lusrlinclude directory. This file can beincluded by adding the line:#include The signa1.h file contains the declarations used by the functions that handle signals. Most of this file definesnames to each numerical signal./* this example is included as * //* a typical illustration of the * /Page 126


6/82ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>/* header file signal.h */#define SIGHUP 1 /* hangup */#define SIGINT 2 /* interrupt (rubout) */#define SIGQUIT 3 /* quit (ASCII FS) */#define SIGILL 4 /* illegal instruction (not reset when caught)* /#define SIGTRAP 5 /* trace trap (not reset when caught) * /#define SIGIOT 6 /* lOT instruction */#define SIG EMT 7 /* EMT instruction * /#define SIGFPE 8/* floating point exception */#define SIGKILL 9 /* kill (cannot be caught or ignored) */#define SIGBUS 10/* bus error */#define SIGSEGV 11 /* segmentation violation */#define SIGSYS 12 /* bad argument to system call * /#define SIGPIPE 13 /* write on a pipe with no one to read it */#define SIGALRM 14 /* alarm clock */#define SIGTERM 15 /* software termination signal from kill * /#define SIG USR116 /* user defined signal 1 * /#define SIGUSR2 17 /* user defined signal 2 */#define SIGCLD 18 /* death of a child * /#define SIGPWR 19 /* power-fail restart */#define NSIG 20#define SIG_DFL (int (*)( ))0#if lint#define SIG_IGN (int (*)( ))0#else#define SIG_IGN (int (*)( ))1#endifextern (*signal( ))( );C. The Object File LibraryThe object file library provides functions to access object files. The functions allow access closing to singleobject files or object files that are part of an archive. Some functions locate portions of an object file such asthe symbol table, the file header, sections, and lines within functions. Other functions read these types of entriesinto memory.This library consists of several portions. The functions reside in lusrlJihlJihJd.a and are located and loadedduring the compiling of a C language program by a command line request. The form of this request is:cc file -lIdwhich causes the link editor to search the object file library.In addition, various header files must be included. This is accomplished by including the line:#include where file is the name of the appropriate header file. The header files required for each function are definedat the beginning of each function description. Following the descriptions of the functions is a description of theheader files. The HEADER macro returns a pointer to the HEADER field of the LDFILE structure pointedto by Idptr. The HEADER field is the file header structure of the object file. The IOPTR macro returns thecontents of the IOPTRfield of the LDFILEstructure pointed to by Jdptr. The IOPTRfield contains a file pointerreturned by fopen and used by the input/output functions of the C library. The TYPE macro returns the TYPEPage 127


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82field of the LDFILEstructure pointed to by ldptr. The TYPE field contains the file magic number which is usedto distinguish between archive members and simple object files.FUNCTIONldacloseldahreadldaopenldcloseldfhreadldlinitldlitemldlreadldlseekldnlseekldnrseekldnshreadldnsseekldohseekldopenldrseekldshreadldsseekldtbindexldtbreadldtbseekREFERENCEIdclose(3X)Idahread(3X)Idopen(3X)Idclose(3X)Idfhread(3X)Idlread(3X)Idlread(3X)Idlread(3X)Idlseek(3X)Idlseek(3X)Idrseek(3X)Idshread(3X)Idsseek(3X)Idohseek(3X)Idopen(3X)Idrseek(3X)Idshread(3X)Idsseek(3X)Idtbindex(3X)Idtbread(3X)Idtbseek(3X)BRIEF DESCRIPTIONClose object file being processed.Read archive header.Open object file for reading.Close object file being processed.Read file header of object file being processed.Prepare object file for reading line number entries via ldlitem.Read line number entry from object file after ldlinit.Read line number entry from object file.Seeks to the line number entries of the object file being processed.Seeks to the line number entries of the object file being processedgiven name.Seeks to the relocation entries of the object file being processedgiven name.Read section header of the named section of the object filebeing processed.Seeks to the section of the object file being processed givenname.Seeks to the optional file header of the object file being processed.Open object file for reading.Seeks to the relocation entries of the object file being processed.Read section header of an object file being processed.Seeks to the section of the object file being processed.Returns the long index of the symbol table entry at the currentposition of the object file being processed.Reads the symbol table entry specified by symindex of the objectfile being processed.Seeks to the symbol table of the object file being processed.Page 128


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>D. The Math LibraryThe math library consists of functions and a header file. The functions are located and loaded during thecompiling of a C language program by a command line request. The form of this request is:cc file -1mwhich causes the link editor to search the math library. In addition to the request to load the functions, theheader file of the math library should be included in the program being compiled. This is accomplished by includingthe line:#include near the beginning of the (first) file being compiled.The functions are grouped into the following categories:• trigonometric functions• bessel functions• hyperbolic functions• miscellaneous functions.Trigonometric FunctionsThese functions are used to compute angles (in decimal radian measure), sines, cosines, and tangents. Allof these values are expressed in double precision and should always be declared as such.FUNCTIONacosasinatanatan2coshypotsintanBessel FunctionsREFERENCEtrig(3M)trig(3M)trig(3M)trig(3M)trig(3M)hypot(3M)trig(3M)trig(3M)BRIEF DESCRIPTIONReturn arc cosine.Return arc sine.Return arc tangent.Return arc tangent of a ratio.Return cosine.Return the square root of the sum of the squares of two numbers.Return sine.Return tangent.These functions calculate bessel functions of the first and second kinds of several orders for real values. Thebessel functions are jO, jl, jn, yO, yl, and yn. The functions are located in section bessel(3M).Page 129


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82Hyperbolic FunctionsThese functions are used to compute the hyperbolic sine, cosine, and tangent for real values.FUNCTIONcoshsinhtanhREFERENCEsinh(3M)sinh(3M)sinh(3M)BRIEF DESCRIPTIONReturn hyperbolic cosine.Return hyperbolic sine.Return hyperbolic tangent.Miscellaneous FunctionsThese functions cover a wide variety of operations, such as natural logarithm, exponential, and absolute value.In addition, several are provided to truncate the decimal portion of double precision numbers.FUNCTIONceilexpfabsfloorfmodgammaREFERENCEfloor(3M)exp(3M)floor(3M)floor(3M)floor(3M)gamma(3M)BRIEF DESCRIPTIONReturns the smallest integer not less than a given value.Returns the exponential function of a given value.Returns the absolute value of a given value.Returns the largest integer not greater than a given value.Returns the remainder produced by the division of two givenvalues.Returns the natural log of gamma as a function of theabsolute value of a given value.logpowsqrtexp(3M)exp(3M)exp(3M)Returns the natural logarithm of a given value.Returns the result of a given value raised to another givenvalue.Returns the square root of a given value.Math Header File (math.h)The following listing is the math.h file which is located in the lusrlinclude directory. Note that all the mathfunctions are declared as returning double-precision values. Also note that HUGE is defined as1.701411733192644270 X 10 38/* this example is included as * //* a typical illustration of the * //* math header file (math.h) * /extern double fabs( ), floor( ), ceil( ), fmod( ), ldexp( );extern double sqrt( ), hypot( ), atof( );extern double sine ), cos( ), tan( ), asin( ), acos( ), atan( ), atan2( );Page 130


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>extern double exp( ), log( ), 10glO( ), pow( );extern double sinh( ), cosh( ), tanh( );extern double gamma( );extern double jO( ), jl( ), jn( ), yO( ), yl( ), yn( );#define HUGEL 701411733192644270e38THE "ee" COMMANDA. GeneralThe C compiler cc(l) is used to compile C language programs or assembly language programs into machinelanguage. This document briefly describes the usage of the C compiler. Most of this information is in the UNIXSystem User's Manual. --B. UsageThe cc command is invoked as:cc options fileswhere options control the compiling; files are the files to be compiled.The following options are interpreted by cc. See ld(l) for link editor options.-cSuppresses the link edit phase of the compilation and forces an object file to be producedeven if only one program is compiled.-Dname=def-Dname-E-g-Idir-0-p-pDefines the name to the preprocessor, as if by #define. If no definition is given, the nameis defined as 1.Runs only the macro preprocessor on the named C language programs and sends the resultto the standard output.Arranges for the compiler to produce additional information needed for the use of sdb(l).Changes the algorithm for searching for #include files whose names do not begin with/ to look in dir before looking in the directories on the standard list. Thus, #include fileswhose names are enclosed in" " will be searched for first in the directory of the file argument,then in directories named in the -I options, and last in directories on a standardlist. For #include files whose names are enclosed by < .•. >, the directory of the file argumentis not searched.Invokes an object-code optimizer. The optimizer will move, merge, and delete code so symbolicdebugging with line numbers could be confusing when the optimizer is used.Arranges for the compiler to produce code which counts the number of times each routineis called. Also, if link editing takes place, replaces the standard startoff routine by onewhich automatically calls monitor(3C) at the start and arranges to write out a mon.outfile at normal termination of execution of the object program. An execution profile can begenerated by use of prof(l).Runs only the macro preprocessor on the named C language programs and leaves the resulton corresponding files suffixed .i.Page 131


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ADD ISSUE 1 1/83-S-UnameCompiles the named C language programs and leaves the assembler-language output oncorresponding files suffixed .s.Removes any initial definition of name, where nameis a reserved symbol that is predefinedby the particular preprocessor. The current list of these possibly reserved symbols includesthe operating system [unix (this reserved symbol refers to the UNIX operating system),gcos, os, tss, or u370], the hardware (ibm, interdata, pdp11, u3b, or VAX), and the UNIXsystem variant (RES, RT, or mert).Other options are taken to be either link editor options or C-compatible object programs.Arguments that end with.c are taken to be C language source programs; they are compiled, and each objectprogram is left on the file whose name is that of the source with .0 substituted for .c. The .0 file is normallydeleted.In the same way, arguments whose names end with .s are taken to be assembly source programs and areassembled producing a .0 file.These programs, together with the results of any compilations specified, are linked (in the order given) toproduce an executable program with the name a.out.A C PROGRAM CHECKER -"lint"A. GeneralThe lint program examines C language source programs detecting a number of bugs and obscurities. It enforcesthe type rules of C language more strictly than the C compiler. It may also be used to enforce a numberof portability restrictions involved in moving programs between different machines and/or operating systems.Another option detects a number of wasteful or error prone constructions which nevertheless are legal. The lintprogram accepts multiple input files and library specifications and checks them for consistency.UsageThe lint (1) command has the form:lint [options] files ... library-descriptors ...where options are optional flags to control lint checking and messages; files are the files to be checked whichend with .c; and library-descriptors are the names of libraries to be used in checking the program.The options that are currently supported by the lint command are:-a-b-c-h-n-p-uSuppress messages about assignments of long values to variables that are not long.Suppress messages about break statements that cannot be reached.Suppress messages about casts that have questionable portability.Do not apply heuristics (which attempt to detect bugs, improve style, and reduce waste).Do not check for compatibility with either the standard or the portable lint library.Attempt" to check portability to other dialects of C language (IBM and Honeywell).Suppress messages about function and external variables used and not defined or definedand not used.Page 132


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>-v Suppress messages about unused arguments in functions.-x Do not report variables referred to by external declarations but never used.When more than one option is used, they should be combined into a single argument, such as, -ab or -xha.The names of files that contain C language programs should end with the suffix .c which is mandatory orlint and the C complier.The lint program accepts certain arguments, such as:-lyThese arguments specify libraries that contain functions used in the C language program. The source code istested for compatibility with these libraries. This is done by accessing library description files whose namesare constructed from the library arguments. These files all begin with the comment:/* LINTLIBRARY */which is followed by a series of dummy function definitions. The critical parts of these definitions are the declarationof the function return type, whether the dummy function returns a value, and the number and types ofarguments to the function. The V ARARGS and ARGSUSED comments can be used to specify features of thelibrary functions.The lint library files are processed almost exactly like ordinary source files. The only difference is that functionswhich are defined on a library file but are not used on a source file do not result in messages. The lintprogram does not simulate a full library search algorithm and will print messages if the source files containa redefinition of a library routine.By default, lint checks the programs it is given against a standard library file which contains descriptionsof the programs which are normally loaded when a C language program is run. When the -p option is used,another file is checked containing descriptions of the standard I/O library routines which are expected to beportable across various machines. The -n option can be used to suppress all library checking.B. Types of MessagesThe following paragraphs describe the major categories of messages printed by lint.Unused Variables and FunctionsAs sets of programs evolve and develop, previously used variables and arguments to functions may becomeunused. It is not uncommon for external variables or even entire functions to become unnecessary and yet notbe removed from the source. These types of errors rarely cause working programs to fail, but are a source ofinefficiency and make programs harder to understand and change. Also, information about such unusedvariables and functions can occasionally serve to discover bugs.The lint program prints messages about variables and functions which are defined but not otherwise mentioned.An exception is variables which are declared through explicit extern statements but are never referenced;thus the statementextern float sinO;will evoke no comment if sin is never used. Note that this agrees with the semantics of the C compiler. In somecases, these unused external declarations might be of some interest and can be discovered by using the -x optionwi th the lint command.Page 133


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82Certain styles of programming require many functions to be written with similar interfaces; frequently,some of the arguments may be unused in many of the calls. The -v option is available to suppress the printingof messages about unused arguments. When v is in effect, no messages are produced about unused argumentsexcept for those arguments which are unused and also declared as register arguments. This can be consideredan active (and preventable) waste of the register resources of the machine.Messages about unused arguments can be suppressed for one function by adding the comment:/* ARGSUSED * /to the program before the function. This has the effect of the -v option for only one function. Also, the comment:/* V ARARGS * /can be used to suppress messages about variable number of arguments in calls to a function. The commentshould be added before the function definition. In some cases, it is desirable to check the first several argumentsand leave the later arguments unchecked. This can be done with a digit giving the number of arguments whichshould be checked, such as:/* V ARARGS2 * /will cause only the first two arguments to be checked.There is one case where information about unused or undefined variables is more distracting than helpful.This is when lint is applied to some but not all files out of a collection which are to be loaded together. In thiscase, many of the functions and variables defined may not be used. Conversely, many functions and variablesdefined elsewhere may be used. The -u option may be used to suppress the spurious messages which might otherwiseappear.Set/Used InformationThe lint program attempts to detect cases where a variable is used before it is set. The lint program detectslocal variables (automatic and register storage classes) whose first use appears physically earlier in the inputfile than the first assignment to the variable. It assumes that taking the address of a variable constitutes a"use", since the actual use may occur at any later time, in a data dependent fashion.The restriction to the physical appearance of variables in the file makes the algorithm very simple and quickto implement since the true flow of control need not be discovered. It does mean that lint can print messagesabout some programs which are legal, but these programs would probably be considered bad on stylisticgrounds. Because static and external variables are initialized to zero, no meaningful information can be discoveredabout their uses. The lint program does deal with initialized automatic variables.The set/used information also permits recognition of those local variables which are set and never used.These form a frequent source of inefficiencies and may also be symptomatic of bugs.Flow of ControlThe lint program attempts to detect unreachable portions of the programs which it processes. It will printmessages about unlabeled statements immediately following goto, break, continue, or return statements.An attempt is made to detect loops which can never be left at the bottom and recognize the special caseswhile(l) and for(;;) as infinite loops. The lint program also prints messages about loops which cannot be enteredat the top. Some valid programs may have such loops which are considered to be bad style at best andbugs at worst.The lint program has no way of detecting functions which are called and never return. Thus, a call to exitmay cause an unreachable code which lint does not detect. The most serious effects of this are in the determinationof returned function values (see "Function Values"). If a particular place in the program cannot be reachedPage 134


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>but it is not apparent to lint, the comment/* NOTREACHED */can be added at the appropriate place. This comment will inform lint that a portion of the program cannot bereached.The lint program will not print a message about unreachable break statements. Programs generated byyacc and especially lex may have hundreds of unreachable break statements. The -0 option in the C compilerwill often eliminate the resulting object code inefficiency. Thus, these unreached statements are of little importance.There is typically nothing the user can do about them, and the resulting messages would clutter up thelint output. If these messages are desired, lint can be invoked with the -b option.Function ValuesSometimes functions return values which are never used. Sometimes programs incorrectly use function"values" which have never been returned. The lint program addresses this problem in a number of ways.andLocally, within a function definition, the appearance of bothreturn( expr );return;statements is cause for alarm; the lint program will give the messagefunction name contains return(e) and returnThe most serious difficulty with this is detecting when a function return is implied by flow of control reachingthe end of the function. This can be seen with a simple example:f ( a ) {if ( a ) return ( 3 );g ( );}Notice that, if a tests false, fwill call gand then return with no defined return value; this will trigger a messagefrom lint. If g, like exit, never returns, the message will still be produced when in fact nothing is wrong.In practice, some potentially serious bugs have been discovered by this feature. It also accounts for a substantialportion of the redundant messages produced by lint.On a global scale, lint detects cases where a function returns a value; and this value is sometimes or neverused. When the value is never used, it may constitute an inefficiency in the function definition. When the valueis sometimes unused, it may represent bad style (e.g., not testing for error conditions).The dual problem, using a function value when the function does not return one, is also detected. This isa serious problem.Type CheckingThe lint program enforces the type checking rules of C language more strictly than the compilers do. Theadditional checking is in four major areas:• Across certain binary operators and implied assignmentsPage 135


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82• At the structure selection operators• Between the definition and uses of functions• In the use of enumerations.There are a number of operators which have an implied balancing between types of the operands. The assignment,conditional ( ?: ), and relational operators have this property. The argument of a return statementand expressions used in initialization suffer similar conversions. In these operations, char, short, int, long,unsigned, float, and double types may be freely intermixed. The types of pointers must agree exactly exceptthat arrays of xs can, of course, be intermixed with pointers to xs.The type checking rules also require that, in structure references, the left operand of the -> be a pointerto structure, the left operand of the. be a structure, and the right operand of these operators be a member ofthe structure implied by the left operand. Similar checking is done for references to unions.Strict rules apply to function argument and return value matching. The types float and double may befreely matched, as may the types char, short, int, and unsigned. Also, pointers can be matched with the associatedarrays. Aside from this, all actual arguments must agree in type with their declared counterparts.With enumerations, checks are made that enumeration variables or members are not mixed with othertypes or other enumerations and that the only operations applied are =, initialization, ==, !=, and function argumentsand return values.If it is desired to turn off strict type checking for an expression, the comment/* NO STRICT */should be added to the program immediately before the expression. This comment will prevent strict type checkingfor only the next line in the program.Type CastsThe type cast feature in C language was introduced largely as an aid to producing more portable programs.Consider the assignmentp = 1;where p is a character pointer. The lint program will print a message as a result of detecting this. Considerthe assignmentp = (char *)1 ;in which a cast has been used to convert the integer to a character pointer. The programmer obviously had astrong motivation for doing this and has clearly signaled his/her intentions. It seems harsh for lint to continueto print messages about this. On the other hand, if this code is moved to another machine, such code should belooked at carefully. The -c flag controls the printing of comments about casts. When -c is in effect, casts aretreated as though they were assignments subject to messages; otherwise, all legal casts are passed without comment,no matter how strange the type mixing seems to be.Nonportable Character UseOn some systems, characters are signed quantities with a range from -128 to 127. On other C languageimplementations, characters take on only positive values. Thus, lint will print messages about certain comparisonsand assignments as being illegal or nonportable. For example, the fragmentchar c;if( (c = getcharO) < 0 ) ...Page 136


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>will work on one machine but will fail on machines where characters always take on positive values. The realsolution is to declare c as an integer since getchar is actually returning integer values. In any case, lint willprint the message "nonportable character comparison".A similar issue arises with bit fields. When assignments of constant values are made to bit fields, the fieldmay be too small to hold the value. This is especially true because on some machines bit fields are consideredas signed quantities. While it may seem logical to consider that a two bit field declared of type int cannot holdthe value 3, the problem disappears if the bit field is declared to have type unsigned.Assignments of "longs" to "ints"Bugs may arise from the assignment of long to an int, which will truncate the contents. This may happenin programs which have been incompletely converted to use typedefs. When a typedef variable is changedfrom int to long, the program can stop working because some intermediate results may be assigned to ints,which is truncated. Since there are a number of legitimate reasons for assigning longs to ints, the detectionof these assignments is enabled by the - a option.Strange ConstructionsSeveral perfectly legal, but somewhat strange, constructions are detected by lint. The messages hopefullyencourage better code quality, clearer style, and may even point out bugs. The -h option is used to enable thesechecks. For example, in the statement*p++ ;the * does nothing. This provokes the message "null effect" from lint. The following program fragment:unsigned x ;if( x < 0 ) ...results in a test that will never succeed. Similarly, the testis equivalent toif( x > 0 ) ...if( x != 0 )which may not be the intended action. The lint program will print the message "degenerate unsigned comparison"in these cases. If a program contains something similar toif( 1 != 0 ) ...lint will print the message "constant in conditional context" since the comparison of 1 with 0 gives a constantresult.Another construction detected by lint involves operator precedence. Bugs which arise from misunderstandingsabout the precedence of operators can be accentuated by spacing and formatting making such bugs extremelyhard to find. For example, the statementif( x&077 == 0 ) ...Page 137


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82orx«2 + 40probably do not do what was intended. The best solution is to parenthesize such expressions, and lint encouragesthis by an appropriate message.Finally, when the -h option has been used, lint prints messages about variables which are redeclared ininner blocks in a way that conflicts with their use in outer blocks. This is legal but is considered to be bad style,usually unnecessary, and frequently a bug.Old SyntaxSeveral forms of older syntax are now illegal. These fall into two classes-assignment operators and initialization.The older forms of assignment operators (e.g., =+, =-, ... ) could cause ambiguous expressions, such as:a =-1;which could be taken as eitherora =-1;a= -1;The situation is especially perplexing if this kind of ambiguity arises as the result of a macro substitution. Thenewer and preferred operators (e.g., +=, -=, ... ) have no such ambiguities. To encourage the abandonment ofthe older forms, lint prints messages about these old-fashioned operators.A similar issue arises with initialization. The older language allowedint xl;to initialize x to 1. This also caused syntactic difficulties. For example, the initializationint x ( -1 ) ;looks somewhat like the beginning of a function declaration:int x ( y ) { ...and the compiler must read past x in order to determine the correct meaning. Again, the problem is even moreperplexing when the initializer involves a macro. The current syntax places an equals sign between the variableand the initializer:int x = -1 ;This is free of any possible syntactic ambiguity.Pointer AlignmentCertain pointer assigRments may be reasonable on some machines and illegal on others due entirely toalignment restrictions. For example, on the PDP-II, it is reasonable to assign integer pointers to double pointersPage 138


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>since double-precision values may begin on any integer boundary. On the Honeywell 6000, double-precision valuesmust begin on even word boundaries. Thus, not all such assignments make sense. The lint program triesto detect cases where pointers are assigned to other pointers and such alignment problems might arise. The message"possible pointer alignment problem" results from this situation whenever either the -p or -h optionsare used.Multiple Uses and Side EffectsIn complicated expressions, the best order in which to evaluate subexpressions may be highly machine dependent.For example, on machines (like the PDP-ll) in which the stack runs backwards, function argumentswill probably be best evaluated from right to left. On machines with a stack r:unning forward, left to right seemsmost attractive. Function calls embedded as arguments of other functions mayor may not be treated similarlyto ordinary arguments. Similar issues arise with other operators which have side effects, such as the assignmentoperators and the increment and decrement operators.In order that the efficiency of C language on a particular machine not be unduly compromised, the C languageleaves the order of evaluation of complicated expressions up to the local compiler. In fact, the variousC compilers have considerable differences in the order in which they will evaluate complicated expressions. Inparticular, if any variable is changed by a side effect and also used elsewhere in the same expression, the resultis explicitly undefined.The lint program checks for the important special case where a simple scalar variable is affected. For example,the statementa[i] = b[i++];will cause lint to print the messagewarning: i evaluation order undefinedin order to call attention to this condition.c. PortabilityC language on the Honeywell and IBM systems is used, in part, to write system code for the host operatingsystem. This means that the implementation of C language tends to follow local conventions rather than adherestrictly to UNIX system conventions. Despite these differences, many C language programs have been successfullymoved to GCOS and the various IBM installations with little effort. This section describes some of the differencesbetween the implementations and discusses the lint features which encourage portability.A related difficulty comes from the amount of information retained about external names during the loadingprocess. On the UNIX system, externally known names have seven significant characters with the uppercaseand lowercase distinction kept. On the IBM systems, there are eight significant characters; but the case distinctionis lost. On GCOS, there are only six characters of a single case. This leads to situations where programsrun on the UNIX system but encounter loader problems on the IBM or GCOS systems. The -p option causesall external symbols to be mapped to one case and truncated to six characters, providing a worst-case analysis.A number of differences arise in the area of character handling. Characters in the UNIX system are 8-bitASCII, while they are 8-bit EBCDIC on the IBM and 9-bit ASCII on GCOS. Also, character strings go from hightolow-bit positions "left to right" on GCOS and IBM and low to high "right to left" on the PDP-ll. This meansthat code attempting to construct strings out of character constants or attempting to use characters as indicesinto arrays must be looked at with great suspicion. The lint program is of little help here except to flagmulti character character constants.Of course, the word size differs from machine to machine. This causes less trouble than might be expectedat least when moving from the UNIX system (16-bit words) to the IBM (32-bits) or GCOS (36-bits). The mainPage 139


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82problems are likely to arise in shifting or masking. The C language now supports a bit-field facility, which canbe used to write much of this code in a reasonably portable way. Frequently, portability of such code can beenhanced by slight rearrangements in coding style. Many of the incompatibilities seem to have the flavor ofwritingx &= 0177700 ;to clear the low order six bits of x. This suffices on the PDP-II but fails badly on GCOS and IBM. If the bit-fieldfeature cannot be used, the same effect can be obtained by writingx &= - 077;which will work on all these machines.The right shift operator is arithmetic shift on the PDP-II and logical shift on most other machines. To obtaina logical shift on all machines, the left operand can be typed unsigned. Characters are considered signedintegers on the PDP-II and unsigned on the other machines. This persistence of the sign bit may be reasonablyconsidered a bug in the PDP-II hardware which has infiltrated itself into the C language. If there were a goodway to discover the programs which would be affected, the C language could be changed. In any case, lint isno help here.The previous discussion may have made the problem of portability seem bigger than it is. The issues involvedhere are rarely subtle or mysterious, at least to the implementor of the program, although they can involvesome work to straighten out. The most serious bar to the portability of UNIX system utilities has beenthe inability to mimic essential UNIX system functions on the other systems. The inability to seek to a randomcharacter position in a text file or to establish a pipe between processes has involved far more rewriting anddebugging than any of the differences in C compilers. On the other hand, lint has been very helpful in movingthe UNIX operating system and associated utility programs to other machines.A SYMBOLIC DEBUGGING PROGRAM-"sdb"A. GeneralThis part describes the symbolic debugger sdb(I) as implemented for C language and Fortran 77 programson the UNIX operating system. The sdb program is useful both for examining "core images" of aborted programsand for providing an environment in which execution of a program can be monitored and controlled.The sdb program allows interaction with a debugged program at the source language level. When debugginga core image from an aborted program, sdb reports which line in the source program caused the error and allowsall variables to be accessed symbolically and to be displayed in the correct format.Breakpoints may be placed at selected statements or the program may be single stepped on a line-by-linebasis. To facilitate specification of lines in the program without a source listing, sdb provides a mechanism forexamining the source text. Procedures may be called directly from the debugger. This feature is useful both fortesting individual procedures and for calling user-provided routines which provided formatted printout of structureddata.B. UsageIn order to use sdb to its full capabilities, it is necessary to compile the source program with the -g option.This causes the compiler to generate additional information about the variables and statements of the compiledprogram. When the -g option has been specified, sdb can be used to obtain a trace of the called functions atthe time of the abort and interactively display the values of variables.Page 140


6/82 ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>A typical sequence of shell commands for debugging a core image is$ cc -g prgm.c -0 prgm$ prgmBus error - core dumped$ sdb prgmmain:25: x[i] = 0;*The program prgm was compiled with the -g option and then executed. An error occurred which causeda core dump. The sdb program is then invoked to examine the core dump to determine the cause of the error.It reports that the bus error occurred in function main at line 25 (line numbers are always relative to the beginningof the file) and outputs the source text of the offending line. The sdb program then prompts the user withan * indicating that it awaits a command.It is useful to know that sdb has a notion of current function and current line. In this example, they areinitially set to main and "25", respectively.In the above example, sdb was called with one argument, prgm. In general, it takes three arguments onthe command line. The first is the name of the executable file which is to be debugged; it defaults to a.outwhennot specified. The second is the name of the core file, defaulting to core; and the third is the name of the directorycontaining the source of the program being debugged. The sdb program currently requires all source to residein a single directory. The default is the working directory. In the example, the second and third arguments defaultedto the correct values, so only the first was specified.It is possible that the error occurred in a function which was not compiled with the -g option. In this case,sdb prints the function name and the address at which the error occurred. The current line and function areset to the first executable line in main. The sdb program will print an error message if main was not compiledwith the -g option, but debugging can continue for those routines compiled with the -g option. Figure 3.1 showsa typical example of sdb usage.Printing a Stack TraceIt is often useful to obtain a listing of the function calls which led to the error. This is obtained with thet command. For example:*tsub(x=2,y=3) [prgm.c:25]inter(i = 16012) [prgm.c:96]main( argc= 1,argv=Ox7fffff54,envp=Ox7fffff5c)[prgm.c:15]This indicates that the error occurred within the function sub at line 25 in file prgm.c. The sub function wascalled with the arguments x=2 and y=3 from inter at line 96. The inter function was called from main at line15. The main function is always called by the shell with three arguments often referred to as argc, argv, andenvp. Note that argvand envp are pointers, so their values are printed in hexadecimal.Examining VariablesThe sdb program can be used to display variables in the stopped program. Variables are displayed by typingtheir name followed by a slash, so*errflag/causes sdb to display the value of variable errflg. Unless otherwise specified, variables are assumed to be eitherlocal to or accessible from the current function. To specify a different function, use the form*sub:i/Page 141


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82to display variable i in function sub. F77 users can specify a common block variable in the same manner.The sdb program supports a limited form of pattern matching for variable and function names. The symbol* is used to match any sequence of characters of a variable name and? to match any single character. Considerthe following commands:*x*/*sub:y?/**/The first prints the values of all variables beginning with x, the second prints the values of all two lettervariables in function sub beginning with y, and the last prints all variables. In the first and last examples, onlyvariables accessible from the current function are printed. The command**:*/displays the variables for each function on the call stack.The sdb program normally displays the variable in a format determined by its type as declared in the sourceprogram. To request a different format, a specifier is placed after the slash. The specifier consists of an optionallength specification followed by the format. The length specifiers are:bhIOne byteTwo bytes (half word)Four bytes (long word).The lengths are only effective with the formats d, 0, x, and u. If no length is specified, the word length of thehost machine is used. A numeric length specifier may be used for the s or a commands. These commands normallyprint characters until either a null is reached or 128 characters are printed. The number specifies howmany characters should be printed. There are a number of format specifiers available:cduoxfgsapCharacterDecimalDecimal unsignedOctalHexadecimal32-bit single-precision floating point64-bit double-precision floating pointAssume variable is a string pointer and print characters until a null is reachedPrint characters starting at the variable's address until a null is reachedPointer to functionInterpret as a machine-language with addresses printed symbolicallyIInterpret as a machine-language with addresses printed symbolically.Page 142


6/82 ISSUE'l <strong>PROGRAMMING</strong> <strong>GUIDE</strong>For example, the variable i can be displayed with*i/xwhich prints out the value of i in hexadecimal.The sdb program also knows about structures, arrays, and pointers so that all of the following commandswork.*array[2] [3]1*sym.id/*psym - > usage/*xsym[20] .p-> usage/The only restriction is that array subscripts must be numbers. Depending on your machine, accessing arraysmay be limited to I-dimensional arrays. Note that as a special case:*psym->/ddisplays the location pointed to by psym in decimal.Core locations can also be displayed by specifying their absolute addresses. The command*1024/displays location 1024 in decimal. As in C language, numbers may also be specified in octal or hexadecimal sothe above command is equivalent to both*02000/*Ox400/It is possible to mix numbers and variables so that*IOOO.x/refers to an element of a structure starting at address 1000, and*IOOO->x/refers to an element of a structure whose address is at 1000. For commands of the type *IOOO.x/ and *IOOO->x/,the sdb program uses the structure template of the last structured referenced.The address of a variable is printed with the = p, so*i=displays the address of i. Another feature whose usefulness will become apparent later is the command*./which displays the last variable typed again.c. Source File Display and ManipulationThe sdb program has been designed to make it easy to debug a program without constant reference to acurrent source listing. Facilities are provided which perform context searches within the source files of the programbeing debugged and to display selected portions of the source files. The commands are similar to thosePage 143


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82of the UNIX system text editor ed(l). Like the editor, sdb has a notion of current file and line within the file.The sdb program also knows how the lines of a file are partitioned into functions, so it also has a notion of currentfunction. As noted in other parts of this document, the current function is used by a number of sdb commands.Displaying the Source FileFour commands exist for displaying lines in the source file. They are useful for perusing the source programand for determining the context of the current line. The commands are:pwzcontrol-dPrints the current line.Window; prints a window of ten lines around the current line.Prints ten lines starting at the current line. Advances the current line by ten.Scrolls; prints the next ten lines and advances the current line by ten. This command isused to cleanly display long segments of the program.When a line from a file is printed, it is preceded by its line number. This not only gives an indication of itsrelative position in the file but is also used as input by some sdb commands.Changing the Current Source File or FunctionThe e command is used to change the current source file. Either of the forms*e function*e file.cmay be used. The first causes the file containing the named function to become the current file, and the currentline becomes the first line of the function. The other form causes the named file to become current. In this case,the current line is set to the first line of the named file. Finally, an e command with no argument causes thecurrent function and file named to be printed.Changing the Current Line in the Source FileThe z and control-d commands have a side effect of changing the current line in the source file. The followingparagraphs describe other commands that change the current line.There are two commands for searching for instances of regular expressions in source files. They are* / regular expression/* ?regular expression?The first command searches forward through the file for a line containing a string that matches the regularexpression and the second searches backwards. The trailing/ and? may be omitted from these commands. Regularexpression matching is identical to that of ed(l).The + and - commands may be used to move the current line forwards or backwards by a specified numberof lines. Typing a new-line advances the current line by one, and typing a number causes that line to becomethe current line in the file. These commands may be combined with the display commands so that*+15zPage 144


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>advances the current line by 15 and then prints 10 lines.D. A Controlled Environment for Program TestingOne very useful feature of sdb is breakpoint debugging. After entering sdb, certain lines in the source programmay be specified to be breakpoints. The program is then started with a sdb command. Execution of theprogram proceeds as normal until it is about to execute one of the lines at which a breakpoint has been set. Theprogram stops and sdb reports the breakpoint where the program stopped. Now, sdb commands may be usedto display the trace of function calls and the values of variables. If the user is satisfied the program is workingcorrectly to this point, some breakpoints can be deleted and others set; and then program execution may be continuedfrom the point where it stopped.A useful alternative to setting breakpoints is single stepping. The sdb program can be requested to executethe next line of the program and then stop. This feature is especially useful for testing new programs, so theycan be verified on a statement-by-statement basis. Note that if an attempt is made to single step through a functionwhich has not been compiled with the -g option execution proceeds until a statement in a function compiledwith the -g option is reached. It is also possible to have the program execute one machine level instruction ata time. This is particularly useful when the program has not been compiled with the -g option.Setting and Deleting BreakpointsBreakpoints can be set at any line in a function which contains executable code. The command format is:*12b*proc:12b*proc:b*bThe first form sets a breakpoint at line 12 in the current file. The line numbers are relative to the beginningof the file as printed by the source file display commands. The second form sets a breakpoint at line 12 of functionproc, and the third sets a breakpoint at the first line of proc. The last sets a breakpoint at the current line.Breakpoints are deleted similarly with the commands*12d*proc:12d*proc:dIn addition, if the command d is given alone, the breakpoints are deleted interactively. Each breakpoint locationis printed, and a line is read from the user. If the line begins with a y or d, the breakpoint is deleted.A list of the current breakpoints is printed in response to a B command, and the D command deletes allbreakpoints. It is sometimes desirable to have sdb automatically perform a sequence of commands at abreakpoint and then have execution continue. This is achieved with another form of the b command*12b t;xlcauses both a trace back and the value of x to be printed each time execution gets to line 12. The a commandis a variation of the above command. There are two forms:*proc:a*proc:12aThe first prints the function name and its arguments each time it is called, and the second prints the sourceline each time it is about to be executed. For both forms of the a command, execution continues after the functionname or source line is printed.Page 145


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82Running the ProgramThe r command is used to begin program execution. It restarts the program as if it were invoked from theshell. The command*r argsruns the program with the given arguments as if they had been typed on the shell command line. If no argumentsare specified, then the arguments from the last execution of the program are used. To run a programwith no arguments, use the R command.After the program is started, execution continues until a breakpoint is encountered, a signal such as INTER­RUPT or QUIT occurs, or the program terminates. In all cases after an appropriate message is printed, controlreturns to sdb.The c command may be used to continue execution of a stopped program. A line number may be specified,as in:*proc:12cThis places a temporary breakpoint at the named line. The breakpoint is deleted when the c command finishes.There is also a C command which continues but passes the signal which stopped the program back to the program.This is useful for testing user-written signal handlers. Execution may be continued at a specified line withthe g command. For example:*17 gcontinues at line 17 of the current function. A use for this command is to avoid executing a section of code whichis known to be bad. The user should not attempt to continue execution in a function different than that of thebreakpoint.The s command is used to run the program for a single line. It is useful for slowly executing the programto examine its behavior in detail. An important alternative is the S command. This command is like the s commandbut does not stop within called functions. It is often used when one is confident that the called functionworks correctly but is interested in testing the calling routine.The i command is used to run the program one machine level instruction at a time while ignoring the signalwhich stopped the program. Its uses are similar to the s command. There is also an I command which causesthe program to execute one machine level instruction at a time, but also passes the signal which stopped theprogram back to the program.Calling FunctionsIt is possible to call any of the functions of the program from sdb. This feature is useful both for testingindividual functions with different arguments and for calling a function which prints structured data in a niceway. There are two ways to call a function:*proc(arg1, arg2, ... )*proc(arg1, arg2, ... )/mThe first simply executes the function. The second is intended for calling functions (it executes the function andprints the value that it returns). The value is printed in decimal unless some other format is specified by m.Arguments to functions may be integer, character or string constants, or values of variables which are accessiblefrom the current function.Page 146


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>An unfortunate bug in the current implementation is that if a function is called when the program is notstopped at a breakpoint (such as when a core image is being debugged) all variables are initialized before thefunction is started. This makes it impossible to use a function which formats data from a dump.E. Machine Language DebuggingThe sdb program has facilities for examining programs at the machine language level. It is possible to printthe machine language statements associated with a line in the source and to place breakpoints at arbitrary addresses.The sdb program can also be used to display or modify the contents of the machine registers.Displaying Machine Language StatementsTo display the machine language statements associated with line 25 in function main, use the command*main:25?The? command is identical to the / command except that it displays from text space. The default format forprinting text space is the i format which interprets the machine language instruction. The control-d commandmay be used to print the next ten instructions.Absolute addresses may be specified instead of line numbers by appending a : to them so that*Oxl024:?displays the contents of address Oxl024 in text space. Note that the command*Oxl024?displays the instruction corresponding to line Oxl024 in the current function. It is also possible to set or deletea breakpoint by specifying its absolute address;*Oxl024:bsets a breakpoint at address Oxl024.Manipulating RegistersThe x command prints the values of all the registers. Also, individual registers may be named instead ofvariables by appending a % to their name so that*r3%displays the value of register r3.F. Other CommandsTo exit sdb, use the q command.The! command is identical to that in ed(l) and is used to have the shell execute a command.It is possible to change the values of variables when the program is stopped at a breakpoint. This is donewi th the command*variable!valuePage 147


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82which sets the variable to the given value. The value may be a number, character constant, register, or the nameof another variable. If the variable is of type float or double, the value can also be a floating-point constant.$ cat testdiv2.cmain(argc, argv, envp)char **argv, **envp; {int i;i = div2( -1);printf( " -1/2 = % d\n" ,i);}div2(i) {int j;j=i»I;return(j);}$ cc -g testdiv2.c$ a.out-1/2 = -1$ sdbNo core image*/"div27: div2(i) {*z7: div2(i) {8: int j;9: j = i»I;10: return(j);11: }# Warning message from sdb# Search for function" div2"# It starts on line 7# Print the next few lines*div2:b# Place a breakpoint at the beginning of" div2"div2:9 b # Sdb echoes proc name and line number*r # Run the functiona.out# Sdb echoes command line executedBreakpoint at # Executions stops just before line 9div2:9: j = i> > 1;*t # Print trace of subroutine callsdiv2(i=-I) [testdiv2.c:9]main( argc= 1 ,argv=Ox7fffff50,en vp=Ox7fffff58)*i/# Print i-1*sdiv2:l0:*j/-1*9d*div2(1)/o*div2(-2)/-1*div2(-3)/-2*q$[ testdiv2.c:4]# Single stepreturn(j); # Execution stops just before line 10# Print j# Delete the breakpoint# Try running" div2" with different argumentsFig. 3.1-Example of sdb UsagePage 148


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>4. FORTRANINTRODUCTIONThis section describes the implementation of the Fortran programming language on the UNIX operatingsystem. This section contains the following parts:• FORTRAN 77 -Describes the implementation of Fortran 77 on the system in terms of the variationsfrom the American National Standard .• A RATIONAL FORTRAN PREPROCESSOR-"ratfor"-A preprocessor which provides a meansfor writing Fortran in a fashion similar to C language. This preprocessor provides (among other things)simplified control-flow statements.Both parts assume that the user is already familiar with Fortran 77.Throughout this section, each reference of the form name(1M), name(7), or name(8) refers to entries inthe UNIX System Administrator's Manual. All other references to entries of the form name(N), where "N"is a number (1 through 6) possibly followed by a letter, refer to entry name in section N of the UNIX SystemUser's Manual.FORTRAN 77A. GeneralThis part describes the compiler and run-time system for Fortran 77 as implemented on the UNIX operatingsystem. Fortran 77 became an official American National Standard on April 3, 1978. The implementation ofFortran 77 on the UNIX operating system varies from the American National Standard. This document describesthe difference between the American National Standard and the current implementation. Also, this documentdescribes the interfaces between procedures and the file formats assumed by the 1/0 system.UsageThe command to run the compiler isf77 options fileThe f77(1) command is a general purpose command for compiling and loading Fortran and Fortran-relatedfiles. Ratfor source files will be preprocessed before being presented to the Fortran compiler. C language andassembler source files will be compiled by the appropriate programs. Object files will be loaded. [The f77(1)and cc(1) commands cause slightly different loading sequences to be generated since Fortran programs needa few extra libraries and a different startup routine than do C language programs.] The following file namesuffixes are understood:.f Fortran source file.r Ratfor source file.c C language source file.s Assembler source file.0 Object filePage 149


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82The following flags are understood:-sGenerate assembler output for each source file, but do not assemble it. Assembler outputfor a source file x.f, x.r, or x.c is put on file x.s.-c-m-f-p-of-w-w66-0-c-onetrip-u-u-12-R-FCompile but do not load. Output for x.f, x.r, X.C, or x.s is put on file x.o.Apply the M4 macro preprocessor to each Ratfor source file before using the appropriatecompiler.Apply the Ratfor processor to all relevant files and leave the output from x.r on x.f. Donot compile the resulting Fortran program.Generate code to produce usage profiles.Put executable module on file f. (Default is a.out).Suppress all warning messages.Suppress warnings about Fortran 66 features used.Invoke the C language object code optimizer.Compile code the checks that subscripts are within array bounds.Compile code that performs every do loop at least once.Do not convert uppercase letters to lowercase. The default is to convert Fortran programsto lowercase.Make the default type of a variable undefined.On machines which support short integers, make the default integer constants andvariables short. (-14 is the standard value of this option.) All logical quantities will beshort.The remaining characters in the argument are used as a Ratfor flag argument.Ratfor and source programs are preprocessed into Fortran files, but those files are notcompiled or removed.All library names (arguments beginning -1) and other options not ending with one of the special suffixesare passed to the loader. See f77(1) for additional options.B. Language ExtensionsFortran 77 includes almost all of Fortran 66 as a subset. The most important additions are a character stringdata type, file-oriented input/output statements, and random access I/O. Also, the language has been cleanedup considerably.In addition to implementing the language specified in the new Fortran 77 American National Standard, thiscompiler implements a few extensions. Most are useful additions to the language. The remainder are extensionsto make it easier to communicate with C language procedures or to permit compilation of old (1966 StandardFortran) programs.Page 150


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>Double Complex Data TypeThe data type double complex is added. Each datum is represented by a pair of double-precision realvariables. A double complex version of every complex built-in function is provided. The specific function namesbegin with z instead of c.Internal FilesThe Fortran 77 American National Standard introduces internal files (memory arrays) but restricts theiruse to formatted sequential I/O statements. This I/O system also permits internal files to be used in direct andunformatted reads and writes.Implicit Undefined StatementFortran has a fixed rule that the type of a variable that does not appear in a type statement is integer ifits first letter is i, j, k, 1, m or n. Otherwise, it is teal. Fortran 77 has an implicit statement for overriding thisrule. An additional type statement, undefined, is permitted. The statementimplicit undefined(a-z)turns off the automatic data typing mechanism, and the compiler will issue a diagnostic for each variable thatis used but does not appear in a type statement. Specifying the -u compiler option is equivalent to beginningeach procedure with this statement.RecursionProcedures may call themselves directly or through a chain of other procedures.Automatic StorageTwo new keywords recognized are static and automatic. These keywords may appear as "types" in typestatements and in implicit statements. Local variables are static by default; there is exactly one copy of thedatum, and its value is retained between calls. There is one copy of each variable declared automatic for eachinvocation of the procedure. Automatic variables may not appear in equivalence, data, or save statements.Variable Length Input LinesThe Fortran 77 American National Standard expects input to the compiler to be in a 72-column format: exceptin comment lines, the first five characters are the statement number, the next is the continuation character,and the next 66 are the body of the line. (If there are fewer than 72 characters on a line, the compiler pads itwith blanks; characters after the first 72 are ignored.) In order to make it easier to type Fortran programs, thiscompiler also accepts input in variable length lines. An ampersand (&) in the first position of a line indicatesa continuation line; the remaining characters form the body of the line. A tab character in one of the first sixpositions of a line signals the end of the statement number and continuation part of the line; the remaining charactersform the body of the line. A tab elsewhere on the line is treated as another kind of blank by the compiler.In the Fortran 77 Standard, there are only 26 letters-Fortran is a I-case language. Consistent with ordinarysystem usage, the new compiler expects lowercase input. By default, the compiler converts all uppercasecharacters to lowercase except those inside character constants. However, if the - U compiler option is specified,uppercase letters are not transformed. In this mode, it is possible to specify external names with uppercase lettersin them and to have distinct variables differing only in case. Regardless of the setting of the option,keywords will only be recognized in lowercase.Page 151


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82Include StatementThe statementinclude II stuff"is replaced by the contents of the file stuff Includes may be nested to a reasonable depth, currently ten.Binary Initialization ConstantsA logical, real, or integer variable may be initialized in a data statement by a binary constant denotedby a letter followed by a quoted string. If the letter is b, the string is binary, and only zeroes and ones are permitted.If the letter is 0, the string is octal with digits zero through seven. If the letter is z or x, the string ishexadecimal with digits zero through nine, a through f Thus, the statements --in teger a( 3)data a/b'lOlO',o'12',z'a'/initialize all three elements of a to ten.Character StringsFor compatibility with C language usage, the following backslash escapes are recognized:\n new-line\t tab\b backspace\f form feed\0 null\'apostrophe (does not terminate a string)\" quotation mark (does not terminate a string)\\ \\x where x is any other character.Fortran 77 only has one quoting character-the apostrophe ('). This compiler and I/O system recognize boththe apostrophe and the double quote (" ). If a string begins with one variety of quote mark, the other may beembedded within it without using the repeated quote or backslash escapes.Every unequivalenced scalar local character variable and every character string constant is aligned on aninteger word boundary. Each character string constant appearing outside a data statement is followed by a nullcharacter to ease communication with C language routines.HollerithFortran 77 does not have the old Hollerith (nh) notation though the new Standard recommends implementingthe old Hollerith feature in order to improve compatibility with old programs. In this compiler, HollerithPage 152


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>data may be used in place of character string constants and may also be used to initialize noncharacter variablesin data statements.Equivalence StatementsThis compiler permits single subscripts in equivalence statements under the interpretation that all missingsubscripts are equal to 1. A warning message is printed for each such incomplete subscript.One-Trip DO LoopsThe Fortran 77 American National Standard requires that the range of a do loop not be performed if theinitial value is already past the limit value, as indo 10 i = 2, 1The 1966 Standard stated that the effect of such a statement was undefined, but it was common practice thatthe range of a do loop would be performed at least once. In order to accommodate old programs though theywere in violation of the 1966 Standard, the -one trip compiler option causes nonstandard loops to be generated.Commas in Formatted InputThe I/O system attempts to be more lenient than the Fortran 77 American National Standard when it seemsworthwhile. When doing a formatted read of noncharacter variables, commas may be used as value separatorsin the input record overriding the field lengths given in the format statement. Thus, the formatwill read the recordcorrectly.Short Integers(ilO, f20.10, i4)-345,.05e-3,12On machines that support half word integers, the compiler accepts declarations of type integer*2. (Ordinaryintegers follow the Fortran rules about occupying the same space as a REAL variable; they are assumedto be of C language type long int; half word integers are of C language type short int.) An expression involvingonly objects of type integer*2 is of that type. Generic functions return short or long integers depending onthe actual types of their arguments. If a procedure is compiled using the -12 flag, all small integer constantswill be of type integer*2. If the precision of an integer-valued intrinsic function is not determined by the genericfunction rules, one will be chosen that returns the prevailing length (integer*2 when the -12 commandflag is in effect). When the -12 option is in effect, all quantities of type logical will be short. Note that theseshort integer and logical quantities do not obey the standard rules for storage association.Additional Intrinsic Fu nctionsThis compiler supports all of the intrinsic functions specified in the Fortran 77 Standard. In addition, thereare functions for performing bitwise Boolean operations (or, and, xor, and not) and for accessing the commandarguments (getarg and iargc).C. Violations of the StandardThe following paragraphs describe only three known ways in which the UNIX system implementation ofFortran 77 violates the new American National Standard.Page 153


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82Double Precision AlignmentThe Fortran 77 American National Standard permits common or equivalence statements to force a doubleprecision quantity onto an odd word boundary, as in the following example:real a(4)double precision b,cequivalence (a(I),b), (a(4),c)Some machines (e.g., Honeywell 6000, IBM 360) require that double precision quantities be on double wordboundaries; other machines (e.g., IBM 370) run inefficiently if this alignment rule is not observed. It is possibleto tell which equivalenced and common variables suffer from a forced odd alignment, but every double-precisionargument would have to be assumed on a bad boundary. To load such a quantity on some machines, it wouldbe necessary to use two separate operations. The first operation would be to move the upper and lower halvesinto the halves of an aligned temporary. The second would be to load that double-precision temporary. The reversewould be needed to store a result. All double-precision real and complex quantities are required to fallon even word boundaries on machines with corresponding hardware requirements and to issue a diagnostic ifthe source code demands a violation of the rule.Dummy Procedure ArgumentsIf any argument of a procedure is of type character, all dummy procedure arguments of that procedure mustbe declared in an external statement. This requirement arises as a subtle corollary of the way we representcharacter string arguments. This requirement arises because of the way character string arguments are representedand of the I-pass nature of the compiler. A warning is printed if a dummy procedure is not declaredexternal. Code is correct if there are no character arguments.T and TL FormatsThe implementation of the t (absolute tab) and t1 (leftward tab) format codes is defective. These codes allowrereading or rewriting part of the record which has already been processed. The implementation uses "seeks";so if the unit is not one which allows seeks (such as a terminal) the program is in error. A benefit of the implementationchosen is that there is no upper limit on the length of a record nor is it necessary to predeclare anyrecord lengths except where specifically required by Fortran or the operating system.D. Interprocedure InterfaceTo be able to write C language procedures that call or are called by Fortran procedures, it is necessary toknow the conventions for procedure names, data representation, return values, and argument lists that the compiledcode obeys.Procedure NamesOn UNIX systems, the name of a common block or a Fortran procedure has an underscore appended to itby the compiler to distinguish it from a C language procedure or external variable with the same user-assignedname. Fortran library procedure names have embedded underscores to avoid clashes with user-assigned subroutinenames.Data RepresentationsThe following is a table of corresponding Fortran and C language declarations:FortranC Languageinteger*2 x short int x;Page 154


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>integer x long int x;logical x long int x;real x float x;double precision x double x;complex x struct { float r,i; } x;double complex x struct { double dr, di; } x;character*6 xchar x[6];By the rules of Fortran, integer, logical, and real data occupy the same amount of memory.Return ValuesA function of type integer, logical, real, or double precision declared as a C language function returnsthe corresponding type. A complex or double complex function is equivalent to a C language routine withan additional initial argument that points to the place where the return value is to be stored. Thus, the following:is equivalent tocomplex function f( ... )f_(temp, ... )struct { float r, i; } *temp;A character-valued function is equivalent to a C language routine with two extra initial arguments-a data addressand a length. Thus,is equivalent tocharacter*15 function g( ... )g_(result, length, ... )char result[ ];long int length;and could be invoked in C language bychar chars[15];g_( chars, 15L, . . . );Subroutines are invoked as if they were integer-valued functions whose value specifies which alternatereturn to use. Alternate return arguments (statement labels) are not passed to the function but are used to doan indexed branch in the calling procedure. (If the subroutine has no entry points with alternate return arguments,the returned value is undefined.) The statementcall nret(*l, *2, *3)Page 1 SS


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82is treated exactly as if it were the computed gotogoto (1, 2, 3),nret()Argument ListsAll Fortran arguments are passed by address. In addition, for every argument that is of type character orthat is a dummy procedure, an argument giving the length of the value is passed. (The string lengths are longint quantities passed by value.) The order of arguments is then:Th us, the call inExtra arguments for complex and character functionsAddress for each datum or functionA long int for each character or procedure argumentexternal fcharacter*7 sinteger b(3)call sam(f, b(2), s)is equivalent to that inint f( );char s[7];long int b[3];sam_(f, &b[l], s, OL, 7L);Note that the first element of a C language array always has subscript 0, but Fortran arrays begin at 1 by default.Fortran arrays are stored in column-major order; C language arrays are stored in row-major order.E. File FormatsStructure of Fortran FilesFortran requires four kinds of external files: sequential formatted and unformatted, and direct formattedand unformatted. On UNIX systems, these are all implemented as ordinary files which are assumed to have theproper internal structure.Fortran 1/0 is based on "records". When a direct file is opened in a Fortran program, the record length ofthe records must be given; and this is used by the Fortran 1/0 system to make the file look as if it is made upof records of the given length. In the special case that the record length is given as 1, the files are not consideredto be divided into records but are treated as byte-addressable byte strings; i.e., as ordinary files on the UNIXsystem. (A read or write request on such a file keeps consuming bytes until satisfied rather than being restrictedto a single record.)The peculiar requirements on sequential unformatted files make it unlikely that they will ever be read orwritten by any means except Fortran 1/0 statements. Each record is preceded and followed by an integer containingthe record's length in bytes.The Fortran 1/0 system breaks sequential formatted files into records while reading by using each new-lineas a record separator. The result of reading off the end of a record is undefined according to the Fortran 77Page 156


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>American National Standard. The I/O system is permissive and treats the record as being extended by blanks.On output, the 110 system will write a new-line at the end of each record. It is also possible for programs towrite new-lines for themselves. This is an error, but the only effect will be that the single record the userthought was written will be treated as more than one record when being read or backspaced over.Preconnected Files and File PositionsUnits 5, 6, and 0 are preconnected when the program starts. Unit 5 is connected to the standard input, unit6 is connected to the standard output, and unit 0 is connected to the standard error unit. All are connected forsequential formatted I/O.All the other units are also preconnected when execution begins. Unit n is connected to a file named fort.n.These files need not exist nor will they be created unless their units are used without first executing an open.The default connection is for sequential formatted I/O.The Fortran 77 Standard does not specify where a file which has been explicitly opened for sequential I/Ois initially positioned. In fact, the I/O system attempts to position the file at the end. A write will append tothe file and a read will result in an "end of file" indication. To position a file to its beginning, use a rewindstatement. The preconnected units 0, 5, and 6 are positioned as they come from the parent process.A RATIONAL FORTRAN PREPROCESSOR -"ratfor"A. GeneralThis part describes the ratfor(l) preprocessor. It is assumed that the user is familiar with the current implementationof Fortran 77 on the UNIX operating system.The ratfor language allows users to write Fortran programs in a fashion similar to C language. The ratforprogram is implemented as a preprocessor that translates this "simplified" language into Fortran. The facilitiesprovided by ratfor are:B. Usage• statement grouping• if-else and switch for decision making• while, for, do, and repeat-until for looping• break and next for controlling loop exits• free form input such as multiple statements/lines and automatic continuation• simple comment convention• translation of >, >=, etc., into .gt., .ge., etc.• return statement for functions• define statement for symbolic parameters• include statement for including source files.The ratfor program takes either a list of file names or the standard input and writes Fortran on the standardoutput. Options include -6x, which uses x as a continuation character in column 6 (the UNIX system uses& in column 1), and -C, which causes ratfor comments to be copied into the generated Fortran.Page 157


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 1 6/82The program rc(lM) provides an interface to the ratfor(l) command. This command is similar to cc(l).Thus:rc options filescompiles the files specified by files. Files with names ending in .r are ratfor source; other files are assumedto be for the loader. The options -C and -6x described above are recognized, as are-c Compile only; don't load-f Save intermediate Fortran .f files-r Ratfor only; implies -c and -f-2 Use big Fortran compiler (for large programs)-U Flag undeclared variables (not universally available).Other options are passed on to the loader.c. Statement GroupingFortran provides no way to group statements together short of making them into a subroutine. The ratforlanguage does provide a statement grouping facility. A group of statements can be treated as a unit by enclosingthem in the braces { and }. For example, the ratfor codeif (x > 100){ call error(" x> 100" ); err = 1; return}will be translated by the ratfor preprocessor into Fortran equivalent to10if (x .Ie. 100) go to 10call error(5hx>100)err = 1returnwhich should simplify programming effort. By using {and}, a group of statements can be used instead of a singlestatement.Also note in the previous ratfor example that the character> was used instead of .GT. in the if statement.The ratfor preprocessor translates this C language type operator to the appropriate Fortran operator. Moreon relationship operators later.In addition, many Fortran compilers permit character strings in quotes (like "x> 100" ). Quotes are not allowedin ANSI Fortran, so ratfor converts it into the right number of Hs.The ratfor language is free form. Statements may appear anywhere on a line, and several may appear onone line if they are separated by semicolons. The previous example could also be written asif (x > 100) {call error("x> 100" )err = 1returnPage 158


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>which shows grouped statements spread over several lines. In this case, no semicolon is needed at the end ofeach line because ratfor assumes there is one statement per line unless told otherwise.Of course, if the statement that follows the if is a single statement, no braces are needed.D. The "if-else" ConstructionThe ratfor language provides an else statement. The syntax of the if-else construction is:if (legal Fortran condition)ratfor statementelseratfor statementwhere the else part is optional. The legal Fortran condition is anything that can legally go into a Fortran LogicalIF statement. The ratfor preprocessor does not check this clause since it does not know enough Fortranto know what is permitted. The ratfor statement is any ratfor or Fortran statement or any collection of themin braces. For example,if (a 100)x = 100will ensure that x is not less than 0 and not greater than 100.Nested if and else statements could result in ambiguous code. In ratfor when there is more if statementsthan else statements, else statements are associated with the closest previous if statement that currently doesPage 1 S9


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 16/82not have an associated else statement. For example:if (x > 0)if(y > 0)wri te( 6,1) x, yelsewrite(6,2) yis interpreted by the ratfor preprocessor asif (x > 0) {if (y > 0)write(6, 1) x, yelsewrite(6, 2) yin which the braces are assumed. If the other association is desired, it must be written asif (x > 0) {if (y > 0)write(6, 1) x, y}elsewrite(6, 2) ywith the braces specified.E. The "switch" StatementThe switch statement provides a way to express multiway branches which branch on the value of someinteger-valued expression. The syntax isswitch (expression) {case exprl:statementscase expr2, expr3:statementsdefault:statementswhere each case is followed by an integer expression (or several integer expressions separated by commas).The switch expression is compared to each case expr until a match is found. Then the statements followingthat case are executed. If no cases match expression, then the statements following default are executed.The default section of a switch is optional.When the statements associated with a case is executed, the entire switch is exited immediately. This issomewhat different than C language.F. The "do" StatementThe do statement in ratfor is quite similar to the DO statement in Fortran except that it uses no statementnumber (braces are used to mark the end of the do instead of a statement number). The syntax of the ratforPage 160


6/82ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>do statement isdo legal-Fortran-DO-textrat/or statementsThe legal-Fortran-DO-text must be something that can legally be used in a Fortran DO statement. Thus if alocal version of Fortran allows DO limits to be expressions (which is not currently permitted in ANSI Fortran),they can be used in a ratfor do statement. The ratfor statements are enclosed in braces; but as with the if,a single statement need not have braces around it. For example, the following code sets an array to 0:and the codedo i = 1, nx(i) = 0.0do i = 1, ndo j = 1, nm(i, j) = 0sets the entire array m to zero.G. The "break" and "next" StatementsThe ratfor break and next statements provide a means for leaving a loop early and one for beginningthe next iteration. The break causes an immediate exit from the do; in effect, it is a branch to the statementafterthe do. The next is a branch to the bottom of the loop, so it causes the next iteration to be done. For example,this code skips over negative values in an arraydo i = 1, n {if (x(i) < 0.0)nextprocess postive elementThe break and next statements will also work in the other ratfor looping constructions and will be discussedwith each looping construction.The break and next can be followed by an integer to indicate breaking or iterating that level of enclosingloop. For example:break 2exits from two levels of enclosing loops, andbreak 1is equivalent to break. Thenext 2iterates the second enclosing loop.Page 161


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 16/82H. The "while" StatementThe ratfor language provides a while statement. The syntax of the while statement iswhile (legal-Fortran-condition)ratfor statementAs with the if, legal-Fortran-condition is something that can go into a Fortran Logical IF, and ratfor statementis a single statement which may be multiple statements enclosed in braces.For example, suppose nextch is a function which returns the next input character both as a function valueand in its argument. Then a while loop to find the first nonblank character could bewhile (nextch(ich) == iblank)where a semicolon by itself is a null statement (which is necessary here to mark the end of the while). If thesemicolon were not present, the while would control the next statement. When the loop is exited, ich containsthe first non blank. -I. The "for" StatementThe for statement is another ratfor loop. A for statement allows explicit initialization and increment stepsas part of the statement.The syntax of the for statement isfor ( init; condition; increment)ratfor statementwhere init is any single Fortran statement which is executed once before the loop begins. The increment is anysingle Fortran statement that is executed at the end of each pass through the loop before the test. The conditionis again anything that is legal in a Fortran Logical IF. Any of init, condition, and increment may be omittedalthough the semicolons must always be present. A nonexistent condition is treated as always true, sofor (;;)is an infinite loop.For example, a Fortran DO loop could be written aswhich is equivalent tofor (i = 1; i


6/82 ISSUE 1<strong>PROGRAMMING</strong> <strong>GUIDE</strong>The increment in a for need not be an arithmetic progression. The programsum = 0.0for (i = first; i > 0; i == ptr(i»)sum = sum + value(i)steps through a list (stored in an integer array ptr) until a zero pointer is found while adding up elements froma parallel array of values. Notice that the code also works correctly if the list is empty.J. The "repeat-until" StatementThere are times when a test needs to be performed at the bottom of a loop after one pass through. This facilityis provided by the repeat-until statement. The syntax for the repeat-until statement isrepeatratior statementuntil (legal-Fortran-condition )where ratior-statement is done once, then the condition is evaluated. If it is true, the loop is exited; if it is false,another pass is made.The until part is optional, so a repeat by itself is an infinite loop. A repeat-until loop can be exited bythe use of a stop, return, or break statement or an implicit stop such as running out of input with a READstatement.As stated before, a break statement causes an immediate exit from the enclosing repeat-until loop. Anext statement will cause a skip to the bottom of a repeat-until loop (Le., to the until part).K. The "return" StatementThe standard Fortran mechanism for returning a value from a routine uses the name of the routine as avariable. This variable can be assigIJed a value. The last value stored in it is the value returned by the function.For example, in a Fortran routine named equal, the statementsequal = 0returncause equal to return zero.The ratfor language provides a return statement similar to the C language return statement. In orderto return a value from any routine, the return statement has the syntaxreturn ( expression )were expression is the value to be returned.If there is no parenthesized expression after return, no value is returned.L. The "define" StatementThe ratfor language provides a define statement similar to the C language version. Any string of alphanumericcharacters can be defined as a name. Whenever that name occurs in the input (delimited bynonalphanumerics), it is replaced by the rest of the definition line. (Comments and trailing white spaces arestripped off.) A defined name can be arbitrarily long and must begin with a letter.Usually the define statement is used for symbolic parameters. The syntax of the define statement isdefine name valuePage 163


<strong>PROGRAMMING</strong> <strong>GUIDE</strong> ISSUE 16/82where name is a symbolic name that represents the quantity of value. For example:define ROWS 100define CLOS 50dimension a(ROWS), b(ROWS, COLS)if (i > ROWS j > COLS) ...causes the preprocessor to replace the name ROWS with the value 100 and the name GOLS with the value 50.Alternately, definitions may be written asdefine(ROWS, 100)in which case the defining text is everything after the comma up to the right parenthesis. This allows multiplelinedefinitions.M. The "include" StatementThe ratfor language provides an include statement similar to the #include < ... > statement in C language.The syntax for this statement isinclude filewhich inserts the contents of the named file into the ratfor input file in place of the include statement. Thestandard usage is to place COMMON blocks on a file and use the include statement to include the commoncode whenever needed.N. Free-Form InputIn ratfor, statements can be placed anywhere on a line. Long statements are continued automatically asare long conditions in if, for, and until statements. Blank lines are ignored. Multiple statements may appearon one line if they are separated by semicolons. No semicolon is needed at the end of a line if ratfor can makesome reasonable guess about whether the statement ends there. Lines ending with any of the characters+ * &are assumed to be continued on the next line. Underscores are discarded wherever they occur. All other charactersremain as part of the statement.Any statement that begins with an all-numeric field is assumed to be a Fortran label and placed in columns1 through 5 upon output. Thus:is converted intoO. Translationswrite(6, 100); 100 format("hello" )write(6, 100)100 format(5hhello)Text enclosed in matching single or double quotes is converted to nH .. but is otherwise unaltered (exceptfor formatting-it may get split across card boundaries during the reformatting process). Within quotedstrings, the backslash (\) serves as an escape character; i.e., the next character is taken literally. This providesa way to get quotes and the backslash itself into quoted strings. For example:" \\\'"Page 164


6/82 ISSUE 1 <strong>PROGRAMMING</strong> <strong>GUIDE</strong>is a string containing a backs lash and an apostrophe. (This is not the standard convention of doubled quotes,but it is easier to use and more general.) -Any line that begins with the character % is left a.bsolutely unaltered except for stripping off the % andmoving the line one position to the left. This is useful for inserting control cards and other things that shouldnot be preprocessed (like an existing Fortran program). Use % only for ordinary statements not for the conditionparts of if, while, etc., or the output may come out in an unexpected place.The following character translations are made (except within single or double quotes or on a line beginningwith a %):.eq.!= .ne .> . gt.>= .ge.< .It.


<strong>PROGRAMMING</strong> <strong>GUIDE</strong>ISSUE 1 6/82NOTESPage 166166 Pages

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!