09.07.2015 Views

OpenMP Application Program Interface

OpenMP Application Program Interface

OpenMP Application Program Interface

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

123456789101112131415161718192021222324252627282.6.1 Parallel loop construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.6.2 parallel sections Construct . . . . . . . . . . . . . . . . . . . . . 462.6.3 parallel workshare Construct . . . . . . . . . . . . . . . . . . . . . . . . 482.7 Master and Synchronization Constructs . . . . . . . . . . . . . . . . . . . . . . 492.7.1 master Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.7.2 critical Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.7.3 barrier Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 522.7.4 atomic Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.7.5 flush Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562.7.6 ordered Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592.8 Data Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.8.1 Sharing Attribute Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.8.1.1 Sharing Attribute Rules for Variables Referenced in aConstruct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.8.1.2 Sharing Attribute Rules for Variables Referenced in aRegion, but not in a Construct . . . . . . . . . . . . . . . 632.8.2 threadprivate Directive . . . . . . . . . . . . . . . . . . . . . . . . . 632.8.3 Data-Sharing Attribute Clauses . . . . . . . . . . . . . . . . . . . . . . 672.8.3.1 default clause . . . . . . . . . . . . . . . . . . . . . . . . . 682.8.3.2 shared clause . . . . . . . . . . . . . . . . . . . . . . . . . . 692.8.3.3 private clause . . . . . . . . . . . . . . . . . . . . . . . . . 712.8.3.4 firstprivate clause . . . . . . . . . . . . . . . . . . . . 732.8.3.5 lastprivate clause . . . . . . . . . . . . . . . . . . . . . 742.8.3.6 reduction clause . . . . . . . . . . . . . . . . . . . . . . . 762.8.4 Data Copying Clauses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802.8.4.1 copyin clause . . . . . . . . . . . . . . . . . . . . . . . . . . 802.8.4.2 copyprivate clause . . . . . . . . . . . . . . . . . . . . 822.9 Nesting of Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8329303. Runtime Library Routines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 853.1 Runtime Library Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8631ii <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789101112131415161718192021222324252627282930A.4 Using the nowait clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117A.5 Using the critical Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118A.6 Using the lastprivate Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . 120A.7 Using the reduction Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121A.8 Using the parallel sections Construct . . . . . . . . . . . . . . . . . . . 125A.9 Using the single Construct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126A.10 Specifying Sequential Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128A.11 Specifying a Fixed Number of Threads . . . . . . . . . . . . . . . . . . . . . . . 129A.12 Using the atomic Directive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131A.13 Illustration of <strong>OpenMP</strong> memory model . . . . . . . . . . . . . . . . . . . . . . . 133A.14 Using the flush Directive with a List . . . . . . . . . . . . . . . . . . . . . . . 135A.15 Using the flush Directive without a List . . . . . . . . . . . . . . . . . . . . . 138A.16 Determining the Number of Threads Used . . . . . . . . . . . . . . . . . . . . 141A.17 Using Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143A.18 Using Nestable Locks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145A.19 Nested Loop Constructs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148A.20 Examples Showing Incorrect Nesting of Regions . . . . . . . . . . . . . . . 150A.21 Binding of barrier Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156A.22 Using the private Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158A.23 Examples of Non-conforming Storage Association . . . . . . . . . . . . . . 159A.24 Using the default(none) Clause . . . . . . . . . . . . . . . . . . . . . . . . . 162A.25 Example of Race Conditions caused by Implied Copies of SharedVariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164A.26 Examples of Syntax of Parallel Loops . . . . . . . . . . . . . . . . . . . . . . . 165A.27 Loop iteration variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167A.28 Examples of the atomic Construct . . . . . . . . . . . . . . . . . . . . . . . . . . 168A.29 Examples of the ordered Construct . . . . . . . . . . . . . . . . . . . . . . . . 170A.30 Examples of threadprivate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172A.31 Example of the private Clause . . . . . . . . . . . . . . . . . . . . . . . . . . . 177A.32 Examples of the Data-sharing Attribute Clauses: shared and private 17931iv <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


Derivatives Market Rules12.13 Call requirements 13012.14 Closing positions of a Client 13112.15 Adjustment on closing a Derivatives Contract of a Client 13112.16 Retention of margins 13112.17 Participant to deposit shortfall of a margin 13112.18 Maintenance of records – F&O Orders and Off-Exchange trades131Section 13: Accounts, Records, Audits and Reporting 13413.1 General requirements for books of accounts and records 13413.2 Reporting requirements 13413.3 Records to be maintained 13613.4 Financial Statements 13613.5 Accounting systems 13613.6 Audit certificate and report 13613.7 Maintenance of records and internal systems 13713.8 Additional requirements for records, reports and internal controls13713.9 Penalties for late filing of returns or records 139Section 14: NZX Powers, Default and Resignation 14014.1 Powers are additional 14014.2 Waivers and rulings 14014.3 Market Maker <strong>Program</strong>me 14014.4 Complaints and investigations 14014.5 Referral to the NZ Markets Disciplinary Tribunal 14114.6 Participant’s obligation to report and facilitate enquiries 14214.7 NZX to investigate 14214.8 NZX powers to inspect and investigate 14314.9 Disclosure of Information from inspection or investigation 14414.10 Participant’s liability for NZX costs 14514.11 Fees and charges 14514.12 Currency conversions and overdue interest 14614.13 General powers 14614.14 Delegation of powers 14714.15 Defaulting Participants 14714.16 Powers on default 15014.17 Suspension 151Derivatives Market Rules – September 2010 Page 7


1vi <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1CHAPTER 12Introduction345678910111213141516171819This document specifies a collection of compiler directives, library routines, andenvironment variables that can be used to specify shared-memory parallelism in C, C++and Fortran programs. The functionality described in this document is collectivelyknown as the <strong>OpenMP</strong> <strong>Application</strong> <strong>Program</strong> <strong>Interface</strong> (<strong>OpenMP</strong> API). The goal of thisspecification is to provide a model for parallel programming that is portable acrossshared memory architectures from different vendors. Compilers from numerous vendorssupport the <strong>OpenMP</strong> API. More information about <strong>OpenMP</strong> can be found at thefollowing web site:http://www.openmp.orgThe directives, library routines, and environment variables defined in this document willallow users to create and manage parallel programs while permitting portability. Thedirectives extend the C, C++ and Fortran base languages with single program multipledata (SPMD) constructs, work-sharing constructs, and synchronization constructs, andthey provide support for the sharing and privatization of data. The functionality tocontrol the runtime environment is provided by library routines and environmentvariables. Compilers that support the <strong>OpenMP</strong> API often include a command line optionto the compiler that activates and allows interpretation of all <strong>OpenMP</strong> directives.2021222324252627281.1 ScopeThe <strong>OpenMP</strong> API covers only user-directed parallelization, wherein the user explicitlyspecifies the actions to be taken by the compiler and run-time system in order to executethe program in parallel. <strong>OpenMP</strong>-compliant implementations are not required to checkfor dependencies, conflicts, deadlocks, race conditions, or other problems that resultfrom non-conforming programs. The user is responsible for using <strong>OpenMP</strong> in hisapplication to produce a conforming program. <strong>OpenMP</strong> does not cover compilergeneratedautomatic parallelization and directives to the compiler to assist suchparallelization.291


11.2 Glossary21.2.1 Threading Concepts345threadthread-safe routineAn execution entity having a serial flow of control and an associated stack.A routine that performs the intended function even when executedconcurrently (by more than one thread).61.2.2 <strong>OpenMP</strong> language terminology7891011121314151617181920212223242526base languageoriginal program<strong>OpenMP</strong> directivewhite space<strong>OpenMP</strong> programdeclarative directiveexecutable directiveA programming language which serves as the foundation of the <strong>OpenMP</strong>specification.COMMENT: Current base languages for <strong>OpenMP</strong> are C90, C99, C++,Fortran 77, Fortran 90, and Fortran 95.A program written in a base language.In C/C++, a #pragma and in Fortran, a comment, that specifies <strong>OpenMP</strong>program behavior.COMMENT: See Section 2.1 on page 16 for a description of <strong>OpenMP</strong>directive syntax.A non-empty sequence of space and/or horizontal tab characters.A program that consists of an original program, annotated with <strong>OpenMP</strong>directives.<strong>OpenMP</strong> directive that may only be placed in a declarative context. Adeclarative directive has no associated executable user code, but instead hasone or more associated user declarations.COMMENT: Only the threadprivate directive is a declarative directive.<strong>OpenMP</strong> directive that is not declarative, i.e., it may be placed in anexecutable context.COMMENT: All directives except threadprivate are executabledirectives.272 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314structured blockFor C/C++, an executable statement, possibly compound, with a single entryat the top and a single exit at the bottom.For Fortran, a block of executable statements with a single entry at the top anda single exit at the bottom.COMMENTS: For both languages, the point of entry cannot be a labeledstatement and the point of exit cannot be a branch of any type.For C/C++:• The point of entry cannot be a call to setjmp().• longjmp() and throw() must not violate the entry/exit criteria.• Calls to exit() are allowed in a structured block.• An expression statement, iteration statement, selection statement, or tryblock is considered to be a structured block if the corresponding compoundstatement obtained by enclosing it in { and } would be a structuredblock.1516For Fortran:• STOP statements are allowed in a structured block.1718192021222324252627282930313233constructregionsequential part<strong>OpenMP</strong> executable directive (and for Fortran, the paired end directive, ifany) and the associated statement, loop or structured block, ifany,notincluding the code in any called routines, i.e., the lexical extent of anexecutable directive.All code encountered during a specific instance of the execution of a givenconstruct, <strong>OpenMP</strong> library routine, or the whole <strong>OpenMP</strong> program itself. Aregion includes any code in called routines as well as any implicit codeintroduced by the <strong>OpenMP</strong> implementation.COMMENTSA region may also be thought of as the dynamic or run-time extent of aconstruct, <strong>OpenMP</strong> library routine, or the whole <strong>OpenMP</strong> program.During the execution of an <strong>OpenMP</strong> program, aconstruct may give rise tomany regions.The set of executable statements not enclosed by any parallel regioncorresponding to an explicit parallel construct.COMMENT: The sequential part executes as if it were enclosed by aninactive parallel region called the implicit parallel region.344 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314151617181920212223242526272829303132binding regionorphaned constructworksharingconstructactive parallel regioninactive parallelregionFor a region whose binding thread set is the current team, the enclosing regionwhich determines the execution context and limits the scope of the effects ofthe bound region.Binding region is not defined for regions whose binding thread set is allthreads or the encountering thread.COMMENTSThe binding region for an ordered region is the innermost enclosing loopregion.For all other regions with whose binding thread set is the current team, thebinding region is the innermost enclosing parallel region.A parallel region need not be active to be a binding region.For regions whose binding region is a parallel region, the innermostenclosing parallel region may be the implicit parallel region enclosingthe sequential part.A region never binds to any region outside of the innermost enclosingparallel region.A construct that gives rise to a region whose binding thread set is the currentteam, but that is not nested within another construct giving rise to the bindingregion.A construct that defines units of work, each of which is executed exactly onceby a thread in the team executing the construct.For C, worksharing constructs are for, sections, and single.For Fortran, worksharing constructs are do, sections, single andworkshare.A parallel region whose if clause evaluates to true.COMMENT: A missing if clause is equivalent to an if clause whichevaluates to true.A parallel region which is not an active parallel region, i.e., a serializedparallel region.An inactive parallel region is always executed by a team of only one thread.336 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910111213implicit parallelregioninitial threadmaster threadteambarrierThe inactive parallel region that encloses the sequential part of an <strong>OpenMP</strong>program.The thread that executes the sequential part.A thread that encounters (the start of) a parallel region and creates a team.A set of one or more threads participating in the execution of a parallelregion.For an active parallel region, the team comprises the master thread andadditional threads that may be launched.For an inactive parallel region, the team only includes the master thread.A point in the execution of a program encountered by a team, beyond whichno thread in the team may execute until all threads in the team haveencountered the point.141.2.3 Data Terminology1516171819202122232425262728293031variableprivate variableshared variableglobal-lifetimememorythreadprivatememoryA data object, whose value can be defined and redefined during the executionof a program.For C/C++, a variable is an identifier, optionally qualified by namespacenames.For Fortran, a variable is a named data object.Only whole objects are considered variables. Array elements, structurecomponents, array sections and substrings are not considered variables.A variable whose name provides access to a different block of storage foreach thread in a team.A variable whose name provides access to the same block of storage for allthreads in a team.Memory locations that persist during the entire execution of the originalprogram, according to the base language specification.Global-lifetime memory locations that are replicated, one per thread, bythe<strong>OpenMP</strong> implementation.32Chapter 1 Introduction 7


1234567891011121314definedFor variables, the property of having a valid value.For Fortran:For the contents of variables, the property of having a valid value. For theallocation or association status of variables, the property of having a validstatus.For C:For the contents of variables, the property of having a valid value.For C++:For the contents of variables of POD type, the property of having a validvalue.For variables of non-POD class type, the property of having been constructedbut not subsequently destructed.COMMENT: <strong>Program</strong>s that rely upon variables that are not defined are nonconformingprograms.151.2.4 Implementation Terminology161718192021222324252627282930supporting n levelsofparallelismsupporting <strong>OpenMP</strong>supporting nestedparallelismconforming programcompliantimplementationImplies allowing an active parallel region to be enclosed by n-1 activeparallel regions, where the team associated with each active parallel regionhas more than one thread.Supporting at least one level of parallelism.Supporting more than one level of parallelism.An <strong>OpenMP</strong> program that follows all the rules and restrictions of the <strong>OpenMP</strong>specifications.An implementation of the <strong>OpenMP</strong> specifications that compiles and executesany conforming program as defined by the specifications.COMMENT: A compliant implementation may exhibit unspecified behaviorwhen compiling or executing a non-conforming program.318 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567unspecified behaviorA behavior or result that is not specified by the <strong>OpenMP</strong> specifications or notknown prior to the compilation or execution of an <strong>OpenMP</strong> program.Such unspecified behavior may result from:• Issues documented by the <strong>OpenMP</strong> specifications as having unspecifiedbehavior.• A non-conforming program.• A conforming program exhibiting an implementation defined behavior.89101112implementationdefinedBehavior that is allowed to vary among different compliant implementations,but must be documented by the implementation. An implementation isallowed to define this behavior as unspecified.COMMENT: All such features are documented in Appendix E.13141516171819202122232425262728293031323334351.3 Execution ModelThe <strong>OpenMP</strong> API uses the fork-join model of parallel execution. Although this fork-joinmodel can be useful for solving a variety of problems, it is somewhat tailored for largearray-based applications. <strong>OpenMP</strong> is intended to support programs that will executecorrectly both as parallel programs (multiple threads of execution and a full <strong>OpenMP</strong>support library) and as sequential programs (directives ignored and a simple <strong>OpenMP</strong>stubs library). However, it is possible and permitted to develop a program that executescorrectly as a parallel program but not as a sequential program, or that producesdifferent results when executed as parallel program, compared to when it is executed asa sequential program. Furthermore, using different numbers of threads may result indifferent numeric results because of changes in the association of numeric operations.For example, a serial addition reduction may have a different pattern of additionassociations than a parallel reduction. These different associations may change theresults of floating-point addition.An <strong>OpenMP</strong> program begins as a single thread of execution, called the initial thread.The initial thread executes sequentially, as if enclosed in an implicit inactive parallelregion surrounding the whole program.When any thread encounters a parallel construct, the thread creates a team of itselfand zero or more additional threads and becomes the master of the new team. Allmembers of the new team execute the code inside the parallel construct. There is animplicit barrier at the end of the parallel construct. Only the master thread continuesexecution of user code beyond the end of the parallel construct. Any number ofparallel constructs can be specified in a single program.36Chapter 1 Introduction 9


12345678910111213141516171819202122parallel regions may be arbitrarily nested inside each other. If nested parallelism isdisabled, or is not supported by the <strong>OpenMP</strong> implementation, then the new team whichis created by a thread encountering a parallel construct inside a parallel regionwill consist only of the encountering thread. However, if nested parallelism is supportedand enabled, then the new team can consist of more than one thread.Certain regions bind to an enclosing parallel region. If a region that binds to aparallel region is used without an enclosing user-specified parallel region, thenit binds to the implicit inactive parallel region surrounding the whole program.When any team encounters a work-sharing construct, the work inside the construct isdivided among the members of the team and executed co-operatively instead of beingexecuted by every thread. There is an optional barrier at the end of work-sharingconstructs. Execution of code by every thread in the team resumes after the end of thework-sharing construct.Synchronization constructs and library routines are available in <strong>OpenMP</strong> to co-ordinatethreads and data in parallel and work-sharing constructs. In addition, libraryroutines and environment variables are available to control or query the runtimeenvironment of <strong>OpenMP</strong> programs.<strong>OpenMP</strong> makes no guarantee that input or output to the same file is synchronous whenexecuted in parallel. In this case, the programmer is responsible for synchronizing inputand output statements (or routines) using the provided synchronization constructs orlibrary routines. For the case where each thread accesses a different file, nosynchronization by the programmer is necessary.231.4 Memory Model24252627282930313233<strong>OpenMP</strong> assumes a relaxed-consistency, shared-memory model by default. All <strong>OpenMP</strong>threads have access to a place to store and retrieve variables, called the memory. Inaddition, each thread has its own temporary view of the memory (this can be thought ofas machine registers or any kind of temporary private storage). The temporary view ofmemory allows the thread to cache variables from memory and to avoid going tomemory for every reference to a variable. Each thread also has access to another type ofmemory that must not be accessed by other threads, called threadprivate memory.Data-sharing attribute clauses may be used on a parallel directive or a work-sharingdirective to specify private variables. The effect of this is to replicate the variable, in thatone copy of the variable is provided per thread in the current team by the3410 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314151617181920212223242526272829303132333435363738implementation. Such a variable resides in memory, and is accessible to the thread forwhom it was created until execution reaches the end of the structured block for thedirective.A private variable in an outer parallel region belonging to, or accessible from, athread that eventually becomes the master thread of an inner nested parallel region,is permitted to be accessed by any of the threads of the team executing the innerparallel region, unless the variable is also private with respect to the innerparallel region. Any other access by one thread to the private variables of anotherthread results in unspecified behavior.The memory model has relaxed-consistency because a thread’s temporary view ofmemory is not required to be consistent with memory at all times. A value written to avariable can remain in the thread’s temporary view until it is forced to memory at a latertime. Likewise, a read from a variable may retrieve the value from the thread’stemporary view, unless it is forced to read from memory. The <strong>OpenMP</strong> flush operationenforces consistency between the temporary view and memory.If a thread has written to its temporary view of a variable since the last flush of thatvariable, then a flush causes the value of that variable in the temporary view to bewritten to the variable in memory. If the thread has not written to the temporary view ofthe variable since the last flush of the variable, then a flush causes the temporary view ofthat variable to be discarded, so that the next read of the variable by that thread will readfrom memory into the temporary view. The flush operation may specify multiplevariables and does not complete until the appropriate flush operation is complete for allspecified variables.The flush operation provides a guarantee of consistency between a thread’s temporaryview and memory. Therefore, flush can be used to guarantee that a value written to avariable by one thread may be read by a second thread. To accomplish this, theprogrammer must ensure that the second thread has not written to the variable since itslast flush of the variable, and that the following sequence of events happens in thespecified order:1. The value is written to the variable by the first thread.2. The variable is flushed by the first thread.3. The variable is flushed by the second thread.4. The value is read from the variable by the second thread.The volatile keyword in the C/C++ languages specifies a consistency mechanismthat is related to the <strong>OpenMP</strong> memory consistency mechanism in the following way:specifying a set of variables that are visible from a certain part of the program asvolatile is equivalent to doing a single flush operation for that set of variables atevery sequence point in that part of the program.39Chapter 1 Introduction 11


12The flush operation can be specified using the flush directive, and is also implied atvarious locations in an <strong>OpenMP</strong> program: see Section 2.7.5 on page 56 for details.3456789101112131415161718192021222324251.5 <strong>OpenMP</strong> ComplianceAn implementation of the <strong>OpenMP</strong> API is compliant if and only if it compiles andexecutes all conforming programs according to the syntax and semantics laid out inChapters 1, 2, 3, 4, and Appendix C. Appendices A, B, D, and E and sections designatedas Notes (see Section 1.7 on page 13) are for information purposes only and are not partof the specification.The <strong>OpenMP</strong> API defines constructs that operate in the context of the base language thatis supported by an implementation. If the base language does not support a languageconstruct that appears in this document, a compliant <strong>OpenMP</strong> implementation is notrequired to support it.All library, intrinsic and built-in routines provided by the base language must be threadsafein a compliant implementation. In addition, the implementation of the baselanguage must also be thread-safe (e.g., ALLOCATE and DEALLOCATE statements mustbe thread-safe in Fortran). Unsynchronized concurrent use of such routines by differentthreads must produce correct results (though not necessarily the same as serial executionresults, as in the case of random number generation routines).In both Fortran 90 and Fortran 95, variables with explicit initialization have the SAVEattribute implicitly. This is not the case in Fortran 77. However, a compliant <strong>OpenMP</strong>Fortran implementation must give such a variable the SAVE attribute, regardless of theunderlying base language version.Appendix E lists certain aspects of the <strong>OpenMP</strong> API that are implementation-defined. Acompliant implementation is required to define and document its behavior for each ofthe items in Appendix E.261.6 Normative References2728• ISO/IEC 9899:1990, Information Technology - <strong>Program</strong>ming Languages - C.This <strong>OpenMP</strong> API specification refers to ISO/IEC 9899:1990 as C90.2912 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12• ISO/IEC 9899:1999, Information Technology - <strong>Program</strong>ming Languages - C.This <strong>OpenMP</strong> API specification refers to ISO/IEC 9899:1999 as C99.34• ISO/IEC 14882:1998, Information Technology - <strong>Program</strong>ming Languages - C++.This <strong>OpenMP</strong> API specification refers to ISO/IEC 14882:1998 as C++.56• ISO/IEC 1539:1980, Information Technology - <strong>Program</strong>ming Languages - FortranThis <strong>OpenMP</strong> API specification refers to ISO/IEC 1539:1980 as Fortran 77.78• ISO/IEC 1539:1991, Information Technology - <strong>Program</strong>ming Languages - FortranThis <strong>OpenMP</strong> API specification refers to ISO/IEC 1539:1991 as Fortran 90.910• ISO/IEC 1539-1:1997, Information Technology - <strong>Program</strong>ming Languages - FortranThis <strong>OpenMP</strong> API specification refers to ISO/IEC 1539-1:1997 as Fortran 95.1112Where this <strong>OpenMP</strong> API specification refers to C, C++ or Fortran, reference is made tothe base language supported by the implementation.131415161718192021221.7 Organization of this documentThe remainder of this document is structured as follows:• Chapter 2: Directives• Chapter 3: Run-time library routines• Chapter 4: Environment variables• Appendix A: Examples• Appendix B: Stubs for the run-time library• Appendix C: <strong>OpenMP</strong> Grammar for C and C++• Appendix D: <strong>Interface</strong> Declaration Module for Fortran• Appendix E: Implementation-defined behaviors23Chapter 1 Introduction 13


123Some sections of this document only apply to programs written in a certain baselanguage. Text which applies only to programs whose base language is C or C++ isshown as follows:456C/C++ specific text....C/C++C/C++78Text which applies only to programs whose base language is Fortran is shown asfollows:91011● F90Fortran specific text......FortranFortran1213141516Where an entire page consists of, for example, Fortran specific text, a marker is shownin the centre of the page like this:FortranSome text is for information only, and is not part of the normative specification. Suchtext is designated as a note, like this:17Note – Non-normative text....1814 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1CHAPTER 22Directives34567891011121314151617181920212223242526● F90● F90This chapter describes the syntax and behavior of <strong>OpenMP</strong> directives, and is dividedinto the following sections:• The language-specific directive format (Section 2.1 on page 16)• Mechanisms to control conditional compilation (Section 2.2 on page 19)• Control of <strong>OpenMP</strong> API internal control variables (Section 2.3 on page 21)• Details of each <strong>OpenMP</strong> directive (Section 2.4 on page 24 to Section 2.9 on page 83)C/C++In C/C++, <strong>OpenMP</strong> directives are specified by using the #pragma mechanism providedby the C and C++ standards.C/C++FortranIn Fortran, <strong>OpenMP</strong> directives are specified by using special comments that areidentified by unique sentinels. Also, a special comment form is available for conditionalcompilation.FortranCompilers can therefore ignore <strong>OpenMP</strong> directives and conditionally compiled code ifsupport of <strong>OpenMP</strong> is not provided or enabled. A compliant implementation mustprovide an option or interface that ensures that underlying support of all <strong>OpenMP</strong>directives and <strong>OpenMP</strong> conditional compilation mechanisms is enabled. In theremainder of this document, the phrase <strong>OpenMP</strong> compilation is used to mean acompilation with these <strong>OpenMP</strong> features enabled.RestrictionsFortranThe following restriction applies to all <strong>OpenMP</strong> directives:2715


12• <strong>OpenMP</strong> directives may not appear in PURE or ELEMENTAL procedures.Fortran3456782.1 Directive FormatC/C++<strong>OpenMP</strong> directives for C/C++ are specified with the pragma preprocessing directive.The syntax of an <strong>OpenMP</strong> directive is formally specified by the grammar inAppendix C, and informally as follows:#pragma omp directive-name [clause[ [,] clause]...] new-line910111213141516171819● F90Each directive starts with #pragma omp. The remainder of the directive follows theconventions of the C and C++ standards for compiler directives. In particular, whitespace can be used before and after the #, and sometimes white space must be used toseparate the words in a directive. Preprocessing tokens following the #pragma ompare subject to macro replacement.Directives are case-sensitive.An <strong>OpenMP</strong> directive applies to at most one succeeding statement, which must be astructured block.C/C++Fortran<strong>OpenMP</strong> directives for Fortran are specified as follows:202122232425262728sentinel directive-name [clause[[,] clause]...]All <strong>OpenMP</strong> compiler directives must begin with a directive sentinel. The format of asentinel differs between fixed and free-form source files, as described in Section 2.1.1 onpage 17 and Section 2.1.2 on page 18.Directives are case-insensitive. Directives cannot be embedded within continuedstatements, and statements cannot be embedded within directives.In order to simplify the presentation, free form is used for the syntax of <strong>OpenMP</strong>directives for Fortran in the remainder of this document, except as noted.Fortran2916 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314151617● F90Only one directive-name can be specified per directive (note that this includes combineddirectives, see Section 2.6 on page 44). The order in which clauses appear in directivesis not significant. Clauses in directives may be repeated as needed, subject to therestrictions listed in the description of each clause.Some data-sharing attribute clauses (Section 2.8.3 on page 67), data copying clauses(Section 2.8.4 on page 80), the threadprivate directive (Section 2.8.2 on page 63),and the flush directive (Section 2.7.5 on page 56) accept a list. Alist consists of acomma-separated collection of one or more list items.C/C++A list item is a variable name, subject to the restrictions specified in each of the sectionsdescribing clauses and directives for which a list appears.C/C++FortranA list item is a variable name or common block name (enclosed in slashes), subject tothe restrictions specified in each of the sections describing clauses and directives forwhich a list appears.Fortran18192021222324252627282930● F90Fortran2.1.1 Fixed Source Form DirectivesThe following sentinels are recognized in fixed form source files:!$omp | c$omp | *$ompSentinels must start in column one and appear as a single word with no interveningwhite space (spaces and/or tab characters). Fortran fixed form line length, white space,continuation, and column rules apply to the directive line. Initial directive lines musthave a space or zero in column six, and continuation directive lines must have acharacter other than a space or a zero in column six.Comments may appear on the same line as a directive. The exclamation point initiates acomment when it appears after column 6. The comment extends to the end of the sourceline and is ignored. If the first nonblank character after the directive sentinel of an initialor continuation directive line is an exclamation point, the line is ignored.3132Note – in the following example, the three formats for specifying the directive areequivalent (the first line represents the position of the first 9 columns):33Chapter 2 Directives 17


123c23456789!$omp parallel do shared(a,b,c)Fortran (cont.)45c$omp parallel doc$omp+shared(a,b,c)6c$omp paralleldoshared(a,b,c)782.1.2 Free Source Form DirectivesThe following sentinel is recognized in free form source files:9101112131415161718192021222324!$ompThe sentinel can appear in any column as long as it is preceded only by white space(spaces and tab characters). It must appear as a single word with no intervening whitespace. Fortran free form line length, white space, and continuation rules apply to thedirective line. Initial directive lines must have a space after the sentinel. Continueddirective lines must have an ampersand as the last nonblank character on the line, priorto any comment placed inside the directive. Continuation directive lines can have anampersand after the directive sentinel with optional white space before and after theampersand.Comments may appear on the same line as a directive. The exclamation point initiates acomment. The comment extends to the end of the source line and is ignored. If the firstnonblank character after the directive sentinel is an exclamation point, the line isignored.One or more blanks or tabs must be used to separate adjacent keywords in directives infree source form, except in the following cases, where white space is optional betweenthe given pair of keywords:25262728end criticalend doend masterend ordered2918 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567end parallelend sectionsend singleend workshareparallel doparallel sectionsparallel workshare89101112Note – in the following example the three formats for specifying the directive areequivalent (the first line represents the position of the first 9 columns):!23456789!$omp parallel do &!$omp shared(a,b,c)1314!$omp parallel &!$omp&do shared(a,b,c)15!$omp paralleldo shared(a,b,c)16Fortran1718192021222.2 Conditional CompilationIn implementations that support a preprocessor, the _OPENMP macro name is defined tohave the decimal value YYYYMM where YYYY and MM are the year and monthdesignations of the version of the <strong>OpenMP</strong> API that the implementation supports.If this macro is the subject of a #define or a #undef preprocessing directive, thebehavior is unspecified.232425● F90FortranThe <strong>OpenMP</strong> Fortran API permits Fortran lines to be compiled conditionally,asdescribed in the following sections.26Chapter 2 Directives 19


12345Fortran (cont.)2.2.1 Fixed Source Form Conditional CompilationSentinelsThe following conditional compilation sentinels are recognized in fixed form sourcefiles:678910111213!$ | *$ | c$The sentinel must start in column 1 and appear as a single word with no interveningwhite space. Fortran fixed form line length, white space, continuation, and column rulesapply to the line. After the sentinel is replaced with two spaces, initial lines must have aspace or zero in column 6 and only white space and numbers in columns 1 through 5.After the sentinel is replaced with two spaces, continuation lines must have a characterother than a space or zero in column 6 and only white space in columns 1 through 5. Ifthese criteria are not met, the line is treated as a comment and ignored.141516171819Note – in the following example, the two forms for specifying conditional compilationin fixed source form are equivalent (the first line represents the position of the first 9columns):c23456789!$ 10 iam = omp_get_thread_num() +!$ & index20212223#ifdef _OPENMP10 iam = omp_get_thread_num() +&index#endif2420 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1232.2.2 Free Source Form Conditional CompilationSentinelThe following conditional compilation sentinel is recognized in free form source files:4567891011!$This sentinel can appear in any column as long as it is preceded only by white space. Itmust appear as a single word with no intervening white space. Fortran free source formline length, white space, and continuation rules apply to the line. Initial lines must havea space after the sentinel. Continued lines must have an ampersand as the last nonblankcharacter on the line, prior to any comment appearing on the conditionally compiledline. Continuation lines can have an ampersand after the sentinel, with optional whitespace before and after the ampersand.121314151617Note – in the following example, the two forms for specifying conditional compilationin free source form are equivalent (the first line represents the position of the first 9columns):c23456789!$ iam = omp_get_thread_num() + &!$& index18192021#ifdef _OPENMPiam = omp_get_thread_num() +index#endif&22Fortran23242526272.3 Internal Control VariablesThe <strong>OpenMP</strong> implementation must act as if there were internal control variables thatstore the information for determining the number of threads to use for a parallel regionand how to schedule a work-sharing loop. The control variables are given values atvarious times (described below) during execution of an <strong>OpenMP</strong> program. They are28Chapter 2 Directives 21


123456789101112131415161718192021initialized by the implementation itself and may be given values by using <strong>OpenMP</strong>environment variables, and by calls to <strong>OpenMP</strong> API routines. The only way for theprogram to retrieve the values of these control variables is by calling <strong>OpenMP</strong> APIroutines.For purposes of exposition, this document refers to the control variables by certainnames (below), but an implementation is not required to use these names or to offer anyway to access the variables other than through the ways shown in Table 2.1.The following control variables store values that affect the operation of parallelregions:• nthreads-var - stores the number of threads requested for future parallel regions.• dyn-var - controls whether dynamic adjustment of the number of threads to be usedfor future parallel regions is enabled.• nest-var - controls whether nested parallelism is enabled for future parallelregions.The following control variables store values that affect the operation of loop regions:• run-sched-var - stores scheduling information to be used for loop regions using theruntime schedule clause.• def-sched-var - stores implementation defined default scheduling information for loopregions.Table 2-1 shows the methods for modifying and retrieving the values of each controlvariable, as well as their initial values.22TABLE 2-1Control variables23Control variable Ways to modify value Way to retrieve value Initial value24nthreads-varOMP_NUM_THREADSomp_set_num_threads()omp_get_max_threads()Implementation defined25dyn-varOMP_DYNAMIComp_set_dynamic()omp_get_dynamic()Implementation defined26nest-varOMP_NESTEDomp_set_nested()omp_get_nested()false2728run-sched-var OMP_SCHEDULE (none) Implementation defineddef-sched-var (none) (none) Implementation defined2930313233The effect of the API routines in Table 2-1 on the internal control variables described inthis specification applies only during the execution of the sequential part of the program.During execution of the sequential part, only one copy of each internal control variablemay exist. The effect of these API routines on the internal control variables isimplementation defined when the API routines are executed from within any explicit3422 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789101112131415parallel region. Additionally, the number of copies of the internal control variables,and their effects, during the execution of any explicit parallel region are implementationdefined.The internal control variables are each given values before any <strong>OpenMP</strong> construct or<strong>OpenMP</strong> API routine executes. The initial values of nthreads-var, dyn-var, run-schedvar,and def-sched-var are implementation defined. The initial value of nest-var is false.After the initial values are assigned, but also before any <strong>OpenMP</strong> construct or <strong>OpenMP</strong>API routine executes, the values of any <strong>OpenMP</strong> environment variables that were set bythe user are read and the associated control variables are modified accordingly. After thispoint, no changes to any <strong>OpenMP</strong> environment variables will be reflected in the controlvariables. During execution of the user’s code, certain control variables can be furthermodified by certain <strong>OpenMP</strong> API routine calls. An <strong>OpenMP</strong> construct clause does notmodify the value of any of these control variables.Table 2-2 shows the override relationships between various construct clauses, <strong>OpenMP</strong>API routines, environment variables, and initial values.16TABLE 2-2Override relationships17construct clause,if used...overrides previous call to<strong>OpenMP</strong> API routine...overrides environmentvariable, if set...overrides initial value18192021num_threads clause omp_set_num_threads() OMP_NUM_THREADS initial value ofnthreads-var(none) omp_set_dynamic() OMP_DYNAMIC initial value of dyn-var(none) omp_set_nested() OMP_NESTED initial value of nest-var222324(none) (none) OMP_SCHEDULE(only used whenschedule kind isruntime)initial value ofrun-sched-var2526schedule clause (none) (none) initial value ofdef-sched-var272829303132333435Cross References:• parallel construct, see Section 2.4 on page 24.• Loop construct, see Section 2.5.1 on page 31.• omp_set_num_threads routine, see Section 3.2.1 on page 87.• omp_set_dynamic routine, see Section 3.2.7 on page 93.• omp_set_nested routine, see Section 3.2.9 on page 96.• omp_get_max_threads routine, see Section 3.2.3 on page 90.• omp_get_dynamic routine, see Section 3.2.8 on page 95.• omp_get_nested routine, see Section 3.2.10 on page 97.36Chapter 2 Directives 23


1234• OMP_NUM_THREADS environment variable, see Section 4.2 on page 109.• OMP_DYNAMIC environment variable, see Section 4.3 on page 110.• OMP_NESTED environment variable, see Section 4.4 on page 110.• OMP_SCHEDULE environment variable, see Section 4.1 on page 108.52.4 parallel Construct678SummaryThis is the fundamental construct that starts parallel execution. See Section 1.3 on page9 for a general description of the parallel execution model.91011SyntaxC/C++The syntax of the parallel construct is as follows:121314#pragma omp parallel [clause[ [, ]clause] ...] new-linestructured-blockwhere clause is one of the following:1516171819202122if(scalar-expression)private(list)firstprivate(list)default(shared | none)shared(list)copyin(list)reduction(operator: list)num_threads(integer-expression)23C/C++2424 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12● F90FortranThe syntax of the parallel construct is as follows:345!$omp parallel [clause[[,] clause]...]structured-block!$omp end parallel6where clause is one of the following:7891011121314if(scalar-logical-expression)private(list)firstprivate(list)default(private | shared | none)shared(list)copyin(list)reduction({operator|intrinsic_procedure_name}:list)num_threads(scalar-integer-expression)1516The end parallel directive denotes the end of the parallel construct.Fortran171819BindingThe binding thread set for a parallel region is the encountering thread. Theencountering thread becomes the master thread of the new team.202122232425262728DescriptionWhen a thread encounters a parallel construct, a team of threads is created toexecute the parallel region (see Section 2.4.1 on page 27 for more information abouthow the number of threads in the team is determined, including the evaluation of the ifand num_threads clauses). The thread which encountered the parallel constructbecomes the master thread of the new team, with a thread number of zero for theduraration of the parallel region. All threads in the new team, including the masterthread, execute the region. Once the team is created, the number of threads in the teamremains constant for the duration of that parallel region.29Chapter 2 Directives 25


12345678910111213141516Within a parallel region, thread numbers uniquely identify each thread. Threadnumbers are consecutive whole numbers ranging from zero for the master thread up toone less than the number of threads within the team. A thread may obtain its own threadnumber by a call to the omp_get_thread_num library routine.The structured block of the parallel construct is executed by each thread, althougheach thread can execute a path of statements that is different from the other threads.There is an implied barrier at the end of a parallel region. Only the master thread ofthe team continues execution after the end of a parallel region.If a thread in a team executing a parallel region encounters another paralleldirective, it creates a new team, according to the rules in Section 2.4.1 on page 27, andit becomes the master of that new team.If execution of a thread terminates while inside a parallel region, execution of allthreads in all teams terminates. The order of termination of threads is unspecified. Allthe work done by a team prior to any barrier which the team has passed in the programis guaranteed to be complete. The amount of work done by each thread after the lastbarrier that it passed and before it terminates is unspecified.1718192021222324252627282930313233● F90RestrictionsRestrictions to the parallel construct are as follows:• A program which branches into or out of a parallel region is non-conforming.• A program must not depend on any ordering of the evaluations of the clauses of theparallel directive, or on any side effects of the evaluations of the clauses.• At most one if clause can appear on the directive.• At most one num_threads clause can appear on the directive. The num_threadsexpression must evaluate to a positive integer value.C/C++• A throw executed inside a parallel region must cause execution to resumewithin the same parallel region, and it must be caught by the same thread thatthrew the exception.C/C++Fortran• Unsynchronized use of Fortran I/O statements by multiple threads on the same unithas unspecified behavior.Fortran3426 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345Cross References• default, shared, private, firstprivate, and reduction clauses, seeSection 2.8.3 on page 67.• copyin clause, see Section 2.8.4 on page 80.• omp_get_thread_num routine, see Section 3.2.4 on page 91.6789101112131415161718192021222324252627282930313233343536372.4.1 Determining the Number of Threads for aparallel RegionWhen execution encounters a parallel directive, the value of the if clause ornum_threads clause (if any) on the directive, the current parallel context, the numberof levels of parallelism supported, and the values of the nthreads-var, dyn-var and nestvarinternal control variables are used to determine the number of threads to use in theregion. Figure 2-1 describes how the number of threads is determined. The if clauseexpression and the num_threads clause expression are evaluated in the contextoutside of the parallel construct, and no ordering of those evaluations is specified. Itis also unspecified whether, in what order, or how many times any side-effects of theevaluation of the num_threads or if clause expressions occur.If nested parallelism is disabled, or no further levels of parallelism are supported by the<strong>OpenMP</strong> implementation, then the new team which is created by a thread encounteringa parallel construct inside an active parallel region will consist only of theencountering thread. However, if nested parallelism is enabled and additional levels ofparallelism are supported, then the new team can consist of more than one thread.The number of levels of parallelism supported is implementation defined. If only onelevel of parallelism is supported (i.e. nested parallelism is not supported) then the valueof the nest-var internal control variable is always false.If dynamic adjustment of the number of threads is enabled, the number of threads thatare used for executing subsequent parallel regions may be adjusted automatically bythe runtime environment. Once the number of threads is determined, it remains fixed forthe duration of that parallel region. If dynamic adjustment of the number of threadsis disabled, the number of threads that are used for executing subsequent parallelregions may not be so adjusted by the runtime environment.It is implementation defined whether the ability to dynamically adjust the number ofthreads is provided. If this ability is not provided, then the value of the dyn-var internalcontrol variable is always false.Implementations may deliver fewer threads than indicated in Figure 2-1, in exceptionalsituations, such as when there is a lack of resources, even if dynamic adjustment isdisabled. In these exceptional situations the behavior of the program is implementationdefined: this may, for example, include interrupting program execution.38Chapter 2 Directives 27


123Note – Since the initial value of the dyn-var internal control variable is implementationdefined, programs that depend on a specific number of threads for correct executionshould explicitly disable dynamic adjustment of the number of threads.428 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12FIGURE 2-1Determining the number of threads for a parallel region. Note that no ordering ofevaluation of the if and num_threads clauses is implied.3START45if clausepresent?Yesif clausevaluefalseUse 1 thread67Notrue8910All enclosingparallel regionsinactive?Nonest-varvaluefalseUse 1 thread1112Yestrue1314YesAnother level ofparallelism supported?NoUse 1 thread1516dyn-varvaluefalsenum_threadsclause present?NoUse nthreads-varthreads171819trueYesUse num_threadsvalue threads202122num_threadsclause present?NoUse between 1 andnthreads-var threads,inclusive232425YesUse between 1 andnum_threads valuethreads, inclusive26Chapter 2 Directives 29


123456789101112132.5 Work-sharing ConstructsA work-sharing construct distributes the execution of the associated region among themembers of the team that encounters it. A work-sharing region must bind to an activeparallel region in order for the work-sharing region to execute in parallel. Ifexecution encounters a work-sharing region in the sequential part, it is executed by theinitial thread.A work-sharing construct does not launch new threads, and a work-sharing region hasno barrier on entry. However, an implied barrier exists at the end of the work-sharingregion, unless a nowait clause is specified. If a nowait clause is present, animplementation may omit code to synchronize the threads at the end of the work-sharingregion. In this case, threads that finish early may proceed straight to the instructionsfollowing the work-sharing region without waiting for the other members of the team tofinish the work-sharing region (see Section A.4 on page 117 for an example.)1415161718192021● F90<strong>OpenMP</strong> defines the following work-sharing constructs, and these are described in thesections that follow:• loop construct• sections construct• single construct• workshare constructFortranFortran222324252627RestrictionsThe following restrictions apply to work-sharing constructs:• Each work-sharing region must be encountered by all threads in a team or by none atall.• The sequence of work-sharing regions and barrier regions encountered must bethe same for every thread in a team.2830 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12.5.1 Loop Construct2345SummaryThe loop construct specifies that the iterations of the associated loop will be executed inparallel. The iterations of the loop are distributed across threads that already exist in theteam executing the parallel region to which the loop region binds.678SyntaxC/C++The syntax of the loop construct is as follows:91011#pragma omp for [clause[[,] clause] ... ] new-linefor-loopThe clause is one of the following:12131415161718private(list)firstprivate(list)lastprivate(list)reduction(operator: list)orderedschedule(kind[, chunk_size])nowait19Chapter 2 Directives 31


12345678910111213141516171819202122232425262728293031323334353637The for directive places restrictions on the structure of the corresponding for-loop.Specifically, the corresponding for-loop must have the following canonical form:for (init-expr; var relational-op b; incr-expr)init-expr One of the following:var = lbinteger-type var = lbincr-expr One of the following:++varvar++--varvar--var += incrvar -= incrvar = var + incrvar = incr + varvar = var - incrvarA signed integer variable. If this variable would otherwise beshared, it is implicitly made private on the construct. Thisvariable must not be modified during the execution of the forstatement, other than in incr-expr. Unless the variable isspecified lastprivate, its value after the loop isindeterminate.relational-op One of the following:=lb, b, and incr Loop invariant integer expressions. There is no synchronizationduring the evaluation of these expressions. It is unspecifiedwhether, in what order, or how many times any side effectswithin the lb, b, orincr expressions occur.Note that the canonical form allows the number of loop iterations to be computed onentry to the loop. This computation is performed with values in the type of var, afterintegral promotions. In particular, if the value of b - lb + incr, or any intermediate resultrequired to compute this value, cannot be represented in that type, the behavior isunspecified.C/C++3832 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12● F90FortranThe syntax of the loop construct is as follows:3456!$omp do [clause[[,] clause] ... ]do-loop[!$omp end do [nowait] ]The clause is one of the following:78910111213141516171819202122private(list)firstprivate(list)lastprivate(list)reduction({operator|intrinsic_procedure_name}:list)orderedschedule(kind[, chunk_size])If an end do directive is not specified, an end do directive is assumed at the end of thedo-loop.The do-loop must be a do-construct as defined in Section 8.1.4.1 on page 126 of theFortran 95 standard. If an end do directive follows a do-construct in which several DOstatements share a DO termination statement, then a do directive can only be specifiedfor the first (i.e. outermost) of these DO statements. See Section A.24 for examples.If the loop iteration variable would otherwise be shared, it is implicitly made private onthe construct. Unless the variable is specified lastprivate, its value after the loop isindeterminate.Fortran2324252627282930BindingThe binding thread set for a loop region is the current team. A loop region binds to theinnermost enclosing parallel region. Only the threads of the team executing thebinding parallel region participate in the execution of the loop iterations and(optional) implicit barrier of the loop region.DescriptionThere is an implicit barrier at the end of a loop construct unless a nowait clause isspecified.31Chapter 2 Directives 33


123456789The schedule clause specifies how iterations of the loop are divided into contiguousnon-empty subsets, called chunks, and how these chunks are assigned among threads ofthe team. The correctness of a program should not depend on which thread executes aparticular iteration. The chunk_size expression is evaluated in the context outside of theloop construct. It is unspecified whether, in what order, or how many times, any sideeffectsof the evaluation of this expression occur.See Section 2.5.1.1 on page 36 for details of how the schedule for a work-sharing loopis determined.The schedule kind can be one of the following:101112131415161718192021222324252627282930313233343536373839TABLE 2-3staticdynamicguidedruntimeschedule clause kind valuesWhen schedule(static, chunk_size) is specified, iterations are dividedinto chunks of size chunk_size, and the chunks are statically assigned tothreads in the team in a round-robin fashion in the order of the thread number.Note that the last chunk to be assigned may have a smaller number ofiterations.When no chunk_size is specified, the iteration space is divided into chunkswhich are approximately equal in size, and such that each thread is assigned atmost one chunk.When schedule(dynamic, chunk_size)is specified, the iterations aredivided into chunks of size chunk_size, and each chunk is assigned when athread requests it. The thread executes the chunk of iterations, thenrequests another chunk, until no chunks remain to be assigned. Note that thelast chunk to be assigned may have a smaller number of iterations.When no chunk_size is specified, it defaults to 1.When schedule(guided, chunk_size) is specified, the iterations areassigned to threads in chunks as the threads request them, with chunksizes decreasing to chunk_size (except that the last chunk may have asmaller size). The thread executes the chunk of iterations, thenrequests another chunk, until no chunks remain to be assigned.For a chunk_size of 1, the size of each chunk is proportional to thenumber of unassigned iterations divided by the number of threads,decreasing to 1. For a chunk_size with value k (greater than 1), thesize of each chunk is determined in the same way with the restrictionthat the chunks do not contain fewer than k iterations (except possiblyfor the last chunk).When no chunk_size is specified, it defaults to 1.When schedule(runtime)is specified, the decision regarding schedulingis deferred until runtime, and the schedule and chunk size are taken from therun-sched-var control variable.4034 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789101112Note – For a team of p threads and a loop of n iterations, let ceiling(n/p) be the integerq which satisfies n=p*q-r,with 0


1234567● F90Fortran• The do-loop must be a structured block, and in addition, its execution must not beterminated by an EXIT statement.• The do-loop iteration variable must be of type integer.• If used, the end do directive must appear immediately after the end of the do-loop.• The do-loop cannot be a DO WHILE or a DO loop without loop control.Fortran89101112Cross References• private, firstprivate, lastprivate, and reduction clauses, seeSection 2.8.3 on page 67.• OMP_SCHEDULE environment variable, see Section 4.1 on page 108.• ordered construct, see Section 2.7.6 on page 59.131415161718192021222.5.1.1 Determining the Schedule of a Work-sharing LoopWhen execution encounters a loop directive, the schedule clause (if any) on thedirective, and the run-sched-var and def-sched-var internal control variables are used todetermine how loop iterations are assigned to threads. See Section 2.3 on page 21 fordetails of how the values of the internal control variables are determined. If noschedule clause is used on the work-sharing loop directive, then the schedule is takenfrom the current value of def-sched-var. If the schedule clause is used and specifiesthe runtime schedule kind, then the schedule is taken from the run-sched-var controlvariable. Otherwise, the schedule is taken from the value of the schedule clause.Figure 2-2 describes how the schedule for a work-sharing loop is determined.2324Cross References• Internal control variables, see Section 2.3 on page 21.2536 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12FIGURE 2-2Determining the schedule for a work-sharing loop.START34scheduleclause present?NoUse def-sched-var schedule kind5Yes67schedule kindvalue is runtime?NoUse schedule kind specified inschedule clause89YesUse run-sched-var schedule kind102.5.2 sections Construct11121314SummaryThe sections construct is a noniterative work-sharing construct that contains a set ofstructured blocks that are to be divided among, and executed by, the threads in a team.Each structured block is executed once by a thread in the team.15Chapter 2 Directives 37


123SyntaxC/C++The syntax of the sections construct is as follows:4567891011#pragma omp sections [clause[[,] clause] ...] new-line{[#pragma omp section new-line]structured-block[#pragma omp section new-linestructured-block ]...}12The clause is one of the following:1314151617private(list)firstprivate(list)lastprivate(list)reduction(operator: list)nowait181920● F90C/C++FortranThe syntax of the sections construct is as follows:21222324252627!$omp sections [clause[[,] clause] ...][!$omp section]structured-block[!$omp sectionstructured-block ]...!$omp end sections [nowait]2838 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345The clause is one of the following:private(list)firstprivate(list)lastprivate(list)reduction({operator|intrinsic_procedure_name}:list)67891011121314151617181920212223242526BindingFortranThe binding thread set for a sections region is the current team. A sectionsregion binds to the innermost enclosing parallel region. Only the threads of theteam executing the binding parallel region participate in the execution of thestructured blocks and (optional) implicit barrier of the sections region.DescriptionEach structured block is preceded by a section directive except possibly the firstblock, for which a preceding section directive is optional.The method of scheduling the structured blocks among threads in the team isimplementation defined.There is an implicit barrier at the end of a sections construct, unless a nowaitclause is specified.RestrictionsRestrictions to the sections construct are as follows:• The section directives must appear within the sections construct and notelsewhere in the sections region.• The code enclosed in a sections construct must be a structured block.C/C++• Only a single nowait clause can appear on a sections directive.C/C++27Chapter 2 Directives 39


123Cross References• private, firstprivate, lastprivate, and reduction clauses, seeSection 2.8.3 on page 67.42.5.3 single Construct56789SummaryThe single construct specifies that the associated structured block is executed by onlyone thread in the team (not necessarily the master thread). The other threads in the teamdo not execute the block, and wait at an implicit barrier at the end of single construct,unless a nowait clause is specified.101112SyntaxC/C++The syntax of the single construct is as follows:131415#pragma omp single [clause[[,] clause] ...] new-linestructured-blockThe clause is one of the following:16171819private(list)firstprivate(list)copyprivate(list)nowait20C/C++2122● F90FortranThe syntax of the single constuct is as follows:232425!$omp single [clause[[,] clause] ...]structured-block!$omp end single [end_clause[[,] end_clause] ...]2640 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1The clause is one of the following:234private(list)firstprivate(list)and end_clause is one of the following:56copyprivate(list)nowait789101112BindingFortranThe binding thread set for a single region is the current team. A single regionbinds to the innermost enclosing parallel region. Only the threads of the teamexecuting the binding parallel region participate in the execution of the structuredblock and (optional) implicit barrier of the single region.13141516DescriptionThe method of choosing a thread to execute the structured block is implementationdefined. There is an implicit barrier after the single construct unless a nowait clauseis specified.17181920RestrictionsRestrictions to the single construct are as follows:• The copyprivate clause must not be used with the nowait clause.• At most one nowait clause can appear on a single construct.212223Cross References• private and firstprivate clauses, see Section 2.8.3 on page 67.• copyprivate clauses, see Section 2.8.4 on page 80.24Chapter 2 Directives 41


12345678910111213141516171819202122232425262728● F902.5.4 workshare ConstructSummaryFortranThe workshare construct divides the execution of the enclosed structured block intoseparate units of work, and causes the threads of the team to share the work such thateach unit is executed only once.SyntaxThe syntax of the workshare construct is as follows:!$omp worksharestructured-block!$omp end workshare [nowait]The enclosed structured block must consist of only the following:• array assignments• scalar assignments• FORALL statements• FORALL constructs• WHERE statements• WHERE constructs• atomic directives• critical constructs• parallel constructsStatements contained in any enclosed critical construct are also subject to theserestrictions. Statements in any enclosed parallel construct are not restricted.BindingThe binding thread set for a workshare region is the current team. A workshareregion binds to the innermost enclosing parallel region. Only the threads of theteam executing the binding parallel region participate in the execution of the unitsof work and (optional) implicit barrier of the workshare region.2942 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1Fortran (cont.)23456789101112131415161718192021222324252627282930313233343536DescriptionThere is an implicit barrier at the end of a workshare construct unless a nowaitclause is specified.An implementation of the workshare construct must insert any synchronization that isrequired to maintain standard Fortran semantics. For example, the effects of onestatement within the structured block must appear to occur before the execution ofsucceeding statements, and the evaluation of the right hand side of an assignment mustappear to have been completed prior to the effects of assigning to the left hand side.The statements in the workshare construct are divided into units of work as follows:• For array expressions within each statement, including transformational arrayintrinsic functions that compute scalar values from arrays:• Evaluation of each element of the array expression, including any references toELEMENTAL functions, is a unit of work.• Evaluation of transformational array intrinsic functions may be freely subdividedinto any number of units of work.• If a workshare directive is applied to an array assignment statement, theassignment of each element is a unit of work.• If a workshare directive is applied to a scalar assignment statement, theassignment operation is a single unit of work.• If a workshare directive is applied to a WHERE statement or construct, theevaluation of the mask expression and the masked assignments are workshared.• If a workshare directive is applied to a FORALL statement or construct, theevaluation of the mask expression, expressions occurring in the specification of theiteration space, and the masked assignments are workshared.• For atomic directives and their corresponding assignments, the update of eachscalar variable is a single unit of work.• For critical constructs, each construct is a single unit of work.• For parallel constructs, each construct is a single unit of work with respect to theworkshare construct. The statements contained in parallel constructs areexecuted by new teams of threads formed for each parallel construct.• If none of the rules above apply to a portion of a statement in block, then that portionis a single unit of work.The transformational array intrinsic functions are MATMUL, DOT_PRODUCT, SUM,PRODUCT, MAXVAL, MINVAL, COUNT, ANY, ALL, SPREAD, PACK, UNPACK,RESHAPE, TRANSPOSE, EOSHIFT, CSHIFT, MINLOC, and MAXLOC.37Chapter 2 Directives 43


1234567891011121314It is unspecified how the units of work are assigned to the threads executing aworkshare region.If an array expression in the block references the value, association status, or allocationstatus of private variables, the value of the expression is undefined, unless the samevalue would be computed by every thread.If an array assignment, a scalar assignment, a masked array assignment, or a FORALLassignment assigns to a private variable in the block, the result is unspecified.The workshare directive causes the sharing of work to occur only in the lexicallyenclosed block.RestrictionsThe following restrictions apply to the workshare directive:• The construct must not contain any user defined function calls unless the function isELEMENTAL.Fortran15161718192021222324252627282930312.6 Combined Parallel Work-sharingConstructs● F90Combined parallel work–sharing constructs are shortcuts for specifying a work sharingconstruct nested immediately inside a parallel construct. The semantics of thesedirectives are identical to that of explicitly specifying a parallel construct containingone work-sharing construct and no other statements.The combined parallel work-sharing constructs admit certain clauses which arepermitted on both parallel constructs and on work-sharing constructs. If a programwould have different behavior depending on whether the clause were applied to theparallel construct or to the work-sharing construct, then the program’s behavior isunspecified.The following sections describe the combined parallel work-sharing constructs:• The parallel loop construct.• The parallel sections construct.Fortran• The parallel workshare construct.Fortran3244 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011122.6.1 Parallel loop constructSummaryThe parallel loop construct is a shortcut for specifying a parallel constructcontaining one loop construct and no other statements.SyntaxC/C++The syntax of the parallel loop construct is as follows:#pragma omp parallel for [clause[[,] clause] ...] new-linefor-loopThe clause can be one of the clauses accepted by the parallel and for directives,except the nowait clause, with identical meanings and restrictions.C/C++1314● F90FortranThe syntax of the parallel loop construct is as follows:15161718192021222324!$omp parallel do [clause[[,] clause] ...]do-loop[!$omp end parallel do]The clause can be one of the clauses accepted by the parallel and do directives,with identical meanings and restrictions. However, nowait may not be specified on anend parallel do directive.If an end parallel do directive is not specified, an end parallel do directive isassumed at the end of the do-loop. If used, the end parallel do directive mustappear immediately after the end of the do-loop.Fortran2526272829DescriptionC/C++The semantics are identical to explicitly specifying a parallel directive immediatelyfollowed by a for directive.C/C++30Chapter 2 Directives 45


1234567891011● F90FortranThe semantics are identical to explicitly specifying a parallel directive immediatelyfollowed by a do directive, and an end do directive immediately followed by an endparallel directive.FortranRestrictionsThe restrictions for the parallel construct and the loop construct apply.Cross References• parallel construct, see Section 2.4 on page 24.• loop construct, see Section 2.5.1 on page 31.• Data attribute clauses, see Section 2.8.3 on page 67.1213141516171819202122232425262728292.6.2 parallel sections ConstructSummaryThe parallel sections construct is a shortcut for specifying a parallelconstruct containing one sections construct and no other statements.SyntaxC/C++The syntax of the parallel sections contruct is as follows:#pragma omp parallel sections [clause[[,] clause] ...] new-line{[#pragma omp section new-line]structured-block[#pragma omp section new-linestructured-block ]...}The clause can be one of the clauses accepted by the parallel and sectionsdirectives, except the nowait clause, with identical meanings and restrictions.C/C++3046 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12● F90FortranThe syntax of the parallel sections construct is as follows:34567891011121314!$omp parallel sections [clause[[,] clause] ...][!$omp section]structured-block[!$omp sectionstructured-block ]...!$omp end parallel sectionsThe clause can be one of the clauses accepted by the parallel and sectionsdirectives, with identical meanings and restrictions. However, nowait cannot bespecified on an end parallel sections directive.The last section ends at the end parallel sections directive.Fortran15161718192021222324252627282930● F90DescriptionC/C++The semantics are identical to explicitly specifying a parallel directive immediatelyfollowed by a sections directive.C/C++FortranThe semantics are identical to explicitly specifying a parallel directive immediatelyfollowed by a sections directive, and an end sections directive immediatelyfollowed by an end parallel directive.FortranRestrictionsThe restrictions for the parallel construct and the sections construct apply.Cross References:• parallel construct, see Section 2.4 on page 24.• sections construct, see Section 2.5.2 on page 37.• Data attribute clauses, see Section 2.8.3 on page 67.31Chapter 2 Directives 47


123456789101112131415161718192021222324● F90Fortran2.6.3 parallel workshare ConstructSummaryThe parallel workshare construct is a shortcut for specifying a parallelconstruct containing one workshare construct and no other statements.SyntaxThe syntax of the parallel workshare construct is as follows:!$omp parallel workshare [clause[[,] clause] ...]structured-block!$omp end parallel workshareThe clause can be one of the clauses accepted by the parallel directive, withidentical meanings and restrictions. However, nowait may not be specified on an endparallel workshare directive.DescriptionThe semantics are identical to explicitly specifying a parallel directive immediatelyfollowed by a workshare directive, and an end workshare directive immediatelyfollowed by an end parallel directive.RestrictionsThe restrictions for the parallel construct and the workshare construct apply.Cross References• parallel construct, see Section 2.4 on page 24.• workshare construct, see Section 2.5.4 on page 42.• Data attribute clauses, see Section 2.8.3 on page 67.Fortran2548 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456782.7 Master and Synchronization ConstructsThe following sections describe :• the master construct.• the critical construct.• the barrier construct.• the atomic construct.• the flush construct.• the ordered construct.92.7.1 master Construct101112SummaryThe master construct specifies a structured block that is executed by the master threadof the team.131415SyntaxC/C++The syntax of the master construct is as follows:1617#pragma omp master new-linestructured-block18C/C++1920● F90FortranThe syntax of the master construct is as follows:212223!$omp masterstructured-block!$omp end master24Fortran25Chapter 2 Directives 49


12345BindingThe binding thread set for a master region is the current team. A master regionbinds to the innermost enclosing parallel region. Only the master thread of the teamexecuting the binding parallel region participates in the execution of the structuredblock of the master region.678DescriptionOther threads in the team do not execute the associated structured block. There is noimplied barrier either on entry to, or exit from, the master construct.92.7.2 critical Construct101112SummaryThe critical construct restricts execution of the associated structured block to asingle thread at a time.131415SyntaxC/C++The syntax of the critical construct is as follows:1617#pragma omp critical [(name)] new-linestructured-block18C/C++1920● F90FortranThe syntax of the critical construct is as follows:212223!$omp critical [(name)]structured-block!$omp end critical [(name)]24Fortran2550 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314151617181920● F90BindingThe binding thread set for a critical region is all threads. Region execution isrestricted to a single thread at a time among all the threads in the program, withoutregard to the team(s) to which the threads belong.DescriptionAn optional name may be used to identify the critical construct. All criticalconstructs without a name are considered to have the same unspecified name. A threadwaits at the beginning of a critical region until no other thread is executing acritical region with the same name. The critical construct enforces exclusiveaccess with respect to all critical constructs with the same name in all threads, notjust in the current team.C/C++Identifiers used to identify a critical construct have external linkage and are in aname space which is separate from the name spaces used by labels, tags, members, andordinary identifiers.C/C++FortranThe names of critical constructs are global entities of the program. If a nameconflicts with any other entity, the behavior of the program is unspecified.Fortran2122232425262728● F90RestrictionsFortranThe following restrictions apply to the critical construct:• If a name is specified on a critical directive, the same name must also bespecified on the end critical directive.• If no name appears on the critical directive, no name can appear on the endcritical directive.Fortran29Chapter 2 Directives 51


123456782.7.3 barrier ConstructSummaryThe barrier construct specifies an explicit barrier at the point at which the constructappears.SyntaxC/C++The syntax of the barrier construct is as follows:#pragma omp barrier new-line91011121314151617● F90Note that because the barrier construct does not have a C language statement as partof its syntax, there are some restrictions on its placement within a program. Thebarrier directive may only be placed in the program at a position where ignoring ordeleting the directive would result in a program with correct syntax. See Appendix C forthe formal grammar. The examples in Section A.40 on page 194 illustrate theserestrictions.C/C++FortranThe syntax of the barrier construct is as follows:18!$omp barrier19202122232425BindingFortranThe binding thread set for a barrier region is the current team. A barrier regionbinds to the innermost enclosing parallel region.DescriptionAll of the threads of the team executing the binding parallel region must execute thebarrier region before any are allowed to continue execution beyond the barrier.2652 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345RestrictionsThe following restrictions apply to the barrier construct:• Each barrier region must be encountered by all threads in a team or by none at all.• The sequence of work-sharing regions and barrier regions encountered must bethe same for every thread in a team.62.7.4 atomic Construct789SummaryThe atomic construct ensures that a specific storage location is updated atomically,rather than exposing it to the possibility of multiple, simultaneous writing threads.101112SyntaxC/C++The syntax of the atomic construct is as follows:131415#pragma omp atomic new-lineexpression-stmtexpression-stmt is an expression statement with one of the following forms:1617181920212223242526x binop= exprx++++xx----xIn the preceding expressions:• x is an lvalue expression with scalar type.• expr is an expression with scalar type, and it does not reference the object designatedby x.• binop is not an overloaded operator and is one of +, *, -, /, &, ^, |, .C/C++27Chapter 2 Directives 53


12● F90FortranThe syntax of the atomic construct is as follows:34!$omp atomicstatement5678910111213141516171819202122232425where statement has one of the following forms:x = x operator exprx = expr operator xx = intrinsic_procedure_name (x, expr_list)x = intrinsic_procedure_name (expr_list, x)In the preceding statements:• x is a scalar variable of intrinsic type.• expr is a scalar expression that does not reference x.• expr_list is a comma-separated, non-empty list of scalar expressions that do notreference x. When intrinsic_procedure_name refers to IAND, IOR, orIEOR, exactlyone expression must appear in expr_list.• intrinsic_procedure_name is one of MAX, MIN, IAND, IOR, orIEOR.• operator is one of +, *, -, /, .AND., .OR., .EQV., or.NEQV. .• The operators in expr must have precedence equal to or greater than the precedenceof operator, x operator expr must be mathematically equivalent to x operator (expr),and expr operator x must be mathematically equivalent to (expr) operator x.• intrinsic_procedure_name must refer to the intrinsic procedure name and not to otherprogram entities.• operator must refer to the intrinsic operator and not to a user-defined operator.• The assignment must be intrinsic assignment.Fortran2627282930BindingThe binding thread set for an atomic region is all threads. atomic regions enforceexclusive access with respect to other atomic regions that update the same storagelocation x among all the threads in the program without regard to the team(s) to whichthe threads belong.3154 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678DescriptionOnly the load and store of the object designated by x are atomic; the evaluation of expris not atomic. To avoid race conditions, all updates of the location which couldpotentially occur in parallel must be protected with an atomic directive. atomicregions do not enforce exclusive access with respect to any critical or orderedregions which access the same storage location x.It is implementation defined whether an implementation replaces all atomic contructswith critical constructs that have the same unique name.91011Note – Although such a replacement is compliant, the atomic construct permits betteroptimization. Hardware instructions may be available that can perform the atomicupdate with lower overhead.12131415161718192021222324● F90RestrictionsC/C++The following restriction applies to the atomic construct:• All atomic references to the storage location x throughout the program are required tohave a compatible type.C/C++FortranThe following restriction applies to the atomic construct:• All atomic references to the storage location of variable x throughout the program arerequired to have the same type and type parameters.FortranCross References• critical construct, see Section 2.7.2 on page 50.25Chapter 2 Directives 55


12.7.5 flush Construct23456SummaryThe flush construct executes the <strong>OpenMP</strong> flush operation. This operation makes athread’s temporary view of memory become consistent with memory, and enforces anorder on the memory operations of the variables explicitly or implicitly specified. Seethe memory model description in Section 1.4 on page 10 for more details.789SyntaxC/C++The syntax of the flush construct is as follows:10#pragma omp flush [(list)] new-line11C/C++1213● F90FortranThe syntax of the flush construct is as follows:14!$omp flush [(list)]15Fortran16171819BindingThe binding thread set for a flush region is the encountering thread. Execution of aflush region only affects the view of memory from the thread which executes theregion.202122232425DescriptionA flush construct with a list applies the flush operation to the items in the list, anddoes not return until the operation is complete for all specified list items. If a pointer ispresent in the list, the pointer itself is flushed, not the object to which the pointer refers.A flush construct without a list, executed on a given thread, operates as if the wholethread-visible data state of the program, as defined by the base language, is flushed.2656 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789The flush operation applied to a set of variables (the flush-set), specified explicitly orimplicitly on the flush construct, enforces an order between the flush construct andreferences to the flush-set variables in the rest of the program. References to all flushsetvariables that occur prior to the flush operation are guaranteed to be complete beforeexecution proceeds beyond the flush region. Likewise, the flush operation isguaranteed to complete before a subsequent reference to a flush-set variable may begin.The flush operation does not imply any ordering between itself and operations onvariables not in the flush-set, nor does it imply any ordering between two or moreflush constructs if the intersection of their flush-sets is empty.1011121314151617181920212223Note – the following examples illustrate the ordering properties of the flush operation.In the following incorrect pseudocode example, the programmer intends to preventsimultaneous execution of the critical section by the two threads, but the program doesnot work properly because it does not enforce the proper ordering of the operations onvariables a and b.Incorrect example:a = b = 0thread 1 thread 2b = 1 a = 1flush(b) flush(a)flush(a) flush(b)if (a == 0) thenif (b == 0) thencritical sectioncritical sectionend ifend if242526272829The problem with this example is that operations on variables a and b are not orderedwith respect to each other. For instance, nothing prevents the compiler from moving theflush of b on thread 1 or the flush of a on thread 2 to a position completely after thecritical section (assuming that the critical section on thread 1 does not reference b and thecritical section on thread 2 does not reference a). If either re-ordering happens, the criticalsection can be active on both threads simultaneously.30Chapter 2 Directives 57


1234567891011The following correct pseudocode example correctly ensures that the critical section isexecuted by not more than one of the two threads at any one time. Notice that executionof the critical section by neither thread is considered correct in this example.Correct example:a = b = 0thread 1 thread 2b = 1 a = 1flush(a,b) flush(a,b)if (a == 0) thenif (b == 0) thencritical sectioncritical sectionend ifend if121314The compiler is prohibited from moving the flush at all for either thread, ensuring that therespective assignment is complete and the data is flushed before the if statement isexecuted.151617181920212223242526272829303132C/C++A reference that reads the value of an object with a volatile-qualified type behaves asif there were a flush construct specifying that object at the previous sequence point. Areference that modifies the value of an object with a volatile-qualified type behaves asif there were a flush construct specifying that object at the next sequence point.Note that because the flush construct does not have a C language statement as part of itssyntax, there are some restrictions on its placement within a program. The flushdirective may only be placed in the program at a position where ignoring or deleting thedirective would result in a program with correct syntax. See Appendix C for the formalgrammar. See Section A.40 on page 194 for an example that illustrates these placementrestrictions.C/C++A flush region without a list is implied at the following locations:• During a barrier region.• At entry to and exit from parallel, critical and ordered regions.• At exit from work-sharing regions, unless a nowait is present.• At entry to and exit from combined parallel work-sharing regions.A flush region with a list is implied at the following locations:3358 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12• At entry to and exit from atomic regions, where the list contains only the objectupdated in the atomic construct.3456Note – A flush region is not implied at the following locations:• At entry to work-sharing regions.• At entry to or exit from a master region.• At entry to or exit from lock routines.72.7.6 ordered Construct891011SummaryThe ordered construct specifies a structured block in a loop region which will beexecuted in the order of the loop iterations. This sequentializes and orders the codewithin an ordered region while allowing code outside the region to run in parallel.121314SyntaxC/C++The syntax of the ordered construct is as follows:1516#pragma omp ordered new-linestructured-block17C/C++1819● F90FortranThe syntax of the ordered construct is as follows:202122!$omp orderedstructured-block!$omp end ordered23Fortran24Chapter 2 Directives 59


1234BindingThe binding thread set for a ordered region is the current team. An ordered regionbinds to the innermost enclosing loop region. ordered regions that bind to differentloop regions execute completely independently of each other.5678910111213DescriptionThe threads in the team executing the loop region execute ordered regionssequentially in the order of the loop iterations. When the thread executing the firstiteration of the loop encounters an ordered construct, it can enter the orderedregion without waiting. When a thread executing any subsequent iteration encounters anordered region, it waits at the beginning of that ordered region until each of theprevious iterations that contains an ordered region has completed the orderedregion. During the execution of each loop iteration, a thread may only execute a singleordered region.14151617181920RestrictionsRestrictions to the ordered construct are as follows:• The loop region to which an ordered region binds must have an ordered clausespecified on the corresponding loop (or parallel loop) construct.• During execution of an iteration of a loop within a loop region, the executing threadmust not execute more than one ordered region which binds to the same loopregion.212223Cross References• loop construct, see Section 2.5.1 on page 31.• parallel loop construct, see Section 2.6.1 on page 45.242526272829302.8 Data EnvironmentThis section presents a directive and several clauses for controlling the data environmentduring the execution of parallel regions.• Section 2.8.1 on page 61 describes how the sharing attributes of variables referencedin parallel regions are determined.• The threadprivate directive, which is provided to create threadprivate memory,is described in Section 2.8.2 on page 63.3160 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456• Clauses that may be specified on directives to control the sharing attributes ofvariables for the duration of parallel or work-sharing regions are described inSection 2.8.3 on page 67.• Clauses that may be specified on directives to copy data values from private orthreadprivate objects on one thread to the corresponding objects on other threads inthe team are described in Section 2.8.4 on page 80.789101112132.8.1 Sharing Attribute RulesThis section describes how the sharing attributes of variables referenced in parallelregions are determined. The following two cases are described separately:• Section 2.8.1.1 on page 61 describes the sharing attribute rules for variablesreferenced in a construct.• Section 2.8.1.2 on page 63 describes the sharing attribute rules for variablesreferenced in a region, but outside any construct.1415161718192021222324252627282930312.8.1.1 Sharing Attribute Rules for Variables Referenced in aConstructThe sharing attributes of variables which are referenced in a construct may be eitherpredetermined, explicitly determined or implicitly determined.Note that specifying a variable on a firstprivate, lastprivate, orreduction clause of an enclosed construct causes an implicit reference to the variablein the enclosing construct. Such implicit references are also subject to the followingrules.The following variables have predetermined sharing attributes:C/C++• Variables appearing in threadprivate directives are threadprivate.• Variables with automatic storage duration which are declared in a scope inside theconstruct are private.• Variables with heap allocated storage are shared.• The loop iteration variable in the for-loop of a for or parallel for construct isprivate in that construct.• Variables with const-qualified type having no mutable member are shared.C/C++32Chapter 2 Directives 61


1234567891011121314151617181920212223242526272829303132333435● F90● F90Fortran• Variables and common blocks appearing in threadprivate directives arethreadprivate.• The loop iteration variable in the do-loop of a do or parallel do construct isprivate in that construct.• Variables used as loop iteration variables in sequential loops in a parallelconstruct are private in the parallel construct.• implied-do and forall indices are private in the enclosing construct.• Cray pointees inherit the sharing attribute of the storage with which their Craypointer is associated.FortranVariables with predetermined sharing attributes may not be listed in data-sharingattribute clauses, with the following exceptions:C/C++• The loop iteration variable in the for-loop of a for or parallel for constructmay be listed in a private or lastprivate clause.C/C++Fortran• The loop iteration variable in the do-loop of a do or parallel do construct may belisted in a private or lastprivate clause.• Variables used as loop iteration variables in sequential loops in a parallelconstruct may be listed in a private, firstprivate, lastprivate, sharedor reduction clause.FortranAdditional restrictions on the variables which may appear in individual clauses aredescribed with each clause in Section 2.8.3 on page 67.Variables referenced in the construct are said to have an explicitly determined sharingattribute if they are listed in a data-sharing attribute clause on the construct.Variables referenced in the construct whose sharing attribute is not predetermined orexplicitly determined will have their sharing attribute implicitly determined. In aparallel construct, the sharing attributes of these variables is determined by thedefault clause, if present (see Section 2.8.3.1 on page 68). If no default clause is present,variables with implicitly determined sharing attributes are shared. For other constructs,variables with implicitly determined sharing attributes inherit their sharing attributesfrom the enclosing context.3662 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910111213141516171819202122232425262728292.8.1.2 Sharing Attribute Rules for Variables Referenced in a Region,but not in a Construct● F90The sharing attributes of variables which are referenced in a region, but not in aconstruct, are determined as follows:C/C++• Static variables declared in called routines in the region are shared.• Variables with const-qualified type having no mutable member declared in calledroutines are shared.• Other variables declared in called routines in the region are private.• File-scope or namespace-scope variables referenced in called routines in the regionare shared unless they appear in a threadprivate directive.• Variables with heap-allocated storage are shared.• Formal arguments of called routines in the region that are passed by reference inheritthe data-sharing attributes of the associated actual argument.C/C++Fortran• Local variables declared in called routines in the region and that have the saveattribute, or that are data initialized, are shared unless they appear in athreadprivate directive.• Other local variables declared in called routines in the region are private.• Variables belonging to common blocks, or declared in modules, and referenced incalled routines in the region are shared unless they appear in a threadprivatedirective.• Dummy arguments of called routines in the region that are passed by reference inheritthe data-sharing attributes of the associated actual argument.• implied-do and forall indices are private.• Cray pointees inherit the sharing attribute of the storage which which their Craypointer is associated.Fortran303132332.8.2 threadprivate DirectiveSummaryThe threadprivate directive specifies that named global-lifetime objects arereplicated, with each thread having its own copy.34Chapter 2 Directives 63


123SyntaxC/C++The syntax of the threadprivate directive is as follows:456789● F90#pragma omp threadprivate(list) new-linewhere list is a comma-separated list of file-scope, namespace-scope, or static blockscopevariables that do not have an incomplete type.C/C++FortranThe syntax of the threadprivate directive is as follows:10111213!$omp threadprivate(list)where list is a comma-separated list of named variables and named common blocks.Common block names must appear between slashes.Fortran141516171819202122232425262728293031DescriptionEach copy of a threadprivate object is initialized once, in the manner specified by theprogram, but at an unspecified point in the program prior to the first reference to thatcopy.A thread may not reference another thread’s copy of a threadprivate object.During the sequential part, and in non-nested inactive parallel regions, referenceswill be to the initial thread’s copy of the object. In parallel regions, references bythe master thread will be to the copy of the object in the thread which encountered theparallel region.The values of data in the initial thread’s copy of a threadprivate object are guaranteed topersist between any two consecutive references to the object in the program.The values of data in the threadprivate objects of threads other than the initial thread areguaranteed to persist between two consecutive active parallel regions only if all thefollowing conditions hold:• Neither parallel region is nested inside another parallel region.• The number of threads used to execute both parallel regions is the same.• The value of the dyn-var internal control variable is false at entry to the firstparallel region and remains false until entry to the second parallel region.3264 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314151617181920212223242526272829303132333435363738● F90• The value of the nthreads-var internal control variable is the same at entry to bothparallel regions and has not been modified between these points.If these conditions all hold, and if a threadprivate object is referenced in both regions,then threads with the same thread number in their respective regions will reference thesame copy of that variable.C/C++If the above conditions hold, the storage duration, lifetime, and value of a thread’s copyof a threadprivate variable that does not appear in any copyin clause on the secondregion will be retained. Otherwise, the storage duration, lifetime, and value of a thread’scopy of the variable in the second region is undefined.If an object is referenced in an explicit initializer of a threadprivate variable, and thevalue of the object is modified prior to the first reference to a copy of the variable, thenthe behavior is unspecified. See Section A.30 on page 172 for an example illustratingthis behavior.C/C++FortranA variable is said to be affected by a copyin clause if the variable appears in thecopyin clause or it is in a common block that appears in the copyin clause.If the above conditions hold, the definition, association, or allocation status of a thread’scopy of a threadprivate variable or a variable in a threadprivate common block, that isnot affected by any copyin clause that appears on the second region, will be retained.Otherwise, the definition and association status of a thread’s copy of the variable in thesecond region is undefined, and the allocation status of an allocatable array will beimplementation defined.If a common block, or a variable that is declared in the scope of a module, appears in athreadprivate directive, it implicitly has the SAVE attribute.If a threadprivate variable or a variable in a threadprivate common block is not affectedby any copyin clause that appears on the first parallel region in which it isreferenced, the variable or any subobject of the variable is initially defined or undefinedaccording to the following rules:• If it has the ALLOCATABLE attribute, each copy created will have an initialallocation status of not currently allocated.• If it has the POINTER attribute:• if it has an initial association status of disassociated, either through explicitinitialization or default initialization, each copy created will have an associationstatus of disassociated;• otherwise, each copy created will have an association status of undefined.• If it does not have either the POINTER or the ALLOCATABLE attribute:39Chapter 2 Directives 65


1234• if it is initially defined, either through explicit initialization or defaultinitialization, each copy created is so defined;• otherwise, each copy created is undefined.Fortran56789101112131415161718192021222324252627282930313233343536● F90RestrictionsThe restrictions to the threadprivate directive are as follows:• A threadprivate object must not appear in any clause except the copyin,copyprivate, schedule, num_threads, and if clauses.C/C++• A threadprivate directive for file-scope variables must appear outside anydefinition or declaration, and must lexically precede all references to any of thevariables in its list.• A threadprivate directive for namespace-scope variables must appear outsideany definition or declaration other than the namespace definition itself, and mustlexically precede all references to any of the variables in its list.• Each variable in the list of a threadprivate directive at file or namespace scopemust refer to a variable declaration at file or namespace scope that lexically precedesthe directive.• A threadprivate directive for static block-scope variables must appear in thescope of the variable and not in a nested scope. The directive must lexically precedeall references to any of the variables in its list.• Each variable in the list of a threadprivate directive in block scope must referto a variable declaration in the same scope that lexically precedes the directive. Thevariable declaration must use the static storage-class specifier.• If a variable is specified in a threadprivate directive in one translation unit, itmust be specified in a threadprivate directive in every translation unit in whichit is declared.• The address of a threadprivate variable is not an address constant.• A threadprivate variable must not have an incomplete type or a reference type.• A threadprivate variable with non-POD class type must have an accessible,unambiguous copy constructor if it is declared with an explicit initializer.C/C++Fortran• The threadprivate directive must appear in the declaration section of a scopingunit in which the common block or variable is declared. Although variables incommon blocks can be accessed by use association or host association, common3766 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789101112131415161718block names cannot. This means that a common block name specified in athreadprivate directive must be declared to be a common block in the samescoping unit in which the threadprivate directive appears.• If a threadprivate directive specifying a common block name appears in oneprogram unit, then such a directive must also appear in every other program unit thatcontains a COMMON statement specifying the same name. It must appear after the lastsuch COMMON statement in the program unit.• A blank common block cannot appear in a threadprivate directive.• A variable can only appear in a threadprivate directive in the scope in which itis declared. It must not be an element of a common block or be declared in anEQUIVALENCE statement.• A variable that appears in a threadprivate directive and is not declared in thescope of a module must have the SAVE attribute.FortranCross References:• Dynamic adjustment of threads, see Section 2.4.1 on page 27.• copyin clause, see Section 2.8.4.1 on page 80.• Internal control variables, see Section 2.3 on page 21.19202122232425262728293031322.8.3 Data-Sharing Attribute ClausesSeveral constructs accept clauses that allow a user to control the sharing attributes ofvariables for the duration of the construct. Data-sharing attribute clauses apply only tovariables whose names are visible in the construct on which the clause appears, exceptthat formal arguments that are passed by reference inherit the data-sharing attributes ofthe associated actual argument.Not all of the clauses listed in this section are valid on all directives. The set of clausesthat is valid on a particular directive is described with the directive.Most of the clauses accept a comma-separated list of list items (see Section 2.1 on page16). All list items appearing in a clause must be visible, according to the scoping rulesof the base language. With the exception of the default clause, clauses may berepeated as needed. A list item that specifies a given variable may not appear in morethan one clause on the same directive, except that a variable may be specified in bothfirstprivate and lastprivate clauses.33Chapter 2 Directives 67


12345678910111213141516171819202122● F90C/C++If a variable referenced in a data-sharing attribute clause has a type derived from atemplate, and there are no other references to that variable in the program, then anybehavior related to that variable is undefined.C/C++FortranA named common block may be specified in a list by enclosing the name in slashes.When a named common block appears in a list, it has the same meaning as if everyexplicit member of the common block appeared in the list. An explicit member of acommon block is a variable that is named in a COMMON statement that specifies thecommon block name and is declared in the same scoping unit in which the clauseappears.Although variables in common blocks can be accessed by use association or hostassociation, common block names cannot. This means that a common block namespecified in a data-sharing attribute clause must be declared to be a common block in thesame scoping unit in which the data-sharing attribute clause appears.When a named common block appears in a private, firstprivate, orlastprivate clause of a directive, none of its members may be declared in anotherdata-sharing attribute clause in that directive. When individual members of a commonblock are privatized, the storage of the specified variables is no longer associated withthe storage of the common block itself (see Section A.23 on page 159 for examples).Fortran2324252627282930312.8.3.1 default clauseSummaryThe default clause allows the user to control the sharing attributes of variables which arereferenced in a parallel construct, and whose sharing attributes are implicitlydetermined (see Section 2.8.1.1 on page 61).SyntaxC/C++The syntax of the default clause is as follows:default(shared | none)32C/C++3368 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12● F90FortranThe syntax of the default clause is as follows:3default(private | shared | none)4567891011121314151617● F90DescriptionFortranThe default(shared) clause causes all variables referenced in the construct whichhave implicitly determined sharing attributes to be shared.FortranThe default(private) clause causes all variables referenced in the constructwhich have implicitly determined sharing attributes to be private.FortranThe default(none) clause requires that each variable which is referenced in theconstruct, and that does not have a predetermined sharing attribute, must have its sharingattribute explicitly determined by being listed in a data-sharing attribute clause.RestrictionsThe restrictions to the default clause are as follows:• Only a single default clause may be specified on a parallel directive.181920212223242.8.3.2 shared clauseSummaryThe shared clause declares one or more list items to be shared among all the threadsin a team.SyntaxThe syntax of the shared clause is as follows:shared(list)25Chapter 2 Directives 69


12345678910111213141516171819202122232425262728293031● F90DescriptionAll threads within a team access the same storage area for each shared object.FortranThe association status of a shared pointer becomes undefined upon entry to and on exitfrom the parallel construct if it is associated with a target or a subobject of a targetthat is in a private, firstprivate, lastprivate, orreduction clauseinside the parallel construct.Under certain conditions, passing a shared variable to a non-intrinsic procedure mayresult in the value of the shared variable being copied into temporary storage before theprocedure reference, and back out of the temporary storage into the actual argumentstorage after the procedure reference. This situation will occur when the following threeconditions hold regarding an actual argument in a reference to a non-intrinsic procedure:a. The actual argument is one of the following:• A shared variable.• A subobject of a shared variable.• An object associated with a shared variable.• An object associated with a subobject of a shared variable.b. The actual argument is also one of the following:• An array section with a vector subscript.• An array section.• An assumed-shape array.• A pointer array.c. The associated dummy argument for this actual argument is an explicit-shapearray or an assumed-size array.This effectively results in references to, and definitions of, the storage during theprocedure reference. Any references to (or definitions of) the shared storage that isassociated with the dummy argument by any other thread must be synchronized with theprocedure reference to avoid possible race conditions.It is implementation defined whether this situation might occur under other conditions.See Section A.25 on page 164 for an example of this behavior.Fortran3270 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234562.8.3.3 private clauseSummaryThe private clause declares one or more list items to be private to a thread.SyntaxThe syntax of the private clause is as follows:private(list)789101112131415161718DescriptionEach thread in the team that references a list item that appears in a private clause inany statement in the construct receives a new list item whose language-specific attributesare derived from the original list item. Inside the construct, all references to the originallist item are replaced by references to the new list item.The value of the original list item is not defined upon entry to the region. The originallist item must not be modified within the region. The value of the original list item is notdefined upon exit from the region.List items that are privatized in a parallel region may be privatized again on anenclosed parallel or work-sharing construct. As a result, list items that appear in aprivate clause on a parallel or work-sharing construct may be either shared orprivate in the enclosing context.19202122232425262728293031● F90C/C++A new list item with automatic storage duration is allocated for the construct. The sizeand alignment of the new list item are determined by the type of the variable. Thisallocation occurs once for each thread in the team that references the list item in anystatement in the construct.For variables of class type, the new object is default initialized. For other variables, theinitial value of the new object is indeterminate. The order in which default constructorsfor different private objects are called is unspecified.C/C++FortranA new list item of the same type is declared once for each thread in the team thatreferences the list item in any statement in the construct. The initial value of the new listitem is undefined.32Chapter 2 Directives 71


12345678910111213141516Within a parallel region, the initial status of a private pointer is undefined.private pointers that become allocated during the execution of a parallel regionshould be explicitly deallocated by the program prior to the end of the parallelregion to avoid memory leaks.A list item that appears in a private clause may be storage-associated with othervariables when the private clause is encountered. Storage association may existbecause of constructs such as EQUIVALENCE, COMMON, etc. If A is a variableappearing in a private clause and B is a variable which is storage-associated with A,then:• The contents, allocation, and association status of B are undefined on entry to theparallel region.• Any definition of A, or of its allocation or association status, causes the contents,allocation, and association status of B to become undefined.• Any definition of B, or of its allocation or association status, causes the contents,allocation, and association status of A to become undefined.Fortran17181920212223242526272829303132333435● F90RestrictionsThe restrictions to the private clause are as follows:• A list item that appears in the reduction clause of a parallel construct mustnot appear in a private clause on a work-sharing construct if the any of the worksharingregions arising from the work-sharing construct ever bind to any of theparallel regions arising from the parallel construct.C/C++• A variable of class type that appears in a private clause must have an accessible,unambiguous default constructor.• A variable that appears in a private clause must not have a const-qualified typeunless it is of class type with a mutable member.• A variable that appears in a private clause must not have an incomplete type or areference type.C/C++Fortran• A variable that appears in a private clause must be definable.• Assumed-size arrays may not appear in a private clause.• An allocatable array that appears in a private clause must have an allocation statusof "not currently allocated" on entry to and on exit from the construct.3672 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123• Variables that appear in namelist statements, in variable format expressions, and inexpressions for statement function definitions, may not appear in a private clause.Fortran45678910112.8.3.4 firstprivate clauseSummaryThe firstprivate clause declares one or more list items to be private to a thread,and initializes each of them with the value that the corresponding original item has whenthe construct is encountered.SyntaxThe syntax of the firstprivate clause is as follows:firstprivate(list)121314151617181920212223242526272829DescriptionThe firstprivate clause provides a superset of the functionality provided by theprivate clause.A list item that appears in a firstprivate clause is subject to the private clausesemantics described in Section 2.8.3.3 on page 71. In addition, the new list item isinitialized from the original list item existing before the construct. The initialization ofthe new list item is done once for each thread in the team that references the list item inany statement in the construct. The initialization is done prior to the thread’s executionof the construct.For a firstprivate clause on a parallel construct, the initial value of the newlist item is the value of the original list item that exists immediately prior to theparallel construct for the thread that encounters the construct. For afirstprivate clause on a work-sharing construct, the initial value of the new listitem for a thread that executes the work-sharing construct is the value of the original listitem that exists immediately prior to the point in time that the thread encounters thework-sharing construct.If a list item appears in both firstprivate and lastprivate clauses, the updaterequired for lastprivate occurs after all the initializations for firstprivate.30Chapter 2 Directives 73


12345678● F90C/C++For variables of class type, the new object is copy constructed from the original object.The order in which copy constructors for different objects are called is unspecified. Forother variables, the initialization occurs as if by assignment.C/C++FortranThe initialization of the new list items occurs as if by assignment.Fortran91011121314151617181920212223242526272829● F90RestrictionsThe restrictions to the firstprivate clause are as follows:• A list item that is private within a parallel region, or that appears in thereduction clause of a parallel construct, must not appear in afirstprivate clause on a work-sharing construct if any of the work-sharingregions arising from the work-sharing construct ever bind to any of the parallelregions arising from the parallel construct.C/C++• A variable of class type that appears in a firstprivate clause must have anaccessible, unambiguous copy constructor.• A variable that appears in a firstprivate clause must not have an incompletetype or a reference type.C/C++Fortran• A variable that appears in a firstprivate clause must be definable.• Fortran pointers, Cray pointers, assumed-size arrays and allocatable arrays may notappear in a firstprivate clause.• Variables that appear in namelist statements, in variable format expressions, and inexpressions for statement function definitions, may not appear in a firstprivateclause.Fortran303132332.8.3.5 lastprivate clauseSummaryThe lastprivate clause declares one or more list items to be private to a thread, andcauses the corresponding original list item to be updated after the end of the region.3474 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12SyntaxThe syntax of the lastprivate clause is as follows:3lastprivate(list)456789101112131415161718192021222324DescriptionThe lastprivate clause provides a superset of the functionality provided by theprivate clause.A list item that appears in a lastprivate clause is subject to the private clausesemantics described in Section 2.8.3.3 on page 71. In addition, when a lastprivateclause appears on the directive that identifies a work-sharing construct, the value of eachnew list item from the sequentially last iteration of the associated loop, or the lexicallylast section construct, is assigned to the original list item. List items that are notassigned a value by the last iteration of the loop, or by the lexically last sectionconstruct, have unspecified values after the construct. Unassigned subobjects also havean unspecified value after the construct.The original list item becomes defined at the end of the construct if there is an implicitbarrier at that point. Any concurrent uses or definitions of the list items must besynchronized with the definition that occurs at the end of the construct to avoid raceconditions.If the lastprivate clause is used on a construct to which nowait is also applied,the original list item remains undefined until a barrier synchronization has beenperformed to ensure that the thread that executed the sequentially last iteration, or thelexically last section construct, has stored that list item.If a list item appears in both firstprivate and lastprivate clauses, the updaterequired for lastprivate occurs after all initializations for firstprivate.25262728293031323334RestrictionsThe restrictions to the lastprivate clause are as follows:• A list item that is private within a parallel region, or that appears in thereduction clause of a parallel construct, must not appear in a lastprivateclause on a work-sharing construct if any of the corresponding work-sharing regionsever binds to any of the corresponding parallel regions.C/C++• A variable of class type that appears in a lastprivate clause must have anaccessible, unambiguous default constructor, unless the variable is also specified in afirstprivate clause.35Chapter 2 Directives 75


123456789101112131415● F90• A variable that appears in a lastprivate clause must not have a const-qualifiedtype unless it is of class type with a mutable member.• A variable that appears in a lastprivate clause must not have an incomplete typeor a reference type.• A variable of class type that appears in a lastprivate clause must have anaccessible, unambiguous copy assignment operator.C/C++Fortran• A variable that appears in a lastprivate clause must be definable.• Fortran pointers, Cray pointers, assumed-size arrays and allocatable arrays may notappear in a lastprivate clause.• Variables that appear in namelist statements, in variable format expressions, and inexpressions for statement function definitions, may not appear in a lastprivateclause.Fortran1617181920212223242526272.8.3.6 reduction clauseSummaryThe reduction clause specifies an operator and one or more list items. For each listitem, a private copy is created on each thread, and is initialized appropriately for theoperator. After the end of the region, the original list item is updated with the values ofthe private copies using the specified operator.SyntaxC/C++The syntax of the reduction clause is as follows:reduction(operator:list)The following table lists the operators that are valid and their initialization values. Theactual initialization value will be consistent with the data type of the reduction variable.282930Operator Initialization value+ 0* 13176 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456- 0& ~0| 0^ 0&& 1|| 0789● F90C/C++FortranThe syntax of the reduction clause is as follows:10reduction({operator|intrinsic_procedure_name}:list)111213The following table lists the operators and intrinsic_procedure_names that are valid andtheir initialization values. The actual initialization value will be consistent with the datatype of the reduction variable.14Operator/IntrinsicInitialization value151617+ 0* 1- 018192021222324.and..or..eqv..neqv.maxminiand.true..false..true..false.Smallest representable numberLargest representable numberAll bits on2526ior 0ieor 027Fortran28Chapter 2 Directives 77


12345678910111213141516171819202122DescriptionThe reduction clause can be used to perform some forms of recurrence calculations(involving mathematically associative and commutative operators) in parallel.A private copy of each list item is created, one for each thread, as if the privateclause had been used. The private copy is then initialized to the initialization value forthe operator, as specified above. At the end of the region for which the reductionclause was specified, the original list item is updated by combining its original valuewith the final value of each of the private copies, using the operator specified. (Thepartial results of a subtraction reduction are added to form the final value.)The value of the original list item becomes unspecified when the first thread reaches theconstruct that specifies the clause and remains so until the reduction computation iscomplete. Normally, the computation will be complete at the end of the construct;however, if the reduction clause is used on a construct to which nowait is alsoapplied, the value of the original list item remains unspecified until a barriersynchronization has been performed to ensure that all threads have completed thereduction. Any concurrent uses or definitions of the list items must be synchronized withthe definition that occurs at the end of the construct, or at the subsequent barrier, toavoid race conditions.The order in which the values are combined is unspecified. Therefore, comparingsequential and parallel runs, or comparing one parallel run to another (even if thenumber of threads used is the same), there is no guarantee that bit-identical results willbe obtained or that side effects (such as floating point exceptions) will be identical.23242526Note – List items specified in a reduction clause are typically used in the enclosedregion in certain forms.C/C++A reduction is typically specified for statements of the form:27282930313233x = x op exprx binop= exprx = expr op x (except for subtraction)x++++xx----x3478 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456● F90where expr has scalar type and does not reference x, op is not an overloaded operator,but one of +, *, -, &, ^, |, &&, or||, and binop is not an overloaded operator, but oneof +, *, -, &, ^, or|.C/C++FortranA reduction using an operator is typically specified for statements of the form:78x = x op exprx = expr op x(except for subtraction)910111213141516where op is +, *, -, .and., .or., .eqv., or.neqv., the expression does not involvex, and the reduction op is the last operation performed on the right hand side.A reduction using an intrinsic is typically specified for statements of the form:x = intr(x,expr_list)x = intr(expr_list, x)where intr is max, min, iand, ior, orieor and expr_list is a comma separated list ofexpressions not involving x.Fortran17181920212223242526272829303132RestrictionsThe restrictions to the reduction clause are as follows:• A list item that appears in a reduction clause of a work-sharing construct must beshared in the parallel regions to which any of the work-sharing regions arisingfrom the work-sharing construct bind.• A list item that appears in a reduction clause of a parallel construct must notbe privatized on any enclosed work-sharing construct if any of the work-sharingregions arising from the work-sharing construct bind to any of the parallelregions arisng from the parallel construct.• Any number of reduction clauses can be specified on the directive, but a list itemcan appear only once in the reduction clause(s) for that directive.C/C++• The type of a list item that appears in a reduction clause must be valid for thereduction operator.• Aggregate types (including arrays), pointer types and reference types may not appearin a reduction clause.33Chapter 2 Directives 79


1234567891011121314151617● F90• A variable that appears in a reduction clause must not be const-qualified.• The operator specified in a reduction clause can not be overloaded with respect tothe variables that appear in that clause.C/C++Fortran• The type of a list item that appears in a reduction clause must be valid for thereduction operator or intrinsic.• A variable that appears in a reduction clause must be definable.• A list item that appears in a reduction clause must be a named variable ofintrinsic type.• Fortran pointers, Cray pointers, deferred shape arrays and assumed-size arrays maynot appear in a reduction clause.• Operators specified must be intrinsic operators and any intrinsic_procedure_namemust refer to one of the allowed intrinsic procedures. Assignment to the reductionvariables must be via intrinsic assignment. See Section A.7 on page 121 forexamples.Fortran181920212223242526272.8.4 Data Copying ClausesThis section describes the copyin clause (valid on the parallel directive andcombined parallel work-sharing directives) and the copyprivate clause (valid on thesingle directive).These clauses support the copying of data values from private or threadprivate objectson one thread, to the corresponding objects on other threads in the team.The clauses accept a comma-separated list of list items (see Section 2.1 on page 16). Alllist items appearing in a clause must be visible, according to the scoping rules of thebase language. Clauses may be repeated as needed, but a list item that specifies a givenvariable may not appear in more than one clause on the same directive.28293031322.8.4.1 copyin clauseSummaryThe copyin clause provides a mechanism to copy the value of the master thread’sthreadprivate variable to the threadprivate variable of each other member of the teamexecuting the parallel region.3380 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12SyntaxThe syntax of the copyin clause is as follows:3copyin(list)45678910111213141516171819202122232425262728293031● F90● F90DescriptionThe copy is done, as if by assignment, after the team is formed and prior to the start ofexecution of the parallel region.FortranOn entry to any parallel region, each thread’s copy of a variable that is affected bya copyin clause for the parallel region will acquire the allocation, association, ordefinition status of the master thread’s copy, according to the following rules:• If it has the POINTER attribute:• if the master thread’s copy is associated with a target that each copy can becomeassociated with, each copy will become associated with the same target;• if the master thread’s copy is disassociated, each copy will become disassociated;• otherwise, each copy will have an undefined association status.• If it does not have the POINTER attribute, each copy becomes defined with the valueof the master thread’s copy as if by intrinsic assignment.FortranRestrictionsThe restrictions on the copyin clause are as follows:C/C++• A list item that appears in a copyin clause must be threadprivate.• A variable that appears in a copyin clause must have an accessible, unambiguouscopy assignment operator.C/C++Fortran• A list item that appears in a copyin clause must be threadprivate. Named variablesappearing in a threadprivate common block may be specified: it is not necessary tospecify the whole common block.• A common block name that appears in a copyin clause must be declared to be acommon block in the same scoping unit in which the copyin clause appears.32Chapter 2 Directives 81


1234• A variable with the ALLOCATABLE attribute may not appear in a copyin clause,although a structure that has an ultimate component with the ALLOCATABLEattribute may appear in a copyin clause.Fortran5678910112.8.4.2 copyprivate clauseSummaryThe copyprivate clause provides a mechanism to use a private variable to broadcasta value from one member of a team to the other members of the team.SyntaxThe syntax of the copyprivate clause is as follows:copyprivate(list)12131415DescriptionThe effect of the copyprivate clause on the specified list items occurs after theexecution of the structured block associated with the single construct, and before anyof the threads in the team have left the barrier at the end of the construct.161718192021222324252627● F90C/C++In all other threads in the team, each specified list item becomes defined with the valueof the corresponding list item (as if by assignment) in the thread that executed thestructured block.C/C++FortranIf a list item is not a pointer, then in all other threads in the team, the list item becomesdefined (as if by assignment) with the value of the corresponding list item in the threadthat executed the structured block. If the list item is a pointer, then in all other threads inthe team, the list item becomes pointer associated (as if by pointer assignment) with thecorresponding list item in the thread that executed the structured block.Fortran282930Note – The copyprivate clause is an alternative to using a shared variable for thevalue when providing such a shared variable would be difficult (for example, in arecursion requiring a different variable at each level).3182 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789101112131415● F90RestrictionsThe restrictions to the copyprivate clause are as follows:• A list item that appears in a copyprivate clause may not appear in a private orfirstprivate clause on the single construct.• If the single directive is encountered in a parallel region but not in theparallel construct, all list items that appear in the copyprivate clause must beprivate in the enclosing context.C/C++• A variable that appears in the copyprivate clause must have an accessibleunambiguous copy assignment operator.C/C++Fortran• A common block that appears in a copyprivate clause must be threadprivate.• Assumed-size arrays may not appear in a copyprivate clause.Fortran161718192021222324252627282930312.9 Nesting of RegionsThis section describes a set of restrictions on the nesting of regions. A region is said tobe closely nested inside another region if no parallel region is nested between thetwo regions.The restrictions on nesting are as follows:• A work-sharing region may not be closely nested inside a work-sharing, critical,ordered, ormaster region.• A barrier region may not be closely nested inside a work-sharing, critical,ordered, ormaster region.• A master region may not be closely nested inside a work-sharing region.• An ordered region may not be closely nested inside a critical region.• An ordered region must be closely nested inside a loop region (or parallel loopregion) with an ordered clause.• A critical region may not be nested (closely or otherwise) inside a criticalregion with the same name. Note that this restriction is not sufficient to preventdeadlock.32Chapter 2 Directives 83


184 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1CHAPTER 32Runtime Library Routines3456789This chapter describes the <strong>OpenMP</strong> API runtime library routines and is divided into thefollowing sections:• Runtime library definitions (Section 3.1 on page 86).• Execution environment routines that can be used to control and query the parallelexecution environment (Section 3.2 on page 87).• Lock routines that can be used to synchronize access to data (Section 3.3 on page 98).• Portable timer routines (Section 3.4 on page 104).1011121314151617● F90Throughout this chapter, true and false are used as generic terms to simplify thedescription of the routines.C/C++true means a nonzero integer value and false means an integer value of zero.C/C++Fortrantrue means a logical value of .TRUE. and false means a logical value of .FALSE..Fortran181920212223● F90FortranRestrictionsThe following restriction applies to all <strong>OpenMP</strong> runtime library routines:• <strong>OpenMP</strong> runtime library routines may not be called from PURE or ELEMENTALprocedures.Fortran2485


1234567891011121314153.1 Runtime Library DefinitionsFor each base language, a compliant implementation must supply a set of definitions forthe <strong>OpenMP</strong> API runtime library routines and the special data types of their parameters.The set of definitions must contain a declaration for each <strong>OpenMP</strong> API runtime libraryroutine and a declaration for the simple lock and nestable lock data types. In addition,each set of definitions may specify other implementation defined values.C/C++The library routines are external functions with “C” linkage.Prototypes for the C/C++ runtime library routines described in this chapter shall beprovided in a header file named omp.h. This file defines the following:• The prototypes of all the routines in the chapter.• The type omp_lock_t.• The type omp_nest_lock_t.See Section D.1 on page 217 for an example of this file.C/C++1617181920212223242526272829303132● F90FortranThe <strong>OpenMP</strong> Fortran API runtime library routines are external procedures. The returnvalues of these routines are also of default kind, unless otherwise specified.<strong>Interface</strong> declarations for the <strong>OpenMP</strong> Fortran runtime library routines described in thischapter shall be provided in the form of a Fortran include file named omp_lib.h ora Fortran 90 module named omp_lib. It is implementation defined whether theinclude file or the module file (or both) is provided.These files define the following:• The interfaces of all of the routines in this chapter.• The integer parameter omp_lock_kind.• The integer parameter omp_nest_lock_kind.• The integer parameter openmp_version with a value yyyymm where yyyyand mm are the year and month designations of the version of the <strong>OpenMP</strong> FortranAPI that the implementation supports. This value matches that of the C preprocessormacro _OPENMP, when a macro preprocessor is supported (see Section 2.2 on page19).See Appendix D.1 and Appendix D.3 for examples of these files.3386 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234It is implementation defined whether any of the <strong>OpenMP</strong> runtime library routines thattake an argument are extended with a generic interface so arguments of different KINDtype can be accomodated. See Appendix D.4 for an example of such an extension.Fortran56789101112131415161718193.2 Execution Environment RoutinesThe routines described in this section affect and monitor threads, processors, and theparallel environment.• the omp_set_num_threads routine.• the omp_get_num_threads routine.• the omp_get_max_threads routine.• the omp_get_thread_num routine.• the omp_get_num_procs routine.• the omp_in_parallel routine.• the omp_set_dynamic routine.• the omp_get_dynamic routine.• the omp_set_nested routine.• the omp_get_nested routine.A conforming program may call these routines at any point, except where rules aboutthe calling context are explicitly stated.20212223243.2.1 omp_set_num_threadsSummaryThe omp_set_num_threads routine affects the number of threads to be used forsubsequent parallel regions that do not specify a num_threads clause, by settingthe value of the nthreads-var internal control variable.25Chapter 3 Runtime Library Routines 87


12FormatC/C++3456789101112131415161718192021222324252627● F90void omp_set_num_threads(int num_threads);Constraints on ArgumentsC/C++Fortransubroutine omp_set_num_threads(num_threads)integer num_threadsFortranThe value of the argument passed to this routine must evaluate to a positive integer.BindingThe binding thread set for an omp_set_num_threads region is the encounteringthread.EffectThe effect of this routine is to set the value of the nthreads-var internal control to thevalue specified in the argument.See Section 2.4.1 on page 27 for the rules governing the number of threads used toexecute a parallel region.If the number of threads requested exceeds the number the implementation can support,or is not a positive integer, the behavior of this routine is implementation defined.Calling Context RulesThis routine has the described effect only when called from the sequential part of theprogram. If it is called from a portion of the program where the omp_in_parallelroutine returns true, the behavior of this routine is implementation defined.Cross References• Internal control variables, see Section 2.3 on page 21.• omp_in_parallel routine, see Section 3.2.6 on page 92.2888 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12343.2.2 omp_get_num_threadsSummaryThe omp_get_num_threads routine returns the number of threads in the currentteam.56FormatC/C++7891011121314151617181920212223● F90int omp_get_num_threads(void);BindingC/C++Fortraninteger function omp_get_num_threads()FortranThe binding thread set for an omp_get_num_threads region is the current team.Thebinding region for an omp_get_num_threads region is the innermost enclosingparallel region.EffectThe omp_get_num_threads routine returns the number of threads in the teamexecuting the parallel region to which the routine region binds. If called from thesequential part of a program, this routine returns 1.See Section 2.4.1 on page 27 for the rules governing the number of threads used toexecute a parallel region.Cross References• parallel construct, see Section 2.4 on page 24.24Chapter 3 Runtime Library Routines 89


123453.2.3 omp_get_max_threadsSummaryThe value returned by omp_get_max_threads routine is no less than the number ofthreads that would be used to form a team if an active parallel region without anum_threads clause were to be encountered at that point in the program.67FormatC/C++8910111213141516171819202122● F90int omp_get_max_threads(void);BindingC/C++Fortraninteger function omp_get_max_threads()FortranThe binding thread set for an omp_get_max_threads region is the encounteringthread.EffectThe following expresses a lower bound on the value of omp_get_max_threads: thenumber of threads that would be used to form a team if an active parallel regionwithout a num_threads clause were to be encountered at that point in the program isless than or equal to the value returned by omp_get_max_threads.See Section 2.4.1 on page 27 for the rules governing the number of threads used toexecute a parallel region.232425Note – The return value of omp_get_max_threads routine can be used todynamically allocate sufficient storage for all threads in the team formed at thesubsequent active parallel region.262728Cross References• Internal control variables, see Section 2.3 on page 21.• parallel construct, see Section 2.4 on page 24.2990 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1• num_threads clause, see Section 2.4 on page 24.234563.2.4 omp_get_thread_numSummaryThe omp_get_thread_num routine returns the thread number, within the team, ofthe thread executing the parallel region from which omp_get_thread_num iscalled.78FormatC/C++910111213141516171819202122232425● F90int omp_get_thread_num(void);BindingC/C++Fortraninteger function omp_get_thread_num()FortranThe binding thread set for an omp_get_thread_num region is the current team.Thebinding region for an omp_get_thread_num region is the innermost enclosingparallel region.EffectThe omp_get_thread_num routine returns the thread number of the current thread,within the team executing the parallel region to which the routine region binds. Thethread number is an integer between 0 and one less than the value returned byomp_get_num_threads, inclusive. The thread number of the master thread of theteam is 0. The routine returns 0 if it is called from the sequential part of a program.Cross References• omp_get_num_threads routine, see Section 3.2.2 on page 89.26Chapter 3 Runtime Library Routines 91


12343.2.5 omp_get_num_procsSummaryThe omp_get_num_procs routine returns the number of processors available to theprogram.56FormatC/C++78910111213141516● F90int omp_get_num_procs(void);BindingC/C++Fortraninteger function omp_get_num_procs()FortranThe binding thread set for an omp_get_num_procs region is all threadsEffectThe omp_get_num_procs routine returns the number of processors that are availableto the program at the time the routine is called.171819203.2.6 omp_in_parallelSummaryThe omp_in_parallel routine returns true if the call to the routine is enclosed by anactive parallel region; otherwise, it returns false.2192 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12FormatC/C++3456789● F90int omp_in_parallel(void);C/C++Fortranlogical function omp_in_parallel()FortranBindingThe binding thread set for an omp_in_parallel region is all threads.1011121314151617Effectomp_in_parallel returns the logical OR of the if clauses of all enclosingparallel regions. If a parallel region does not have an if clause, this isequivalent to if(true).If the routine is called from the sequential part of the program, thenomp_in_parallel returns false.Cross References• if clause, see Section 2.4.1 on page 27.18192021223.2.7 omp_set_dynamicSummaryThe omp_set_dynamic routine enables or disables dynamic adjustment of thenumber of threads available for the execution of parallel regions by setting the valueof the dyn-var internal control variable.23Chapter 3 Runtime Library Routines 93


12FormatC/C++345678910111213141516● F90void omp_set_dynamic(int dynamic_threads);BindingC/C++Fortransubroutine omp_set_dynamic (dynamic_threads)logical dynamic_threadsFortranThe binding thread set for an omp_set_dynamic region is the encountering thread.EffectFor implementations that provide the ability to dynamically adjust the number ofthreads, if the argument to omp_set_dynamic evaluates to true, dynamic adjustmentof the number of threads is enabled; otherwise, dynamic adjustment is disabled.For implementations that do not provide the ability to dynamically adjust the number ofthreads, this routine has no effect: the value of dyn-var remains false..1718192021222324252627See Section 2.4.1 on page 27 for the rules governing the number of threads used toexecute a parallel region.Calling Context RulesThe omp_set_dynamic routine behaves as described above when it is called from aportion of the program where omp_in_parallel returns false. If it is called from aportion of the program where omp_in_parallel returns true, the behavior ofomp_set_dynamic is implementation defined.Cross References:• Internal control variables, see Section 2.3 on page 21.• omp_get_num_threads routine, see Section 3.2.2 on page 89.• omp_in_parallel routine, see Section 3.2.6 on page 92.2894 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123453.2.8 omp_get_dynamicSummaryThe omp_get_dynamic routine returns the value of the dyn-var internal controlvariable, which determines whether dynamic adjustment of the number of threads isenabled or disabled.67FormatC/C++891011121314151617181920212223● F90int omp_get_dynamic(void);BindingC/C++Fortranlogical function omp_get_dynamic()FortranThe binding thread set for an omp_get_dynamic region is the encountering thread.EffectThis routine returns true if dynamic adjustment of the number of threads is enabled; itreturns false, otherwise.If the implementation does not provide the ability to dynamically adjust the number ofthreads, then this routine always returns false.• See Section 2.4.1 on page 27 for the rules governing the number of threads used toexecute a parallel region.Cross References• Internal control variables, see Section 2.3 on page 21.24Chapter 3 Runtime Library Routines 95


12343.2.9 omp_set_nestedSummaryThe omp_set_nested routine enables or disables nested parallelism, by setting thenest-var internal control variable.56FormatC/C++789101112131415161718192021222324252627● F90void omp_set_nested(int nested);BindingC/C++Fortransubroutine omp_set_nested (nested)logical nestedFortranThe binding thread set for an omp_set_nested region is the encountering thread.EffectFor implementations that support nested parallelism, if the argument toomp_set_nested evaluates to true, nested parallelism is enabled; otherwise, nestedparallelism is disabled.For implementations that do not support nested parallelism, this routine has no effect:the value of nest-var remains false.See Section 2.4.1 on page 27 for the rules governing the number of threads used toexecute a parallel region.Calling Context RulesThe omp_set_nested routine behaves as described above when it is called from aportion of the program where omp_in_parallel returns false. If it is called from aportion of the program where omp_in_parallel returns true, the behavior ofomp_set_nested is implementation defined.2896 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123Cross References• Internal control variables, see Section 2.3 on page 21.• omp_in_parallel routine, see Section 3.2.6 on page 92.45673.2.10 omp_get_nestedSummaryThe omp_get_nested routine returns the value of the nest-var internal controlvariable, which determines if nested parallelism is enabled or disabled.8910Formatint omp_get_nested(void);C/C++1112131415161718192021222324● F90BindingC/C++Fortranlogical function omp_get_nested()FortranThe binding thread set for an omp_get_nested region is the encountering thread.EffectThis routine returns true if nested parallelism is enabled; it returns false, otherwise.If an implementation does not support nested parallelism, this routine always returnsfalse.See Section 2.4.1 on page 27 for the rules governing the number of threads used toexecute a parallel region.Cross References• Internal control variables, see Section 2.3 on page 21.25Chapter 3 Runtime Library Routines 97


1234567891011121314151617181920212223243.3 Lock RoutinesThe <strong>OpenMP</strong> runtime library includes a set of general-purpose locking routines that canbe used for synchronization. These general-purpose locking routines operate on<strong>OpenMP</strong> locks that are represented by <strong>OpenMP</strong> lock variables. An <strong>OpenMP</strong> lockvariable must be accessed only through the routines described in this section.An <strong>OpenMP</strong> lock may be in one of the following states: uninitialized, unlocked, orlocked. When a thread succeeds in locking an <strong>OpenMP</strong> lock, it is said that the thread“owns” the lock.Two types of locks are supported: simple locks and nestable locks. Nestable locks maybe locked multiple times by the same thread before being unlocked; simple locks maynot be locked if they are already in a locked state. Simple lock variables are associatedwith simple locks and may only be passed to simple lock routines. Nestable lockvariables are associated with nestable locks and may only be passed to nestable lockroutines.There are constraints on the state and ownership of the lock accessed by each of the lockroutines: these are described with the routine. If these constraints are not met, thebehavior of the routine is undefined.The <strong>OpenMP</strong> lock routines access a lock variable in such a way that they always readand update the most current value of the lock variable. Therefore, it is not necessary foran <strong>OpenMP</strong> program to include explicit flush directives to ensure that the lockvariable’s value is consistent among different threads. (There may be a need for flushdirectives to ensure that the values of other variables are consistent.)See Section A.17 on page 143 and Section A.18 on page 145, for examples of using thesimple and the nestable lock routines, respectively.252627282930Simple Lock RoutinesC/C++The type omp_lock_t is an object type capable of representing a simple lock. For thefollowing routines, a lock variable must be of omp_lock_t type. All lock routinesrequire an argument that is a pointer to a variable of type omp_lock_t.C/C++3198 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910● F90FortranFor the following routines, svar must be an integer variable ofkind=omp_lock_kind.FortranThe simple lock routines are as follows:• The omp_init_lock routine initializes a simple lock.• The omp_destroy_lock routine removes a simple lock.• The omp_set_lock routine waits until a simple lock is available, and then sets it.• The omp_unset_lock routine releases a simple lock.• The omp_test_lock routine tests a simple lock, and sets it if it is available.11121314151617181920212223242526272829● F90Nestable Lock Routines:C/C++The type omp_nest_lock_t is an object type capable of representing a nestable lock.For the following routines, a lock variable must be of omp_nest_lock_t type. Allnestable lock routines require an argument that is a pointer to a variable of typeomp_nest_lock_t.C/C++FortranFor the following routines, nvar must be an integer variable ofkind=omp_nest_lock_kind.FortranThe nestable lock routines are as follows:• The omp_init_nest_lock routine initializes a nestable lock.• The omp_destroy_nest_lock routine removes a nestable lock.• The omp_set_nest_lock routine waits until a nestable lock is available, and thensets it.• The omp_unset_nest_lock routine releases a nestable lock.• The omp_test_nest_lock routine tests a nestable lock, and sets it if it isavailable.30Chapter 3 Runtime Library Routines 99


12BindingThe binding thread set for all the lock routines is all threads.3453.3.1 omp_init_lock and omp_init_nest_lockSummaryThese routines provide the only means of initializing an <strong>OpenMP</strong> lock.67FormatC/C++89101112131415161718192021● F90void omp_init_lock(omp_lock_t *lock);void omp_init_nest_lock(omp_nest_lock_t *lock);Constraints on ArgumentsC/C++Fortransubroutine omp_init_lock(svar)integer (kind=omp_lock_kind) svarsubroutine omp_init_nest_lock(nvar)integer (kind=omp_nest_lock_kind) nvarFortranA lock variable accessed by either routine must be in the uninitialized state.EffectThe effect of these routines is to set the lock variable to the unlocked state (that is, nothread owns the lock). In addition, the nesting count for a nestable lock is set to zero.22100 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12343.3.2 omp_destroy_lock andomp_destroy_nest_lockSummaryThese routines ensure that the <strong>OpenMP</strong> lock variable is uninitialized56FormatC/C++78910111213141516171819● F90void omp_destroy_lock(omp_lock_t *lock);void omp_destroy_nest_lock(omp_nest_lock_t *lock);Constraints on ArgumentsC/C++Fortransubroutine omp_destroy_lock(svar)integer (kind=omp_lock_kind) svarsubroutine omp_destroy_nest_lock(nvar)integer (kind=omp_nest_lock_kind) nvarFortranA lock variable accessed by either routine must be in the unlocked state.EffectThe effect of these routines is to set the state of the lock variable to uninitialized.202122233.3.3 omp_set_lock and omp_set_nest_lockSummaryThese routines provide a means of establishing ownership of an <strong>OpenMP</strong> lock. Thecalling thread blocks until ownership is established.24Chapter 3 Runtime Library Routines 101


12FormatC/C++34567891011121314151617181920212223● F90void omp_set_lock(omp_lock_t *lock);void omp_set_nest_lock(omp_nest_lock_t *lock);Constraints on ArgumentsC/C++Fortransubroutine omp_set_lock(svar)integer (kind=omp_lock_kind) svarsubroutine omp_set_nest_lock(nvar)integer (kind=omp_nest_lock_kind) nvarFortranA lock variable accessed by either routine must not be in the uninitialized state. Asimple lock variable accessed by omp_set_lock which is in the locked state must notbe owned by the thread executing the routine.EffectEach of these routines blocks the thread executing the routine until the specified lock isavailable and then sets the lock.A simple lock is available if it is unlocked. Ownership of the lock is granted to thethread executing the routine.A nestable lock is available if it is unlocked or if it is already owned by the threadexecuting the routine. The thread executing the routine is granted, or retains, ownershipof the lock, and the nesting count for the lock is incremented.2425263.3.4 omp_unset_lock and omp_unset_nest_lockSummaryThese routines provide the means of releasing ownership of an <strong>OpenMP</strong> lock.27102 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12FormatC/C++34567891011121314151617181920● F90void omp_unset_lock(omp_lock_t *lock);void omp_unset_nest_lock(omp_nest_lock_t *lock);Constraints on ArgumentsC/C++Fortransubroutine omp_unset_lock(svar)integer (kind=omp_lock_kind) svarsubroutine omp_unset_nest_lock(nvar)integer (kind=omp_nest_lock_kind) nvarFortranA lock variable accessed by either routine must be in the locked state, and be owned bythe thread executing the routine.EffectFor a simple lock, the omp_unset_lock routine causes the lock to become unlocked.For a nestable lock, the omp_unset_nest_lock routine decrements the nestingcount and causes the lock to become unlocked if the resulting nesting count is zero.For either routine, if the lock becomes unlocked, and if one or more threads are waitingfor this lock, the effect is that one thread is chosen and given ownership of the lock.212223243.3.5 omp_test_lock and omp_test_nest_lockSummaryThese routines attempt to acquire ownership of an <strong>OpenMP</strong> lock but do not blockexecution of the thread executing the rouitine.25Chapter 3 Runtime Library Routines 103


12FormatC/C++345678910111213141516171819202122● F90int omp_test_lock(omp_lock_t *lock);int omp_test_nest_lock(omp_nest_lock_t *lock);Constraints on ArgumentsC/C++Fortranlogical function omp_test_lock(svar)integer (kind=omp_lock_kind) svarinteger function omp_test_nest_lock(nvar)integer (kind=omp_nest_lock_kind) nvarFortranA lock variable accessed by either routine must not be in the uninitialized state. Asimple lock variable accessed by omp_test_lock which is in the locked state mustnot be owned by the thread executing the routine.EffectThese routines attempt to set a lock in the same manner as omp_set_lock andomp_set_nest_lock, except that they do not block execution of the thread.For a simple lock, the omp_test_lock routine returns true if the lock is successfullyset; otherwise, it returns false.For a nestable lock, the omp_test_nest_lock routine returns the new nesting countif the lock is successfully set; otherwise, it returns zero.232425263.4 Timing RoutinesThe routines described in this section support a portable wall-clock timer.• the omp_get_wtime routine.• the omp_get_wtick routine.27104 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1233.4.1 omp_get_wtimeSummaryThe omp_get_wtime routine returns elapsed wall clock time in seconds.45FormatC/C++6789101112131415161718● F90double omp_get_wtime(void);BindingC/C++Fortrandouble precision function omp_get_wtime()FortranThe binding thread set for an omp_get_wtime region is the encountering thread.EffectThe omp_get_wtime routine returns a value equal to the elapsed wall clock time inseconds since some “time in the past”. The actual “time in the past” is arbitrary, but it isguaranteed not to change during the execution of the application program. The timesreturned are “per-thread times”, so they are not required to be globally consistent acrossall the threads participating in an application.19202122232425262728Note – It is anticipated that the routine will be used to measure elapsed times as shownin the following example:C/C++double start;double end;start = omp_get_wtime();... work to be timed ...end = omp_get_wtime();printf(“Work took %f seconds\n”, end-start);C/C++29Chapter 3 Runtime Library Routines 105


1234567● F90FortranDOUBLE PRECISION START, ENDSTART = omp_get_wtime()... work to be timed ...END = omp_get_wtime()PRINT *, “Work took”, END-START, “seconds”Fortran8910113.4.2 omp_get_wtickSummaryThe omp_get_wtick routine returns the precision of the timer used byomp_get_wtime.1213FormatC/C++14151617181920212223● F90double omp_get_wtick(void);BindingC/C++Fortrandouble precision function omp_get_wtick()FortranThe binding thread set for an omp_get_wtick region is all threads.EffectThe omp_get_wtick routine returns a value equal to the number of seconds betweensuccessive clock ticks of the timer used by omp_get_wtime.24106 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1CHAPTER 42Environment Variables34567891011121314151617181920212223This chapter describes the <strong>OpenMP</strong> environment variables that specify the settings ofthe internal control variables that affect the execution of <strong>OpenMP</strong> programs. The namesof environment variables must be uppercase. The values assigned to the environmentvariables are case insensitive and may have leading and trailing white space.Modifications to the environment variables after the program has started, even ifmodified by the program itself, are ignored. However, the settings of the internal controlvariables can be modified during the execution of the <strong>OpenMP</strong> program by the use ofthe appropriate directive clauses or <strong>OpenMP</strong> API routines.The environment variables are as follows:• OMP_SCHEDULE sets the run-sched-var internal control variable for the run-timeschedule type and chunk size.• OMP_NUM_THREADS sets the nthreads-var internal control variable for the numberof threads to use for parallel regions.• OMP_DYNAMIC sets the dyn-var internal control variable for the dynamic adjustmentof threads to use in a parallel region.• OMP_NESTED sets the nest-var internal control variable to enable or disable nestedparallelism.The examples in this chapter only demonstrate how these variables might be set in UnixC shell (csh) environments. In Korn shell (ksh) and DOS environments the actions aresimilar, as follows:• csh:2425setenv OMP_SCHEDULE “dynamic”• ksh:26export OMP_SCHEDULE=”dynamic”27107


1• DOS:2set OMP_SCHEDULE=”dynamic”3456789101112131415164.1 OMP_SCHEDULEThe OMP_SCHEDULE environment variable controls the schedule type and chunk sizeof all loop directives that have the schedule type runtime, by setting the value of therun-sched-var internal control variable.The value of this environment variable takes the form:type[,chunk]where• type is one of static, dynamic or guided• chunk is an optional positive integer which specifies the chunk sizeIf chunk is present, there may be white space on either side of the “,”. See Section 2.5.1on page 31 for a detailed description of the schedule types.If OMP_SCHEDULE is not set, the initial value of the is run-sched-var internal controlvariable is implementation defined.Example:1718setenv OMP_SCHEDULE "guided,4"setenv OMP_SCHEDULE "dynamic"19202122Cross References:• Loop construct, see Section 2.5.1 on page 31.• Combined parallel work-sharing constructs, see Section 2.6 on page 44.• Internal control variables, see Section 2.3 on page 21.23108 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314151617184.2 OMP_NUM_THREADSThe OMP_NUM_THREADS environment variable sets the number of threads to use forparallel regions by setting the initial value of the nthreads-var internal controlvariable. See Section 2.3 for a comprehensive set of rules about the interaction betweenthe OMP_NUM_THREADS environment variable, the num_threads clause, theomp_set_num_threads library routine and dynamic adjustment of threads.The value of this environment variable must be a positive integer. The behavior of theprogram is implementation defined if the requested value of OMP_NUM_THREADS isgreater than the number of threads an implementation can support, or if the value is nota positive integer.If the OMP_NUM_THREADS environment variable is not set, the initial value of thenthreads-var internal control variable is implementation defined.The nthreads-var internal control variable can be modified by using theomp_set_num_threads library routine. The number of threads in the current teamcan be queried using the omp_get_num_threads library routine. The maximumnumber of threads in future teams can be queried by using theomp_get_max_threads library routine.Example:19setenv OMP_NUM_THREADS 1620212223242526Cross References:• Internal control variables, see Section 2.3 on page 21.• num_threads clause, Section 2.4 on page 24.• omp_set_num_threads routine, see Section 3.2.1 on page 87.• omp_get_num_threads routine, see Section 3.2.2 on page 89.• omp_get_max_threads routine, see Section 3.2.3 on page 90.• omp_get_dynamic routine, see Section 3.2.8 on page 95.27Chapter 4 Environment Variables 109


12345678910111213144.3 OMP_DYNAMICThe OMP_DYNAMIC environment variable controls dynamic adjustment of the numberof threads to use for executing parallel regions by setting the initial value of thedyn-var internal control variable. The value of this environment variable must be trueor false. If the environment variable is set to true, the <strong>OpenMP</strong> implementation mayadjust the number of threads to use for executing parallel regions in order tooptimize the use of system resources. If the environment variable is set to false, thedynamic adjustment of the number of threads is disabled.If the OMP_DYNAMIC environment variable is not set, the initial value of the dyn-varinternal control variable is implementation defined.The dyn-var internal control variable can be modified by calling theomp_set_dynamic library routine. The current value of dyn-var can be queried usingthe omp_get_dynamic library routine.Example:15setenv OMP_DYNAMIC true1617181920Cross References:• Internal control variables, see Section 2.3 on page 21.• omp_set_dynamic routine, see Section 3.2.7 on page 93.• omp_get_dynamic routine, see Section 3.2.8 on page 95.• omp_get_num_threads routine, see Section 3.2.2 on page 89.212223242526274.4 OMP_NESTEDThe OMP_NESTED environment variable controls nested parallelism by setting theinitial value of the nest-var internal control variable. The value of this environmentvariable must be true or false. If the environment variable is set to true, nestedparallelism is enabled; if set to false, nested parallelism is disabled.If the OMP_NESTED environment variable is not set, the initial value of the nest-varinternal control variable is false.28110 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234The nest-var internal control variable can be modified by calling theomp_set_nested library routine. The current value of nest-var can be queried usingthe omp_get_nested library routine.Example:5setenv OMP_NESTED false678Cross References:• Internal control variables, see Section 2.3 on page 21.• omp_set_nested routine, see Section 3.2.9 on page 96.9Chapter 4 Environment Variables 111


1112 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1APPENDIX A2Examples34567The following are examples of the constructs and routines defined in this document.C/C++A statement following a directive is compound only when necessary, and a noncompoundstatement is indented with respect to a directive preceding it.C/C++89101112131415161718192021A.1 Executing a Simple Loop in ParallelThe following example demonstrates how to parallelize a simple loop using theparallel loop construct (Section 2.6.1 on page 45). The loop iteration variable isprivate by default, so it is not necessary to specify it explicitly in a private clause.C/C++Example A.1.1cvoid a1(int n, float *a, float *b){int i;#pragma omp parallel forfor (i=1; i


12345Example A.1.1fSUBROUTINE A1(N, A ,B)INTEGER I, NREAL B(N), A(N)Fortran6789101112!$OMP PARALLEL DO !I is private by defaultDO I=2,NB(I) = (A(I) + A(I-1)) / 2.0ENDDO!$OMP END PARALLEL DOEND SUBROUTINE A1Fortran131415161718192021222324A.2 Specifying Conditional CompilationC/C++The following example illustrates the use of conditional compilation using the <strong>OpenMP</strong>macro _OPENMP (Section 2.2 on page 19). With <strong>OpenMP</strong> compilation, the _OPENMPmacro becomes defined.Example A.2.1c#include int main(){# ifdef _OPENMPprintf("Compiled by an <strong>OpenMP</strong>-compliant implementation.\n");# endif2526}return 1;27C/C++28114 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011FortranThe following example illustrates the use of the conditional compilation sentinel (seeSection 2.2 on page 19). With <strong>OpenMP</strong> compilation, the conditional compilationsentinel !$ is recognized and treated as two spaces. In fixed form source, statementsguarded by the sentinel must start after column 6.Example A.2.1fPROGRAM A2C234567890!$ PRINT *, "Compiled by an <strong>OpenMP</strong>-compliant implementation."END PROGRAM A2Fortran12131415161718192021A.3 Using the parallel ConstructThe parallel construct (Section 2.4 on page 24) can be used in coarse-grain parallelprograms. In the following example, each thread in the parallel region decides whatpart of the global array x to work on, based on the thread number:Example A.3.1c#include C/C++void subdomain(float *x, int istart, int ipoints){int i;222324}for (i = 0; i < ipoints; i++)x[istart+i] = 123.456;25262728293031void sub(float *x, int npoints){int iam, nt, ipoints, istart;#pragma omp parallel shared(x, npoints) private(iam, nt, ipoints){iam = omp_get_thread_num();nt = omp_get_num_threads();32Appendix A Examples 115


1234567}}ipoints = npoints / nt; /* size of partition */istart = iam * ipoints; /* starting array index */if (iam == nt-1) /* last thread may do more */ipoints = npoints - istart;subdomain(x, istart, ipoints);891011int main(){float array[10000];sub(array, 10000);1213}return 1;141516Example A.3.1fC/C++Fortran1718192021222324252627SUBROUTINE SUBDOMAIN(X, ISTART, IPOINTS)INTEGER ISTART, IPOINTSREAL X(*)INTEGER IDO 100 I=1,IPOINTSX(ISTART+I) = 123.456100 CONTINUEEND SUBROUTINE SUBDOMAINSUBROUTINE SUB(X, NPOINTS)REAL X(*)INTEGER NPOINTS28INCLUDE "omp_lib.h"! or USE OMP_LIB29303132333435INTEGER IAM, NT, IPOINTS, ISTART!$OMP PARALLEL DEFAULT(PRIVATE) SHARED(X,NPOINTS)IAM = OMP_GET_THREAD_NUM()NT = OMP_GET_NUM_THREADS()IPOINTS = NPOINTS/NTISTART = IAM * IPOINTSIF (IAM .EQ. NT-1) THEN36116 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910!$OMP END PARALLELIPOINTS = NPOINTS - ISTARTENDIFCALL SUBDOMAIN(X,ISTART,IPOINTS)END SUBROUTINE SUBPROGRAM A3REAL ARRAY(10000)CALL SUB(ARRAY, 10000)END PROGRAM A3Fortran111213141516171819202122232425A.4 Using the nowait clauseIf there are multiple independent loops within a parallel region, you can use thenowait clause (see Section 2.5.1 on page 31) to avoid the implied barrier at the end ofthe loop construct, as follows:C/C++Example A.4.1c#include void a4(int n, int m, float *a, float *b, float *y, float *z){int i;#pragma omp parallel{#pragma omp for nowaitfor (i=1; i


12Example A.4.1fFortran34567891011121314151617181920SUBROUTINE A4(N, M, A, B, Y, Z)INTEGER N, MREAL A(*), B(*), Y(*), Z(*)INTEGER I!$OMP PARALLEL!$OMP DODO I=2,NB(I) = (A(I) + A(I-1)) / 2.0ENDDO!$OMP END DO NOWAIT!$OMP DODO I=1,MY(I) = SQRT(Z(I))ENDDO!$OMP END DO NOWAIT!$OMP END PARALLELEND SUBROUTINE A4Fortran2122232425262728293031A.5 Using the critical ConstructThe following example includes several critical constructs (Section 2.7.2 on page50). The example illustrates a queuing model in which a task is dequeued and workedon. To guard against multiple threads dequeuing the same task, the dequeuing operationmust be in a critical region. Because the two queues in this example areindependent, they are protected by critical constructs with different names, xaxisand yaxis.Example A.5.1cint dequeue(float *a);void work(int i, float *a);C/C++32118 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910111213void a5(float *x, float *y){int ix_next, iy_next;}#pragma omp parallel shared(x, y) private(ix_next, iy_next){#pragma omp critical (xaxis)ix_next = dequeue(x);work(ix_next, x);}#pragma omp critical (yaxis)iy_next = dequeue(y);work(iy_next, y);14151617Example A.5.1fSUBROUTINE A5(X, Y)C/C++Fortran1819202122232425262728293031REAL X(*), Y(*)INTEGER IX_NEXT, IY_NEXT!$OMP PARALLEL SHARED(X, Y) PRIVATE(IX_NEXT, IY_NEXT)!$OMP CRITICAL(XAXIS)CALL DEQUEUE(IX_NEXT, X)!$OMP END CRITICAL(XAXIS)CALL WORK(IX_NEXT, X)!$OMP CRITICAL(YAXIS)CALL DEQUEUE(IY_NEXT,Y)!$OMP END CRITICAL(YAXIS)CALL WORK(IY_NEXT, Y)!$OMP END PARALLELEND SUBROUTINE A5Fortran32Appendix A Examples 119


12345678910111213141516A.6 Using the lastprivate ClauseCorrect execution sometimes depends on the value that the last iteration of a loopassigns to a variable. Such programs must list all such variables in a lastprivateclause (Section 2.8.3.5 on page 74) so that the values of the variables are the same aswhen the loop is executed sequentially.C/C++Example A.6.1cvoid a6 (int n, float *a, float *b){int i;#pragma omp parallel{#pragma omp for lastprivate(i)for (i=0; i


1END SUBROUTINE A62Fortran34567891011121314151617181920A.7 Using the reduction ClauseThe following example demonstrates the reduction clause (Section 2.8.3.6 on page76):C/C++Example A.7.1cvoid a7(float *x, int *y, int n){int i, b;float a;a = 0.0;b = 0;#pragma omp parallel for private(i) shared(x, y, n) \reduction(+:a) reduction(^:b)for (i=0; i


12A = A + X(I)B = MIN(B, Y(I))345678910111213141516171819202122232425262728! Note that some reductions can be expressed in! other forms. For example, the MIN could be expressed as! IF (B > Y(I)) B = Y(I)END DOEND SUBROUTINE A7FortranA common implementation of the preceding example is to treat it as if it had beenwritten as follows:C/C++Example A.7.2cvoid a7_2(float *x, int *y, int n){int i, b, b_p;float a, a_p;a = 0.0;b = 0;#pragma omp parallel shared(a, b, x, y, n) \private(a_p, b_p){a_p = 0.0;b_p = 0;#pragma omp for private(i)for (i=0; i


12345#pragma omp critical{a += a_p;b ^= b_p;}67}}8910Example A.7.2fC/C++Fortran111213141516SUBROUTINE A7_2 (A, B, X, Y, N)INTEGER NREAL X(*), Y(*), A, B, A_P, B_P!$OMP PARALLEL SHARED(X, Y, N, A, B) PRIVATE(A_P, B_P)A_P = 0.0B_P = HUGE(B_P)17181920212223242526!$OMP!$OMP!$OMP!$OMPDO PRIVATE(I)DO I=1,NA_P = A_P + X(I)B_P = MIN(B_P, X(I))ENDDOEND DOCRITICALA = A + A_PB = MIN(B, B_P)END CRITICAL2728!$OMP END PARALLELEND SUBROUTINE A7_22930313233The following program is non-conforming because the reduction is on the intrinsicprocedure name MAX but that name has been redefined to be the variable named MAX.Example A.7.3fPROGRAM A7_3MAX = HUGE(0)34Appendix A Examples 123


1234567891011M = 0!$OMP PARALLEL DO REDUCTION(MAX: M) ! MAX is no longer the! intrinsic so this! is non-conformingDO I = 1, 100CALL SUB(M,I)END DOEND PROGRAM A7_3SUBROUTINE SUB(M,I)M = MAX(M,I)END SUBROUTINE SUB121314151617The following compliant program performs the reduction using the intrinsic procedurename MAX even though the intrinsic MAX has been renamed to REN.Example A.7.4fMODULE MINTRINSIC MAXEND MODULE M1819202122232425PROGRAM A7_4USE M, REN => MAXN = 0!$OMP PARALLEL DO REDUCTION(REN: N)DO I = 1, 100N = MAX(N,I)END DOEND PROGRAM A7_4! still does MAX2627The following compliant program performs the reduction using intrinsic procedure nameMAX even though the intrinsic MAX has been renamed to MIN.282930313233Example A.7.5fMODULE MODINTRINSIC MAX, MINEND MODULE MODPROGRAM A7_5USE MOD, MIN=>MAX, MAX=>MIN34124 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789REAL :: RR = -HUGE(0.0)!$OMP PARALLEL DO REDUCTION(MIN: R)DO I = 1, 1000R = MIN(R, SIN(REAL(I)))END DOPRINT *, REND PROGRAM A7_5Fortran! still does MAX1011121314151617181920212223242526A.8 Using the parallel sections ConstructIn the following example, (for Section 2.5.2 on page 37) routines xaxis, yaxis, and zaxiscan be executed concurrently. The first section directive is optional. Note that allsection directives need to appear in the parallel sections construct.Example A.8.1cvoid XAXIS();void YAXIS();void ZAXIS();void a8(){#pragma omp parallel sections{#pragma omp sectionXAXIS();#pragma omp sectionYAXIS();C/C++27282930}}#pragma omp sectionZAXIS();31C/C++32Appendix A Examples 125


12345678910111213Example A.8.1fSUBROUTINE A8()!$OMP PARALLEL SECTIONS!$OMP SECTIONCALL XAXIS()!$OMP SECTIONCALL YAXIS()!$OMP SECTIONCALL ZAXIS()!$OMP END PARALLEL SECTIONSEND SUBROUTINE A8FortranFortran141516171819202122232425262728293031A.9 Using the single ConstructThe following example demonstrates the single construct (Section 2.5.3 on page 40).In the example, only one thread prints each of the progress messages. All other threadswill skip the single region and stop at the barrier at the end of the single constructuntil all threads in the team have reached the barrier. If other threads can proceedwithout waiting for the thread executing the single region, a nowait clause can bespecified, as is done in the third single construct in this example. The user must notmake any assumptions as to which thread will execute a single region.Example A.9.1c#include void work1() {}void work2() {}void a9(){#pragma omp parallel{#pragma omp singleC/C++32126 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1printf("Beginning work1.\n");2work1();34#pragma omp singleprintf("Finishing work1.\n");56#pragma omp single nowaitprintf("Finished work1 and beginning work2.\n");789}}work2();101112131415161718Example A.9.1fSUBROUTINE WORK1()END SUBROUTINE WORK1SUBROUTINE WORK2()END SUBROUTINE WORK2PROGRAM A9!$OMP PARALLELC/C++Fortran19202122232425262728!$OMP SINGLEprint *, "Beginning work1."!$OMP END SINGLECALL WORK1()!$OMP SINGLEprint *, "Finishing work1."!$OMP END SINGLE!$OMP SINGLEprint *, "Finished work1 and beginning work2."!$OMP END SINGLE NOWAIT29Appendix A Examples 127


123CALL WORK2()!$OMP END PARALLELEND PROGRAM A94Fortran567891011121314151617181920212223242526272829A.10 Specifying Sequential OrderingOrdered sections (Section 2.7.6 on page 59) are useful for sequentially ordering theoutput from work that is done in parallel. The following program prints out the indexesin sequential order:C/C++Example A.10.1c#include void work(int k){#pragma omp orderedprintf(" %d\n", k);}void a10(int lb, int ub,int stride){int i;}#pragma omp parallel for ordered schedule(dynamic)for (i=lb; i


12345678Example A.10.1fSUBROUTINE WORK(K)INTEGER k!$OMP ORDEREDWRITE(*,*) K!$OMP END ORDEREDEND SUBROUTINE WORKFortran9101112131415161718192021SUBROUTINE SUBA10(LB, UB, STRIDE)INTEGER LB, UB, STRIDEINTEGER I!$OMP PARALLEL DO ORDERED SCHEDULE(DYNAMIC)DO I=LB,UB,STRIDECALL WORK(I)END DO!$OMP END PARALLEL DOEND SUBROUTINE SUBA10PROGRAM A10CALL SUBA10(1,100,5)END PROGRAM A10Fortran222324252627282930313233A.11 Specifying a Fixed Number of ThreadsSome programs rely on a fixed, prespecified number of threads to execute correctly.Because the default setting for the dynamic adjustment of the number of threads isimplementation defined, such programs can choose to turn off the dynamic threadscapability and set the number of threads explicitly to ensure portability. The followingexample shows how to do this using omp_set_dynamic (Section 3.2.7 on page 93),and omp_set_num_threads (Section 3.2.1 on page 87):Example A.11.1c#include #include C/C++void do_by_16(float *x, int iam, int ipoints) {}34Appendix A Examples 129


123456789void a11(float *x, int npoints){int iam, ipoints;omp_set_dynamic(0);omp_set_num_threads(16);#pragma omp parallel shared(x, npoints) private(iam, ipoints){if (omp_get_num_threads() != 16)abort();1011121314}}iam = omp_get_thread_num();ipoints = npoints/16;do_by_16(x, iam, ipoints);151617Example A.11.1fC/C++Fortran18192021222324252627282930313233343536!$OMP!$OMPSUBROUTINE DO_BY_16(X, IAM, IPOINTS)REAL X(*)INTEGER IAM, IPOINTSEND SUBROUTINE DO_BY_16SUBROUTINE SUBA11(X, NPOINTS)INCLUDE "omp_lib.h" ! or USE OMP_LIBINTEGER NPOINTSREAL X(NPOINTS)INTEGER IAM, IPOINTSCALL OMP_SET_DYNAMIC(.FALSE.)CALL OMP_SET_NUM_THREADS(16)PARALLEL SHARED(X,NPOINTS) private(IAM, IPOINTS)IF (OMP_GET_NUM_THREADS() .NE. 16) THENCALL ABORT()ENDIFIAM = OMP_GET_THREAD_NUM()IPOINTS = NPOINTS/16CALL DO_BY_16(X,IAM,IPOINTS)END PARALLEL37130 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1END SUBROUTINE SUBA112345678FortranIn this example, the program executes correctly only if it is executed by 16 threads. Ifthe implementation is not capable of supporting 16 threads, the behavior of this exampleis implementation defined. Note that the number of threads executing a parallelregion remains constant during the region, regardless of the dynamic threads setting.The dynamic threads mechanism determines the number of threads to use at the start ofthe parallel region and keeps it constant for the duration of the region.910111213141516171819202122232425262728293031323334A.12 Using the atomic DirectiveThe following example avoids race conditions (simultaneous updates of an element of xby multiple threads by using the atomic directive (Section 2.7.4 on page 53):Example A.12.1cfloat work1(int i){return 1.0 * i;}float work2(int i){return 2.0 * i;}C/C++void a12(float *x, float *y, int *index, int n){int i;}#pragma omp parallel for shared(x, y, index, n)for (i=0; i


1234567891011121314151617181920212223}float y[10000];int index[10000];int i;for (i = 0; i < 10000; i++)index[i] = i % 1000;for (i = 0; i < 1000; i++)x[i] = y[i] = 0.0;a12(x, y, index, 10000);return 0;Example A.12.1fREAL FUNCTION WORK1(I)INTEGER IWORK1 = 1.0 * IRETURNEND FUNCTION WORK1REAL FUNCTION WORK2(I)INTEGER IWORK2 = 2.0 * IRETURNEND FUNCTION WORK2C/C++Fortran2425262728293031323334353637SUBROUTINE SUBA12(X, Y, INDEX, N)REAL X(*), Y(*)INTEGER INDEX(*), NINTEGER I!$OMP PARALLEL DO SHARED(X, Y, INDEX, N)DO I=1,N!$OMP ATOMICX(INDEX(I)) = X(INDEX(I)) + WORK1(I)Y(I) = Y(I) + WORK2(I)ENDDOEND SUBROUTINE SUBA12PROGRAM A12REAL X(1000), Y(10000)INTEGER INDEX(10000)38132 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314151617181920INTEGER IDO I=1,10000INDEX(I) = MOD(I, 1000) + 1ENDDODO I = 1,1000X(I) = 0.0Y(I) = 0.0ENDDOCALL SUBA12(X, Y, INDEX, 10000)END PROGRAM A12FortranThe advantage of using the atomic directive in this example is that it allows updates oftwo different elements of x to occur in parallel. If a critical construct (seeSection 2.7.2 on page 50) were used instead, then all updates to elements of x would beexecuted serially (though not in any guaranteed order).Note that the atomic directive applies only to the statement immediately following it.As a result, elements of y are not updated atomically in this example.Note that the array sizes in this example are chosen only for ease of presentation. Theyare not intended to imply that actual parallel speedup will be realized for arrays of thissize.21222324252627282930A.13 Illustration of <strong>OpenMP</strong> memory modelIn the following example, at Print 1, the value of x could be either 2 or 5, depending onthe timing of the threads, and the implementation of the assignment to x. There are tworeasons that the value at Print 1 might not be 5. First, Print 1 might be executed beforethe assignment to x is executed. Second, even if Print 1 is executed after the assignment,the value 5 is not guaranteed to be seen by thread 1 because a flush may not have beenexecuted by thread 0 since the assignment.The barrier after Print 1 contains implicit flushes on all threads, as well as a threadsynchronization, so the programmer is guaranteed that the value 5 will be printed byboth Print 2 and Print 3.31Appendix A Examples 133


1234Example A.13.1cint main(){int x;C/C++567891011121314151617181920212223}x = 2;#pragma omp parallel num_threads(2) shared(x){}if (omp_get_thread_num() == 0) {x = 5;} else {/* Print 1: the following read of x has a race */printf("1: Thread# %d: x = %d\n", omp_get_thread_num(),x );}#pragma omp barrierif (omp_get_thread_num() == 0) {/* Print 2 */printf("2: Thread# %d: x = %d\n", omp_get_thread_num(),x );} else {/* Print 3 */printf("3: Thread# %d: x = %d\n", omp_get_thread_num(),x );}2425262728Example A.13.1fPROGRAM A_13INTEGER XC/C++Fortran293031323334353637X = 2!$OMP PARALLEL NUM_THREADS(2) SHARED(X)IF (OMP_GET_THREAD_NUM() .EQ. 0) THENX = 5ELSE! PRINT 1: The following read of x has a racePRINT *,"1: THREAD# “, OMP_GET_THREAD_NUM(), “X = ", XENDIF!$OMP BARRIER38134 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910IF (OMP_GET_THREAD_NUM() .EQ. 0) THEN! PRINT 2PRINT *,"2: THREAD# “, OMP_GET_THREAD_NUM(), “X = ", XELSE! PRINT 3PRINT *,"3: THREAD# “, OMP_GET_THREAD_NUM(), “X = ", XENDIF!$OMP END PARALLELEND PROGRAM A_13Fortran11121314151617181920212223242526272829303132A.14 Using the flush Directive with a ListThe following example uses the flush directive for point-to-point synchronizationspecific objects between pairs of threads:C/C++Example A.14.1c#define NUMBER_OF_THREADS 256int synch[NUMBER_OF_THREADS];float work[NUMBER_OF_THREADS];float result[NUMBER_OF_THREADS];float fn1(int i){return i*2.0;}float fn2(float a, float b){return a + b;}int main(){int iam, neighbor;#pragma omp parallel private(iam,neighbor) shared(work,synch){33Appendix A Examples 135


12345678910111213141516171819202122232425}iam = omp_get_thread_num();synch[iam] = 0;#pragma omp barrier/*Do computation into my portion of work array */work[iam] = fn1(iam);/* Announce that I am done with my work. The first flush* ensures that my work is made visible before synch.* The second flush ensures that synch is made visible.*/#pragma omp flush(work,synch)synch[iam] = 1;#pragma omp flush(synch)/* Wait for neighbor. The first flush ensures that synch is read* from memory, rather than from the temporary view of memory.* The second flush ensures that work is read from memory, and* is done so after the while loop exits.*/neighbor = (iam>0 ? iam : omp_get_num_threads()) - 1;while (synch[neighbor] == 0) {#pragma omp flush(synch)}#pragma omp flush(work,synch)/* Read neighbor’s values of work array */result[iam] = fn2(work[neighbor], work[iam]);262728}/* output result here */return 0;2930313233343536Example A.14.1fREAL FUNCTION FN1(I)INTEGER IFN1 = I * 2.0RETURNEND FUNCTION FN1C/C++Fortran37136 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567REAL FUNCTION FN2(A, B)REAL A, BFN2 = A + BRETURNEND FUNCTION FN2PROGRAM A13Fortran (cont.)89101112INCLUDE "omp_lib.h"INTEGER ISYNC(256)REAL WORK(256)REAL RESULT(256)INTEGER IAM, NEIGHBOR! or USE OMP_LIB131415!$OMPPARALLEL PRIVATE(IAM, NEIGHBOR) SHARED(WORK, ISYNC)IAM = OMP_GET_THREAD_NUM() + 1ISYNC(IAM) = 016!$OMP BARRIER17181920212223242526272829303132333435CCCC!$OMP!$OMPCCCC!$OMPDo computation into my portion of work arrayWORK(IAM) = FN1(IAM)Announce that I am done with my work.The first flush ensures that my work is made visible beforesynch. The second flush ensures that synch is made visible.FLUSH(WORK,ISYNC)ISYNC(IAM) = 1FLUSH(ISYNC)Wait until neighbor is done. The first flush ensures thatsynch is read from memory, rather than from the temporaryview of memory. The second flush ensures that work is readfrom memory, and is done so after the while loop exits.IF (IAM .EQ. 1) THENNEIGHBOR = OMP_GET_NUM_THREADS()ELSENEIGHBOR = IAM - 1ENDIFDO WHILE (ISYNC(NEIGHBOR) .EQ. 0)FLUSH(ISYNC)36Appendix A Examples 137


123456!$OMP!$OMPEND DOFLUSH(WORK, ISYNC)RESULT(IAM) = FN2(WORK(NEIGHBOR), WORK(IAM))END PARALLELEND PROGRAM A13Fortran78910111213141516171819202122232425262728293031323334A.15 Using the flush Directive without a ListThe following example (for Section 2.7.5 on page 56) distinguishes the shared objectsaffected by a flush directive with no list from the shared objects that are not affected:Example A.15.1cint x, *p = &x;C/C++void f1(int *q){*q = 1;#pragma omp flush/* x, p, and *q are flushed *//* because they are shared and accessible *//* q is not flushed because it is not shared. */}void f2(int *q){#pragma omp barrier*q = 2;#pragma omp barrier}/* a barrier implies a flush *//* x, p, and *q are flushed *//* because they are shared and accessible *//* q is not flushed because it is not shared. */int g(int n){int i = 1, j, sum = 0;*p = 1;35138 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789101112131415}#pragma omp parallel reduction(+: sum) num_threads(10){f1(&j);}/* i, n and sum were not flushed *//* because they were not accessible in f1 *//* j was flushed because it was accessible */sum += j;f2(&j);/* i, n, and sum were not flushed *//* because they were not accessible in f2 *//* j was flushed because it was accessible */sum += i + j + *p + n;return sum;16171819202122232425int main(){int result = g(7);return result;}Example A.15.1fSUBROUTINE F1(Q)INTEGER QC/C++Fortran262728293031Q = 1;!$OMP FLUSH! X and P are flushed! because they are shared and accessible! Q is not flushed because it is not shared.END SUBROUTINE F13233343536373839!$OMP!$OMPSUBROUTINE F2(Q)INTEGER QBARRIERQ = 2BARRIER! a barrier implies a flush! X and P are flushed! because they are shared and accessible40Appendix A Examples 139


1234567891011121314151617181920212223242526272829303132333435!$OMP!$OMP! Q is not flushed because it is not shared.END SUBROUTINE F2INTEGER FUNCTION G(N)COMMON X, PINTEGER, TARGET :: XINTEGER, POINTER :: PINTEGER NINTEGER I, J, SUMI = 1SUM = 0P = 1PARALLEL REDUCTION(+: SUM) NUM_THREADS(10)CALL F1(J)! I, N and SUM were not flushed! because they were not accessible in F1! J was flushed because it was accessibleSUM = SUM + JCALL F2(J)! I, N, and SUM were not flushed! because they were not accessible in f2! J was flushed because it was accessibleSUM = SUM + I + J + P + NEND PARALLELG = SUMEND FUNCTION GPROGRAM A14COMMON X, PINTEGER, TARGET :: XINTEGER, POINTER :: PINTEGER RESULT, GP => XRESULT = G(7)PRINT *, RESULTEND PROGRAM A14Fortran36140 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910111213A.16 Determining the Number of Threads UsedConsider the following incorrect example (for Section 3.2.2 on page 89):Example A.16.1cvoid work(int i);void incorrect(){int np, i;}C/C++np = omp_get_num_threads(); /* misplaced */#pragma omp parallel for schedule(static)for (i=0; i < np; i++)work(i);14151617181920212223Example A.16.1fSUBROUTINE WORK(I)INTEGER II = I + 1END SUBROUTINE WORKSUBROUTINE INCORRECT()INCLUDE "omp_lib.h"INTEGER I, NPC/C++Fortran! or USE OMP_LIB2425262728293031!$OMP!$OMPNP = OMP_GET_NUM_THREADS() !misplaced: will return 1PARALLEL DO SCHEDULE(STATIC)DO I = 0, NP-1CALL WORK(I)ENDDOEND PARALLEL DOEND SUBROUTINE INCORRECTFortran32Appendix A Examples 141


123456789101112131415161718The omp_get_num_threads call returns 1 in the serial section of the code, so npwill always be equal to 1 in the preceding example. To determine the number of threadsthat will be deployed for the parallel region, the call should be inside theparallel region.The following example shows how to rewrite this program without including a query forthe number of threads:C/C++Example A.16.2cvoid work(int i);void correct(){int i;}#pragma omp parallel private(i){i = omp_get_thread_num();work(i);}19202122232425262728Example A.16.2fSUBROUTINE WORK(I)INTEGER II = I + 1END SUBROUTINE WORKSUBROUTINE CORRECT()INCLUDE "omp_lib.h"INTEGER IC/C++Fortran! or USE OMP_LIB293031323334!$OMP!$OMPPARALLEL PRIVATE(I)I = OMP_GET_THREAD_NUM()CALL WORK(I)END PARALLELEND SUBROUTINE CORRECTFortran35142 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314151617181920212223242526272829A.17 Using LocksThis is an example of the use of the simple lock routines.In the following example, (for Section 3.3 on page 98) the lock functions cause thethreads to be idle while waiting for entry to the first critical section, but to do other workwhile waiting for entry to the second. The omp_set_lock function blocks, but theomp_test_lock function does not, allowing the work in skip to be done.Example A.17.1c#include #include void skip(int i) {}void work(int i) {}int main(){omp_lock_t lck;int id;omp_init_lock(&lck);C/C++#pragma omp parallel shared(lck) private(id){id = omp_get_thread_num();omp_set_lock(&lck);/* only one thread at a time can execute this printf */printf("My thread id is %d.\n", id);omp_unset_lock(&lck);while (! omp_test_lock(&lck)) {skip(id); /* we do not yet have the lock,so we must do something else */}#pragma omp flush303132work(id);#pragma omp flush/* we now have the lockand can do the work */33Appendix A Examples 143


12}omp_unset_lock(&lck);345}omp_destroy_lock(&lck);return 1;6789101112131415161718Note that the argument to the lock routines should have type omp_lock_t, and thatthere is no need to flush it. However, flush constructs must be inserted after the lock hasbeen acquired (to ensure that any shared variables are read from memory) and before thelock is released (to ensure that any modifed shared variables are written back tomemory).C/C++FortranExample A.17.1fSUBROUTINE SKIP()END SUBROUTINE SKIPSUBROUTINE WORK()END SUBROUTINE WORKPROGRAM A1719INCLUDE "omp_lib.h"! or USE OMP_LIB202122232425262728293031!$OMPINTEGER(OMP_LOCK_KIND) LCKINTEGER IDCALL OMP_INIT_LOCK(LCK)PARALLEL SHARED(LCK) PRIVATE(ID)ID = OMP_GET_THREAD_NUM()CALL OMP_SET_LOCK(LCK)PRINT *, ’My thread id is ’, IDCALL OMP_UNSET_LOCK(LCK)DO WHILE (.NOT. OMP_TEST_LOCK(LCK))CALL SKIP(ID) ! We do not yet have the lock! so we must do something elseEND DO3233!$OMP FLUSHCALL WORK(ID)! We now have the lock34144 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12!$OMP FLUSH! and can do the work3456!$OMPCALL OMP_UNSET_LOCK( LCK )END PARALLELCALL OMP_DESTROY_LOCK( LCK )END PROGRAM A177891011Note that there is no need to flush the lock variable. However, flush constructs must beinserted after the lock has been acquired (to ensure that any shared variables are readfrom memory) and before the lock is released (to ensure that any modifed sharedvariables are written back to memory).Fortran1213141516171819202122232425262728293031A.18 Using Nestable LocksThe following example (for Section 3.3 on page 98) demonstrates how a nestable lockcan be used to synchronize updates both to a whole structure and to one of its members.C/C++Example A.18.1c#include typedef struct {int a,b;omp_nest_lock_t lck; } pair;int work1();int work2();int work3();void incr_a(pair *p, int a){/* Called only from incr_pair, no need to lock. */p->a += a;}void incr_b(pair *p, int b){/* Called both from incr_pair and elsewhere, */32Appendix A Examples 145


1234567891011121314151617181920212223242526/* so need a nestable lock. */omp_set_nest_lock(&p->lck);#pragma omp flushp->b += b;#pragma omp flushomp_unset_nest_lock(&p->lck);}void incr_pair(pair *p, int a, int b){omp_set_nest_lock(&p->lck);#pragma omp flushincr_a(p, a);incr_b(p, b);#pragma omp flushomp_unset_nest_lock(&p->lck);}void f(pair *p){}#pragma omp parallel sections{#pragma omp sectionincr_pair(p, work1(), work2());#pragma omp sectionincr_b(p, work3());}272829Example A.18.1fC/C++Fortran30313233343536373839404142MODULE DATAUSE OMP_LIB, ONLY: OMP_NEST_LOCK_KIND! or INCLUDE "omp_lib.h"TYPE LOCKED_PAIRINTEGER AINTEGER BINTEGER (OMP_NEST_LOCK_KIND) LCKEND TYPEEND MODULE DATASUBROUTINE INCR_A(P, A)! called only from INCR_PAIR, no need to lockUSE DATATYPE(LOCKED_PAIR) :: P43146 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789101112131415161718192021222324252627282930313233343536INTEGER AP%A = P%A + AEND SUBROUTINE INCR_AFortran (cont.)SUBROUTINE INCR_B(P, B)! called from both INCR_PAIR and elsewhere,! so we need a nestable lockUSE OMP_LIB ! or INCLUDE "omp_lib.h"USE DATATYPE(LOCKED_PAIR) :: PINTEGER BCALL OMP_SET_NEST_LOCK(P%LCK)!$OMP FLUSH (P,B)P%B = P%B + B!$OMP FLUSH(P)CALL OMP_UNSET_NEST_LOCK(P%LCK)END SUBROUTINE INCR_BSUBROUTINE INCR_PAIR(P, A, B)USE OMP_LIB ! or INCLUDE "omp_lib.h"USE DATATYPE(LOCKED_PAIR) :: PINTEGER AINTEGER BCALL OMP_SET_NEST_LOCK(P%LCK)!$OMP FLUSH(P,A,B)CALL INCR_A(P, A)CALL INCR_B(P, B)!$OMP FLUSH(P)CALL OMP_UNSET_NEST_LOCK(P%LCK)END SUBROUTINE INCR_PAIRSUBROUTINE F(P)USE OMP_LIB ! or INCLUDE "omp_lib.h"USE DATATYPE(LOCKED_PAIR) :: PINTEGER WORK1, WORK2, WORK3EXTERNAL WORK1, WORK2, WORK3373839404142!$OMP!$OMP!$OMP!$OMPPARALLEL SECTIONSSECTIONCALL INCR_PAIR(P, WORK1, WORK2)SECTIONCALL INCR_B(P, WORK3)END PARALLEL SECTIONS43Appendix A Examples 147


1END SUBROUTINE F2Fortran3456789101112131415161718192021222324A.19 Nested Loop ConstructsThe following example of loop construct nesting (Section 2.9 on page 83) is conformingbecause the inner and outer loop constructs bind to different parallel regions:Example A.19.1cvoid work(int i, int j) {}C/C++void good_nesting(int n){int i, j;#pragma omp parallel default(shared){#pragma omp forfor (i=0; i


123456789101112!$OMP!$OMP!$OMP!$OMP!$OMP!$OMPPARALLEL DEFAULT(SHARED)DODO I = 1, NPARALLEL SHARED(I,N)DODO J = 1, NCALL WORK(I,J)END DOEND PARALLELEND DOEND PARALLELEND SUBROUTINE GOOD_NESTING1314151617181920212223242526272829303132333435363738FortranThe following variation of the preceding example is also compliant:C/C++Example A.19.2cvoid work(int i, int j) {}void work1(int i, int n){int j;#pragma omp parallel default(shared){#pragma omp forfor (j=0; j


12345Example A.19.2fSUBROUTINE WORK(I, J)INTEGER I, JEND SUBROUTINE WORKFortran67891011121314SUBROUTINE WORK1(I, N)INTEGER J!$OMP PARALLEL DEFAULT(SHARED)!$OMP DODO J = 1, NCALL WORK(I,J)END DO!$OMP END PARALLELEND SUBROUTINE WORK115161718192021222324SUBROUTINE GOOD_NESTING2(N)INTEGER N!$OMP PARALLEL DEFAULT(SHARED)!$OMP DODO I = 1, NCALL WORK1(I, N)END DO!$OMP END PARALLELEND SUBROUTINE GOOD_NESTING2Fortran252627282930A.20 Examples Showing Incorrect Nesting ofRegionsThe examples in this section illustrate the region nesting rules. For more information ondirective nesting, see Section 2.9 on page 83.The following example is non-conforming because the inner and outer loop regions areclosely nested:31150 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123Example A.20.1cvoid work(int i, int j) {}C/C++4567891011121314151617void wrong1(int n){#pragma omp parallel default(shared){int i, j;#pragma omp forfor (i=0; i


123456789101112131415161718192021The following orphaned version of the preceding example is also non-conforming:Example A.20.2cC/C++void work1(int i, int n){int j;/* incorrect nesting of loop regions */#pragma omp forfor (j=0; j


123!$OMPEND DOEND PARALLELEND SUBROUTINE WRONG2456789101112131415161718192021FortranThe following example is non-conforming because the loop and single regions areclosely nested:C/C++Example A.20.3cvoid wrong3(int n){#pragma omp parallel default(shared){int i;#pragma omp forfor (i=0; i


123456789101112131415161718The following example is non-conforming because a barrier region cannot be closelynested inside a loop region:C/C++Example A.20.4cvoid wrong4(int n){#pragma omp parallel default(shared){int i;#pragma omp forfor (i=0; i


12Example A.20.5cC/C++34567891011121314void wrong5(int n){#pragma omp parallel{#pragma omp critical{work(n, 0);/* incorrent nesting of barrier region in a critical region */#pragma omp barrierwork(n, 1);}}1516171819Example A.20.5fSUBROUTINE WRONG5(N)INTEGER NC/C++Fortran2021222324252627282930313233343536373839!$OMP PARALLEL DEFAULT(SHARED)!$OMP CRITICALCALL WORK(N,1)! incorrent nesting of barrier region in a critical region!$OMP BARRIERCALL WORK(N,2)!$OMP END CRITICAL!$OMP END PARALLELEND SUBROUTINE WRONG5FortranThe following example is non-conforming because the barrier region cannot beclosely nested inside the single region. If this were permitted, it would result indeadlock due to the fact that only one thread executes the single region:Example A.20.6cvoid wrong6(int n){#pragma omp parallel{#pragma omp singleC/C++40Appendix A Examples 155


12345678{work(n, 0);/* incorrent nesting of barrier region in a single region */#pragma omp barrierwork(n, 1);}}}910111213Example A.20.6fSUBROUTINE WRONG6(N)INTEGER NC/C++Fortran14151617181920212223!$OMP PARALLEL DEFAULT(SHARED)!$OMP SINGLECALL WORK(N,1)! incorrent nesting of barrier region in a single region!$OMP BARRIERCALL WORK(N,2)!$OMP END SINGLE!$OMP END PARALLELEND SUBROUTINE WRONG6Fortran2425262728293031323334A.21 Binding of barrier RegionsThe binding rules call for a barrier region to bind to the closest enclosingparallel region.In the following example, the call from main to sub2 is conforming because thebarrier region (in sub3) binds to the parallel region in sub2. The call from mainto sub1 is conforming because the barrier region binds to the parallel region insubroutine sub2.The call from main to sub3 is conforming because the barrier region binds to theimplicit inactive parallel region enclsoing the sequential part. Also note that thebarrier region in sub3 when called from sub2 only synchronizes the team of threadsin the enclosing parallel region and not all the threads created in sub1.35156 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789Example A.21.1cvoid work(int n) {}void sub3(int n){work(n);#pragma omp barrierwork(n);}C/C++101112131415161718192021222324252627282930void sub2(int k){#pragma omp parallel shared(k)sub3(k);}void sub1(int n){int i;#pragma omp parallel private(i) shared(n){#pragma omp forfor (i=0; i


123456789!$OMPBARRIERCALL WORK(N)END SUBROUTINE SUB3SUBROUTINE SUB2(K)INTEGER K!$OMP PARALLEL SHARED(K)CALL SUB3(K)!$OMP END PARALLELEND SUBROUTINE SUB210111213141516171819202122232425!$OMP!$OMP!$OMPSUBROUTINE SUB1(N)INTEGER NINTEGER IPARALLEL PRIVATE(I) SHARED(N)DODO I = 1, NCALL SUB2(I)END DOEND PARALLELEND SUBROUTINE SUB1PROGRAM A21CALL SUB1(2)CALL SUB2(2)CALL SUB3(2)END PROGRAM A21Fortran26272829303132333435A.22 Using the private ClauseIn the following example the values of i and j are undefined on exit from theparallel region. For more information on the private clause, see Section 2.8.3.3on page 71.C/C++Example A.22.1c#include int main(){int i, j;36158 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910}i = 1;j = 2;#pragma omp parallel private(i) firstprivate(j){i = 3;j = j + 2;}printf("%d %d\n", i, j); /* i and j are undefined */return 0;11121314151617Example A.22.1fPROGRAM A22INTEGER I, JI = 1J = 2C/C++Fortran1819202122232425!$OMP!$OMPPARALLEL PRIVATE(I) FIRSTPRIVATE(J)I = 3J = J + 2END PARALLELPRINT *, I, J ! I and J are definedEND PROGRAM A22FortranFortran26272829A.23 Examples of Non-conforming StorageAssociationThe following non-conforming examples illustrate the implications of the privateclause rules with regard to storage association.30Appendix A Examples 159


123456Example A.23.1fSUBROUTINE SUB()COMMON /BLOCK/ XPRINT *,XEND SUBROUTINE SUBFortran (cont.)! X is undefined7891011121314!$OMP!$OMPPROGRAM A23_1COMMON /BLOCK/ XX = 1.0PARALLEL PRIVATE (X)X = 2.0CALL SUB()END PARALLELEND PROGRAM A23_115161718Example A.23.2fPROGRAM A23_2COMMON /BLOCK2/ XX = 1.019202122!$OMP!$OMPPARALLEL PRIVATE (X)X = 2.0CALL SUB()END PARALLEL232425CONTAINSSUBROUTINE SUB()COMMON /BLOCK2/ Y26272829PRINT *,XPRINT *,YEND SUBROUTINE SUBEND PROGRAM A23_2! X is undefined! Y is undefined303132Example A.23.3fPROGRAM A23_3EQUIVALENCE (X,Y)33160 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12X = 1.0Fortran (cont.)345678!$OMP PARALLEL PRIVATE(X)PRINT *,YY = 10PRINT *,X!$OMP END PARALLELEND PROGRAM A23_3! Y is undefined! X is undefined910111213Example A.23.4fPROGRAM A23_4INTEGER I, JINTEGER A(100), B(100)EQUIVALENCE (A(51), B(1))1415161718!$OMP PARALLEL DO DEFAULT(PRIVATE) PRIVATE(I,J) LASTPRIVATE(A)DO I=1,100DO J=1,100B(J) = J - 1ENDDO192021DO J=1,100A(J) = JENDDO! B becomes undefined at this point22232425262728293031ENDDO!$OMP END PARALLEL DODO J=1,50B(J) = B(J) + 1 ! B is undefined! A becomes undefined at this pointENDDO! The LASTPRIVATE write for A has! undefined resultsPRINT *, B ! B is undefined since the LASTPRIVATE! write of A was not definedEND PROGRAM A23_432Example A.23.5f3334SUBROUTINE SUB1(X)DIMENSION X(10)35Appendix A Examples 161


1234567891011121314151617181920212223242526!$OMP!$OMP!$OMP!$OMP! This use of X does not conform to the! specification. It would be legal Fortran 90,! but the <strong>OpenMP</strong> private directive allows the! compiler to break the sequence association that! A had with the rest of the common block.FORALL (I = 1:10) X(I) = IEND SUBROUTINE SUB1PROGRAM A23_5COMMON /BLOCK5/ ADIMENSION B(10)EQUIVALENCE (A,B(1))! the common block has to be at least 10 wordsA = 0PARALLEL PRIVATE(/BLOCK5/)! Without the private clause,! we would be passing a member of a sequence! that is at least ten elements long.! With the private clause, A may no longer be! sequence-associated.CALL SUB1(A)MASTERPRINT *, AEND MASTEREND PARALLELEND PROGRAM A23_5Fortran27282930A.24 Using the default(none) ClauseThe following example distinguishes the variables that are affected by thedefault(none) clause from those that are not. For more information on thedefault clause, see Section 2.8.3.1 on page 68.31162 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678Example A.24.1c#include int x, y, z[1000];#pragma omp threadprivate(x)void a24(int a) {const int c = 1;int i = 0;C/C++910111213141516171819202122#pragma omp parallel default(none) private(a) shared(z){int j = omp_get_num_threads();/* O.K. - j is declared within parallel region */a = z[j]; /* O.K. - a is listed in private clause *//* - z is listed in shared clause */x = c; /* O.K. - x is threadprivate *//* - c has const-qualified type */z[i] = y; /* Error - cannot reference i or y here */#pragma omp for firstprivate(y)for (i=0; i


123!$OMP THREADPRIVATE(/BLOCKX/)INTEGER I, Ji = 145678910111213141516171819!$OMP!$OMP!$OMPPARALLEL DEFAULT(NONE) PRIVATE(A) SHARED(Z) PRIVATE(J)J = OMP_GET_NUM_THREADS();! O.K. - J is listed in PRIVATE clauseA = Z(J) ! O.K. - A is listed in PRIVATE clause! - Z is listed in SHARED clauseX = 1 ! O.K. - X is THREADPRIVATEZ(I) = Y ! Error - cannot reference I or Y hereDO firstprivate(y)DO I = 1,10Z(I) = Y ! O.K. - I is the loop iteration variable! Y is listed in FIRSTPRIVATE clauseEND DOZ(I) = Y ! Error - cannot reference I or Y hereEND PARALLELEND SUBROUTINE A24Fortran20Fortran21222324252627282930313233A.25 Example of Race Conditions caused byImplied Copies of Shared VariablesThe following example contains a race condition, because the shared variable which isin array section is passed as an actual argument to a routine which has an explicit-shapearray as its dummy argument (see Section 2.8.3.2 on page 69). The subroutine callpassing an array section argument may cause the compiler to copy the argument into atemporary location prior to the call and copy from the temporary location into theoriginal variable when the subroutine returns. This copying would cause races in theparallel region.Example A.25.1fSUBROUTINE A25REAL A(20)INTEGER MYTHREAD34164 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789101112131415!$OMP PARALLEL SHARED(A) PRIVATE(MYTHREAD)MYTHREAD = OMP_GET_THREAD_NUM()IF (MYTHREAD .EQ. 0) THENCALL SUB(A(1:10)) ! compiler may introduce writes to A(6:10)ELSEA(6:10) = 12ENDIF!$OMP END PARALLELEND SUBROUTINE A25SUBROUTINE SUB(X)REAL X(:)X(1:5) = 4END SUBROUTINE SUBFortranFortran1617181920212223242526272829303132A.26 Examples of Syntax of Parallel LoopsIf an end do directive follows a do_construct in which several DO statements share aDO termination statement, then a do directive can only be specified for the first (i.e.outermost) of these DO statements. For more information, see Section 2.5.1 on page 31.The following example contains correct usages of loop constructs:Example A.26.1fSUBROUTINE WORK(I, J)INTEGER I,JEND SUBROUTINE WORKSUBROUTINE A26_GOOD()INTEGER I, JREAL A(1000)DO 100 I = 1,10!$OMP DODO 100 J = 1,10CALL WORK(I,J)100 CONTINUE ! !$OMP ENDDO implied here33Appendix A Examples 165


1234567891011!$OMP DODO 200 J = 1,10200 A(I) = I + 1!$OMP ENDDO!$OMPDODO 300 I = 1,10DO 300 J = 1,10CALL WORK(I,J)300 CONTINUE!$OMPENDDOEND SUBROUTINE A26_GOOD1213The following example is non-conforming because the matching do directive for theend do does not precede the outermost loop:1415161718192021222324252627Example A.26.2fSUBROUTINE WORK(I, J)INTEGER I,JEND SUBROUTINE WORKSUBROUTINE A26_WRONGINTEGER I, JDO 100 I = 1,10!$OMP DODO 100 J = 1,10CALL WORK(I,J)100 CONTINUE!$OMP ENDDOEND SUBROUTINE A26_WRONGFortran28166 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1Fortran23456A.27 Loop iteration variablesIn general loop iteration variables will be private, when used in the do-loop of a do andparallel do construct or in sequential loops in a parallel construct. In the followingexample of a sequential loop in a parallel construct the loop iteration variable I will beprivate.7891011121314151617Example A.27.1fSUBROUTINE A27_1(A,N)REAL A(*)INTEGER I, MYOFFSET, N!$OMP PARALLEL PRIVATE(MYOFFSET)MYOFFSET = OMP_GET_THREAD_NUM()*NDO I = 1, NA(MYOFFSET+I) = FLOAT(I)ENDDO!$OMP END PARALLELEND SUBROUTINE A27_11819In exceptional cases, loop iteration variables can be made shared, as in the followingexample:2021222324252627282930313233Example A.27.2fSUBROUTINE A27_2(A,B,N,I1,I2)FLOAT A(*) , B(*)INTEGER I1, I2, N!$OMP PARALLEL SHARED(A,B,I1,I2)!$OMP SECTIONS!$OMP SECTIONDO I1 = I1, NIF (A(I1).NE.0.0) EXITENDDO!$OMP SECTIONDO I2 = I2, NIF (B(I2).NE.0.0) EXITENDDO34Appendix A Examples 167


1234567!$OMP END SECTIONS!$OMP SINGLEIF (I1.LE.N) PRINT *, ’ITEMS IN A UP TO ’, I1, ’ ARE ALL ZERO.’IF (I2.LE.N) PRINT *, ’ITEMS IN B UP TO ’, I2, ’ ARE ALL ZERO.’!$OMP END SINGLE!$OMP END PARALLELEND SUBROUTINE A27_28910Note however that the use of shared loop iteration variables can easily lead to raceconditions.Fortran11Fortran1213141516171819202122232425262728293031A.28 Examples of the atomic ConstructAll atomic references to the storage location of each variable that appears on the lefthandside of an atomic assignment statement throughout the program are required tohave the same type and type parameters. For more information, see Section 2.7.4 onpage 53.The following are some non-conforming examples:Example A.28.1fSUBROUTINE A28_1_WRONG()INTEGER:: IREAL:: REQUIVALENCE(I,R)!$OMP PARALLEL!$OMP ATOMICI = I + 1!$OMP ATOMICR = R + 1.0! incorrect because I and R reference the same location! but have different types!$OMP END PARALLELEND SUBROUTINE A28_1_WRONG32168 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345Example A.28.2fSUBROUTINE SUB()COMMON /BLK/ RREAL RFortran (cont.)6789101112131415161718192021!$OMP ATOMIC! incorrect because I and R reference the same location! but have different typesR = R + 1.0END SUBROUTINE SUBSUBROUTINE A28_2_WRONG()COMMON /BLK/ IINTEGER I!$OMP PARALLEL! incorrect because I and R reference the same location! but have different types!$OMP ATOMICI = I + 1CALL SUB()!$OMP END PARALLELEND SUBROUTINE A28_2_WRONG2223Although the following example might work on some implementations, this is also nonconforming:2425262728293031323334353637Example A.28.3fSUBROUTINE A28_3_WRONGINTEGER:: IREAL:: REQUIVALENCE(I,R)!$OMP PARALLEL!$OMP ATOMIC! incorrect because I and R reference the same location! but have different typesI = I + 1!$OMP END PARALLEL!$OMP PARALLEL!$OMP ATOMIC! incorrect because I and R reference the same location38Appendix A Examples 169


12345! but have different typesR = R + 1.0!$OMP END PARALLELEND SUBROUTINE A28_3_WRONGFortran67891011121314151617181920212223242526A.29 Examples of the ordered ConstructIt is possible to have multiple ordered contructs within a loop region with theordered clause specified. The first example is non-conforming because all iterationsexecute two ordered regions. An iteration of a loop must not execute more than oneordered region:Example A.29.1cvoid work(int i) {}C/C++void a29_wrong(int n){int i;#pragma omp for orderedfor (i=0; i


12345678INTEGER I!$OMP DO ORDEREDDO I = 1, N! incorrect because an iteration may not execute more than one! ordered region!$OMP ORDEREDCALL WORK(I)!$OMP END ORDERED910111213!$OMP!$OMPORDEREDCALL WORK(I+1)END ORDEREDEND DOEND SUBROUTINE A29_WRONG1415161718192021222324252627FortranThis is a compliant example with more than one ordered contruct. Each iteration willexecute only one ordered region:Example A.29.2cvoid a29_good(int n){int i;#pragma omp for orderedfor (i=0; i


1234567891011121314!$OMP!$OMP!$OMP!$OMP!$OMPDO ORDEREDDO I = 1,NIF (I 10) THENORDEREDCALL WORK(I+1)END ORDEREDENDIFENDDOEND SUBROUTINE A29_GOOD15Fortran16171819202122232425262728A.30 Examples of threadprivate DataThe following examples demonstrate how to use the threadprivate directive(Section 2.8.2 on page 63) to give each thread a separate counter.C/C++Example A.30.1cint counter = 0;#pragma omp threadprivate(counter)int increment_counter(){counter++;return(counter);}C/C++2930Example A.30.1fFortran313233!$OMPINTEGER FUNCTION INCREMENT_COUNTER()COMMON/A32_COMMON/COUNTERTHREADPRIVATE(/A32_COMMON/)34172 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567COUNTER = COUNTER +1INCREMENT_COUNTER = COUNTERRETURNEND FUNCTION INCREMENT_COUNTERFortranC/C++This example uses threadprivate on a static variable:89101112131415Example A.30.2cint increment_counter_2(){static int counter = 0;#pragma omp threadprivate(counter)counter++;return(counter);}161718The following example illustrates how modifying a variable that appears in an initializercan cause unspecified behavior, and also how to avoid this problem by using an auxiliaryobject and a copy-constructor.19202122232425262728293031323334Example A.30.3cclass T {public:int val;T (int);T (const T&);};T :: T (int v){val = v;}T :: T (const T& t) {val = t.val;}void g(T a, T b){a.val += b.val;}35Appendix A Examples 173


123456789101112int x = 1;T a(x);const T b_aux(x); /* Capture value of x = 1 */T b(b_aux);#pragma omp threadprivate(a, b)void f(int n) {x++;#pragma omp parallel for/* In each thread:* Object a is constructed from x (with value 1 or 2?)* Object b is copy-constructed from b_aux*/1314151617}for (int i=0; i


12Example A.30.5fFortran (cont.)345!$OMPSUBROUTINE A30_5_WRONG()COMMON /T/ ATHREADPRIVATE(/T/)6789101112CONTAINSSUBROUTINE A30_5S_WRONG()!$OMP PARALLEL COPYIN(/T/)!non-conforming because /T/ not declared in A30_5S_WRONG!$OMP END PARALLELEND SUBROUTINE A30_5S_WRONGEND SUBROUTINE A30_5_WRONG13The following example is a correct rewrite of the previous example:14Example A.30.6f15161718192021!$OMP!$OMPSUBROUTINE A30_6_GOOD()COMMON /T/ ATHREADPRIVATE(/T/)CONTAINSSUBROUTINE A30_6S_GOOD()COMMON /T/ ATHREADPRIVATE(/T/)2223242526!$OMP!$OMPPARALLEL COPYIN(/T/)END PARALLELEND SUBROUTINE A30_6_GOODThe following is an example of the use of example of threadprivate for localvariables:27Example A.30.7f28293031323334!$OMPPROGRAM A30_7_GOODINTEGER, ALLOCATABLE, SAVE :: A(:)INTEGER, POINTER, SAVE :: PTRINTEGER, SAVE :: IINTEGER, TARGET :: TARGLOGICAL :: FIRSTIN = .TRUE.THREADPRIVATE(A, I, PTR)35Appendix A Examples 175


12345ALLOCATE (A(3))A = (/1,2,3/)PTR => TARGI = 5Fortran (cont.)6789101112131415161718192021!$OMP!$OMPPARALLEL COPYIN(I, PTR)CRITICALIF (FIRSTIN) THENTARG = 4! Update target of ptrI = I + 10IF (ALLOCATED(A)) A = A + 10FIRSTIN = .FALSE.END IFIF (ALLOCATED(A)) THENPRINT *, ’a = ’, AELSEPRINT *, ’A is not allocated’END IFPRINT *, ’ptr = ’, PTRPRINT *, ’i = ’, IPRINT *222324!$OMP!$OMPEND CRITICALEND PARALLELEND PROGRAM A30_7_GOOD2526The above program, if executed by two threads, will print one of the following two setsof output:27282930313233a = 11 12 13ptr = 4i = 15A is not allocatedptr = 4i = 5or343536A is not allocatedptr = 4i = 1537176 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234a = 1 2 3ptr = 4i = 5The following is an example of the use of threadprivate for module variables:567891011121314151617181920212223242526272829Example A.30.8f!$OMPMODULE A30_MODULE8REAL, POINTER :: WORK(:)SAVE WORKTHREADPRIVATE(WORK)END MODULE A30_MODULE8SUBROUTINE SUB1(N)USE A30_MODULE8!$OMP PARALLEL PRIVATE(THE_SUM)ALLOCATE(WORK(N))CALL SUB2(N,THE_SUM)WRITE(*,*)THE_SUM!$OMP END PARALLELEND SUBROUTINE SUB1SUBROUTINE SUB2(N,THE_SUM)USE A30_MODULE8WORK = 10THE_SUM=SUM(WORK)END SUBROUTINE SUB2PROGRAM A30_8_GOODUSE A30_MODULE8N = 10CALL SUB1(N)END PROGRAM A30_8_GOODFortran3031323334A.31 Example of the private ClauseThe private clause (Section 2.8.3.3 on page 71) of a parallel construct is only ineffect inside the construct, and not for the rest of the region. Therefore, in the examplethat follows, any uses of the variable a within the loop in the routine f refers to a privatecopy of a, while a usage in routine g refers to the global a.35Appendix A Examples 177


123Example A.31.1cint a;C/C++456void g(int k) {a = k; /* The global "a", not the private "a" in f */}78void f(int n) {int a = 0;91011121314}#pragma omp parallel for private(a)for (int i=1; i


1END MODULE A312Fortran3Fortran4567891011121314A.32 Examples of the Data-sharing AttributeClauses: shared and privateWhen a named common block is specified in a private, firstprivate, orlastprivate clause of a directive, none of its constituent elements may be declaredin another data-sharing attribute clause in that directive. The following examplesillustrate this point. For more information, see Section 2.8.3 on page 67.This example is conforming:Example A.32.1fSUBROUTINE A32_1_GOOD()COMMON /C/ X,YREAL X, Y151617!$OMP!$OMPPARALLEL PRIVATE (/C/)! do work hereEND PARALLEL18192021!$OMP PARALLEL SHARED (X,Y)! do work here!$OMP END PARALLELEND SUBROUTINE A32_1_GOOD22This example is also conforming:2324252627Example A.32.2fSUBROUTINE A32_2_GOOD()COMMON /C/ X,YREAL X, YINTEGER I28Appendix A Examples 179


123456789101112131415!$OMP!$OMP!$OMP!!$OMP!$OMP!$OMPPARALLELDO PRIVATE(/C/)DO I=1,1000! do work hereENDDOEND DODO PRIVATE(X)DO I=1,1000! do work hereENDDOEND DOEND PARALLELEND SUBROUTINE A32_2_GOODFortran (cont.)16This example is non-conforming because x is a constituent element of c.1718192021222324Example A.32.3fSUBROUTINE A32_3_WRONG()COMMON /C/ X,Y! Incorrect because X is a constituent element of C!$OMP PARALLEL PRIVATE(/C/), SHARED(X)! do work here!$OMP END PARALLELEND SUBROUTINE A32_3_WRONG25This example is conforming:262728Example A.32.4fSUBROUTINE A32_4_GOOD()COMMON /C/ X,Y293031!$OMP!$OMPPARALLEL PRIVATE (/C/)! do work hereEND PARALLEL32333435!$OMP PARALLEL SHARED (/C/)! do work here!$OMP END PARALLELEND SUBROUTINE A32_4_GOOD36180 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12This example is non-conforming because a common block may not be declared bothshared and private:3456789101112Example A.32.5fSUBROUTINE A32_5_WRONG()COMMON /C/ X,Y! Incorrect: common block C cannot be declared both! shared and private!$OMP PARALLEL PRIVATE (/C/), SHARED(/C/)! do work here!$OMP END PARALLELEND SUBROUTINE A32_5_WRONGFortran131415161718192021222324252627A.33 Examples of the copyprivate ClauseThe copyprivate clause (see Section 2.8.4.2 on page 82) can be used to broadcastvalues acquired by a single thread directly to all instances of the private variables in theother threads. In this example, if the routine is called from the sequential part, itsbehavior is not affected by the presence of the directives. If it is called from aparallel region, then the actual arguments with which a and b are associated must beprivate. After the input routine has been executed by one thread, no thread leaves theconstruct until the private objects designated by a, b, x, and y in all threads have becomedefined with the values read.C/C++Example A.33.1c#include float x, y;#pragma omp threadprivate(x, y)void init(float a, float b ) {28Appendix A Examples 181


123456}#pragma omp single copyprivate(a,b,x,y){scanf("%f %f %f %f", &a, &b, &x, &y); /* */}C/C++789101112Example A.33.1f!$OMPSUBROUTINE INIT(A,B)REAL A, BCOMMON /XY/ X,YTHREADPRIVATE (/XY/)Fortran13141516!$OMP!$OMPSINGLEREAD (11) A,B,X,YEND SINGLE COPYPRIVATE (A,B,/XY/)END1718192021222324252627282930313233343536FortranIn contrast to the previous example, suppose the input must be performed by a particularthread, say the master thread. In this case, the copyprivate clause cannot be used todo the broadcast directly, but it can be used to provide access to a temporary sharedobject.C/C++Example A.33.2c#include #include float read_next( ) {float * tmp;float return_val;#pragma omp single copyprivate(tmp){tmp = (float *) malloc(sizeof(float));}#pragma omp master{scanf("%f", tmp);}37182 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789}#pragma omp barrierreturn_val = *tmp;#pragma omp barrier#pragma omp single{free(tmp);}return return_val;101112Example A.33.2fC/C++Fortran131415161718192021222324252627!$OMP!$OMP!$OMP!$OMP!$OMP!$OMP!$OMP!$OMPREAL FUNCTION READ_NEXT()REAL, POINTER :: TMPSINGLEALLOCATE (TMP)END SINGLE COPYPRIVATE (TMP)MASTERREAD (11) TMPEND MASTERBARRIERREAD_NEXT = TMPBARRIERSINGLEDEALLOCATE (TMP)END SINGLE NOWAITEND FUNCTION READ_NEXT282930313233343536FortranSuppose that the number of lock objects required within a parallel region cannoteasily be determined prior to entering it. The copyprivate clause can be used toprovide access to shared lock objects that are allocated within that parallel region.Example A.33.3c#include #include #include C/C++37Appendix A Examples 183


12345678omp_lock_t *new_lock(){omp_lock_t *lock_ptr;#pragma omp single copyprivate(lock_ptr){lock_ptr = (omp_lock_t *) malloc(sizeof(omp_lock_t));omp_init_lock( lock_ptr );}910}return lock_ptr;111213Example A.33.3fC/C++Fortran1415161718192021!$OMP!$OMPFUNCTION NEW_LOCK()USE OMP_LIB ! or INCLUDE "omp_lib.h"INTEGER(OMP_LOCK_KIND), POINTER :: NEW_LOCKSINGLEALLOCATE(NEW_LOCK)CALL OMP_INIT_LOCK(NEW_LOCK)END SINGLE COPYPRIVATE(NEW_LOCK)END FUNCTION NEW_LOCK2223Note that the effect of the copyprivate clause on a variable with the allocatableattribute is different than on a variable with the pointer attribute.2425262728Example A.33.4fSUBROUTINE S(N)INTEGER NREAL, DIMENSION(:), ALLOCATABLE :: AREAL, DIMENSION(:), POINTER :: B29303132333435!$OMP!$OMPALLOCATE (A(N))SINGLEALLOCATE (B(N))READ (11) A,BEND SINGLE COPYPRIVATE(A,B)! Variable A designates a private object! which has the same value in each thread36184 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567!$OMP!$OMP!$OMP! Variable B designates a shared objectBARRIERSINGLEDEALLOCATE (B)END SINGLE NOWAITEND SUBROUTINE SFortran8C/C++9101112131415161718192021A.34 Use of C99 Variable Length ArraysThe following example demonstrates how to use C99 Variable Length Arrays (VLAs) ina firstprivate directive (Section 2.8.3.4 on page 73).Example A.34.1cvoid f(int m, int C[m][m]){double v1[m];}#pragma omp parallel firstprivate(C, v1){/* do work here */}C/C++222324A.35 Using the num_threads ClauseThe following example demonstrates the num_threads clause (Section 2.4 on page24). The parallel region is executed with a maximum of 10 threads.25Appendix A Examples 185


123456Example A.35.1c#include main(){omp_set_dynamic(1);C/C++7891011}#pragma omp parallel num_threads(10){/* do work here */}121314Example A.35.1fC/C++Fortran1516171819202122!$OMP!$OMPPROGRAM A34INCLUDE "omp_lib.h" ! or USE OMP_LIBCALL OMP_SET_DYNAMIC(.TRUE.)PARALLEL NUM_THREADS(10)! do work hereEND PARALLELEND PROGRAM A34Fortran2324252627282930313233A.36 Use of Work-Sharing Constructs Inside acritical ConstructThe following example demonstrates using a work-sharing construct inside acritical construct. This example is compliant because the work-sharing constructand the critical construct are not closely nested.Example A.36.1cvoid a36(){int i = 1;#pragma omp parallel sectionsC/C++34186 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910111213141516}{}#pragma omp section{#pragma omp critical (name){#pragma omp parallel{#pragma omp single{i++;}}}}171819202122Example A.36.1fSUBROUTINE A36()INTEGER II = 1C/C++Fortran232425262728293031323334!$OMP!$OMP!$OMP!$OMP!$OMP!$OMP!$OMP!$OMP!$OMPPARALLEL SECTIONSSECTIONCRITICAL (NAME)PARALLELSINGLEI = I + 1END SINGLEEND PARALLELEND CRITICAL (NAME)END PARALLEL SECTIONSEND SUBROUTINE A36Fortran35Appendix A Examples 187


123456789101112131415161718A.37 Use of ReprivatizationThe following example demonstrates the reprivatization of variables. Private variablescan be marked private again in a nested construct. They do not have to be shared inthe enclosing parallel region.Example A.37.1cvoid a37(){int i, a;}C/C++#pragma omp parallel private(a){#pragma omp parallel for private(a)for (i=0; i


12345678910A.38 Thread-Safe Lock FunctionsThe following example demonstrates how to initialize an array of locks in a parallelregion by using omp_init_lock (Section 3.3.1 on page 100).Example A.38.1c#include C/C++omp_lock_t *new_locks(){int i;omp_lock_t *lock = new omp_lock_t[1000];11121314151617}#pragma omp parallel for private(i)for (i=0; i


1Fortran234567891011A.39 Examples of the workshare ConstructThe following are examples of the workshare construct (see Section 2.5.4 on page42).In this example, workshare spreads work across some number of threads and there isa barrier after the last statement. Implementations must enforce Fortran execution rulesinside of the workshare block.Example A.39.1fSUBROUTINE A39_1(AA, BB, CC, DD, EE, FF, N)INTEGER NREAL AA(N,N), BB(N,N), CC(N,N), DD(N,N), EE(N,N), FF(N,N)1213141516171819!$OMP!$OMP!$OMP!$OMPPARALLELWORKSHAREAA = BBCC = DDEE = FFEND WORKSHAREEND PARALLELEND SUBROUTINE A39_12021In this example the final barrier can be eliminated with a nowait clause. Threads doingCC = DD immediately begin work on EE = FF when they are done with CC = DD.2223242526Example A.39.2fSUBROUTINE A39_2(AA, BB, CC, DD, EE, FF, N)INTEGER NREAL AA(N,N), BB(N,N), CC(N,N)REAL DD(N,N), EE(N,N), FF(N,N)27282930!$OMP!$OMPPARALLELWORKSHAREAA = BBCC = DD31190 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12!$OMPEND WORKSHARE NOWAITFortran (cont.)34567!$OMP WORKSHAREEE = FF!$OMP END WORKSHARE!$OMP END PARALLELEND SUBROUTINE A39_289This example shows the use of an atomic directive inside a workshare construct.The computation of SUM(AA) is workshared, but the update to I is atomic.10111213Example A.39.3fSUBROUTINE A39_3(AA, BB, CC, DD, N)INTEGER NREAL AA(N,N), BB(N,N), CC(N,N), DD(N,N)141516171819202122!$OMP!$OMP!$OMP!$OMP!$OMPPARALLELWORKSHAREAA = BBATOMICI = I + SUM(AA)CC = DDEND WORKSHAREEND PARALLELEND SUBROUTINE A39_323242526Fortran WHERE and FORALL statements are compound statements, made up of a controlpart and a statement part. When workshare is applied to one of these compoundstatements, both the control and the statement parts are workshared. The followingexample shows the use of a WHERE statement in a workshare construct:272829303132Example A.39.4fSUBROUTINE A39_4(AA, BB, CC, DD, EE, FF, GG, HH, N)INTEGER NREAL AA(N,N), BB(N,N), CC(N,N)REAL DD(N,N), EE(N,N), FF(N,N)REAL GG(N,N), HH(N,N)3334!$OMP!$OMPPARALLELWORKSHARE35Appendix A Examples 191


12345678!$OMP!$OMPFortran (cont.)AA = BBCC = DDWHERE (EE .ne. 0) FF = 1 / EEGG = HHEND WORKSHAREEND PARALLELEND SUBROUTINE A39_491011121314Each task gets worked on in order by the threads:AA = BB thenCC = DD thenEE .ne. 0 thenFF = 1 / EE thenGG = HH1516In the following example, an assignment to a shared scalar variable is performed by onethread in a workshare while all other threads in the team wait.1718192021Example A.39.5fSUBROUTINE A39_5(AA, BB, CC, DD, N)INTEGER NREAL AA(N,N), BB(N,N), CC(N,N), DD(N,N)INTEGER SHR2223242526272829!$OMP!$OMP!$OMP!$OMPPARALLEL SHARED(SHR)WORKSHAREAA = BBSHR = 1CC = DD * SHREND WORKSHAREEND PARALLELEND SUBROUTINE A39_530192 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910Fortran (cont.)The following example which contains an assignment to a private scalar variable whichis performed by one thread in a workshare while all other threads wait. It is nonconformingbecause the private scalar variable is undefined after the assignmentstatement.Example A.39.6fSUBROUTINE A39_6_WRONG(AA, BB, CC, DD, N)INTEGER NREAL AA(N,N), BB(N,N), CC(N,N), DD(N,N)INTEGER PRI1112131415161718!$OMP!$OMP!$OMP!$OMPPARALLEL PRIVATE(PRI)WORKSHAREAA = BBPRI = 1CC = DD * PRIEND WORKSHAREEND PARALLELEND SUBROUTINE A39_6_WRONG19202122Fortran execution rules must be enforced inside a workshare construct. In thefollowing example, the same result is produced in the following program fragmentregardless of whether the code is executed sequentially or inside an <strong>OpenMP</strong> programwith multiple threads:23242526Example A.39.7fSUBROUTINE A39_7(AA, BB, CC, N)INTEGER NREAL AA(N), BB(N), CC(N)27Appendix A Examples 193


12345678!$OMP!$OMP!$OMP!$OMPPARALLELWORKSHAREAA(1:50) = BB(11:60)CC(11:20) = AA(1:10)END WORKSHAREEND PARALLELEND SUBROUTINE A39_7Fortran9C/C++101112131415161718192021222324252627282930A.40 Placement restrictions for flush andbarrier directivesThe following example is non-conforming, because the flush and barrier directivescannot be the immediate substatement of an if statement. See Section 2.7.3 on page 52and Section 2.7.5 on page 56.Example A.40.1cvoid a40_wrong(){int a = 1;#pragma omp parallel{if (a != 0)#pragma omp flush(a)/* incorrect as flush cannot be immediate substatementof if statement */if (a != 0)#pragma omp barrier/* incorrect as barrier cannot be immediate substatementof if statement */}}3132The following version of the above example is conforming because the flush andbarrier directives are enclosed in a compound statement.33194 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314Example A.40.2cvoid a40(){int a = 1;}#pragma omp parallel{if (a != 0) {#pragma omp flush(a)}if (a != 0) {#pragma omp barrier}}15C/C++16Appendix A Examples 195


1196 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1APPENDIX B23Stubs for Run-time LibraryFunctions456789101112131415161718192021● F90This section provides stubs for the run-time library routines defined in the <strong>OpenMP</strong> API.The stubs are provided to enable portability to platforms that do not support the<strong>OpenMP</strong> API. On these platforms, <strong>OpenMP</strong> programs must be linked with a librarycontaining these stub routines. The stub routines assume that the directives in the<strong>OpenMP</strong> program are ignored. As such, they emulate serial semantics.Note that the lock variable that appears in the lock routines must be accessed exclusivelythrough these routines. It should not be initialized or otherwise modified in the userprogram.FortranFor the stub routines written in Fortran, the lock variable is declared as a POINTER toguarantee that it is capable of holding an address. Alternatively, for Fortran 90implementations, it could be declared as an INTEGER(OMP_LOCK_KIND) orINTEGER(OMP_NEST_LOCK_KIND), as appropriate.FortranIn an actual implementation the lock variable might be used to hold the address of anallocated object, but here it is used to hold an integer value. Users should not makeassumptions about mechanisms used by <strong>OpenMP</strong> implementations to implement locksbased on the scheme used by the stub procedures.22197


1B.1 C/C++ Stub routines2345678910#include #include #include "omp.h"#ifdef __cplusplusextern "C" {#endifvoid omp_set_num_threads(int num_threads){}11121314int omp_get_num_threads(void){return 1;}15161718int omp_get_max_threads(void){return 1;}19202122int omp_get_thread_num(void){return 0;}23242526int omp_get_num_procs(void){return 1;}272829void omp_set_dynamic(int dynamic_threads){}30198 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234int omp_get_dynamic(void){return 0;}5678int omp_in_parallel(void){return 0;}91011void omp_set_nested(int nested){}12131415int omp_get_nested(void){return 0;}16enum {UNLOCKED = -1, INIT, LOCKED};17181920void omp_init_lock(omp_lock_t *lock){*lock = UNLOCKED;}21222324void omp_destroy_lock(omp_lock_t *lock){*lock = INIT;}25Appendix B Stubs for Run-time Library Functions 199


123456789101112void omp_set_lock(omp_lock_t *lock){if (*lock == UNLOCKED) {*lock = LOCKED;} else if (*lock == LOCKED) {fprintf(stderr, "error: deadlock in using lock variable\n");exit(1);} else {fprintf(stderr, "error: lock not initialized\n");exit(1);}}131415161718192021222324void omp_unset_lock(omp_lock_t *lock){if (*lock == LOCKED) {*lock = UNLOCKED;} else if (*lock == UNLOCKED) {fprintf(stderr, "error: lock not set\n");exit(1);} else {fprintf(stderr, "error: lock not initialized\n");exit(1);}}252627282930313233343536int omp_test_lock(omp_lock_t *lock){if (*lock == UNLOCKED) {*lock = LOCKED;return 1;} else if (*lock == LOCKED) {return 0;} else {fprintf(stderr, "error: lock not initialized\n");exit(1);}}37200 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789101112#ifndef OMP_NEST_LOCK_Ttypedef struct { /* This really belongs in omp.h */int owner;int count;} omp_nest_lock_t;#endifenum {MASTER = 0};void omp_init_nest_lock(omp_nest_lock_t *nlock){nlock->owner = UNLOCKED;nlock->count = 0;}1314151617void omp_destroy_nest_lock(omp_nest_lock_t *nlock){nlock->owner = UNLOCKED;nlock->count = UNLOCKED;}181920212223242526272829void omp_set_nest_lock(omp_nest_lock_t *nlock){if (nlock->owner == MASTER && nlock->count >= 1) {nlock->count++;} else if (nlock->owner == UNLOCKED && nlock->count == 0) {nlock->owner = MASTER;nlock->count = 1;} else {fprintf(stderr, "error: lock corrupted or not initialized\n");exit(1);}}30Appendix B Stubs for Run-time Library Functions 201


123456789101112131415void omp_unset_nest_lock(omp_nest_lock_t *nlock){if (nlock->owner == MASTER && nlock->count >= 1) {nlock->count--;if (nlock->count == 0) {nlock->owner = UNLOCKED;}} else if (nlock->owner == UNLOCKED && nlock->count == 0) {fprintf(stderr, "error: lock not set\n");exit(1);} else {fprintf(stderr, "error: lock corrupted or not initialized\n");exit(1);}}1617181920int omp_test_nest_lock(omp_nest_lock_t *nlock){omp_set_nest_lock(nlock);return nlock->count;}21222324252627282930313233343536373839double omp_get_wtime(void){/* This function does not provide a workingwallclock timer. Replace it with a versioncustomized for the target machine.*/return 0.0;}double omp_get_wtick(void){/* This function does not provide a workingclock tick function. Replace it witha version customized for the target machine.*/return 365. * 86400.;}#ifdef __cplusplus}#endif40202 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1B.2 Fortran Stub Routines2345SUBROUTINE OMP_SET_NUM_THREADS(NUM_THREADS)INTEGER NUM_THREADSRETURNEND SUBROUTINE6789INTEGER FUNCTION OMP_GET_NUM_THREADS()OMP_GET_NUM_THREADS = 1RETURNEND FUNCTION10111213INTEGER FUNCTION OMP_GET_MAX_THREADS()OMP_GET_MAX_THREADS = 1RETURNEND FUNCTION14151617INTEGER FUNCTION OMP_GET_THREAD_NUM()OMP_GET_THREAD_NUM = 0RETURNEND FUNCTION18192021INTEGER FUNCTION OMP_GET_NUM_PROCS()OMP_GET_NUM_PROCS = 1RETURNEND FUNCTION22232425SUBROUTINE OMP_SET_DYNAMIC(DYNAMIC_THREADS)LOGICAL DYNAMIC_THREADSRETURNEND SUBROUTINE26Appendix B Stubs for Run-time Library Functions 203


1234LOGICAL FUNCTION OMP_GET_DYNAMIC()OMP_GET_DYNAMIC = .FALSE.RETURNEND FUNCTION5678LOGICAL FUNCTION OMP_IN_PARALLEL()OMP_IN_PARALLEL = .FALSE.RETURNEND FUNCTION9101112SUBROUTINE OMP_SET_NESTED(NESTED)LOGICAL NESTEDRETURNEND SUBROUTINE13141516LOGICAL FUNCTION OMP_GET_NESTED()OMP_GET_NESTED = .FALSE.RETURNEND FUNCTION171819202122232425SUBROUTINE OMP_INIT_LOCK(LOCK)! LOCK is 0 if the simple lock is not initialized! -1 if the simple lock is initialized but not set! 1 if the simple lock is setPOINTER (LOCK,IL)INTEGER ILLOCK = -1RETURNEND SUBROUTINE262728293031SUBROUTINE OMP_DESTROY_LOCK(LOCK)POINTER (LOCK,IL)INTEGER ILLOCK = 0RETURNEND SUBROUTINE32204 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314SUBROUTINE OMP_SET_LOCK(LOCK)POINTER (LOCK,IL)INTEGER ILIF (LOCK .EQ. -1) THENLOCK = 1ELSEIF (LOCK .EQ. 1) THENPRINT *, ’ERROR: DEADLOCK IN USING LOCK VARIABLE’STOPELSEPRINT *, ’ERROR: LOCK NOT INITIALIZED’STOPENDIFRETURNEND SUBROUTINE1516171819202122232425262728SUBROUTINE OMP_UNSET_LOCK(LOCK)POINTER (LOCK,IL)INTEGER ILIF (LOCK .EQ. 1) THENLOCK = -1ELSEIF (LOCK .EQ. -1) THENPRINT *, ’ERROR: LOCK NOT SET’STOPELSEPRINT *, ’ERROR: LOCK NOT INITIALIZED’STOPENDIFRETURNEND SUBROUTINE29Appendix B Stubs for Run-time Library Functions 205


1234567891011121314LOGICAL FUNCTION OMP_TEST_LOCK(LOCK)POINTER (LOCK,IL)INTEGER ILIF (LOCK .EQ. -1) THENLOCK = 1OMP_TEST_LOCK = .TRUE.ELSEIF (LOCK .EQ. 1) THENOMP_TEST_LOCK = .FALSE.ELSEPRINT *, ’ERROR: LOCK NOT INITIALIZED’STOPENDIFRETURNEND FUNCTION15161718192021222324SUBROUTINE OMP_INIT_NEST_LOCK(NLOCK)! NLOCK is 0 if the nestable lock is not initialized! -1 if the nestable lock is initialized but not set! 1 if the nestable lock is set! no use count is maintainedPOINTER (NLOCK,NIL)INTEGER NILNLOCK = -1RETURNEND SUBROUTINE252627282930SUBROUTINE OMP_DESTROY_NEST_LOCK(NLOCK)POINTER (NLOCK,NIL)INTEGER NILNLOCK = 0RETURNEND SUBROUTINE31206 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314SUBROUTINE OMP_SET_NEST_LOCK(NLOCK)POINTER (NLOCK,NIL)INTEGER NILIF (NLOCK .EQ. -1) THENNLOCK = 1ELSEIF (NLOCK .EQ. 0) THENPRINT *, ’ERROR: NESTED LOCK NOT INITIALIZED’STOPELSEPRINT *, ’ERROR: DEADLOCK USING NESTED LOCK VARIABLE’STOPENDIFRETURNEND SUBROUTINE1516171819202122232425262728SUBROUTINE OMP_UNSET_NEST_LOCK(NLOCK)POINTER (NLOCK,IL)INTEGER ILIF (NLOCK .EQ. 1) THENNLOCK = -1ELSEIF (NLOCK .EQ. 0) THENPRINT *, ’ERROR: NESTED LOCK NOT INITIALIZED’STOPELSEPRINT *, ’ERROR: NESTED LOCK NOT SET’STOPENDIFRETURNEND SUBROUTINE29Appendix B Stubs for Run-time Library Functions 207


123456789101112131415161718192021222324252627282930INTEGER FUNCTION OMP_TEST_NEST_LOCK(NLOCK)POINTER (NLOCK,NIL)INTEGER NILIF (NLOCK .EQ. -1) THENNLOCK = 1OMP_TEST_NEST_LOCK = 1ELSEIF (NLOCK .EQ. 1) THENOMP_TEST_NEST_LOCK = 0ELSEPRINT *, ’ERROR: NESTED LOCK NOT INITIALIZED’STOPENDIFRETURNEND SUBROUTINEDOUBLE PRECISION FUNCTION OMP_GET_WTIME()! This function does not provide a working! wall-clock timer. Replace it with a version! customized for the target machine.OMP_WTIME = 0.0D0RETURNEND FUNCTIONDOUBLE PRECISION FUNCTION OMP_GET_WTICK()! This function does not provide a working! clock tick function. Replace it with! a version customized for the target machine.DOUBLE PRECISION ONE_YEARPARAMETER (ONE_YEAR=365.D0*86400.D0)OMP_WTICK = ONE_YEARRETURNEND FUNCTION31208 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1APPENDIX C2<strong>OpenMP</strong> C and C++ Grammar345678910111213C.1 NotationThe grammar rules consist of the name for a non-terminal, followed by a colon,followed by replacement alternatives on separate lines.The syntactic expression term opt indicates that the term is optional within thereplacement.The syntactic expression term optseq is equivalent to term-seq opt with the followingadditional rules:term-seq :termterm-seq termterm-seq , term14209


1234C.2 RulesThe notation is described in section 6.1 of the C standard. This grammar appendixshows the extensions to the base language grammar for the <strong>OpenMP</strong> C and C++directives.5678910/* in C++ (ISO/IEC 14882:1998) */statement-seq:statementopenmp-directivestatement-seq statementstatement-seq openmp-directive111213141516/* in C90 (ISO/IEC 9899:1990) */statement-list:statementopenmp-directivestatement-list statementstatement-list openmp-directive1718192021/* in C99 (ISO/IEC 9899:1999) */block-item:declarationstatementopenmp-directive22210 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314151617181920212223242526statement:/* standard statements */openmp-constructopenmp-construct:parallel-constructfor-constructsections-constructsingle-constructparallel-for-constructparallel-sections-constructmaster-constructcritical-constructatomic-constructordered-constructopenmp-directive:barrier-directiveflush-directivestructured-block:statementparallel-construct:parallel-directive structured-blockparallel-directive:# pragma omp parallel parallel-clause optseq new-lineparallel-clause:unique-parallel-clausedata-clause27Appendix C <strong>OpenMP</strong> C and C++ Grammar 211


12345678910111213141516171819202122232425262728unique-parallel-clause:if ( expression )num_threads ( expression )for-construct:for-directive iteration-statementfor-directive:# pragma omp for for-clause optseq new-linefor-clause:unique-for-clausedata-clausenowaitunique-for-clause:orderedschedule ( schedule-kind )schedule ( schedule-kind , expression )schedule-kind:staticdynamicguidedruntimesections-construct:sections-directive section-scopesections-directive:# pragma omp sections sections-clause optseq new-linesections-clause:data-clausenowaitsection-scope:29212 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910111213141516171819202122232425262728{ section-sequence }section-sequence:section-directive opt structured-blocksection-sequence section-directive structured-blocksection-directive:# pragma omp section new-linesingle-construct:single-directive structured-blocksingle-directive:# pragma omp single single-clause optseq new-linesingle-clause:data-clausenowaitparallel-for-construct:parallel-for-directive iteration-statementparallel-for-directive:# pragma omp parallel for parallel-for-clause optseq new-lineparallel-for-clause:unique-parallel-clauseunique-for-clausedata-clauseparallel-sections-construct:parallel-sections-directive section-scopeparallel-sections-directive:# pragma omp parallel sections parallel-sections-clause optseq new-lineparallel-sections-clause:unique-parallel-clausedata-clause29Appendix C <strong>OpenMP</strong> C and C++ Grammar 213


12345678910111213141516171819202122232425262728master-construct:master-directive structured-blockmaster-directive:# pragma omp master new-linecritical-construct:critical-directive structured-blockcritical-directive:# pragma omp critical region-phrase opt new-lineregion-phrase:( identifier )barrier-directive:# pragma omp barrier new-lineatomic-construct:atomic-directive expression-statementatomic-directive:# pragma omp atomic new-lineflush-directive:# pragma omp flush flush-vars opt new-lineflush-vars:( variable-list )ordered-construct:ordered-directive structured-blockordered-directive:# pragma omp ordered new-linedeclaration:/* standard declarations */threadprivate-directivethreadprivate-directive:29214 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789101112131415161718192021# pragma omp threadprivate ( variable-list ) new-linedata-clause:private ( variable-list )copyprivate ( variable-list )firstprivate ( variable-list )lastprivate ( variable-list )shared ( variable-list )default ( shared )default ( none )reduction ( reduction-operator : variable-list )copyin ( variable-list )reduction-operator:One of: + * - & ^ | && ||/* in C */variable-list:identifiervariable-list , identifier/* in C++ */variable-list:id-expressionvariable-list , id-expression22Appendix C <strong>OpenMP</strong> C and C++ Grammar 215


1216 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1APPENDIX D2<strong>Interface</strong> Declarations345This appendix gives examples of the C/C++ header file, the Fortran include file andFortran 90 module that shall be provided by implementations as specified in Chapter 3.It also includes an example of a Fortran 90 generic interface for a library routine.6D.1 Example of the omp.h Header File78910111213141516171819#ifndef _OMP_H_DEF#define _OMP_H_DEF/** define the lock data types*/#ifndef _OMP_LOCK_T_DEF# define _OMP_LOCK_T_DEFtypedef struct __omp_lock *omp_lock_t;#endif#ifndef _OMP_NEST_LOCK_T_DEF# define _OMP_NEST_LOCK_T_DEFtypedef struct __omp_nest_lock *omp_nest_lock_t;#endif20217


123456/** exported <strong>OpenMP</strong> functions*/#ifdef __cplusplusextern "C" {#endif7891011121314151617181920212223242526272829303132333435363738#if defined(__stdc__) || defined(__STDC__) ||defined(__cplusplus)extern void omp_set_num_threads(int num_threads);extern int omp_get_num_threads(void);extern int omp_get_max_threads(void);extern int omp_get_thread_num(void);extern int omp_get_num_procs(void);extern int omp_in_parallel(void);extern void omp_set_dynamic(int dynamic_threads);extern int omp_get_dynamic(void);extern void omp_set_nested(int nested);extern int omp_get_nested(void);extern void omp_init_lock(omp_lock_t *lock);extern void omp_destroy_lock(omp_lock_t *lock);extern void omp_set_lock(omp_lock_t *lock);extern void omp_unset_lock(omp_lock_t *lock);extern int omp_test_lock(omp_lock_t *lock);extern void omp_init_nest_lock(omp_nest_lock_t *lock);extern void omp_destroy_nest_lock(omp_nest_lock_t *lock);extern void omp_set_nest_lock(omp_nest_lock_t *lock);extern void omp_unset_nest_lock(omp_nest_lock_t *lock);extern int omp_test_nest_lock(omp_nest_lock_t *lock);extern double omp_get_wtime(void);extern double omp_get_wtick(void);#elseextern void omp_set_num_threads();extern int omp_get_num_threads();extern int omp_get_max_threads();extern int omp_get_thread_num();extern int omp_get_num_procs();extern int omp_in_parallel();extern void omp_set_dynamic();39218 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1234567891011121314151617181920extern int omp_get_dynamic();extern void omp_set_nested();extern int omp_get_nested();extern void omp_init_lock();extern void omp_destroy_lock();extern void omp_set_lock();extern void omp_unset_lock();extern int omp_test_lock();extern void omp_init_nest_lock();extern void omp_destroy_nest_lock();extern void omp_set_nest_lock();extern void omp_unset_nest_lock();extern int omp_test_nest_lock();extern double omp_get_wtime();extern double omp_get_wtick();#endif#ifdef __cplusplus}#endif#endif2122D.2 Example of an <strong>Interface</strong> Declaration includeFile2324252627C the "C" of this comment starts in column 1integer omp_lock_kindparameter ( omp_lock_kind = 8 )integer omp_nest_lock_kindparameter ( omp_nest_lock_kind = 8 )28Appendix D <strong>Interface</strong> Declarations 219


1234567891011121314151617181920212223242526272829Cdefault integer type assumed belowCdefault logical type assumed belowC <strong>OpenMP</strong> Fortran API v1.1integer openmp_versionparameter ( openmp_version = 200011 )external omp_destroy_lockexternal omp_destroy_nest_lockexternal omp_get_dynamiclogical omp_get_dynamicexternal omp_get_max_threadsinteger omp_get_max_threadsexternal omp_get_nestedlogical omp_get_nestedexternal omp_get_num_procsinteger omp_get_num_procsexternal omp_get_num_threadsinteger omp_get_num_threadsexternal omp_get_thread_numinteger omp_get_thread_numexternal omp_get_wtickdouble precision omp_get_wtickexternal omp_get_wtimedouble precision omp_get_wtimeexternal omp_init_lockexternal omp_init_nest_lockexternal omp_in_parallellogical omp_in_parallelexternal omp_set_dynamicexternal omp_set_lock30220 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


123456789external omp_set_nest_lockexternal omp_set_nestedexternal omp_set_num_threadsexternal omp_test_locklogical omp_test_lockexternal omp_test_nest_lockinteger omp_test_nest_lockexternal omp_unset_lockexternal omp_unset_nest_lock1011D.3 Example of a Fortran 90 <strong>Interface</strong> Declarationmodule12131415161718! the "!" of this comment starts in column 1module omp_lib_kindsinteger, parameter :: omp_integer_kind = 4integer, parameter :: omp_logical_kind = 4integer, parameter :: omp_lock_kind = 8integer, parameter :: omp_nest_lock_kind = 8end module omp_lib_kinds19Appendix D <strong>Interface</strong> Declarations 221


12345678910111213141516171819202122232425262728293031323334module omp_libuse omp_lib_kinds! <strong>OpenMP</strong> Fortran API v1.1integer, parameter :: openmp_version = 199910interfacesubroutine omp_destroy_lock ( var )use omp_lib_kindsinteger ( kind=omp_lock_kind ), intent(inout) :: varend subroutine omp_destroy_lockend interfaceinterfacesubroutine omp_destroy_nest_lock ( var )use omp_lib_kindsinteger ( kind=omp_nest_lock_kind ), intent(inout) :: varend subroutine omp_destroy_nest_lockend interfaceinterfacefunction omp_get_dynamic ()use omp_lib_kindslogical ( kind=omp_logical_kind ) :: omp_get_dynamicend function omp_get_dynamicend interfaceinterfacefunction omp_get_max_threads ()use omp_lib_kindsinteger ( kind=omp_integer_kind ) :: omp_get_max_threadsend function omp_get_max_threadsend interfaceinterfacefunction omp_get_nested ()use omp_lib_kindslogical ( kind=omp_logical_kind ) :: omp_get_nestedend function omp_get_nestedend interface35222 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910111213141516171819202122232425262728293031323334353637383940interfacefunction omp_get_num_procs ()use omp_lib_kindsinteger ( kind=omp_integer_kind ) :: omp_get_num_procsend function omp_get_num_procsend interfaceinterfacefunction omp_get_num_threads ()use omp_lib_kindsinteger ( kind=omp_integer_kind ) :: omp_get_num_threadsend function omp_get_num_threadsend interfaceinterfacefunction omp_get_thread_num ()use omp_lib_kindsinteger ( kind=omp_integer_kind ) :: omp_get_thread_numend function omp_get_thread_numend interfaceinterfacefunction omp_get_wtick ()double precision :: omp_get_wtickend function omp_get_wtickend interfaceinterfacefunction omp_get_wtime ()double precision :: omp_get_wtimeend function omp_get_wtimeend interfaceinterfacesubroutine omp_init_lock ( var )use omp_lib_kindsinteger ( kind=omp_lock_kind ), intent(out) :: varend subroutine omp_init_lockend interfaceinterfacesubroutine omp_init_nest_lock ( var )use omp_lib_kindsinteger ( kind=omp_nest_lock_kind ), intent(out) :: varend subroutine omp_init_nest_lockend interface41Appendix D <strong>Interface</strong> Declarations 223


1234567891011121314151617181920212223242526272829303132&&interfacefunction omp_in_parallel ()use omp_lib_kindslogical ( kind=omp_logical_kind ) :: omp_in_parallelend function omp_in_parallelend interfaceinterfacesubroutine omp_set_dynamic ( enable_expr )use omp_lib_kindslogical ( kind=omp_logical_kind ), intent(in) :: &enable_exprend subroutine omp_set_dynamicend interfaceinterfacesubroutine omp_set_lock ( var )use omp_lib_kindsinteger ( kind=omp_lock_kind ), intent(inout) :: varend subroutine omp_set_lockend interfaceinterfacesubroutine omp_set_nest_lock ( var )use omp_lib_kindsinteger ( kind=omp_nest_lock_kind ), intent(inout) :: varend subroutine omp_set_nest_lockend interfaceinterfacesubroutine omp_set_nested ( enable_expr )use omp_lib_kindslogical ( kind=omp_logical_kind ), intent(in) :: &enable_exprend subroutine omp_set_nestedend interface33224 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


12345678910111213141516171819202122232425262728293031323334&interfacesubroutine omp_set_num_threads ( number_of_threads_expr )use omp_lib_kindsinteger ( kind=omp_integer_kind ), intent(in) :: &number_of_threads_exprend subroutine omp_set_num_threadsend interfaceinterfacefunction omp_test_lock ( var )use omp_lib_kindslogical ( kind=omp_logical_kind ) :: omp_test_lockinteger ( kind=omp_lock_kind ), intent(inout) :: varend function omp_test_lockend interfaceinterfacefunction omp_test_nest_lock ( var )use omp_lib_kindsinteger ( kind=omp_integer_kind ) :: omp_test_nest_lockinteger ( kind=omp_nest_lock_kind ), intent(inout) :: varend function omp_test_nest_lockend interfaceinterfacesubroutine omp_unset_lock ( var )use omp_lib_kindsinteger ( kind=omp_lock_kind ), intent(inout) :: varend subroutine omp_unset_lockend interfaceinterfacesubroutine omp_unset_nest_lock ( var )use omp_lib_kindsinteger ( kind=omp_nest_lock_kind ), intent(inout) :: varend subroutine omp_unset_nest_lockend interfaceend module omp_lib35Appendix D <strong>Interface</strong> Declarations 225


12345678D.4 Example of a Generic <strong>Interface</strong> for a LibraryRoutineAny of the OMP runtime library routines that take an argument may be extended with ageneric interface so arguments of different KIND type can be accomodated.Assume an implementation supports both default INTEGER as KIND =OMP_INTEGER_KIND and another INTEGER KIND, KIND = SHORT_INT. ThenOMP_SET_NUM_THREADS could be specified in the omp_lib module as thefollowing:9101112131415161718192021! the "!" of this comment starts in column 1interface omp_set_num_threadssubroutine omp_set_num_threads_1 ( number_of_threads_expr )use omp_lib_kindsinteger ( kind=omp_integer_kind ), intent(in) :: &&number_of_threads_exprend subroutine omp_set_num_threads_1subroutine omp_set_num_threads_2 ( number_of_threads_expr )use omp_lib_kindsinteger ( kind=short_int ), intent(in) :: &&number_of_threads_exprend subroutine omp_set_num_threads_2end interface omp_set_num_threads22226 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004


1APPENDIX E23Implementation-DefinedBehaviors in <strong>OpenMP</strong>4567891011121314151617181920212223242526This appendix summarizes the behaviors that are described as implementation defined inthis API. Each behavior is cross-referenced back to its description in the mainspecification. An implementation is required to define and document its behavior inthese cases.• Internal control variables: The number of copies of the internal control variables,and their effects, during the execution of any explicit parallel region areimplementation defined. The initial values of nthreads-var, dyn-var, run-sched-var,and def-sched-var are implementation defined (see Section 2.3 on page 21).• Nested parallelism: the number of levels of parallelism supported is implementationdefined. (see Section 1.2.4 on page 8 and Section 2.4.1 on page 27).• Dynamic adjustment of threads: It is implementation defined whether and how theadjustment of the number of threads actually occurs when dynamic adjustment isenabled. Implementations are allowed to deliver fewer threads (but at least one) thanindicated in Figure 2-1 in exceptional situations, such as when there is a lack ofresources, even if dynamic adjustment is disabled. In these situations, the behavior ofthe program is implementation defined. (See Section 2.4.1 on page 27).• sections construct: the method of scheduling the structured blocks among threadsin the team is implementation defined (see Section 2.5.2 on page 37).• single construct: the method of choosing a thread to execute the structured blockis implementation defined (see Section 2.5.3 on page 40).• atomic construct: it is implementation defined whether an implementation replacesall atomic contructs with critical constructs that have the same unique name(see Section 2.7.4 on page 53).27227


12345678910111213141516171819202122232425262728293031● F90• omp_set_num_threads routine: if the number of threads requested exceeds thenumber the implementation can support, or is not a positive integer, the behavior ofthis routine is implementation defined. If this routine is called from a portion of theprogram where the omp_in_parallel routine returns true, the behavior of thisroutine is implementation defined. (See Section 3.2.1 on page 87).• omp_set_dynamic routine: if this routine is called from a portion of the programwhere the omp_in_parallel routine returns true, the behavior of this routine isimplementation defined (see Section 3.2.7 on page 93).• omp_set_nested routine: if this routine is called from a portion of the programwhere the omp_in_parallel routine returns true, the behavior of this routine isimplementation defined (see Section 3.2.9 on page 96).• OMP_NUM_THREADS environment variable: if the requested value ofOMP_NUM_THREADS is greater than the number of threads an implementation cansupport, or if the value is not a positive integer, the behavior of the program isimplementation defined (see Section 4.2 on page 109).Fortran• threadprivate directive: If the conditions for values of data in the threadprivateobjects of threads (other than the initial thread) to persist between two consecutiveactive parallel regions do not all hold, the allocation status of an allocatable arrayin the second region is implementation defined (see Section 2.8.2 on page 63).• shared clause: passing a shared variable to a non-intrinsic procedure may result inthe value of the shared variable being copied into temporary storage before theprocedure reference, and back out of the temporary storage into the actual argumentstorage after the procedure reference. Situations where this occurs other than thosespecifed are implementation defined (see Section 2.8.3.2 on page 69).• Runtime library definitions: it is implementation defined whether the include fileomp_lib.h or the module file omp_lib (or both) is provided. It isimplementation defined whether any of the OMP runtime library routines that take anargument are extended with a generic interface so arguments of different KIND typecan be accomodated. (See Section 3.1 on page 86).Fortran32228 <strong>OpenMP</strong> API • Version 2.5 Public Draft, November 2004

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!