Personal tools
PGI Workstation User's Guide - 9 Optimization Directives and Pragmas
9 Optimization Directives
and Pragmas
- 9.1 Adding Directives to Fortran
- 9.2 Fortran Directive Summary
- 9.3 Scope of Directives and Command Line options
- 9.4 Adding Pragmas to C and C++
- 9.5 C/C++ Pragma Summary
- 9.6 Scope of C/C++ Pragmas and Command Line Options
Directives are Fortran comments which the user may supply in a Fortran source file to provide information to the compiler. Directives alter the effects of certain command line options or default behavior of the compiler. While a command line option affects the entire source file which is being compiled, directives apply, or disable, the effects of a command line option to selected subprograms or to selected loops in the source file (for example an optimization). Directives allow a user to tune selected routines or loops based on the user's knowledge.
9.1 Adding Directives to Fortran
Directives may have any of the following forms:
cpgi$g directive cpgi$r directive cpgi$l directive cpgi$ directive
The C must be in column 1. Either * or ! is allowed in place of C. The scope indicator occurs after the $; this indicator controls the scope of the directive. Some directives ignore the scope indicator. The valid scopes, as shown above, are:
- g
- (global) indicates the directive applies to the end of the source file.
- r
- (routine) indicates the directive applies to the next subprogram.
- l
- (loop) indicates the directive applies to the next loop (but not to any loop contained within the loop body). Loop-scoped directives are only applied to DO loops.
- blank
- indicates that the default scope for the directive is applied.
The body of the directive may immediately follow the scope indicator. Alternatively, any number of blanks may precede the name of the directive. Any names in the body of the directive, including the directive name, may not contain embedded blanks. Blanks may surround any special characters, such as a comma or an equals sign.
The directive name, including the directive prefix, may contain upper or lower case letters (case is not significant). Case is significant for any variable names which appear in the body of the directive if the command line option -Mupcase is selected. For compatibility with other vendors' directives, the prefix cpgi$ may be substituted with cdir$ or cvd$.
9.2 Fortran Directive Summary
Table 9-1 summarizes the supported Fortran directives. The scope entry indicates the allowed scope indicators for each directive; the default scope is surrounded by parentheses. The system field indicates the target system type for which the pragma applies. Many of the directives can be preceded by NO. The default entry in the table indicates the default of the directive; n/a appears if a default does not apply. The name of a directive may also be prefixed with -M; for example, the directive -Mbounds is equivalent to bounds and -Mopt is equivalent to opt.
Table 9-1 Fortran Directive Summary
DIRECTIVE
|
FUNCTION
|
DEFAULT
|
SCOPE
|
---|---|---|---|
altcode
|
Do/don't generate scalar code for vector regions |
altcode |
(l)rg |
assoc
|
Do/don't perform associative transformations |
assoc |
(l)rg |
bounds |
Do/don't perform array bounds checking |
nobounds |
(r)g* |
cncall
|
Loops are considered for parallelization, even if they contain calls to user-defined subroutines or functions, or if their loop counts do not exceed usual thresholds. |
nocncall |
(l)rg |
concur |
Do/don't enable auto-concurrentization of loops |
concur |
(l)rg |
depchk |
Don't/do ignore potential data dependences |
depchk |
(l)rg |
eqvchk |
Do/don't check EQUIVALENCE s for data dependences |
eqvchk |
(l)rg |
invarif |
Do/don't remove invariant if constructs from loops. |
invarif |
(l)rg |
ivdep |
Ignore potential data dependences |
depchk |
(l)rg |
lstval |
Do/don't compute last values |
lstval |
(l)rg |
opt |
Select optimization level |
N/A |
(r)g |
safe_lastval |
Parallelize when loop contains a scalar used outside of loop. |
not enabled |
(l) |
unroll |
Do/don't unroll loops. |
nounroll |
(l)rg |
vector
|
Do/don't perform vectorizations |
vector |
(l)rg |
vintr
|
Do/don't
recognize vector |
vintr |
(l)rg |
In the case of the vector/novector directive, the scope is the code following the directive until the end of the routine for r-scoped directives (as opposed to the entire routine), or until the end of the file for g-scoped directives (as opposed to the entire file).
altcode (noaltcode)
Instructs the parallelizer to generate alternate scalar code for parallelized loops. If altcode is specified the parallelizer determines an appropriate cutoff length and generates scalar code to be executed whenever the loop count is less than or equal to that length. The noaltcode directive disables these transformations.
This directive affects the compiler only when -Mconcur is enabled on the command line.
- altcode(n)concur
This directive sets the loop count threshold for parallelization of non-reduction loops to n. Without this directive, the compiler assumes a default of 100. Under this directive, innermost loops without reductions are executed in parallel only if their iteration counts exceed n.- altcode(n)concurreduction
This directive sets the loop count threshold for parallelization of reduction loops to n. Without this directive, the compiler assumes a default of 200. Under this directive, innermost loops with reductions are executed in parallel only if their iteration counts exceed n.- noaltcode
This directive sets the loop count thresholds for parallelization of all innermost loops to 0.
assoc (noassoc)
This directive toggles the effects of the -Mvect=noassoc command-line
option (an Optimization
-M control).
By default, when scalar reductions are present the vectorizer may change the order of operations so that it can generate better code (e.g., dot product). Such transformations change the result of the computation due to roundoff error. The noassoc directive disables these transformations. This directive affects the compiler only when -Mvect is enabled on the command line.
bounds (nobounds)
This directive alters the effects of the -Mbounds command line option. This directive enables the checking of array bounds when subscripted array references are performed; by default, array bounds checking is not performed.
cncall (nocncall)
Loops within the specified scope are considered for parallelization, even if they contain calls to user-defined subroutines or functions, or if their loop counts do not exceed the usual thresholds. A nocncall directive cancels the effect of a previous cncall.
concur (noconcur)
This directive alters the effects of the -Mconcur command-line option. The directive instructs the auto-parallelizer to enable auto-concurrentization of loops. If concur is specified, multiple processors will be used to execute loops which the auto-parallelizer determines to be parallelizable. The noconcur directive disables these transformations. This directive affects the compiler only when -Mconcur is enabled on the command line.
depchk (nodepchk)
This directive alters the effects of the -Mdepchk command line option. When potential data dependences exist, the compiler, by default, assumes that there is a data dependence which in turn may inhibit certain optimizations or vectorizations. nodepchk directs the compiler to ignore unknown data dependences.
eqvchk (noeqvchk)
When examining data dependences, noeqvchk directs the compiler to ignore any dependences between variables appearing in EQUIVALENCE statements.
invarif (noinvarif)
There is no command-line option corresponding to this directive. Normally, the compiler removes certain invariant if constructs from within a loop and places them outside of the loop. The directive noinvarif directs the compiler to not move such constructs. The directive invarif toggles a previous noinvarif.
ivdep
The ivdep directive is equivalent to the directive nodepchk.
opt
The syntax of this directive is:
cpgi$<scope> opt=<level>
where, the optional <scope> is r or g and <level> is an integer constant representing the optimization level to be used when compiling a subprogram (routine scope) or all subprograms in a file (global scope). The opt directive overrides the value specified by the command line option -On.
lstval (nolstval)
There is no command line option corresponding to this directive. The compiler determines whether or not the last values for loop iteration control variables and promoted scalars need to be computed. In certain cases, the compiler must assume that the last values of these variables are needed and therefore computes their last values. The directive nolstval directs the compiler not to compute the last values for those cases.
safe_lastval
During parallelization scalars within loops need to be privatized. Problems are possible if a scalar is accessed outside the loop. For example,
do i = 1, N
if( f(x(i)) > 5.0 )
t = x(i)
enddo
v = t
creates a problem since the value of t may not be computed on the last iteration of the loop. Normally, if a scalar assigned within a loop is used outside the loop we save the last value of the scalar; essentially the value of the scalar on the "last iteration" is saved, in this case when i = N. However, if the loop is parallelized, and the scalar is not assigned on every iteration, it may be difficult (without resorting to costly critical sections) to determine on what iteration t is last assigned. Analysis allows the compiler to determine if a scalar is assigned on every iteration, thus the loop is safe to parallelize if the scalar is used later. An example loop is :
do i = 1, N
if( x(i) > 0.0 )
t = 2.0
else
t = 3.0
endif
y(i) = ...t...
enddo
v = t
where t is assigned on every iteration of the loop. However, there are cases where a scalar may be privatizable, but if it is used after the loop it is unsafe to parallelize. Examine this loop:
do i = 1,N
if( x(i) > 0.0 )
t = x(i)
...
...
y(i) = ...t..
endif
enddo
v = t
where each use of t within the loop is reached by a definition from the same iteration. Here t is privatizable, but the use of t outside the loop may yield incorrect results since the compiler may not be able to detect on which iteration of the parallelized loop t is assigned last.
The compiler detects the above cases. Where a scalar is used after the loop, but is not defined on every iteration of the loop, parallelization will not occur.
When the programmer knows that the scalar is assigned on the last iteration of the loop, and this knowledge makes it safe to parallelize the loop a pragma is available to let the compiler know the loop is safe to parallelize. The pragma for C which tells the compiler that for a given loop the last value computed for all scalars make it safe to parallelize the loop is:
cpgi$l safe_lastval
In addition, a command-line option, -Msafe_lastval, provides this information for all loops within the routines being compiled (essentially providing global scope.)
unroll (nounroll)
The directive nounroll is used to disable loop unrolling and unroll to enable unrolling. The directive takes arguments c and n. A c specifies that c (complete unrolling should be turned on or off) An n specifies that n (count) unrolling should be turned on or off. In addition, the following arguments may be added to the unroll directive:
unroll = c:v
This sets the threshold to which c unrolling applies; v is a constant; a loop whose constant loop count is <= v is completely unrolled.
unroll = n:v
This adjusts threshold to which n unrolling applies; v is a constant; a loop to which n unrolling applies is unrolled v times.
The directives unroll and nounroll only apply if -Munroll is selected on the command line.
vector (novector)
The directive novector is used to disable vectorization.
vector and novector only apply if
-Mvect has been
selected on the command line.
vintr (novintr)
The directive novintr directs the vectorizer to disable recognition of
vector intrinsics. The
-Mvect=transform option always disables
vector intrinsic recognition. The directive norecog takes precedence
over vintr. The directive vintr affects the compiler only
when -Mvect is specified.
9.3 Scope of Directives and Command Line options
This section presents several examples showing the effect of directives and the scope of directives. Remember that during compilation the effect of a directive may be to either turn an option on, or turn an option off. Directives apply to the section of code following the directive, corresponding to the specified scope (that is, the following loop, the following routine, or the rest of the program).
Consider the following code:
integer maxtime, time
parameter (n = 1000, maxtime = 10)
double precision a(n,n), b(n,n), c(n,n)
do time = 1, maxtime
do i = 1, n
do j = 1, n
c(i,j) = a(i,j) + b(i,j)
enddo
enddo
enddo
end
When compiled with -Mvect, both interior loops are interchanged with the outer loop.
$ pgf90 -Mvect dirvect1.f
Directives alter this behavior either globally or on a routine or loop by loop basis. To assure that vectorization is not applied, use the novector directive with global scope.
cpgi$g novector
integer maxtime, time
parameter (n = 1000, maxtime = 10)
double precision a(n,n), b(n,n), c(n,n)
do time = 1, maxtime
do i = 1, n
do j = 1, n
c(i,j) = a(i,j) + b(i,j)
enddo
enddo
enddo
end
In this version the compiler disables vectorization for the entire source file. Another use of the directive scoping mechanism turns an option on or off locally either for a specific procedure or for a specific loop:
integer maxtime, time
parameter (n = 1000, maxtime = 10)
double precision a(n,n), b(n,n), c(n,n)
cpgi$l novector
do time = 1, maxtime
do i = 1, n
do j = 1, n
c(i,j) = a(i,j) + b(i,j)
enddo
enddo
enddo
end
Loop level scoping does not apply to nested loops. That is, the directive only applies to the following loop. In this example, the directive turns off vector transformations for the top level loop. If the outer loop were a timing loop, this would be a practical use for a loop-scoped directive.
9.4 Adding Pragmas to C and C++
Pragmas may be supplied in a C/C++ source file to provide information to the compiler. Like directives in Fortran, pragmas alter the effects of certain command-line options or default behavior of the compiler (many pragmas have a corresponding command-line option). While a command-line option affects the entire source file that is being compiled, pragmas apply the effects of a particular command-line option to selected functions or to selected loops in the source file (pragmas may also toggle an option, selectively enabling and disabling the option). Pragmas let you tune selected functions or loops based on your knowledge of the code.
The general syntax of a pragma is:
#pragma [ scope ] pragma-body
The optional scope field is an indicator for the scope of the pragma; some pragmas ignore the scope indicator.
The valid scopes are:
- global
- indicates the pragma applies to the entire source file.
- routine
- indicates the pragma applies to the next function.
- loop
- indicates the pragma applies to the next loop (but not to any loop contained within the loop body). Loop-scoped pragmas are only applied to for and while loops.
If a scope indicator is not present, the default scope, if any, is applied. Whitespace must appear after the pragma keyword and between the scope indicator and the body of the pragma. Whitespace may also surround any special characters, such as a comma or an equals sign. Case is significant for the names of the pragmas and any variable names which appear in the body of the pragma.
9.5 C/C++ Pragma Summary
Table 9-2 summarizes the supported pragmas. The scope entry in the table indicates the permitted scope indicators for each pragma: the letters L, R, and G indicate loop, routine, and global scope, respectively. The default scope is surrounded by parentheses. The "*" in the scope field indicates that the scope is the code following the pragma until the end of the routine for R-scoped pragmas, as opposed to the entire routine, or until the end of the file for G-scoped pragmas, as opposed to the entire file.
Many of the pragmas can be preceded by no. The default entry in the table indicates the default of the pragma; N/A appears if a default does not apply. The name of any pragma may be prefixed with -M; for example, -Mnoassoc is equivalent to noassoc and -Mvintr is equivalent to vintr. The section following the table provides brief descriptions of the pragmas which are unique to C/C++. Pragmas that have a corresponding directive in Fortran are described above in section 9.2.
Table 9-2 C/C++ Pragma Summary
PRAGMA
|
FUNCTION
|
DEFAULT
|
SCOPE
|
---|---|---|---|
altcode
|
Do/don't generate scalar code for vector regions |
altcode |
(L)RG |
assoc
|
Do/don't perform associative transformations |
assoc |
(L)RG |
bounds
|
Do/don't perform array bounds checking |
nobounds |
(R)G |
concur |
Do/don't enable auto-concurrentization of loops |
concur |
(L)RG |
depchk
|
Don't/do ignore potential data dependencies |
depchk |
(L)RG |
fcon
|
Do/don't assume unsuffixed real constants are single precision |
nofcon |
(R)G |
invarif |
Do/don't remove invariant if constructs from loops. |
invarif |
(L)RG |
lstval
|
Do/don't compute last values |
lstval |
(L)RG |
opt |
Select optimization level |
N/A |
(R)G |
safe
|
Do/don't treat pointer arguments as safe |
safe |
(R)G |
safe_lastval |
Parallelize when loop contains a scalar used outside of loop. |
not enabled |
(L) |
safeptr
|
Do/don't ignore potential data dependences to pointers |
nosafeptr |
L(R)G |
single |
Do/do not convert float parameters to double. |
nosingle |
(R)G* |
unroll |
Do/don't unroll loops. |
nounroll |
(L)RG |
vector
|
Do/don't perform vectorizations |
vector |
(L)RG |
vintr
|
Do/don't
recognize vector |
vintr |
(L)RG |
fcon (nofcon)
This pragma alters the effects of the -Mfcon command-line option (a -M Language control).
This pragma instructs the compiler to treat non-suffixed floating-point constants as float rather than double; by default, all non-suffixed floating-point constants are treated as double.
safe (nosafe)
By default, the compiler assumes that all pointer arguments are unsafe. That is, the storage located by the pointer can be accessed by other pointers.
The forms of the safe pragma are:
#pragma [scope] [no]safe
#pragma safe (variable [, variable]...)
where scope is either global or routine.
When the pragma safe is not followed by a variable name (or name list), all pointer arguments appearing in a routine (if scope is routine) or all routines (if scope is global) will be treated as safe.
If variable names occur after safe, each name is the name of a pointer argument in the current function. The named argument is considered to be safe. Note that if just one variable name is specified, the surrounding parentheses may be omitted.
There is no command-line option corresponding to this pragma.
safeptr (nosafeptr)
The pragma safeptr directs the compiler to treat pointer variables of the indicated storage class as safe. The pragma nosafeptr directs the compiler to treat pointer variables of the indicated storage class as unsafe. This pragma alters the effects of the -Msafeptr command-line option.
The syntax of this pragma is:
#pragma [scope] value
where value is:
[no]safeptr={arg|local|auto|global|static|all},...
Note that the values local and auto are equivalent.
For example, in a file containing multiple functions, the command-line option -Msafeptr might be helpful for one function, but can't be used because another function in the file would produce incorrect results. In such a file, the safeptr pragma, used with routine scope could improve performance and produce correct results.
single (nosingle)
The pragma single directs the compiler not to convert float parameters to double in non-prototyped functions. This can result in faster code if the program uses only float parameters.
Note: since ANSI C specifies that routines must convert float parameters to double in non-prototyped functions, this pragma results in non-ANSI conforming code.
9.6 Scope of C/C++ Pragmas and Command Line Options
This section presents several examples showing the effect of pragmas and the use of the pragma scope indicators. Note during compilation a pragma either turns an option on or turns an option off. Pragmas apply to the section of code corresponding to the specified scope (that is, the entire file, the following loop, or the following or current routine). For pragmas that have only routine and global scope, there are two rules for determining the scope of a pragma. We cover these special scope rules at the end of this section. In all cases, pragmas override a corresponding command-line option.
Consider the program:
main()
{
float a[100][100], b[100][100], c[100][100];
int time, maxtime, n, i, j;
maxtime=10;
n=100;
for (time=0; time<maxtime;time++)
for (j=0; j<n;j++)
for (i=0; i<n;i++)
c[i][j] = a[i][j] + b[i][j];
}
When this is compiled using the -Mvect command-line option, both interior loops are interchanged with the outer loop. Pragmas alter this behavior either globally or on a routine or loop by loop basis. To assure that vectorization is not applied, use the novector pragma with global scope.
main()
{
#pragma global novector
float a[100][100], b[100][100], c[100][100];
int time, maxtime, n, i, j;
maxtime=10;
n=100;
for (time=0; time<maxtime;time++)
for (j=0; j<n;j++)
for (i=0; i<n;i++)
c[i][j] = a[i][j] + b[i][j];
}
In this version the compiler does not perform vectorization for the entire source file. Another use of the pragma scoping mechanism turns an option on or off locally either for a specific procedure or for a specific loop. The following example shows the use of a loop-scoped pragma.
main()
{
float a[100][100], b[100][100], c[100][100];
int time, maxtime, n, i, j;
maxtime=10;
n=100;
#pragma loop novector
for (time=0; time<maxtime;time++)
for (j=0; j<n;j++)
for (i=0; i<n;i++)
c[i][j] = a[i][j] + b[i][j];
}
Loop level scoping does not apply to nested loops. That is, the pragma only applies to the following loop. In this example, the pragma turns off vector transformations for the top level loop. If the outer loop were a timing loop, this would be a practical use for a loop-scoped pragma.
The following example shows routine pragma scope:
#include "math.h"
func1()
#pragma routine novector
{
float a[100][100], b[100][100];
float c[100][100], d[100][100];
int i,j;
for (i=0;i<100;i++)
for (j=0;j<100;j++)
a[i][j] = a[i][j] + b[i][j] * c[i][j];
c[i][j] = c[i][j] + b[i][j] * d[i][j];
}
func2()
{
float a[200][200], b[200][200];
float c[200][200], d[200][200];
int i,j;
for (i=0;i<200;i++)
for (j=0;j<200;j++)
a[i][j] = a[i][j] + b[i][j] * c[i][j];
c[i][j] = c[i][j] + b[i][j] * d[i][j];
}
When this source is compiled using the -Mvect command-line option, func2 is vectorized, but func1 is not vectorized. In the following example, the global novintr pragma turns off vectorization for the entire file.
#include "math.h"
func1()
#pragma global novector
{
float a[100][100], b[100][100];
float c[100][100], d[100][100];
int i,j;
for (i=0;i<100;i++)
for (j=0;j<100;j++)
a[i][j] = a[i][j] + b[i][j] * c[i][j];
c[i][j] = c[i][j] + b[i][j] * d[i][j];
}
func2()
{
float a[200][200], b[200][200];
float c[200][200], d[200][200];
int i,j;
for (i=0;i<200;i++)
for (j=0;j<200;j++)
a[i][j] = a[i][j] + b[i][j] * c[i][j];
c[i][j] = c[i][j] + b[i][j] * d[i][j];
}
Special Scope Rules
For a pragma with loop, routine and global scope, when the pragma is placed within a routine, it applies to the routine from its point in the routine to the end of the routine; likewise for one of these pragmas with global scope. However, there are several pragmas for which only routine and global scope apply and which affect code immediately following the pragma. These pragmas, (bounds and fcon) behave in a similar manner to pragmas with loop scope. That is, they apply to the code following the pragma. However, for the opt, and safe pragmas, when they are placed within a routine, they apply to the entire routine, as if they had been placed at the beginning of the routine.
<< " border=0> > " border=0>