Monday, April 25, 2011

Using pragma opt

The Studio compiler has the ability to control the optimisation level that is applied to particular functions in an application. This can be useful if the functions are designed to work at a specific optimisation level, or if the application fails at a particular optimisation level, and you need to figure out where the problem lies.

The optimisation levels are controlled through pragma opt. The following steps need to be followed to use the pragma:

  • The directive needs to be inserted into the source file. The format of the directive is #pragma opt /level/ (/function/). This needs to be inserted into the code before the start of the function definition, but after the function header.
  • The code needs to be compiled with the flag -xmaxopt=level. This sets the maximum optimisation level for all functions in the file - including those tagged with #pragma opt.

We can see this in action using the following code snippet. This contains two identical functions, both return the square of a global variable. However, we are using #pragma opt to control the optimisation level of the function f().

int f();
int g();

#pragma opt 2 (f)

int d;

int f()
  return d*d;

int g()
  return d*d;

The code is compiled with the flag -xmaxopt=5, this specifies the maximum optimisation level that can be applied to any functions in the file.

$ cc -O -xmaxopt=5 -S opt.c

If we compare the disassembly for the functions f() and g(), we can see that g() is more optimal as it does not reload the global data.

/* 000000          0 */         sethi   %hi(d),%o5

!   10                !  return d*d;

/* 0x0004         10 */         ldsw    [%o5+%lo(d)],%o4 ! volatile    // First load of d
/* 0x0008            */         ldsw    [%o5+%lo(d)],%o3 ! volatile    // Second load of d
/* 0x000c            */         retl    ! Result =  %o0
/* 0x0010            */         mulx    %o4,%o3,%o0

/* 000000         14 */         sethi   %hi(d),%o5
/* 0x0004            */         ld      [%o5+%lo(d)],%o4               // Single load of d

!   15                !  return d*d;

/* 0x0008         15 */         sra     %o4,0,%o3
/* 0x000c            */         retl    ! Result =  %o0
/* 0x0010            */         mulx    %o3,%o3,%o0