Tuesday, May 22, 2012

Square roots

If you are spending significant time calling sqrt() then to improve this you should compile with -xlibmil. Here's some example code that calls both fabs() and sqrt():

#include <math.h>
#include <stdio.h>

int main()
{
  double d=23.3;
  printf("%f\n",fabs(d));
  printf("%f\n",sqrt(d));
}

If we compile this with Studio 12.2 we will see calls to both fabs() and fsqrt():

$ cc -S -O  m.c bash-3.2$ grep call m.s
$ grep call m.s|grep -v printf
/* 0x0018            */         call    fabs    ! params =  %o0 %o1     ! Result =  %f0 %f1
/* 0x0044            */         call    sqrt    ! params =  %o0 %o1     ! Result =  %f0 %f1

If we add -xlibmil then these calls get replaced by equivalent instructions:

$ cc -S -O -xlibmil  m.c
$ grep abs m.s|grep -v print; grep sqrt m.s|grep -v print
/* 0x0018          7 */         fabsd   %f4,%f0
/* 0x0038            */         fsqrtd  %f6,%f2

The default for Studio 12.3 is to inline fabs(), but you still need to add -xlibmil for the compiler to inline fsqrt(), so it is a good idea to include the flag.

You can see the functions that are replaced by inline versions by grepping the inline template file (libm.il) for the word "inline":

$ grep inline libm.il
        .inline sqrtf,1
        .inline sqrt,2
        .inline ceil,2
        .inline ceilf,1
        .inline floor,2
        .inline floorf,1
        .inline rint,2
        .inline rintf,1
        .inline min_subnormal,0
        .inline min_subnormalf,0
        .inline max_subnormal,0
...

The caveat with -xlibmil is documented:

          However, these substitutions can cause the setting of
          errno to become unreliable. If your program depends on
          the value of errno, avoid this option. See the NOTES
          section at the end of this man page for more informa-
          tion.

An optimisation in the inline versions of these functions is that they do not set errno. Which can be a problem for some codes, but most codes don't read errno.