Tuesday, January 17, 2012

Separation of debug and executable

To reduce the size of shipped binaries it can be useful to separate the debug information into a separate file. This procedure is covered in the dbx manual. We can use objdump to extract the debug information and then to link the executable with the extracted data.

Here's a short example executable:

#include <stdio.h>
#include <math.h>

int main()
{
  double d=1.0;
  d = sin(d);
  printf("sin(1.0) = %f\n",d);
}

Compiled with debug:

$ cc -g hello.c -lm
$ ./a.out
sin(1.0) = 0.841471

We can debug this executable with dbx. Note that, in this case, we compiled without optimisation in order to get the best debug information. Doing this does potentially sacrifice some performance. We can follow the same procedure with optimised code.

$ dbx ./a.out
Reading ld.so.1
Reading libm.so.2
Reading libc.so.1
(dbx) stop in main
(2) stop in main
(dbx) run
Running: a.out
(process id 53296)
stopped in main at line 6 in file "hello.c"
    6     double d=1.0;
(dbx) step
stopped in main at line 7 in file "hello.c"
    7     d = sin(d);
(dbx) print d
d = 1.0
(dbx) cont
Reading libc_psr.so.1
sin(1.0) = 0.841471

First of all we are going to use objcopy to extract the debug information from ./a.out and place it into ./a.out.debug:

$ /usr/sfw/bin/gobjcopy --only-keep-debug ./a.out ./a.out.debug

Now we can strip a.out of debug information:

$ strip ./a.out

To prove that this has removed the debug information we can try running under dbx:

$ dbx  ./a.out
Reading ld.so.1
Reading libm.so.2
Reading libc.so.1
(dbx) stop in main
dbx: warning: 'main' has no debugger info -- will trigger on first instruction
(2) stop in main
(dbx) quit

Now we want to use objcopy to make a link between the executable and its debug information:

$ /usr/sfw/bin/gobjcopy --add-gnu-debuglink=./a.out.debug ./a.out

Now when we debug the executable we are back to full debug:

$ dbx ./a.out
Reading ld.so.1
Reading libm.so.2
Reading libc.so.1
(dbx) stop  in main
(2) stop in main
(dbx) run
Running: a.out
(process id 58837)
stopped in main at line 6 in file "hello.c"
    6     double d=1.0;
(dbx) next
stopped in main at line 7 in file "hello.c"
    7     d = sin(d);
(dbx) print d
d = 1.0
(dbx) cont
Reading libc_psr.so.1
sin(1.0) = 0.841471

execution completed, exit code is 0
(dbx) quit

Friday, January 13, 2012

C++ and inline templates

A while back I wrote an article on using inline templates. It's a bit of a niche article as I would generally advise people to write in C/C++, and tune the compiler flags and source code until the compiler generates the code that they want to see.

However, one thing that I didn't mention in the article, it's implied but not stated, is that inline templates are defined as C functions. When used from C++ they need to be declared as extern "C", otherwise you get linker errors. Here's an example template:

.inline nothing
  nop
.end

And here's some code that calls it:

void nothing();

int main()
{
  nothing();
}

The code works when compiled as C, but not as C++:

$ cc i.c i.il
$ ./a.out
$ CC i.c i.il
Undefined                       first referenced
 symbol                             in file
void nothing()                   i.o
ld: fatal: Symbol referencing errors. No output written to a.out

To fix this, and make the code compilable with both C and C++ we use the __cplusplus feature test macro and conditionally include extern "C". Here's the modified source:

#ifdef __cplusplus
  extern "C"
  {
#endif
    void nothing();
#ifdef __cplusplus
  }
#endif

int main()
{
  nothing();
}

Thursday, January 12, 2012

Please mind the gap

I find the timeline view in the Performance Analyzer incredibly useful, but I've often been puzzled by what causes the gaps - like those in the example below:

Timeline view

One of my colleagues pointed out that it is possible to figure out what is causing the gaps. The call stack is indicated by the event after the gap. This makes sense. The Performance Analyzer works by sending a profiling signal to the thread multiple times a second. If the thread is not scheduled on the CPU then it doesn't get a signal. The first thing that the thread does when it is put back onto the CPU is to respond to those signals that it missed. Here's some example code so that you can try it out.

#include <stdio.h>

void write_file()
{
  char block[8192];
  FILE * file = fopen("./text.txt", "w");
  for (int i=0;i<1024; i++)
  {
    fwrite(block, sizeof(block), 1, file);
  }
  fclose(file);
}

void read_file()
{
  char block[8192];
  FILE * file = fopen("./text.txt", "rw");
  for (int i=0;i<1024; i++)
  {
    fread(block,sizeof(block),1,file);
    fseek(file,-sizeof(block),SEEK_CUR);
    fwrite(block, sizeof(block), 1, file);
  }
  fclose(file);
}

int main()
{
  for (int i=0; i<100; i++)
  {
    write_file();
    read_file();
  }
}

This is the code that generated the timeline shown above, so you know that the profile will have some gaps in it. If we select the event after the gap we determine that the gaps are caused by the application either opening or closing the file.

_close

But that is not all that is going on, if we look at the information shown in the Timeline details panel for the Duration of the event we can see that it spent 210ms in the "Other Wait" micro state. So we've now got a pretty clear idea of where the time is coming from.

Monday, January 9, 2012

A static function, an inline function, and a static variable walked into a bar....

... well, not really. Hacking around with some library code, so I thought I'd write up a quick refresher on scoping. Steve Clamage and I cover scoping in more detail in the series on libraries and linking. For the code I was working on today, the problem was much more limited.

I had a single file containing all the source code. I wanted to export only the minimal number of symbols that were needed to act as an interface for the library. You can imagine it being something like:

#include <stdio.h>

int count=0;

inline void printcount()
{
  printf("Count = %i\n",count);
  asm("nop");
}

void next()
{
  count++;
  printcount();
}

If I compile this, and then use nm to inspect the resulting library, I can see a global symbol for count. The function printcount() is defined with local scope. However, the only interface I want to export is next().

bash-3.00$ cc -g -G -O -o libt.so t.c
bash-3.00$ nm libt.so|grep GLOB
...
[45]    |     66468|       4|OBJT |GLOB |0    |11     |count
[43]    |       724|      40|FUNC |GLOB |0    |5      |next
[42]    |         0|       0|FUNC |GLOB |0    |UNDEF  |printf
bash-3.00$ nm libt.so |grep count
[44]    |     66460|       4|OBJT |GLOB |0    |11     |count
[32]    |       672|      52|FUNC |LOCL |0    |5      |printcount

So I can define count as a static variable, and that reduces its scope to the file in which it is defined. However, this does not actually make it disappear, it is still there, but with name mangling:

bash-3.00$ nm libt.so|grep count
[40]    |     66476|       4|OBJT |GLOB |0    |11     |$XAS4IkBuA_CPGtc.count
[33]    |       688|      52|FUNC |LOCL |0    |5      |printcount

The reason for this is that I'm building with debug (-g). With debug, I get a local version of the routine printcount(), and I get a globalised version of the variable count. If I remove -g, I get the following output from nm:

bash-3.00$ nm libt.so|grep count
[29]    |     66316|       4|OBJT |LOCL |0    |11     |count
[36]    |         0|       0|FUNC |GLOB |0    |UNDEF  |printcount

The variable count has local scope, which is what we expected - it is no longer exported from the file, so we have avoided possible name conflicts there. However, printcount() is now no longer defined. That might be ok so long as we don't actually call the routine:

bash-3.00$ dis libt.so|grep printcount
printcount()
         2e4:  7f ff ff ef  call        printcount      ! 0x2a0

Oops. We've hit the rule about needing to provide an extern version of any inline functions. Once again, I suggest parsing Douglas Walls' discussion of the topic for the gory details. Anyhow, the upshot is that this library wouldn't work. The fix is trivial, declare printcount() to be static inline, and the compiler will generate the local version of the function:

bash-3.00$ cc -G -O -o libt.so t.c
bash-3.00$ nm libt.so |grep count
[29]    |     66448|       4|OBJT |LOCL |0    |11     |count
[30]    |       664|      52|FUNC |LOCL |0    |5      |printcount

With these fixes the library no longer exports any functions but the ones I left with external linkage. This substantially reduces the risk of "undefined behaviour".

Understanding binary size

One of my colleagues, Miriam Blatt, has written a great article about understanding the size of binary objects. This is worth a read because it describes both what goes into the objects and what tools you can use to discover this information.

What's inlined by -xlibmil

The compiler flag -xlibmil provides inline templates for some critical maths functions, but it comes with the optimisation that it does not set errno for these functions. The functions it inlines can vary from release to release, so it's useful to be able to see which functions are inlined, and determine whether you care that they don't set errno. You can see the list of functions using the command:

grep inline /compilerpath/prod/lib/libm.il
        .inline sqrtf,1
        .inline sqrt,2
        .inline ceil,2
        .inline ceilf,1
        .inline floor,2
        .inline floorf,1
        .inline rint,2
        .inline rintf,1
...

From a cursory glance at the list I got when I did this just now, I can only see sqrt as a function that sets errno. So if you use sqrt and you care about whether it set errno, then don't use -xlibmil.