Thursday, July 14, 2011

Best practices for libraries and linkers (part 8)

Part 8 is the conclusion of the series on the best practices for libraries and linking. The core set of best practices are:

  • Ensure at link time that all symbols are resolved.
  • Minimise the number of symbols of global scope.
  • Specify the library search paths at link time.

Putting this series of articles together turned out to be a fair amount of work. Hopefully you can see from the scale of the topics why we chose to break it down into bite-sized chunks. I'll be happy to hear feedback on whether you found it useful, or what other topics you would like discussed.

Using symbol scoping. Libraries and linker best practices part 7

In general the compiler is going to scope symbols declared in object files as being global. This means that they can be seen and bound to by any object. There are two other settings for symbol scope - "symbolic" and "hidden".

Hidden scope is easiest to describe as it just means that the symbol can only be seen within the module and is not exported for applications or libraries to use. This is basically a locally defined symbol. There are multiple advantages to using hidden scoping when possible, it reduces the number of symbols that the linker needs to handle at runtime, so reduces start up time. It also reduces the number of names, so reduces the chance of duplicate names. Finally hidden symbols cannot be bound to externally, so they cannot cause a link order problem. This makes hidden scope a good choice for all those symbols that don't need to be exported.

The other option is symbolic scope. A symbol with symbolic scope is still available for other modules to bind to - so it is like a global symbol in that respect. However, a symbolic symbol can only be satisfied from within the library or application. So if I have an unresolved symbolic symbol foo() then that symbol can only bind within the library or application. So symbolic-scoped symbols avoid the cross-library issue that causes link order problems.

Symbols can be declared with their scoping; __global,__symbolic, or __hidden. We can also use the compiler flag -xldscope=<scope> to set the default scoping for all the symbols not otherwise scoped.

The details of all this are discussed much more thoroughly in Part 7 of the series.

The best practices for symbol scoping come in two flavours:

The easiest way of handling scoping is to declare all the defined symbols to have symbolic scoping (-xldscope=symbolic). This ensures that these symbols end up with local binding rather than pulling in definitions that are present in other libraries. The downside of this is that it could cause multiple definitions for the same symbol to become present in the address space of an application.

The other approach is to carefully define interfaces by declaring exported symbols to be __symbolic, so that other libraries can bind to them, but this library will bind to the local versions in preference. Then to declare imported symbols as __global which will ensure that the library can bind to an external definition for the symbol. Then finally use -xldscope=hidden to avoid further pollution of the name space. This is time consuming but reduces runtime link costs, and also increases the robustness of the application.

Setting the initialisation order for libraries (Best practices for libraries and linking part 6)

Part 5 of the series talked about diagnosing initialisation problems. These are situations where the libraries are loaded in the wrong order and this causes the application not to function correctly (or at all). Part 6 discusses how to resolve this problem.

The easiest, but the least reliable approach is to reorder the libraries on the link line until they get initialised in the right order. This is an easy fix since it is just a matter of changing the link line, but it's not reliable. There are various reasons why this is a poor fix. It is limited to just fixing the one application, and does not fix the root of the problem. It is not robust as a change in one of the libraries may cause the whole problem to recur. etc. Better fixes involve avoiding the duplicate symbol problem that causes the library load order to be indeterminate.

If the symbols are introduced because of C++ templates, then the -instlib=<library> flag causes the compiler not to generate symbols that are defined in the listed libraries.

Direct binding is another approach which records the exact library dependencies at link time so that the linker knows exactly which libraries are required, and hence can determine the appropriate load order. This has the downside that it enables different libraries to bind to different definitions of the same symbol, this could be a useful feature, but could also introduce problems.

Tuesday, July 12, 2011

Feature Test Macros

Feature test macros are a set of macros that are either:

  • Defined by the development environment indicating that the environment conforms to a particular standard

or

  • Defined by the source code for the application before the header files are included to indicate that the application requires a particular environment to build

The macros define what APIs are available, and what parameters are passed through the APIs. Adherence to a particular standard (like POSIX) will define a particular set of APIs, and define their parameters. A good example of this is on Solaris where munmap changes definition depending on what standards have been requested:

$ grep munmap /usr/include/sys/*.h
/usr/include/sys/mman.h:extern int munmap(void *, size_t);
/usr/include/sys/mman.h:extern int munmap(caddr_t, size_t);

The Linux man page for feature_test_macros includes useful source code (ftm.c) for reporting which feature test macros are set by default. This changes depending on the the OS and compiler used. One of the big differences between Linux and Solaris are the feature test macros that are set by default. Here's the output from the program compiled on a Linux box and a Solaris box - both using gcc.

Linux

$ gcc ftm.c
$ ./a.out
_POSIX_SOURCE defined
_POSIX_C_SOURCE defined: 200809L
_BSD_SOURCE defined
_SVID_SOURCE defined
_ATFILE_SOURCE defined

Solaris

$ gcc ftm.c
$ ./a.out
_LARGEFILE64_SOURCE defined
_FILE_OFFSET_BITS defined: 32

The list of standards that Solaris 10 adheres to is documented under man standards, the list for Linux is documented under man feature_test_macros.

Monday, July 11, 2011

OpenMP 3.1 specification released

OpenMP is a great way to produce parallel applications with the minimal amount of work. The 3.1 specification came out a couple of days ago. As should be apparent from the version number, its more incremental than significant. The significant changes I see are:

  • Support for min and max reductions in C/C++. This was a frustrating omission from the previous versions, so I'm pleased to see that fixed here.
  • Support for thread binding. The specification introduces OMP_PROC_BIND which binds threads to cores. This is rather similar to the original SUNW_MP_PROCBIND in Studio, which only took true or false, more recent compilers allow a much finer granularity of control. Still "true" or "false" is a good start!