Tuesday, March 30, 2010

Tuesday, March 2, 2010

Compiler memory barriers

I've written in the past about memory barriers. Basically a membar instruction ensures that other processors see memory operations in the order that they appear in the source code. The obvious example being a mutex lock where you want the memory operations that occurred while the lock was held to be visible to other processors before the memory operation that releases the lock.

There's actually another kind of memory ordering and that is the ordering used by the compiler. If you write:

  *a=1;
  *b=2;

If the compiler can determine that a and b do no alias, then there's no reason for it not to swap the stores to a and b if it thinks that will be a more optimal code pattern.

The most cross-platform way of enforcing this ordering is to put a function call between the two stores:

  *a=1;
  reorder_barrier();
  *b=2;

Memory needs to be the program defined state at the call, so the compiler cannot defer the store to a, and cannot hoist the store to b.

This is a great solution, but causes the overhead of a function call, and function calls can have significant costs. There are some compiler intrinsics that cause the compiler to enforce the desired memory ordering. Sun Studio 12 Update 1 supports the GCC flavour:

  *a=1;
  asm volatile ("":::"memory");
  *b=2;

You can test the performance overhead using the following code:

void barrier(){}

void main()
{
  for (int i=0; i<1000000000;i++)
  {
    barrier();
  }
} 

On the test system this code took about 5 seconds to run. The alternative code is:

void main()
{
  for (int i=0; i<1000000000;i++)
  {
    asm volatile ("":::"memory");
  }
}

This code took under a second.