Monday, July 26, 2010

What does take_deferred_signal() mean in my profile?

Every so often you'll see take_deferred_signal() appear in the profile of an application. Sometimes as quite a time consuming function. So, what does it mean?

It actually comes from signal handling code in libc. If a signal comes in while the application is in a critical section, the signal gets deferred until the critical section is complete. When the application exits the critical section, all the deferred signals get taken.

Typically, this function becomes hot due to mutex locks in malloc and free, but other library calls can also cause it. The way to diagnose what is happening is to examine the call stack. So let's run through an example. Here is some multithreaded malloc/free heavy code.

#include <stdlib.h>
#include <pthread.h>

void *work( void* param )
  while ( 1 ) { free( malloc(100) ); }

int main()
  pthread_t thread;
  pthread_create( &thread, 0, work, 0 );
  for ( int i=0; i<10000000; i++ )
    free ( malloc (100) );

Profiling, we can see that take_deferred_signal() is the hottest function. The other hot functions would probably give us a clue as to the problem, but that is an artifact of the rather simple demonstration code.

Excl.     Incl.      Name
User CPU  User CPU
  sec.      sec.
36.456    36.456     <Total>
14.210    14.210     take_deferred_signal
 4.203    21.265     mutex_lock_impl
 3.082     3.082     clear_lockbyte
 2.872    17.062     mutex_trylock_adaptive

The next thing to look at is the call stack for take_deferred_signal() as this will tell us who is calling the function.

Attr.      Name
User CPU
14.210     do_exit_critical
14.210    *take_deferred_signal

do_exit_critical() doesn't tell us anything, we already know that it is called when the code exits a critical section. Continuing up the call stack we find:

Attr.      Name
User CPU
14.190     mutex_trylock_adaptive
 0.020     mutex_unlock
 0.       *do_exit_critical
14.210     take_deferred_signal

Which is more useful, we now know that the time is spent in mutex locks, but we don't know the user of those mutex locks. In this case the bulk of the time comes from mutex_trylock_adaptive(), so that is the routine to investigate:

Attr.      Name
User CPU
17.062     mutex_lock_impl
 2.872    *mutex_trylock_adaptive
14.190     do_exit_critical

So we're still in the mutex lock code, we need to find who is calling the mutex locks:

Attr.      Name
User CPU
11.938     free
 9.327     malloc
 4.203    *mutex_lock_impl
17.062     mutex_trylock_adaptive

So we finally discover that the time is due to calls to mutex locks in malloc() and free().

No comments:

Post a Comment