A roughcut of Multicore Application Programming has been uploaded to Safari books. If you have access you can read it, and provide feedback or comments. If you don't have access to Safari, you can still see the table of contents, read the preface, and view the start of each chapter.
Wednesday, July 28, 2010
Monday, July 26, 2010
What does take_deferred_signal() mean in my profile?
Every so often you'll see take_deferred_signal()
appear in the profile of an application. Sometimes as quite a time consuming function. So, what does it mean?
It actually comes from signal handling code in libc. If a signal comes in while the application is in a critical section, the signal gets deferred until the critical section is complete. When the application exits the critical section, all the deferred signals get taken.
Typically, this function becomes hot due to mutex locks in malloc and free, but other library calls can also cause it. The way to diagnose what is happening is to examine the call stack. So let's run through an example. Here is some multithreaded malloc/free heavy code.
#include <stdlib.h> #include <pthread.h> void *work( void* param ) { while ( 1 ) { free( malloc(100) ); } } int main() { pthread_t thread; pthread_create( &thread, 0, work, 0 ); for ( int i=0; i<10000000; i++ ) { free ( malloc (100) ); } }
Profiling, we can see that take_deferred_signal()
is the hottest function. The other hot functions would probably give us a clue as to the problem, but that is an artifact of the rather simple demonstration code.
Excl. Incl. Name User CPU User CPU sec. sec. 36.456 36.456 <Total> 14.210 14.210 take_deferred_signal 4.203 21.265 mutex_lock_impl 3.082 3.082 clear_lockbyte 2.872 17.062 mutex_trylock_adaptive
The next thing to look at is the call stack for take_deferred_signal()
as this will tell us who is calling the function.
Attr. Name User CPU sec. 14.210 do_exit_critical 14.210 *take_deferred_signal
do_exit_critical()
doesn't tell us anything, we already know that it is called when the code exits a critical section. Continuing up the call stack we find:
Attr. Name User CPU sec. 14.190 mutex_trylock_adaptive 0.020 mutex_unlock 0. *do_exit_critical 14.210 take_deferred_signal
Which is more useful, we now know that the time is spent in mutex locks, but we don't know the user of those mutex locks. In this case the bulk of the time comes from mutex_trylock_adaptive()
, so that is the routine to investigate:
Attr. Name User CPU sec. 17.062 mutex_lock_impl 2.872 *mutex_trylock_adaptive 14.190 do_exit_critical
So we're still in the mutex lock code, we need to find who is calling the mutex locks:
Attr. Name User CPU sec. 11.938 free 9.327 malloc 4.203 *mutex_lock_impl 17.062 mutex_trylock_adaptive
So we finally discover that the time is due to calls to mutex locks in malloc()
and free()
.
Friday, July 9, 2010
White paper on using Oracle Solaris Studio
I contributed a fair amount of material to a recent white paper about Oracle Solaris Studio. The paper is available for download from the developer portal.
Optimizing Applications with Oracle Solaris Studio Compilers and Tools
Oracle Solaris Studio delivers a fully integrated development platform for generating robust high-performance applications for the latest Oracle Sun systems (SPARC and x86). In order to take full advantage of the latest multicore systems, applications must be compiled for optimal performance and tuned to exploit the capabilities of the hardware. Learn how Oracle Solaris Studio helps you generate the highest performance applications for your target platform, from selecting the right compiler flags and optimization techniques to simplifying development with advanced multicore tools.
Presenting at Oracle Develop
As part of Oracle Develop, I'll be presenting at the Hotel Nikko, in San Francisco on 20th September at 4pm. The session is S317573 titled "Multicore Application Programming with Oracle Solaris Studio". The abstract reads as follows:
Writing correct and fast parallel applications is often considered a hard problem. However, it doesn't need to be that way. This session will describe how Oracle Solaris Studio can be used to produce applications that are both fast and correct. The talk will cover parallelization strategies, implementation details, and common pitfalls, as well as describing how the tools provided by Oracle Solaris Studio can identify coding errors and performance opportunities in the application.
Thursday, July 8, 2010
Multicore application programming: update
It's 2am and I've just handed over the final manuscript for Multicore Application Programming. Those who know publishing will realise that this is not the final step. The publishers will layout my text and send it back to me for a final review before it goes to press. It will probably take a few weeks to complete the process.
I've also uploaded the final version of the table of contents. I've written the book using OpenOffice.org. It's almost certain not to be a one-to-one mapping of pages in my draft to pages in the finished book. But I expect the page count to be roughly the same - somewhere around 370 pages of text. It will be interesting to see what happens when it is properly typeset.