Sunday, August 8, 2010

I want my CMT

One problem with many parallel applications is that they don't scale to large numbers of threads. There are plenty of reasons why this might be the case. Perhaps the amount of work that needs to be done is insufficient for the number of threads being used.

On the other hand there are plenty of examples where codes scale to huge numbers of threads. Often they are called 'embarrassingly parallel codes', as if writing scaling codes is something to be ashamed of. The other term for this is 'delightfully parallel' which I don't really find any better!

So we have some codes that scale, and some that don't. Why don't all codes scale? There's a whole bunch of reasons:

  • Hitting some hardware constraint, like bandwidth. Adding more cores doesn't remove the constraint - although adding more processors, or systems might
  • Insufficient work. If the problem is too small, there just is not enough work to justify multiple threads
  • Algorithmic constraints or dependencies. If the code needs to calculate A and then B, there is no way that A and B can be calculated simultaneously.

These are all good reasons for poor scaling. But I think there's also another one that is, perhaps, less obvious. And that is access to machines with large numbers of cores.

Perhaps five years ago it was pretty hard to get time on a multicore system. The situation has completely reversed now. Obviously if a code is developed on a system with a single CPU, then it will run best on that kind of system. Over time applications are being "tuned" for multicore, but we're still looking at the 4-8 thread range in general. I would expect that to continue to change as access to large systems becomes more common place.

I'm convinced that as access to systems with large numbers of threads becomes easier, the ability of applications to utilise those threads will also increase. So all those applications that currently max out at eight threads, will be made to scale to sixteen, and beyond.

This is at the root of my optimism about multicore in general. Like many things, applications "evolve" to exploit the resources that are provided. You can see this in other domains like video games, where something new comes out of nowhere, and you are left wondering "How did they make the hardware do that?".

I also think that this will change attitudes to parallel programming. It has a reputation of being difficult. Whilst I agree that it's not a walk in the park, not all parallel programming is equally difficult. As developers become more familiar with it, coding style will evolve to avoid the common problems. Hence as it enters the mainstream, its complexity will be more realistically evaluated.