Wednesday, June 30, 2010

The solution is multicore

Professor David Patterson wrote an interesting article in IEEE Spectrum, 'The trouble with multicore'. The tag line is "Chipmakers are busy designing microprocessors that most programmers can't handle". The thrust of the article is that multicore processors are a hardware development that software is poorly equipped to utilise.

There are two main arguments made in the article. The first is that programming languages are very poor at describing parallelism. There has been a long list of languages that were either designed to tackle parallelism or have had parallelism imposed upon them. To be fair parallel programming is littered with the ill-conceived corpses of languages that were meant to solve the problem. So his view is correct, but perhaps this is not relevant.

The second point he makes is that not all tasks break down to independent work. His example is that of ten reporters writing the same story, and not being able to write the story ten times faster because each section of text has to build on the previous sections. Again, this is true. There are some tasks that have implicit (or explicit) dependencies, but perhaps this is not relevant.

The example in his paper that best illustrates how multicore is the solution, and not the problem, is that of cloud computing. As he says "Expert programmers can take advantage of the task-level parallelism inherent in cloud computing.". Are you an expert programmer when you type a search term into Google? A lot of computation goes into finding the results for you, but they appear nearly instantly. It could be argued that Google put a considerable amount of effort into designing a system that produced results so quickly. Of course they did. However, they did it once, and its used for millions of search queries every day.

Observation 1: Many problems just need parallelising once. Or conversely, not every developer needs to worry about the parallelism – in the same way as not every developer on a project needs to worry about the GUI.

But this only addresses part of the argument. It is all very well using an anecdotal example to demonstrate that it is possible to utilise multiple cores, but that does not disprove Professor Patterson's argument.

Lets return to the example of the reporters. The way the reporters are working is perhaps not the best use of their resources. Much of the work of reporting is fact checking, talking to people, and gathering data. The writing part of this is only the final step in a long pipeline. Perhaps a better way of utilising the ten reporters would be during the data gathering stages, multiple people could be interviewed simultaneously, multiple sources consulted at the same time. On the other hand, a newspaper would rarely allocate more than a single reporter to a single story. More progress would be made if each reporter was working on a different story. So perhaps the critical observation is that dependencies within a task are an indication that parallelism needs to be discovered outside that task.

Observation 2: It is rare that there are no other ways of productively utilising compute resources. Meaning that given a number of cores, it is almost always possible to find work to keep them busy. For example, rendering a movie could have cores working on separate frames, or separate segments of the same frame. Sequencing genes could have multiple genes being examined simultaneously. Simulation models of different scenarios could be completed in parallel.

But, it can be argued that there are times when you need to do a single task, and you care how long that task takes to complete. So, lets consider exactly what problems we encounter during our day where we would benefit from a faster processor.

  • "I waited for my PC to boot.". Well booting a PC is pretty much a serial process, however, the boot time is largely dominated by disk access time rather than processor speed.
  • "I waited for my e-mail to download". Any downloading activity, be it e-mail or webpages is going to be dominated by network latency or bandwidth issues. There is undoubtedly some processor activity in the mix, but it is unlikely that a fast processor would make a noticeable difference to performance.
  • "I was watching a video when my virus scanner kicked in and caused the movie to stutter." Assuming it wasn't a disc activity, this is a great example of where having multiple cores will help rather than hinder. Two cores would allow the video to continue playing while the virus scanner did its work. This was, of course, the frequently given example of why multicore processors were a good thing – as if virus scanner were a desirable use of processor time!
  • "I was compiling an application and it took all afternoon." Some stages of compilation, like linking or crossfile optimisation, are inherently serial. But, unless the entire source code was placed into a single file, most projects have multiple source files, so these could be compiled in parallel. Again, the performance can be dominated by disk or network performance, so it is not entirely a processor performance issue.

These are a few situations where you might possibly feel frustration at the length of time a task takes. You may have plenty more. The point is that it is rare that there is no parallelism available, and no opportunity to make parallel progress on some other task.

Observation 3: There are very few day to day tasks that are actually limited by processor performance. Most tasks have substantial bottlenecks in other parts of the system (disk, network, speed of devices). If anything having multiple cores enables a system to remain useful while other compute tasks are completed.

All this discussion has not truly refuted Professor Patterson's observation that there exist problems which are inherently serial, or fiendishly difficult to parallelise. But that's ok. Most commonly encountered computational activities are either easy to parallelise, or there are ways of extracting parallelism at other levels.

But what of software? There is great allure to using threads on a multicore processor to deliver many times the performance of a single core processor. And this is the crux of the matter. Advances in computer languages haven't 'solved' this problem for us. It can still be hard, for some problems, to write parallel programs that are both functionally correct and scale well.

However, we don't all need to solve the hard problems. There are plenty of opportunities for exploiting parallelism in a large number of common problems, and in other situations there are opportunities for task level parallelism. This combination should cover 90+% of the problem space.

Perhaps there are 10% of problems that don't map well to multicore processors, but why focus on those when the other 90% do?


  1. You're 10% number was pulled out of your ass. I'd guess you got it backward--90%. You ever actually tried to write for multicore? LoseThos is as good as it gets for multicofre -- complete control in your hands as master/slave configuration on one address space for everything. You're practically on bare metal, so you have no excuses and one address space is optimal as well.

    One problem is asymetric tasks, like prefetching and uncompressing files to save I/O for compiling. You have balancing issues and usually one task is an order of magnitude bigger -- tasks rarely are of similar size.

    With asymetric, you also have to worry about how many cores your users will have and they don't scale. A major thing that's holding me back is concerns for computers with only two cores when I have eight.

    Games and many things are real-time. The only option is degrading the quality for fewer cores. This is not always possible.

    If it's off-line, not real-time, like compiling, it's often but not always easy. I think such tasks are by far the exception, but maybe I imagine more people are interested in games.

    My operating system, LoseThos, doesn't use graphics acceleration, so multicore games help a lot. If you have graphic acceleration, multicore matters a lot less for games because the CPU interacts serially with the GPU.

  2. RE: 'email' I don't know about you... but I've seen email client UI's become unresponsive while fetching lots of email. It reminds me of a web browser that becomes unresponsive due to a web page locking up the whole browser in a tab. You may not improve performance by going multiprocess, but you might improve responsiveness. Chrome is more responsive even when 1 tab is locked up because each tab has its own process. A mail client could improve much in the same way by fetching each mail account with its own process in the background.

  3. Some problems are solved faster with multiple CPUs (or Cores). Password cracking for example. 16crack automatically detects the number of CPUs and is much faster when used with multiple cores. But it only scales so much. Similar problems have multiple starting points. If you have multiple CPUs it only makes sense to have one CPU start working on the problem here and another start there while ideally not duplicating any work. Unfortunately, not all problems are so easy to do this with. For the reporters example, the problem to solve is writing stories. So long as you have one reporter per story, you are OK. One does weather, the other obituaries and local news, the other international news and politics, etc. That seems multithreaded already. You don't need 10 guys all doing weather.

  4. @xenoterracide. Yup. Plenty of experience with apps freezing etc. Chrome is a good example, MP works better for browsing because of the danger of tabs locking up. The main difference between MT and MP is that MT shares address space and MP doesn't. If the entire app locks up it's normally because of some locking issue. One thread should not cause another thread to stall unless there's some resource sharing going on there. That said, MP gives you robustness - so one tab doesn't take out the entire system.

  5. @TerryD. Your OS looks interesting.

    The point I'm making is that multicore is not solely about developing a single application that takes advantage of all those cores. Often just having multiple cores makes your machine more productive.

    So rather than looking at these machines and complaining that they aren't a single core machine with twice the clock speed, we should be celebrating that our systems have become more responsive, and provide more throughput.

  6. @Brad. Scaling is an interesting problem. I've been meaning to write up a post about that.