2012-04-05

Concurrency vs. parallelism


Parallelism refers to physically simultaneous execution.  When you raise both arms above your head, you do so in parallel.  Nothing can be more parallel than the number of execution agents simultaneously available: on an 8-core system, you can add up 8 rows of a matrix in parallel, but not 16 rows.  Furthermore, some problems cannot be parallelized, whereas others can be executed entirely in parallel, and there are many cases in between.
Concurrency refers to conceptually simultaneous execution.  When you juggle balls, you are executing a concurrent program: despite appearances,jugglers only throw or catch one ball at a time.  A concurrent program can execute on a single execution agent, on as many agents as it has concurrent components, or anything in between, so concurrency does not depend on parallelism.  We usually speak of concurrency when there is interaction between the concurrent components.
Flow-based programming, of which a shell pipeline is the most familiar example, is based on the concurrent execution of mostly isolated processes that communicate via flows.  Because the individual components are isolated, they may be executed in parallel or not.  Because there are no constraints on what components may do with their inputs, parallel execution may or may not actually speed up the computation.

5 comments:

Barry Kelly said...

Some will not accept your definitions until you also include non-determinism; specifically, that concurrency is inherently non-deterministic.

John Cowan said...

Juggling is completely deterministic.

John Cowan said...

By "conceptually simultaneous" I mean that we understand or imagine concurrent processes as if they executed truly simultaneously. Consider a simple pipeline of three stages: one that reads from a file and breaks it into lines, writing each line to its output, one that changes all upper-case letters in each input line to lower case, and one that writes its input lines to a file. These can execute serially on a single processor, but constitute a concurrent program with three loops. It would be possible to write this program without concurrency, for example by using an event loop handling events like "Block successfully read from file" and "Block successfully written to file", but it would be far harder to understand.

Consider adding up the rows of a matrix of size 8 x N on an 8-core CPU. This can execute entirely in parallel because of the limited form its concurrency takes; it is a so-called embarrassingly parallel problem. In particular, there would be no point in making it concurrent if parallel executing engines were not available. The upper casing example above can be executed in parallel as well by taking separate parts of the input and writing to separate parts of the output, with a gain in speed (which is the point of parallelism).

Some problems gain no speedup from parallelization: matching regular expressions is one of them.

John Cowan said...

Actually, you'd get only trivial speedup or none at all at all unless the files were on different disks attached to different disk controllers. If so, that would be the source of the parallelism. Assuming the files were on the same disk, the only gain would be from avoidance of context-swapping on the single CPU, which might be overwhelmed by the cost of locking communication buffers. Pipelines can always be executed in parallel, but only some of them gain thereby: consider a similar three-stage pipeline in which the middle stage reverses the order of inputs.

It is not known whether any problems are inherently sequential, but it's suspected that addition of arbitrarily large integers is. It could be performed concurrently with one process per digit, but each digit must feed a carry into the next digit to the left, making parallel execution useless.

John Cowan said...

Rigor is not my department.

Parallel programs are technically concurrent, but for the most part, their concurrency is trivial.