2012-04-05

Concurrency vs. parallelism


Parallelism refers to physically simultaneous execution.  When you raise both arms above your head, you do so in parallel.  Nothing can be more parallel than the number of execution agents simultaneously available: on an 8-core system, you can add up 8 rows of a matrix in parallel, but not 16 rows.  Furthermore, some problems cannot be parallelized, whereas others can be executed entirely in parallel, and there are many cases in between.
Concurrency refers to conceptually simultaneous execution.  When you juggle balls, you are executing a concurrent program: despite appearances,jugglers only throw or catch one ball at a time.  A concurrent program can execute on a single execution agent, on as many agents as it has concurrent components, or anything in between, so concurrency does not depend on parallelism.  We usually speak of concurrency when there is interaction between the concurrent components.
Flow-based programming, of which a shell pipeline is the most familiar example, is based on the concurrent execution of mostly isolated processes that communicate via flows.  Because the individual components are isolated, they may be executed in parallel or not.  Because there are no constraints on what components may do with their inputs, parallel execution may or may not actually speed up the computation.

9 comments:

Barry Kelly said...

Some will not accept your definitions until you also include non-determinism; specifically, that concurrency is inherently non-deterministic.

John Cowan said...

Juggling is completely deterministic.

Unknown said...

Can you define your terms?

"Concurrency refers to conceptually simultaneous execution." What does "conceptually simultaneous" mean?

"We usually speak of concurrency when there is interaction between the concurrent components." Any but the most trivial parallel programs have interaction between the components. How does this distinguish parallel from concurrent?

Can you give an example of a parallel program that is not concurrent? One that is concurrent but not parallel?

John Cowan said...

By "conceptually simultaneous" I mean that we understand or imagine concurrent processes as if they executed truly simultaneously. Consider a simple pipeline of three stages: one that reads from a file and breaks it into lines, writing each line to its output, one that changes all upper-case letters in each input line to lower case, and one that writes its input lines to a file. These can execute serially on a single processor, but constitute a concurrent program with three loops. It would be possible to write this program without concurrency, for example by using an event loop handling events like "Block successfully read from file" and "Block successfully written to file", but it would be far harder to understand.

Consider adding up the rows of a matrix of size 8 x N on an 8-core CPU. This can execute entirely in parallel because of the limited form its concurrency takes; it is a so-called embarrassingly parallel problem. In particular, there would be no point in making it concurrent if parallel executing engines were not available. The upper casing example above can be executed in parallel as well by taking separate parts of the input and writing to separate parts of the output, with a gain in speed (which is the point of parallelism).

Some problems gain no speedup from parallelization: matching regular expressions is one of them.

Unknown said...

Your pipeline could also be executed by three processes running in parallel. After a little startup for filling the pipeline you'd get a decent speedup, of course depending on the granularity of the three activities.

So that example does not distinguish between concurrency and parallelism.

You say that regular expression matching is not parallel. Is it concurrent? I'm guessing not.

So I still don't see the difference between concurrent and parallel. Every parallel program is concurrent, and every concurrent program can be done in parallel, right?

John Cowan said...

Actually, you'd get only trivial speedup or none at all at all unless the files were on different disks attached to different disk controllers. If so, that would be the source of the parallelism. Assuming the files were on the same disk, the only gain would be from avoidance of context-swapping on the single CPU, which might be overwhelmed by the cost of locking communication buffers. Pipelines can always be executed in parallel, but only some of them gain thereby: consider a similar three-stage pipeline in which the middle stage reverses the order of inputs.

It is not known whether any problems are inherently sequential, but it's suspected that addition of arbitrarily large integers is. It could be performed concurrently with one process per digit, but each digit must feed a carry into the next digit to the left, making parallel execution useless.

Unknown said...

I'm not sure that that answers my question.

Is parallelism a subset of concurrency? Is it the other way around?

Can you define either concept?

I think those nouns are not definable, but we can define the adjectives "concurrent" and "parallel". But I'm not sure what they apply to. Is an algorithm concurrent? Is a program concurrent? Is the execution of a program concurrent?

I would really appreciate a sort-of rigorous definition.

John Cowan said...

Rigor is not my department.

Parallel programs are technically concurrent, but for the most part, their concurrency is trivial.

Unknown said...

Now you have me completely confused. Take everyone's favourite parallel program: quicksort. What do you consider its concurrency, and why is it trivial?