1.. _using-concurrent:
2
3Using Concurrent Haskell
4------------------------
5
6.. index::
7   single: Concurrent Haskell; using
8
9GHC supports Concurrent Haskell by default, without requiring a special
10option or libraries compiled in a certain way. To get access to the
11support libraries for Concurrent Haskell, just import
12:base-ref:`Control.Concurrent.`. More information on Concurrent Haskell is
13provided in the documentation for that module.
14
15Optionally, the program may be linked with the :ghc-flag:`-threaded` option (see
16:ref:`options-linker`. This provides two benefits:
17
18- It enables the :rts-flag:`-N ⟨x⟩` to be used, which allows threads to run in
19  parallelism on a multi-processor or multi-core machine. See :ref:`using-smp`.
20
21- If a thread makes a foreign call (and the call is not marked
22  ``unsafe``), then other Haskell threads in the program will continue
23  to run while the foreign call is in progress. Additionally,
24  ``foreign export``\ ed Haskell functions may be called from multiple
25  OS threads simultaneously. See :ref:`ffi-threads`.
26
27The following RTS option(s) affect the behaviour of Concurrent Haskell
28programs:
29
30.. index::
31   single: RTS options; concurrent
32
33.. rts-flag:: -C ⟨s⟩
34
35    :default: 20 milliseconds
36
37    Sets the context switch interval to ⟨s⟩ seconds.
38    A context switch will occur at the next heap block allocation after
39    the timer expires (a heap block allocation occurs every 4k of
40    allocation). With ``-C0`` or ``-C``, context switches will occur as
41    often as possible (at every heap block allocation).
42
43.. _using-smp:
44
45Using SMP parallelism
46---------------------
47
48.. index::
49   single: parallelism
50   single: SMP
51
52GHC supports running Haskell programs in parallel on an SMP (symmetric
53multiprocessor).
54
55There's a fine distinction between *concurrency* and *parallelism*:
56parallelism is all about making your program run *faster* by making use
57of multiple processors simultaneously. Concurrency, on the other hand,
58is a means of abstraction: it is a convenient way to structure a program
59that must respond to multiple asynchronous events.
60
61However, the two terms are certainly related. By making use of multiple
62CPUs it is possible to run concurrent threads in parallel, and this is
63exactly what GHC's SMP parallelism support does. But it is also possible
64to obtain performance improvements with parallelism on programs that do
65not use concurrency. This section describes how to use GHC to compile
66and run parallel programs, in :ref:`lang-parallel` we describe the
67language features that affect parallelism.
68
69.. _parallel-compile-options:
70
71Compile-time options for SMP parallelism
72~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
73
74In order to make use of multiple CPUs, your program must be linked with
75the :ghc-flag:`-threaded` option (see :ref:`options-linker`). Additionally, the
76following compiler options affect parallelism:
77
78.. ghc-flag:: -feager-blackholing
79    :shortdesc: Turn on :ref:`eager blackholing <parallel-compile-options>`
80    :type: dynamic
81    :category:
82    :noindex:
83
84    Blackholing is the act of marking a thunk (lazy computation) as
85    being under evaluation. It is useful for three reasons: firstly it
86    lets us detect certain kinds of infinite loop (the
87    ``NonTermination`` exception), secondly it avoids certain kinds of
88    space leak, and thirdly it avoids repeating a computation in a
89    parallel program, because we can tell when a computation is already
90    in progress.
91
92    The option :ghc-flag:`-feager-blackholing` causes each thunk to be
93    blackholed as soon as evaluation begins. The default is "lazy
94    blackholing", whereby thunks are only marked as being under
95    evaluation when a thread is paused for some reason. Lazy blackholing
96    is typically more efficient (by 1-2% or so), because most thunks
97    don't need to be blackholed. However, eager blackholing can avoid
98    more repeated computation in a parallel program, and this often
99    turns out to be important for parallelism.
100
101    We recommend compiling any code that is intended to be run in
102    parallel with the :ghc-flag:`-feager-blackholing` flag.
103
104.. _parallel-options:
105
106RTS options for SMP parallelism
107~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
108
109There are two ways to run a program on multiple processors: call
110:base-ref:`Control.Concurrent.setNumCapabilities` from your program, or
111use the RTS :rts-flag:`-N ⟨x⟩` options.
112
113.. rts-flag:: -N ⟨x⟩
114              -maxN ⟨x⟩
115
116    Use ⟨x⟩ simultaneous threads when running the program.
117
118    The runtime manages a set of virtual processors, which we call
119    *capabilities*, the number of which is determined by the ``-N``
120    option. Each capability can run one Haskell thread at a time, so the
121    number of capabilities is equal to the number of Haskell threads
122    that can run physically in parallel. A capability is animated by one
123    or more OS threads; the runtime manages a pool of OS threads for
124    each capability, so that if a Haskell thread makes a foreign call
125    (see :ref:`ffi-threads`) another OS thread can take over that
126    capability.
127
128    Normally ⟨x⟩ should be chosen to match the number of CPU cores on
129    the machine [1]_. For example, on a dual-core machine we would
130    probably use ``+RTS -N2 -RTS``.
131
132    Omitting ⟨x⟩, i.e. ``+RTS -N -RTS``, lets the runtime choose the
133    value of ⟨x⟩ itself based on how many processors are in your
134    machine.
135
136    Omitting ``-N⟨x⟩`` entirely means ``-N1``.
137
138    With ``-maxN⟨x⟩``, i.e. ``+RTS -maxN3 -RTS``, the runtime will choose
139    at most (x), also limited by the number of processors on the system.
140    Omitting (x) is an error, if you need a default use option ``-N``.
141
142    Be careful when using all the processors in your machine: if some of
143    your processors are in use by other programs, this can actually harm
144    performance rather than improve it. Asking GHC to create more capabilities
145    than you have physical threads is almost always a bad idea.
146
147    Setting ``-N`` also has the effect of enabling the parallel garbage
148    collector (see :ref:`rts-options-gc`).
149
150    The current value of the ``-N`` option is available to the Haskell
151    program via ``Control.Concurrent.getNumCapabilities``, and it may be
152    changed while the program is running by calling
153    ``Control.Concurrent.setNumCapabilities``.
154
155The following options affect the way the runtime schedules threads on
156CPUs:
157
158.. rts-flag:: -qa
159
160    Use the OS's affinity facilities to try to pin OS threads to CPU
161    cores.
162
163    When this option is enabled, the OS threads for a capability :math:`i` are
164    bound to the CPU core :math:`i` using the API provided by the OS for setting
165    thread affinity. e.g. on Linux GHC uses ``sched_setaffinity()``.
166
167    Depending on your workload and the other activity on the machine,
168    this may or may not result in a performance improvement. We
169    recommend trying it out and measuring the difference.
170
171.. rts-flag:: -qm
172
173    Disable automatic migration for load balancing. Normally the runtime
174    will automatically try to schedule threads across the available CPUs
175    to make use of idle CPUs; this option disables that behaviour. Note
176    that migration only applies to threads; sparks created by ``par``
177    are load-balanced separately by work-stealing.
178
179    This option is probably only of use for concurrent programs that
180    explicitly schedule threads onto CPUs with
181    :base-ref:`Control.Concurrent.forkOn`.
182
183Hints for using SMP parallelism
184~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
185
186Add the :rts-flag:`-s [⟨file⟩]` RTS option when running the program to see
187timing stats, which will help to tell you whether your program got faster by
188using more CPUs or not. If the user time is greater than the elapsed time, then
189the program used more than one CPU. You should also run the program without
190:rts-flag:`-N ⟨x⟩` for comparison.
191
192The output of ``+RTS -s`` tells you how many "sparks" were created and
193executed during the run of the program (see :ref:`rts-options-gc`),
194which will give you an idea how well your ``par`` annotations are
195working.
196
197GHC's parallelism support has improved in 6.12.1 as a result of much
198experimentation and tuning in the runtime system. We'd still be
199interested to hear how well it works for you, and we're also interested
200in collecting parallel programs to add to our benchmarking suite.
201
202.. [1] Whether hyperthreading cores should be counted or not is an open
203       question; please feel free to experiment and let us know what results you
204       find.
205