1.. _using-concurrent: 2 3Using Concurrent Haskell 4------------------------ 5 6.. index:: 7 single: Concurrent Haskell; using 8 9GHC supports Concurrent Haskell by default, without requiring a special 10option or libraries compiled in a certain way. To get access to the 11support libraries for Concurrent Haskell, just import 12:base-ref:`Control.Concurrent.`. More information on Concurrent Haskell is 13provided in the documentation for that module. 14 15Optionally, the program may be linked with the :ghc-flag:`-threaded` option (see 16:ref:`options-linker`. This provides two benefits: 17 18- It enables the :rts-flag:`-N ⟨x⟩` to be used, which allows threads to run in 19 parallelism on a multi-processor or multi-core machine. See :ref:`using-smp`. 20 21- If a thread makes a foreign call (and the call is not marked 22 ``unsafe``), then other Haskell threads in the program will continue 23 to run while the foreign call is in progress. Additionally, 24 ``foreign export``\ ed Haskell functions may be called from multiple 25 OS threads simultaneously. See :ref:`ffi-threads`. 26 27The following RTS option(s) affect the behaviour of Concurrent Haskell 28programs: 29 30.. index:: 31 single: RTS options; concurrent 32 33.. rts-flag:: -C ⟨s⟩ 34 35 :default: 20 milliseconds 36 37 Sets the context switch interval to ⟨s⟩ seconds. 38 A context switch will occur at the next heap block allocation after 39 the timer expires (a heap block allocation occurs every 4k of 40 allocation). With ``-C0`` or ``-C``, context switches will occur as 41 often as possible (at every heap block allocation). 42 43.. _using-smp: 44 45Using SMP parallelism 46--------------------- 47 48.. index:: 49 single: parallelism 50 single: SMP 51 52GHC supports running Haskell programs in parallel on an SMP (symmetric 53multiprocessor). 54 55There's a fine distinction between *concurrency* and *parallelism*: 56parallelism is all about making your program run *faster* by making use 57of multiple processors simultaneously. Concurrency, on the other hand, 58is a means of abstraction: it is a convenient way to structure a program 59that must respond to multiple asynchronous events. 60 61However, the two terms are certainly related. By making use of multiple 62CPUs it is possible to run concurrent threads in parallel, and this is 63exactly what GHC's SMP parallelism support does. But it is also possible 64to obtain performance improvements with parallelism on programs that do 65not use concurrency. This section describes how to use GHC to compile 66and run parallel programs, in :ref:`lang-parallel` we describe the 67language features that affect parallelism. 68 69.. _parallel-compile-options: 70 71Compile-time options for SMP parallelism 72~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 73 74In order to make use of multiple CPUs, your program must be linked with 75the :ghc-flag:`-threaded` option (see :ref:`options-linker`). Additionally, the 76following compiler options affect parallelism: 77 78.. ghc-flag:: -feager-blackholing 79 :shortdesc: Turn on :ref:`eager blackholing <parallel-compile-options>` 80 :type: dynamic 81 :category: 82 :noindex: 83 84 Blackholing is the act of marking a thunk (lazy computation) as 85 being under evaluation. It is useful for three reasons: firstly it 86 lets us detect certain kinds of infinite loop (the 87 ``NonTermination`` exception), secondly it avoids certain kinds of 88 space leak, and thirdly it avoids repeating a computation in a 89 parallel program, because we can tell when a computation is already 90 in progress. 91 92 The option :ghc-flag:`-feager-blackholing` causes each thunk to be 93 blackholed as soon as evaluation begins. The default is "lazy 94 blackholing", whereby thunks are only marked as being under 95 evaluation when a thread is paused for some reason. Lazy blackholing 96 is typically more efficient (by 1-2% or so), because most thunks 97 don't need to be blackholed. However, eager blackholing can avoid 98 more repeated computation in a parallel program, and this often 99 turns out to be important for parallelism. 100 101 We recommend compiling any code that is intended to be run in 102 parallel with the :ghc-flag:`-feager-blackholing` flag. 103 104.. _parallel-options: 105 106RTS options for SMP parallelism 107~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 108 109There are two ways to run a program on multiple processors: call 110:base-ref:`Control.Concurrent.setNumCapabilities` from your program, or 111use the RTS :rts-flag:`-N ⟨x⟩` options. 112 113.. rts-flag:: -N ⟨x⟩ 114 -maxN ⟨x⟩ 115 116 Use ⟨x⟩ simultaneous threads when running the program. 117 118 The runtime manages a set of virtual processors, which we call 119 *capabilities*, the number of which is determined by the ``-N`` 120 option. Each capability can run one Haskell thread at a time, so the 121 number of capabilities is equal to the number of Haskell threads 122 that can run physically in parallel. A capability is animated by one 123 or more OS threads; the runtime manages a pool of OS threads for 124 each capability, so that if a Haskell thread makes a foreign call 125 (see :ref:`ffi-threads`) another OS thread can take over that 126 capability. 127 128 Normally ⟨x⟩ should be chosen to match the number of CPU cores on 129 the machine [1]_. For example, on a dual-core machine we would 130 probably use ``+RTS -N2 -RTS``. 131 132 Omitting ⟨x⟩, i.e. ``+RTS -N -RTS``, lets the runtime choose the 133 value of ⟨x⟩ itself based on how many processors are in your 134 machine. 135 136 Omitting ``-N⟨x⟩`` entirely means ``-N1``. 137 138 With ``-maxN⟨x⟩``, i.e. ``+RTS -maxN3 -RTS``, the runtime will choose 139 at most (x), also limited by the number of processors on the system. 140 Omitting (x) is an error, if you need a default use option ``-N``. 141 142 Be careful when using all the processors in your machine: if some of 143 your processors are in use by other programs, this can actually harm 144 performance rather than improve it. Asking GHC to create more capabilities 145 than you have physical threads is almost always a bad idea. 146 147 Setting ``-N`` also has the effect of enabling the parallel garbage 148 collector (see :ref:`rts-options-gc`). 149 150 The current value of the ``-N`` option is available to the Haskell 151 program via ``Control.Concurrent.getNumCapabilities``, and it may be 152 changed while the program is running by calling 153 ``Control.Concurrent.setNumCapabilities``. 154 155The following options affect the way the runtime schedules threads on 156CPUs: 157 158.. rts-flag:: -qa 159 160 Use the OS's affinity facilities to try to pin OS threads to CPU 161 cores. 162 163 When this option is enabled, the OS threads for a capability :math:`i` are 164 bound to the CPU core :math:`i` using the API provided by the OS for setting 165 thread affinity. e.g. on Linux GHC uses ``sched_setaffinity()``. 166 167 Depending on your workload and the other activity on the machine, 168 this may or may not result in a performance improvement. We 169 recommend trying it out and measuring the difference. 170 171.. rts-flag:: -qm 172 173 Disable automatic migration for load balancing. Normally the runtime 174 will automatically try to schedule threads across the available CPUs 175 to make use of idle CPUs; this option disables that behaviour. Note 176 that migration only applies to threads; sparks created by ``par`` 177 are load-balanced separately by work-stealing. 178 179 This option is probably only of use for concurrent programs that 180 explicitly schedule threads onto CPUs with 181 :base-ref:`Control.Concurrent.forkOn`. 182 183Hints for using SMP parallelism 184~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 185 186Add the :rts-flag:`-s [⟨file⟩]` RTS option when running the program to see 187timing stats, which will help to tell you whether your program got faster by 188using more CPUs or not. If the user time is greater than the elapsed time, then 189the program used more than one CPU. You should also run the program without 190:rts-flag:`-N ⟨x⟩` for comparison. 191 192The output of ``+RTS -s`` tells you how many "sparks" were created and 193executed during the run of the program (see :ref:`rts-options-gc`), 194which will give you an idea how well your ``par`` annotations are 195working. 196 197GHC's parallelism support has improved in 6.12.1 as a result of much 198experimentation and tuning in the runtime system. We'd still be 199interested to hear how well it works for you, and we're also interested 200in collecting parallel programs to add to our benchmarking suite. 201 202.. [1] Whether hyperthreading cores should be counted or not is an open 203 question; please feel free to experiment and let us know what results you 204 find. 205