1======================
2Using Polly with Clang
3======================
4
5This documentation discusses how Polly can be used in Clang to automatically
6optimize C/C++ code during compilation.
7
8
9.. warning::
10
11  Warning: clang/LLVM/Polly need to be in sync (compiled from the same SVN
12  revision).
13
14Make Polly available from Clang
15===============================
16
17Polly is available through clang, opt, and bugpoint, if Polly was checked out
18into tools/polly before compilation. No further configuration is needed.
19
20Optimizing with Polly
21=====================
22
23Optimizing with Polly is as easy as adding -O3 -mllvm -polly to your compiler
24flags (Polly is not available unless optimizations are enabled, such as
25-O1,-O2,-O3; Optimizing for size with -Os or -Oz is not recommended).
26
27.. code-block:: console
28
29  clang -O3 -mllvm -polly file.c
30
31Automatic OpenMP code generation
32================================
33
34To automatically detect parallel loops and generate OpenMP code for them you
35also need to add -mllvm -polly-parallel -lgomp to your CFLAGS.
36
37.. code-block:: console
38
39  clang -O3 -mllvm -polly -mllvm -polly-parallel -lgomp file.c
40
41Switching the OpenMP backend
42----------------------------
43
44The following CL switch allows to choose Polly's OpenMP-backend:
45
46       -polly-omp-backend[=BACKEND]
47              choose the OpenMP backend; BACKEND can be 'GNU' (the default) or 'LLVM';
48
49The OpenMP backends can be further influenced using the following CL switches:
50
51
52       -polly-num-threads[=NUM]
53              set the number of threads to use; NUM may be any positive integer (default: 0, which equals automatic/OMP runtime);
54
55       -polly-scheduling[=SCHED]
56              set the OpenMP scheduling type; SCHED can be 'static', 'dynamic', 'guided' or 'runtime' (the default);
57
58       -polly-scheduling-chunksize[=CHUNK]
59              set the chunksize (for the selected scheduling type); CHUNK may be any strictly positive integer (otherwise it will default to 1);
60
61Note that at the time of writing, the GNU backend may only use the
62`polly-num-threads` and `polly-scheduling` switches, where the latter also has
63to be set to "runtime".
64
65Example: Use alternative backend with dynamic scheduling, four threads and
66chunksize of one (additional switches).
67
68.. code-block:: console
69
70  -mllvm -polly-omp-backend=LLVM -mllvm -polly-num-threads=4
71  -mllvm -polly-scheduling=dynamic -mllvm -polly-scheduling-chunksize=1
72
73Automatic Vector code generation
74================================
75
76Automatic vector code generation can be enabled by adding -mllvm
77-polly-vectorizer=stripmine to your CFLAGS.
78
79.. code-block:: console
80
81  clang -O3 -mllvm -polly -mllvm -polly-vectorizer=stripmine file.c
82
83Isolate the Polly passes
84========================
85
86Polly's analysis and transformation passes are run with many other
87passes of the pass manager's pipeline.  Some of passes that run before
88Polly are essential for its working, for instance the canonicalization
89of loop.  Therefore Polly is unable to optimize code straight out of
90clang's -O0 output.
91
92To get the LLVM-IR that Polly sees in the optimization pipeline, use the
93command:
94
95.. code-block:: console
96
97  clang file.c -c -O3 -mllvm -polly -mllvm -polly-dump-before-file=before-polly.ll
98
99This writes a file 'before-polly.ll' containing the LLVM-IR as passed to
100polly, after SSA transformation, loop canonicalization, inlining and
101other passes.
102
103Thereafter, any Polly pass can be run over 'before-polly.ll' using the
104'opt' tool.  To found out which Polly passes are active in the standard
105pipeline, see the output of
106
107.. code-block:: console
108
109  clang file.c -c -O3 -mllvm -polly -mllvm -debug-pass=Arguments
110
111The Polly's passes are those between '-polly-detect' and
112'-polly-codegen'. Analysis passes can be omitted.  At the time of this
113writing, the default Polly pass pipeline is:
114
115.. code-block:: console
116
117  opt before-polly.ll -polly-simplify -polly-optree -polly-delicm -polly-simplify -polly-prune-unprofitable -polly-opt-isl -polly-codegen
118
119Note that this uses LLVM's old/legacy pass manager.
120
121For completeness, here are some other methods that generates IR
122suitable for processing with Polly from C/C++/Objective C source code.
123The previous method is the recommended one.
124
125The following generates unoptimized LLVM-IR ('-O0', which is the
126default) and runs the canonicalizing passes on it
127('-polly-canonicalize'). This does /not/ include all the passes that run
128before Polly in the default pass pipeline.  The '-disable-O0-optnone'
129option is required because otherwise clang adds an 'optnone' attribute
130to all functions such that it is skipped by most optimization passes.
131This is meant to stop LTO builds to optimize these functions in the
132linking phase anyway.
133
134.. code-block:: console
135
136  clang file.c -c -O0 -Xclang -disable-O0-optnone -emit-llvm -S -o - | opt -polly-canonicalize -S
137
138The option '-disable-llvm-passes' disables all LLVM passes, even those
139that run at -O0.  Passing -O1 (or any optimization level other than -O0)
140avoids that the 'optnone' attribute is added.
141
142.. code-block:: console
143
144  clang file.c -c -O1 -Xclang -disable-llvm-passes -emit-llvm -S -o - | opt -polly-canonicalize -S
145
146As another alternative, Polly can be pushed in front of the pass
147pipeline, and then its output dumped.  This implicitly runs the
148'-polly-canonicalize' passes.
149
150.. code-block:: console
151
152  clang file.c -c -O3 -mllvm -polly -mllvm -polly-position=early -mllvm -polly-dump-before-file=before-polly.ll
153
154Further options
155===============
156Polly supports further options that are mainly useful for the development or the
157analysis of Polly. The relevant options can be added to clang by appending
158-mllvm -option-name to the CFLAGS or the clang command line.
159
160Limit Polly to a single function
161--------------------------------
162
163To limit the execution of Polly to a single function, use the option
164-polly-only-func=functionname.
165
166Disable LLVM-IR generation
167--------------------------
168
169Polly normally regenerates LLVM-IR from the Polyhedral representation. To only
170see the effects of the preparing transformation, but to disable Polly code
171generation add the option polly-no-codegen.
172
173Graphical view of the SCoPs
174---------------------------
175Polly can use graphviz to show the SCoPs it detects in a program. The relevant
176options are -polly-show, -polly-show-only, -polly-dot and -polly-dot-only. The
177'show' options automatically run dotty or another graphviz viewer to show the
178scops graphically. The 'dot' options store for each function a dot file that
179highlights the detected SCoPs. If 'only' is appended at the end of the option,
180the basic blocks are shown without the statements the contain.
181
182Change/Disable the Optimizer
183----------------------------
184
185Polly uses by default the isl scheduling optimizer. The isl optimizer optimizes
186for data-locality and parallelism using the Pluto algorithm.
187To disable the optimizer entirely use the option -polly-optimizer=none.
188
189Disable tiling in the optimizer
190-------------------------------
191
192By default both optimizers perform tiling, if possible. In case this is not
193wanted the option -polly-tiling=false can be used to disable it. (This option
194disables tiling for both optimizers).
195
196Import / Export
197---------------
198
199The flags -polly-import and -polly-export allow the export and reimport of the
200polyhedral representation. By exporting, modifying and reimporting the
201polyhedral representation externally calculated transformations can be
202applied. This enables external optimizers or the manual optimization of
203specific SCoPs.
204
205Viewing Polly Diagnostics with opt-viewer
206-----------------------------------------
207
208The flag -fsave-optimization-record will generate .opt.yaml files when compiling
209your program. These yaml files contain information about each emitted remark.
210Ensure that you have Python 2.7 with PyYaml and Pygments Python Packages.
211To run opt-viewer:
212
213.. code-block:: console
214
215   llvm/tools/opt-viewer/opt-viewer.py -source-dir /path/to/program/src/ \
216      /path/to/program/src/foo.opt.yaml \
217      /path/to/program/src/bar.opt.yaml \
218      -o ./output
219
220Include all yaml files (use \*.opt.yaml when specifying which yaml files to view)
221to view all diagnostics from your program in opt-viewer. Compile with `PGO
222<https://clang.llvm.org/docs/UsersManual.html#profiling-with-instrumentation>`_ to view
223Hotness information in opt-viewer. Resulting html files can be viewed in an internet browser.
224