ppcg - OpenGrok cross reference for /dports/devel/llvm-cheri/llvm-project-37c49ff00e3eadce5d8703fdc4497f28458c64a8/polly/lib/External/ppcg/

Requirements:

- automake, autoconf, libtool
	(not needed when compiling a release)
- pkg-config (http://www.freedesktop.org/wiki/Software/pkg-config)
	(not needed when compiling a release using the included isl and pet)
- gmp (http://gmplib.org/)
- libyaml (http://pyyaml.org/wiki/LibYAML)
	(only needed if you want to compile the pet executable)
- LLVM/clang libraries, 2.9 or higher (http://clang.llvm.org/get_started.html)
	Unless you have some other reasons for wanting to use the svn version,
	it is best to install the latest release (3.9).
	For more details, see pet/README.

If you are installing on Ubuntu, then you can install the following packages:

automake autoconf libtool pkg-config libgmp3-dev libyaml-dev libclang-dev llvm

Note that you need at least version 3.2 of libclang-dev (ubuntu raring).
Older versions of this package did not include the required libraries.
If you are using an older version of ubuntu, then you need to compile and
install LLVM/clang from source.


Preparing:

Grab the latest release and extract it or get the source from
the git repository as follows.  This process requires autoconf,
automake, libtool and pkg-config.

	git clone git://repo.or.cz/ppcg.git
	cd ppcg
	./get_submodules.sh
	./autogen.sh


Compilation:

	./configure
	make
	make check

If you have installed any of the required libraries in a non-standard
location, then you may need to use the --with-gmp-prefix,
--with-libyaml-prefix and/or --with-clang-prefix options
when calling "./configure".


Using PPCG to generate CUDA or OpenCL code

To convert a fragment of a C program to CUDA, insert a line containing

	#pragma scop

before the fragment and add a line containing

	#pragma endscop

after the fragment.  To generate CUDA code run

	ppcg --target=cuda file.c

where file.c is the file containing the fragment.  The generated
code is stored in file_host.cu and file_kernel.cu.

To generate OpenCL code run

	ppcg --target=opencl file.c

where file.c is the file containing the fragment.  The generated code
is stored in file_host.c and file_kernel.cl.


Specifying tile, grid and block sizes

The iterations space tile size, grid size and block size can
be specified using the --sizes option.  The argument is a union map
in isl notation mapping kernels identified by their sequence number
in a "kernel" space to singleton sets in the "tile", "grid" and "block"
spaces.  The sizes are specified outermost to innermost.

The dimension of the "tile" space indicates the (maximal) number of loop
dimensions to tile.  The elements of the single integer tuple
specify the tile sizes in each dimension.
In case of hybrid tiling, the first element is half the size of
the tile in the time (sequential) dimension.  The second element
specifies the number of elements in the base of the hexagon.
The remaining elements specify the tile sizes in the remaining space
dimensions.

The dimension of the "grid" space indicates the (maximal) number of block
dimensions in the grid.  The elements of the single integer tuple
specify the number of blocks in each dimension.

The dimension of the "block" space indicates the (maximal) number of thread
dimensions in the grid.  The elements of the single integer tuple
specify the number of threads in each dimension.

For example,

    { kernel[0] -> tile[64,64]; kernel[i] -> block[16] : i != 4 }

specifies that in kernel 0, two loops should be tiled with a tile
size of 64 in both dimensions and that all kernels except kernel 4
should be run using a block of 16 threads.

Since PPCG performs some scheduling, it can be difficult to predict
what exactly will end up in a kernel.  If you want to specify
tile, grid or block sizes, you may want to run PPCG first with the defaults,
examine the kernels and then run PPCG again with the desired sizes.
Instead of examining the kernels, you can also specify the option
--dump-sizes on the first run to obtain the effectively used default sizes.


Compiling the generated CUDA code with nvcc

To get optimal performance from nvcc, it is important to choose --arch
according to your target GPU.  Specifically, use the flag "--arch sm_20"
for fermi, "--arch sm_30" for GK10x Kepler and "--arch sm_35" for
GK110 Kepler.  We discourage the use of older cards as we have seen
correctness issues with compilation for older architectures.
Note that in the absence of any --arch flag, nvcc defaults to
"--arch sm_13". This will not only be slower, but can also cause
correctness issues.
If you want to obtain results that are identical to those obtained
by the original code, then you may need to disable some optimizations
by passing the "--fmad=false" option.


Compiling the generated OpenCL code with gcc

To compile the host code you need to link against the file
ocl_utilities.c which contains utility functions used by the generated
OpenCL host code.  To compile the host code with gcc, run

  gcc -std=c99 file_host.c ocl_utilities.c -lOpenCL

Note that we have experienced the generated OpenCL code freezing
on some inputs (e.g., the PolyBench symm benchmark) when using
at least some version of the Nvidia OpenCL library, while the
corresponding CUDA code runs fine.
We have experienced no such freezes when using AMD, ARM or Intel
OpenCL libraries.

By default, the compiled executable will need the _kernel.cl file at
run time.  Alternatively, the option --opencl-embed-kernel-code may be
given to place the kernel code in a string literal.  The kernel code is
then compiled into the host binary, such that the _kernel.cl file is no
longer needed at run time.  Any kernel include files, in particular
those supplied using --opencl-include-file, will still be required at
run time.


Function calls

Function calls inside the analyzed fragment are reproduced
in the CUDA or OpenCL code, but for now it is left to the user
to make sure that the functions that are being called are
available from the generated kernels.

In the case of OpenCL code, the --opencl-include-file option
may be used to specify one or more files to be #include'd
from the generated code.  These files may then contain
the definitions of the functions being called from the
program fragment.  If the pathnames of the included files
are relative to the current directory, then you may need
to additionally specify the --opencl-compiler-options=-I.
to make sure that the files can be found by the OpenCL compiler.
The included files may contain definitions of types used by the
generated kernels.  By default, PPCG generates definitions for
types as needed, but these definitions may collide with those in
the included files, as PPCG does not consider the contents of the
included files.  The --no-opencl-print-kernel-types will prevent
PPCG from generating type definitions.


GNU extensions

By default, PPCG may print out macro definitions that involve
GNU extensions such as __typeof__ and statement expressions.
Some compilers may not support these extensions.
In particular, OpenCL 1.2 beignet 1.1.1 (git-6de6918)
has been reported not to support __typeof__.
The use of these extensions can be turned off with the
--no-allow-gnu-extensions option.


Processing PolyBench

When processing a PolyBench/C 3.2 benchmark, you should always specify
-DPOLYBENCH_USE_C99_PROTO on the ppcg command line.  Otherwise, the source
files are inconsistent, having fixed size arrays but parametrically
bounded loops iterating over them.
However, you should not specify this define when compiling
the PPCG generated code using nvcc since CUDA does not support VLAs.


CUDA and function overloading

While CUDA supports function overloading based on the arguments types,
no such function overloading exists in the input language C.  Since PPCG
simply prints out the same function name as in the original code, this
may result in a different function being called based on the types
of the arguments.  For example, if the original code contains a call
to the function sqrt() with a float argument, then the argument will
be promoted to a double and the sqrt() function will be called.
In the transformed (CUDA) code, however, overloading will cause the
function sqrtf() to be called.  Until this issue has been resolved in PPCG,
we recommend that users either explicitly call the function sqrtf() or
explicitly cast the argument to double in the input code.


Contact

For bug reports, feature requests and questions,
contact http://groups.google.com/group/isl-development

Whenever you report a bug, please mention the exact version of PPCG
that you are using (output of "./ppcg --version").  If you are unable
to compile PPCG, then report the git version (output of "git describe")
or the version number included in the name of the tarball.


Citing PPCG

If you use PPCG for your research, you are invited to cite
the following paper.

@article{Verdoolaege2013PPCG,
    author = {Verdoolaege, Sven and Juega, Juan Carlos and Cohen, Albert and
		G\'{o}mez, Jos{\'e} Ignacio and Tenllado, Christian and
		Catthoor, Francky},
    title = {Polyhedral parallel code generation for CUDA},
    journal = {ACM Trans. Archit. Code Optim.},
    issue_date = {January 2013},
    volume = {9},
    number = {4},
    month = jan,
    year = {2013},
    issn = {1544-3566},
    pages = {54:1--54:23},
    doi = {10.1145/2400682.2400713},
    acmid = {2400713},
    publisher = {ACM},
    address = {New York, NY, USA},
}
Name		Date	Size	#Lines	LOC
..		03-May-2022	-
m4/	H	09-Dec-2020	-	8,773	7,907
tests/	H	09-Dec-2020	-	451	337
ChangeLog	H A D	09-Dec-2020	812	30	29
GIT_HEAD_ID	H A D	09-Dec-2020	10	2	1
Makefile.am	H A D	09-Dec-2020	1.3 KiB	78	67
Makefile.in	H A D	09-Dec-2020	45.5 KiB	1,380	1,237
README	H A D	09-Dec-2020	9.2 KiB	247	182
aclocal.m4	H A D	09-Dec-2020	49.1 KiB	1,377	1,245
compile	H A D	09-Dec-2020	7.2 KiB	348	258
config.guess	H A D	09-Dec-2020	43.8 KiB	1,531	1,321
config.sub	H A D	09-Dec-2020	34.7 KiB	1,783	1,640
configure	H A D	09-Dec-2020	426.6 KiB	14,614	12,218
configure.ac	H A D	09-Dec-2020	2.2 KiB	84	74
cpu.c	H A D	09-Dec-2020	22.4 KiB	803	480
cpu.h	H A D	09-Dec-2020	319	16	10
cuda.c	H A D	09-Dec-2020	19.6 KiB	731	516
cuda.h	H A D	09-Dec-2020	331	14	10
cuda_common.c	H A D	09-Dec-2020	1.3 KiB	51	28
cuda_common.h	H A D	09-Dec-2020	261	16	11
depcomp	H A D	09-Dec-2020	20.4 KiB	709	460
external.c	H A D	09-Dec-2020	2.6 KiB	182	178
gpu.c	H A D	09-Dec-2020	184.2 KiB	5,850	3,527
gpu.h	H A D	09-Dec-2020	13.8 KiB	460	197
gpu_array_tile.c	H A D	09-Dec-2020	1.4 KiB	72	50
gpu_array_tile.h	H A D	09-Dec-2020	1.6 KiB	60	23
gpu_group.c	H A D	09-Dec-2020	54.8 KiB	1,829	1,093
gpu_group.h	H A D	09-Dec-2020	2.4 KiB	66	33
gpu_hybrid.c	H A D	09-Dec-2020	4.3 KiB	147	79
gpu_hybrid.h	H A D	09-Dec-2020	279	14	9
gpu_print.c	H A D	09-Dec-2020	8 KiB	311	197
gpu_print.h	H A D	09-Dec-2020	993	29	21
gpu_tree.c	H A D	09-Dec-2020	17.8 KiB	641	400
gpu_tree.h	H A D	09-Dec-2020	1.4 KiB	34	28
grouping.c	H A D	09-Dec-2020	21.2 KiB	685	404
hybrid.c	H A D	09-Dec-2020	64.9 KiB	2,243	1,252
hybrid.h	H A D	09-Dec-2020	1.5 KiB	42	33
install-sh	H A D	09-Dec-2020	13.7 KiB	528	351
ltmain.sh	H A D	09-Dec-2020	277 KiB	9,662	7,310
missing	H A D	09-Dec-2020	10.1 KiB	332	243
ocl_utilities.c	H A D	09-Dec-2020	5.8 KiB	175	150
ocl_utilities.h	H A D	09-Dec-2020	974	33	15
opencl.h	H A D	09-Dec-2020	212	12	8
opencl_test.sh.in	H A D	09-Dec-2020	1.8 KiB	79	68
polybench_test.sh.in	H A D	09-Dec-2020	2.5 KiB	110	93
ppcg.c	H A D	09-Dec-2020	33.3 KiB	1,068	643
ppcg.h	H A D	09-Dec-2020	4.6 KiB	129	55
ppcg_options.c	H A D	09-Dec-2020	5.7 KiB	137	108
ppcg_options.h	H A D	09-Dec-2020	2.4 KiB	101	52
print.c	H A D	09-Dec-2020	12 KiB	461	290
print.h	H A D	09-Dec-2020	1.5 KiB	41	32
schedule.c	H A D	09-Dec-2020	4 KiB	166	105
schedule.h	H A D	09-Dec-2020	574	22	15
test-driver	H A D	09-Dec-2020	4.2 KiB	140	84
util.c	H A D	09-Dec-2020	2.5 KiB	106	70
util.h	H A D	09-Dec-2020	557	23	15
version.c	H A D	09-Dec-2020	85	7	5