README.md
1**You may be interested in switching to [std-simd](https://github.com/VcDevel/std-simd).**
2GCC 11 includes an experimental version of `std::simd` as part of libstdc++, which also works with clang.
3Features present in Vc 1.4 and not present in *std-simd* will eventually turn into Vc 2.0,which then depends on *std-simd*.
4
5# Vc: portable, zero-overhead C++ types for explicitly data-parallel programming
6
7Recent generations of CPUs, and GPUs in particular, require data-parallel codes
8for full efficiency. Data parallelism requires that the same sequence of
9operations is applied to different input data. CPUs and GPUs can thus reduce
10the necessary hardware for instruction decoding and scheduling in favor of more
11arithmetic and logic units, which execute the same instructions synchronously.
12On CPU architectures this is implemented via SIMD registers and instructions.
13A single SIMD register can store N values and a single SIMD instruction can
14execute N operations on those values. On GPU architectures N threads run in
15perfect sync, fed by a single instruction decoder/scheduler. Each thread has
16local memory and a given index to calculate the offsets in memory for loads and
17stores.
18
19Current C++ compilers can do automatic transformation of scalar codes to SIMD
20instructions (auto-vectorization). However, the compiler must reconstruct an
21intrinsic property of the algorithm that was lost when the developer wrote a
22purely scalar implementation in C++. Consequently, C++ compilers cannot
23vectorize any given code to its most efficient data-parallel variant.
24Especially larger data-parallel loops, spanning over multiple functions or even
25translation units, will often not be transformed into efficient SIMD code.
26
27The Vc library provides the missing link. Its types enable explicitly stating
28data-parallel operations on multiple values. The parallelism is therefore added
29via the type system. Competing approaches state the parallelism via new control
30structures and consequently new semantics inside the body of these control
31structures.
32
33Vc is a free software library to ease explicit vectorization of C++ code. It
34has an intuitive API and provides portability between different compilers and
35compiler versions as well as portability between different vector instruction
36sets. Thus an application written with Vc can be compiled for:
37
38* AVX and AVX2
39* SSE2 up to SSE4.2 or SSE4a
40* Scalar
41* AVX-512 (Vc 2 development)
42* NEON (in development)
43* NVIDIA GPUs / CUDA (research)
44
45After Intel dropped MIC support with ICC 18, Vc 1.4 also removes support for it.
46
47## Examples
48
49### Usage on Compiler Explorer
50
51* [Simdize Example](https://godbolt.org/z/JVEM2j)
52* [Total momentum and time stepping of `std::vector<Particle>`](https://godbolt.org/z/JNdkL9)
53* [Matrix Example](https://godbolt.org/z/fFEkuX): This uses vertical
54 vectorization which does not scale to different vector sizes. However, the
55 example is instructive to compare it with similar solutions of other languages
56 or libraries.
57* [N-vortex solver](https://godbolt.org/z/4o1cg_) showing `simdize`d iteration
58 over many `std::vector<float>`. Note how [important the `-march` flag is, compared
59 to plain `-mavx2 -mfma`](https://godbolt.org/z/hKiOjr).
60
61### Scalar Product
62
63Let's start from the code for calculating a 3D scalar product using builtin floats:
64```cpp
65using Vec3D = std::array<float, 3>;
66float scalar_product(Vec3D a, Vec3D b) {
67 return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
68}
69```
70Using Vc, we can easily vectorize the code using the `float_v` type:
71```cpp
72using Vc::float_v
73using Vec3D = std::array<float_v, 3>;
74float_v scalar_product(Vec3D a, Vec3D b) {
75 return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
76}
77```
78The above will scale to 1, 4, 8, 16, etc. scalar products calculated in parallel, depending
79on the target hardware's capabilities.
80
81For comparison, the same vectorization using Intel SSE intrinsics is more verbose and uses
82prefix notation (i.e. function calls):
83```cpp
84using Vec3D = std::array<__m128, 3>;
85__m128 scalar_product(Vec3D a, Vec3D b) {
86 return _mm_add_ps(_mm_add_ps(_mm_mul_ps(a[0], b[0]), _mm_mul_ps(a[1], b[1])),
87 _mm_mul_ps(a[2], b[2]));
88}
89```
90The above will neither scale to AVX, AVX-512, etc. nor is it portable to other SIMD ISAs.
91
92## Build Requirements
93
94cmake >= 3.0
95
96C++11 Compiler:
97
98* GCC >= 4.8.1
99* clang >= 3.4
100* ICC >= 18.0.5
101* Visual Studio 2015 (64-bit target)
102
103
104## Building and Installing Vc
105
106* After cloning, you need to initialize Vc's git submodules:
107
108```sh
109git submodule update --init
110```
111
112* Create a build directory:
113
114```sh
115$ mkdir build
116$ cd build
117```
118
119* Call cmake with the relevant options:
120
121```sh
122$ cmake -DCMAKE_INSTALL_PREFIX=/opt/Vc -DBUILD_TESTING=OFF <srcdir>
123```
124
125* Build and install:
126
127```sh
128$ make -j16
129$ make install
130```
131
132## Documentation
133
134The documentation is generated via [doxygen](http://doxygen.org). You can build
135the documentation by running `doxygen` in the `doc` subdirectory.
136Alternatively, you can find nightly builds of the documentation at:
137
138* [1.4 branch](https://vcdevel.github.io/Vc-1.4/)
139* [1.4.2 release](https://vcdevel.github.io/Vc-1.4.2/)
140* [1.4.1 release](https://vcdevel.github.io/Vc-1.4.1/)
141* [1.4.0 release](https://vcdevel.github.io/Vc-1.4.0/)
142* [1.3 branch](https://vcdevel.github.io/Vc-1.3/)
143* [1.3.0 release](https://vcdevel.github.io/Vc-1.3.0/)
144* [1.2.0 release](https://vcdevel.github.io/Vc-1.2.0/)
145* [1.1.0 release](https://vcdevel.github.io/Vc-1.1.0/)
146* [0.7 branch](https://vcdevel.github.io/Vc-0.7/)
147
148## Publications
149
150* [M. Kretz, "Extending C++ for Explicit Data-Parallel Programming via SIMD
151 Vector Types", Goethe University Frankfurt, Dissertation,
152 2015.](http://publikationen.ub.uni-frankfurt.de/frontdoor/index/index/docId/38415)
153* [M. Kretz and V. Lindenstruth, "Vc: A C++ library for explicit
154 vectorization", Software: Practice and Experience,
155 2011.](http://dx.doi.org/10.1002/spe.1149)
156* [M. Kretz, "Efficient Use of Multi- and Many-Core Systems with Vectorization
157 and Multithreading", University of Heidelberg,
158 2009.](http://code.compeng.uni-frankfurt.de/attachments/13/Diplomarbeit.pdf)
159
160[Work on integrating the functionality of Vc in the C++ standard library.](
161https://github.com/VcDevel/Vc/wiki/ISO-Standardization-of-the-Vector-classes)
162
163## License
164
165Vc is released under the terms of the [3-clause BSD license](http://opensource.org/licenses/BSD-3-Clause).
166