• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

.github/H22-Jun-2021-9070

Vc/H03-May-2022-40,73027,651

cmake/H03-May-2022-1,5161,429

doc/H03-May-2022-5,8103,501

examples/H03-May-2022-2,8121,948

godbolt/H22-Jun-2021-26,30226,299

math/H22-Jun-2021-355317

src/H22-Jun-2021-2,6821,885

tests/H03-May-2022-12,7489,609

.appveyor.ymlH A D22-Jun-2021449 2317

.clang-formatH A D22-Jun-20216.2 KiB13198

.gitignoreH A D22-Jun-202198 109

.gitmodulesH A D22-Jun-2021192 76

.travis.ymlH A D22-Jun-2021545 3831

CTestConfig.cmakeH A D22-Jun-2021358 1610

CTestCustom.cmakeH A D22-Jun-20211.2 KiB2220

INSTALLH A D22-Jun-202115 21

LICENSEH A D22-Jun-20211.5 KiB2421

MakefileH A D22-Jun-2021794 2820

README.mdH A D22-Jun-20216.2 KiB166130

Test_all_compilers.shH A D22-Jun-20213.7 KiB141124

Test_vc.shH A D22-Jun-2021559 2319

changeVersion.shH A D22-Jun-2021964 2720

configure-icc.shH A D22-Jun-202174 31

makeApidox.shH A D22-Jun-202173 84

makeApidoxLoop.shH A D22-Jun-2021345 1311

makeInternalDox.shH A D22-Jun-202173 73

makeRelease.shH A D22-Jun-20213.1 KiB9870

prepare_single_header.shH A D22-Jun-20211.3 KiB5547

print_target_architecture.cmakeH A D22-Jun-2021528 1613

test.cmakeH A D22-Jun-202120.2 KiB521446

README.md

1**You may be interested in switching to [std-simd](https://github.com/VcDevel/std-simd).**
2GCC 11 includes an experimental version of `std::simd` as part of libstdc++, which also works with clang.
3Features present in Vc 1.4 and not present in *std-simd* will eventually turn into Vc 2.0,which then depends on *std-simd*.
4
5# Vc: portable, zero-overhead C++ types for explicitly data-parallel programming
6
7Recent generations of CPUs, and GPUs in particular, require data-parallel codes
8for full efficiency. Data parallelism requires that the same sequence of
9operations is applied to different input data. CPUs and GPUs can thus reduce
10the necessary hardware for instruction decoding and scheduling in favor of more
11arithmetic and logic units, which execute the same instructions synchronously.
12On CPU architectures this is implemented via SIMD registers and instructions.
13A single SIMD register can store N values and a single SIMD instruction can
14execute N operations on those values. On GPU architectures N threads run in
15perfect sync, fed by a single instruction decoder/scheduler. Each thread has
16local memory and a given index to calculate the offsets in memory for loads and
17stores.
18
19Current C++ compilers can do automatic transformation of scalar codes to SIMD
20instructions (auto-vectorization). However, the compiler must reconstruct an
21intrinsic property of the algorithm that was lost when the developer wrote a
22purely scalar implementation in C++. Consequently, C++ compilers cannot
23vectorize any given code to its most efficient data-parallel variant.
24Especially larger data-parallel loops, spanning over multiple functions or even
25translation units, will often not be transformed into efficient SIMD code.
26
27The Vc library provides the missing link. Its types enable explicitly stating
28data-parallel operations on multiple values. The parallelism is therefore added
29via the type system. Competing approaches state the parallelism via new control
30structures and consequently new semantics inside the body of these control
31structures.
32
33Vc is a free software library to ease explicit vectorization of C++ code. It
34has an intuitive API and provides portability between different compilers and
35compiler versions as well as portability between different vector instruction
36sets. Thus an application written with Vc can be compiled for:
37
38* AVX and AVX2
39* SSE2 up to SSE4.2 or SSE4a
40* Scalar
41* AVX-512 (Vc 2 development)
42* NEON (in development)
43* NVIDIA GPUs / CUDA (research)
44
45After Intel dropped MIC support with ICC 18, Vc 1.4 also removes support for it.
46
47## Examples
48
49### Usage on Compiler Explorer
50
51* [Simdize Example](https://godbolt.org/z/JVEM2j)
52* [Total momentum and time stepping of `std::vector<Particle>`](https://godbolt.org/z/JNdkL9)
53* [Matrix Example](https://godbolt.org/z/fFEkuX): This uses vertical
54  vectorization which does not scale to different vector sizes. However, the
55  example is instructive to compare it with similar solutions of other languages
56  or libraries.
57* [N-vortex solver](https://godbolt.org/z/4o1cg_) showing `simdize`d iteration
58  over many `std::vector<float>`. Note how [important the `-march` flag is, compared
59  to plain `-mavx2 -mfma`](https://godbolt.org/z/hKiOjr).
60
61### Scalar Product
62
63Let's start from the code for calculating a 3D scalar product using builtin floats:
64```cpp
65using Vec3D = std::array<float, 3>;
66float scalar_product(Vec3D a, Vec3D b) {
67  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
68}
69```
70Using Vc, we can easily vectorize the code using the `float_v` type:
71```cpp
72using Vc::float_v
73using Vec3D = std::array<float_v, 3>;
74float_v scalar_product(Vec3D a, Vec3D b) {
75  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
76}
77```
78The above will scale to 1, 4, 8, 16, etc. scalar products calculated in parallel, depending
79on the target hardware's capabilities.
80
81For comparison, the same vectorization using Intel SSE intrinsics is more verbose and uses
82prefix notation (i.e. function calls):
83```cpp
84using Vec3D = std::array<__m128, 3>;
85__m128 scalar_product(Vec3D a, Vec3D b) {
86  return _mm_add_ps(_mm_add_ps(_mm_mul_ps(a[0], b[0]), _mm_mul_ps(a[1], b[1])),
87                    _mm_mul_ps(a[2], b[2]));
88}
89```
90The above will neither scale to AVX, AVX-512, etc. nor is it portable to other SIMD ISAs.
91
92## Build Requirements
93
94cmake >= 3.0
95
96C++11 Compiler:
97
98* GCC >= 4.8.1
99* clang >= 3.4
100* ICC >= 18.0.5
101* Visual Studio 2015 (64-bit target)
102
103
104## Building and Installing Vc
105
106* After cloning, you need to initialize Vc's git submodules:
107
108```sh
109git submodule update --init
110```
111
112* Create a build directory:
113
114```sh
115$ mkdir build
116$ cd build
117```
118
119* Call cmake with the relevant options:
120
121```sh
122$ cmake -DCMAKE_INSTALL_PREFIX=/opt/Vc -DBUILD_TESTING=OFF <srcdir>
123```
124
125* Build and install:
126
127```sh
128$ make -j16
129$ make install
130```
131
132## Documentation
133
134The documentation is generated via [doxygen](http://doxygen.org). You can build
135the documentation by running `doxygen` in the `doc` subdirectory.
136Alternatively, you can find nightly builds of the documentation at:
137
138* [1.4 branch](https://vcdevel.github.io/Vc-1.4/)
139* [1.4.2 release](https://vcdevel.github.io/Vc-1.4.2/)
140* [1.4.1 release](https://vcdevel.github.io/Vc-1.4.1/)
141* [1.4.0 release](https://vcdevel.github.io/Vc-1.4.0/)
142* [1.3 branch](https://vcdevel.github.io/Vc-1.3/)
143* [1.3.0 release](https://vcdevel.github.io/Vc-1.3.0/)
144* [1.2.0 release](https://vcdevel.github.io/Vc-1.2.0/)
145* [1.1.0 release](https://vcdevel.github.io/Vc-1.1.0/)
146* [0.7 branch](https://vcdevel.github.io/Vc-0.7/)
147
148## Publications
149
150* [M. Kretz, "Extending C++ for Explicit Data-Parallel Programming via SIMD
151  Vector Types", Goethe University Frankfurt, Dissertation,
152  2015.](http://publikationen.ub.uni-frankfurt.de/frontdoor/index/index/docId/38415)
153* [M. Kretz and V. Lindenstruth, "Vc: A C++ library for explicit
154  vectorization", Software: Practice and Experience,
155  2011.](http://dx.doi.org/10.1002/spe.1149)
156* [M. Kretz, "Efficient Use of Multi- and Many-Core Systems with Vectorization
157  and Multithreading", University of Heidelberg,
158  2009.](http://code.compeng.uni-frankfurt.de/attachments/13/Diplomarbeit.pdf)
159
160[Work on integrating the functionality of Vc in the C++ standard library.](
161https://github.com/VcDevel/Vc/wiki/ISO-Standardization-of-the-Vector-classes)
162
163## License
164
165Vc is released under the terms of the [3-clause BSD license](http://opensource.org/licenses/BSD-3-Clause).
166