devel/vc/Vc-1.4.2

**You may be interested in switching to [std-simd](https://github.com/VcDevel/std-simd).**
GCC 11 includes an experimental version of `std::simd` as part of libstdc++, which also works with clang.
Features present in Vc 1.4 and not present in *std-simd* will eventually turn into Vc 2.0,which then depends on *std-simd*.

# Vc: portable, zero-overhead C++ types for explicitly data-parallel programming

Recent generations of CPUs, and GPUs in particular, require data-parallel codes
for full efficiency. Data parallelism requires that the same sequence of
operations is applied to different input data. CPUs and GPUs can thus reduce
the necessary hardware for instruction decoding and scheduling in favor of more
arithmetic and logic units, which execute the same instructions synchronously.
On CPU architectures this is implemented via SIMD registers and instructions.
A single SIMD register can store N values and a single SIMD instruction can
execute N operations on those values. On GPU architectures N threads run in
perfect sync, fed by a single instruction decoder/scheduler. Each thread has
local memory and a given index to calculate the offsets in memory for loads and
stores.

Current C++ compilers can do automatic transformation of scalar codes to SIMD
instructions (auto-vectorization). However, the compiler must reconstruct an
intrinsic property of the algorithm that was lost when the developer wrote a
purely scalar implementation in C++. Consequently, C++ compilers cannot
vectorize any given code to its most efficient data-parallel variant.
Especially larger data-parallel loops, spanning over multiple functions or even
translation units, will often not be transformed into efficient SIMD code.

The Vc library provides the missing link. Its types enable explicitly stating
data-parallel operations on multiple values. The parallelism is therefore added
via the type system. Competing approaches state the parallelism via new control
structures and consequently new semantics inside the body of these control
structures.

Vc is a free software library to ease explicit vectorization of C++ code. It
has an intuitive API and provides portability between different compilers and
compiler versions as well as portability between different vector instruction
sets. Thus an application written with Vc can be compiled for:

* AVX and AVX2
* SSE2 up to SSE4.2 or SSE4a
* Scalar
* AVX-512 (Vc 2 development)
* NEON (in development)
* NVIDIA GPUs / CUDA (research)

After Intel dropped MIC support with ICC 18, Vc 1.4 also removes support for it.

## Examples

### Usage on Compiler Explorer

* [Simdize Example](https://godbolt.org/z/JVEM2j)
* [Total momentum and time stepping of `std::vector<Particle>`](https://godbolt.org/z/JNdkL9)
* [Matrix Example](https://godbolt.org/z/fFEkuX): This uses vertical
  vectorization which does not scale to different vector sizes. However, the
  example is instructive to compare it with similar solutions of other languages
  or libraries.
* [N-vortex solver](https://godbolt.org/z/4o1cg_) showing `simdize`d iteration
  over many `std::vector<float>`. Note how [important the `-march` flag is, compared
  to plain `-mavx2 -mfma`](https://godbolt.org/z/hKiOjr).

### Scalar Product

Let's start from the code for calculating a 3D scalar product using builtin floats:
```cpp
using Vec3D = std::array<float, 3>;
float scalar_product(Vec3D a, Vec3D b) {
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
}
```
Using Vc, we can easily vectorize the code using the `float_v` type:
```cpp
using Vc::float_v
using Vec3D = std::array<float_v, 3>;
float_v scalar_product(Vec3D a, Vec3D b) {
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
}
```
The above will scale to 1, 4, 8, 16, etc. scalar products calculated in parallel, depending
on the target hardware's capabilities.

For comparison, the same vectorization using Intel SSE intrinsics is more verbose and uses
prefix notation (i.e. function calls):
```cpp
using Vec3D = std::array<__m128, 3>;
__m128 scalar_product(Vec3D a, Vec3D b) {
  return _mm_add_ps(_mm_add_ps(_mm_mul_ps(a[0], b[0]), _mm_mul_ps(a[1], b[1])),
                    _mm_mul_ps(a[2], b[2]));
}
```
The above will neither scale to AVX, AVX-512, etc. nor is it portable to other SIMD ISAs.

## Build Requirements

cmake >= 3.0

C++11 Compiler:

* GCC >= 4.8.1
* clang >= 3.4
* ICC >= 18.0.5
* Visual Studio 2015 (64-bit target)


## Building and Installing Vc

* After cloning, you need to initialize Vc's git submodules:

```sh
git submodule update --init
```

* Create a build directory:

```sh
$ mkdir build
$ cd build
```

* Call cmake with the relevant options:

```sh
$ cmake -DCMAKE_INSTALL_PREFIX=/opt/Vc -DBUILD_TESTING=OFF <srcdir>
```

* Build and install:

```sh
$ make -j16
$ make install
```

## Documentation

The documentation is generated via [doxygen](http://doxygen.org). You can build
the documentation by running `doxygen` in the `doc` subdirectory.
Alternatively, you can find nightly builds of the documentation at:

* [1.4 branch](https://vcdevel.github.io/Vc-1.4/)
* [1.4.2 release](https://vcdevel.github.io/Vc-1.4.2/)
* [1.4.1 release](https://vcdevel.github.io/Vc-1.4.1/)
* [1.4.0 release](https://vcdevel.github.io/Vc-1.4.0/)
* [1.3 branch](https://vcdevel.github.io/Vc-1.3/)
* [1.3.0 release](https://vcdevel.github.io/Vc-1.3.0/)
* [1.2.0 release](https://vcdevel.github.io/Vc-1.2.0/)
* [1.1.0 release](https://vcdevel.github.io/Vc-1.1.0/)
* [0.7 branch](https://vcdevel.github.io/Vc-0.7/)

## Publications

* [M. Kretz, "Extending C++ for Explicit Data-Parallel Programming via SIMD
  Vector Types", Goethe University Frankfurt, Dissertation,
  2015.](http://publikationen.ub.uni-frankfurt.de/frontdoor/index/index/docId/38415)
* [M. Kretz and V. Lindenstruth, "Vc: A C++ library for explicit
  vectorization", Software: Practice and Experience,
  2011.](http://dx.doi.org/10.1002/spe.1149)
* [M. Kretz, "Efficient Use of Multi- and Many-Core Systems with Vectorization
  and Multithreading", University of Heidelberg,
  2009.](http://code.compeng.uni-frankfurt.de/attachments/13/Diplomarbeit.pdf)

[Work on integrating the functionality of Vc in the C++ standard library.](
https://github.com/VcDevel/Vc/wiki/ISO-Standardization-of-the-Vector-classes)

## License

Vc is released under the terms of the [3-clause BSD license](http://opensource.org/licenses/BSD-3-Clause).
Name		Date	Size	#Lines	LOC
..		03-May-2022	-
.github/	H	22-Jun-2021	-	90	70
Vc/	H	03-May-2022	-	40,730	27,651
cmake/	H	03-May-2022	-	1,516	1,429
doc/	H	03-May-2022	-	5,810	3,501
examples/	H	03-May-2022	-	2,812	1,948
godbolt/	H	22-Jun-2021	-	26,302	26,299
math/	H	22-Jun-2021	-	355	317
src/	H	22-Jun-2021	-	2,682	1,885
tests/	H	03-May-2022	-	12,748	9,609
.appveyor.yml	H A D	22-Jun-2021	449	23	17
.clang-format	H A D	22-Jun-2021	6.2 KiB	131	98
.gitignore	H A D	22-Jun-2021	98	10	9
.gitmodules	H A D	22-Jun-2021	192	7	6
.travis.yml	H A D	22-Jun-2021	545	38	31
CTestConfig.cmake	H A D	22-Jun-2021	358	16	10
CTestCustom.cmake	H A D	22-Jun-2021	1.2 KiB	22	20
INSTALL	H A D	22-Jun-2021	15	2	1
LICENSE	H A D	22-Jun-2021	1.5 KiB	24	21
Makefile	H A D	22-Jun-2021	794	28	20
README.md	H A D	22-Jun-2021	6.2 KiB	166	130
Test_all_compilers.sh	H A D	22-Jun-2021	3.7 KiB	141	124
Test_vc.sh	H A D	22-Jun-2021	559	23	19
changeVersion.sh	H A D	22-Jun-2021	964	27	20
configure-icc.sh	H A D	22-Jun-2021	74	3	1
makeApidox.sh	H A D	22-Jun-2021	73	8	4
makeApidoxLoop.sh	H A D	22-Jun-2021	345	13	11
makeInternalDox.sh	H A D	22-Jun-2021	73	7	3
makeRelease.sh	H A D	22-Jun-2021	3.1 KiB	98	70
prepare_single_header.sh	H A D	22-Jun-2021	1.3 KiB	55	47
print_target_architecture.cmake	H A D	22-Jun-2021	528	16	13
test.cmake	H A D	22-Jun-2021	20.2 KiB	521	446