1/*!
2\mainpage Trilinos/Kokkos: Shared-memory programming interface and computational kernels
3
4\section Kokkos_Intro Introduction
5
6The %Kokkos package has two main components.  The first, sometimes
7called "%Kokkos Array" or just "%Kokkos," implements a
8performance-portable shared-memory parallel programming model and data
9containers.  The second, called "%Kokkos Classic," consists of
10computational kernels that support the %Tpetra package.
11
12\section Kokkos_Kokkos The %Kokkos programming model
13
14%Kokkos implements a performance-portable shared-memory parallel
15programming model and data containers.  It lets you write an algorithm
16once, and just change a template parameter to get the optimal data
17layout for your hardware.  %Kokkos has back-ends for the following
18parallel programming models:
19
20- Kokkos::Threads: POSIX Threads (Pthreads)
21- Kokkos::OpenMP: OpenMP
22- Kokkos::Cuda: NVIDIA's CUDA programming model for graphics
23  processing units (GPUs)
24- Kokkos::Serial: No thread parallelism
25
26%Kokkos also has optimizations for shared-memory parallel systems with
27nonuniform memory access (NUMA).  Its containers can hold data of any
28primitive ("plain old") data type (and some aggregate types).  %Kokkos
29Array may be used as a stand-alone programming model.
30
31%Kokkos' parallel operations include the following:
32
33- parallel_for: a thread-parallel "for loop"
34- parallel_reduce: a thread-parallel reduction
35- parallel_scan: a thread-parallel prefix scan operation
36
37as well as expert-level platform-independent interfaces to thread
38"teams," per-team "shared memory," synchronization, and atomic update
39operations.
40
41%Kokkos' data containers include the following:
42
43- Kokkos::View: A multidimensional array suitable for thread-parallel
44  operations.  Its layout (e.g., row-major or column-major) is
45  optimized by default for the particular thread-parallel device.
46- Kokkos::Vector: A drop-in replacement for std::vector that eases
47  porting from standard sequential C++ data structures to %Kokkos'
48  parallel data structures.
49- Kokkos::UnorderedMap: A parallel lookup table comparable in
50  functionality to std::unordered_map.
51
52%Kokkos also uses the above basic containers to implement higher-level
53data structures, like sparse graphs and matrices.
54
55A good place to start learning about %Kokkos would be <a href="http://trilinos.sandia.gov/events/trilinos_user_group_2013/presentations/2013-11-TUG-Kokkos-Tutorial.pdf">these tutorial slides</a> from the 2013 Trilinos Users' Group meeting.
56
57\section Kokkos_Classic %Kokkos Classic
58
59"%Kokkos Classic" consists of computational kernels that support the
60%Tpetra package.  These kernels include sparse matrix-vector multiply,
61sparse triangular solve, Gauss-Seidel, and dense vector operations.
62They are templated on the type of objects (\c Scalar) on which they
63operate.  This component was not meant to be visible to users; it is
64an implementation detail of the %Tpetra distributed linear algebra
65package.
66
67%Kokkos Classic also implements a shared-memory parallel programming
68model.  This inspired and preceded the %Kokkos programming model
69described in the previous section.  Users should consider the %Kokkos
70Classic programming model deprecated, and prefer the new %Kokkos
71programming model.
72*/
73