1/*! 2\mainpage Trilinos/Kokkos: Shared-memory programming interface and computational kernels 3 4\section Kokkos_Intro Introduction 5 6The %Kokkos package has two main components. The first, sometimes 7called "%Kokkos Array" or just "%Kokkos," implements a 8performance-portable shared-memory parallel programming model and data 9containers. The second, called "%Kokkos Classic," consists of 10computational kernels that support the %Tpetra package. 11 12\section Kokkos_Kokkos The %Kokkos programming model 13 14%Kokkos implements a performance-portable shared-memory parallel 15programming model and data containers. It lets you write an algorithm 16once, and just change a template parameter to get the optimal data 17layout for your hardware. %Kokkos has back-ends for the following 18parallel programming models: 19 20- Kokkos::Threads: POSIX Threads (Pthreads) 21- Kokkos::OpenMP: OpenMP 22- Kokkos::Cuda: NVIDIA's CUDA programming model for graphics 23 processing units (GPUs) 24- Kokkos::Serial: No thread parallelism 25 26%Kokkos also has optimizations for shared-memory parallel systems with 27nonuniform memory access (NUMA). Its containers can hold data of any 28primitive ("plain old") data type (and some aggregate types). %Kokkos 29Array may be used as a stand-alone programming model. 30 31%Kokkos' parallel operations include the following: 32 33- parallel_for: a thread-parallel "for loop" 34- parallel_reduce: a thread-parallel reduction 35- parallel_scan: a thread-parallel prefix scan operation 36 37as well as expert-level platform-independent interfaces to thread 38"teams," per-team "shared memory," synchronization, and atomic update 39operations. 40 41%Kokkos' data containers include the following: 42 43- Kokkos::View: A multidimensional array suitable for thread-parallel 44 operations. Its layout (e.g., row-major or column-major) is 45 optimized by default for the particular thread-parallel device. 46- Kokkos::Vector: A drop-in replacement for std::vector that eases 47 porting from standard sequential C++ data structures to %Kokkos' 48 parallel data structures. 49- Kokkos::UnorderedMap: A parallel lookup table comparable in 50 functionality to std::unordered_map. 51 52%Kokkos also uses the above basic containers to implement higher-level 53data structures, like sparse graphs and matrices. 54 55A good place to start learning about %Kokkos would be <a href="http://trilinos.sandia.gov/events/trilinos_user_group_2013/presentations/2013-11-TUG-Kokkos-Tutorial.pdf">these tutorial slides</a> from the 2013 Trilinos Users' Group meeting. 56 57\section Kokkos_Classic %Kokkos Classic 58 59"%Kokkos Classic" consists of computational kernels that support the 60%Tpetra package. These kernels include sparse matrix-vector multiply, 61sparse triangular solve, Gauss-Seidel, and dense vector operations. 62They are templated on the type of objects (\c Scalar) on which they 63operate. This component was not meant to be visible to users; it is 64an implementation detail of the %Tpetra distributed linear algebra 65package. 66 67%Kokkos Classic also implements a shared-memory parallel programming 68model. This inspired and preceded the %Kokkos programming model 69described in the previous section. Users should consider the %Kokkos 70Classic programming model deprecated, and prefer the new %Kokkos 71programming model. 72*/ 73