• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

.github/workflows/H02-Nov-2020-

.settings/H02-Nov-2020-

cmake/H02-Nov-2020-

cub/H02-Nov-2020-

examples/H03-May-2022-

experimental/H02-Nov-2020-

test/H03-May-2022-

tune/H02-Nov-2020-

.cprojectH A D02-Nov-202063.6 KiB

.gitignoreH A D02-Nov-202010

.projectH A D02-Nov-2020835

CHANGELOG.mdH A D02-Nov-202029.8 KiB

CODE_OF_CONDUCT.mdH A D02-Nov-20203.7 KiB

CONTRIBUTING.mdH A D02-Nov-202013.1 KiB

README.mdH A D02-Nov-20208.8 KiB

common.mkH A D02-Nov-20206.5 KiB

README.md

1<hr>
2<h3>About CUB</h3>
3
4CUB provides state-of-the-art, reusable software components for every layer
5of the CUDA programming model:
6- [<b><em>Device-wide primitives</em></b>](https://nvlabs.github.com/cub/group___device_module.html)
7  - Sort, prefix scan, reduction, histogram, etc.
8  - Compatible with CUDA dynamic parallelism
9- [<b><em>Block-wide "collective" primitives</em></b>](https://nvlabs.github.com/cub/group___block_module.html)
10  - I/O, sort, prefix scan, reduction, histogram, etc.
11  - Compatible with arbitrary thread block sizes and types
12- [<b><em>Warp-wide "collective" primitives</em></b>](https://nvlabs.github.com/cub/group___warp_module.html)
13  - Warp-wide prefix scan, reduction, etc.
14  - Safe and architecture-specific
15- [<b><em>Thread and resource utilities</em></b>](https://nvlabs.github.com/cub/group___thread_module.html)
16  - PTX intrinsics, device reflection, texture-caching iterators, caching memory allocators, etc.
17
18![Orientation of collective primitives within the CUDA software stack](http://nvlabs.github.com/cub/cub_overview.png)
19
20CUB is included in the NVIDIA HPC SDK and the CUDA Toolkit.
21
22We recommend the [CUB Project Website](http://nvlabs.github.com/cub) for further information and examples.
23
24<br><hr>
25<h3>A Simple Example</h3>
26
27```C++
28#include <cub/cub.cuh>
29
30// Block-sorting CUDA kernel
31__global__ void BlockSortKernel(int *d_in, int *d_out)
32{
33     using namespace cub;
34
35     // Specialize BlockRadixSort, BlockLoad, and BlockStore for 128 threads
36     // owning 16 integer items each
37     typedef BlockRadixSort<int, 128, 16>                     BlockRadixSort;
38     typedef BlockLoad<int, 128, 16, BLOCK_LOAD_TRANSPOSE>   BlockLoad;
39     typedef BlockStore<int, 128, 16, BLOCK_STORE_TRANSPOSE> BlockStore;
40
41     // Allocate shared memory
42     __shared__ union {
43         typename BlockRadixSort::TempStorage  sort;
44         typename BlockLoad::TempStorage       load;
45         typename BlockStore::TempStorage      store;
46     } temp_storage;
47
48     int block_offset = blockIdx.x * (128 * 16);	  // OffsetT for this block's ment
49
50     // Obtain a segment of 2048 consecutive keys that are blocked across threads
51     int thread_keys[16];
52     BlockLoad(temp_storage.load).Load(d_in + block_offset, thread_keys);
53     __syncthreads();
54
55     // Collectively sort the keys
56     BlockRadixSort(temp_storage.sort).Sort(thread_keys);
57     __syncthreads();
58
59     // Store the sorted segment
60     BlockStore(temp_storage.store).Store(d_out + block_offset, thread_keys);
61}
62```
63
64Each thread block uses `cub::BlockRadixSort` to collectively sort
65its own input segment.  The class is specialized by the
66data type being sorted, by the number of threads per block, by the number of
67keys per thread, and implicitly by the targeted compilation architecture.
68
69The `cub::BlockLoad` and `cub::BlockStore` classes are similarly specialized.
70Furthermore, to provide coalesced accesses to device memory, these primitives are
71configured to access memory using a striped access pattern (where consecutive threads
72simultaneously access consecutive items) and then <em>transpose</em> the keys into
73a [<em>blocked arrangement</em>](index.html#sec4sec3) of elements across threads.
74
75Once specialized, these classes expose opaque `TempStorage` member types.
76The thread block uses these storage types to statically allocate the union of
77shared memory needed by the thread block.  (Alternatively these storage types
78could be aliased to global memory allocations).
79
80<br><hr>
81<h3>Releases</h3>
82
83CUB is distributed with the NVIDIA HPC SDK and the CUDA Toolkit in addition
84to GitHub.
85
86See the [changelog](CHANGELOG.md) for details about specific releases.
87
88| CUB Release               | Included In                             |
89| ------------------------- | --------------------------------------- |
90| 1.10.0                    | NVIDIA HPC SDK 20.9                     |
91| 1.9.10-1                  | NVIDIA HPC SDK 20.7 & CUDA Toolkit 11.1 |
92| 1.9.10                    | NVIDIA HPC SDK 20.5                     |
93| 1.9.9                     | CUDA Toolkit 11.0                       |
94| 1.9.8-1                   | NVIDIA HPC SDK 20.3                     |
95| 1.9.8                     | CUDA Toolkit 11.0 Early Access          |
96| 1.9.8                     | CUDA 11.0 Early Access                  |
97| 1.8.0                     |                                         |
98| 1.7.5                     | Thrust 1.9.2                            |
99| 1.7.4                     | Thrust 1.9.1-2                          |
100| 1.7.3                     |                                         |
101| 1.7.2                     |                                         |
102| 1.7.1                     |                                         |
103| 1.7.0                     | Thrust 1.9.0-5                          |
104| 1.6.4                     |                                         |
105| 1.6.3                     |                                         |
106| 1.6.2 (previously 1.5.5)  |                                         |
107| 1.6.1 (previously 1.5.4)  |                                         |
108| 1.6.0 (previously 1.5.3)  |                                         |
109| 1.5.2                     |                                         |
110| 1.5.1                     |                                         |
111| 1.5.0                     |                                         |
112| 1.4.1                     |                                         |
113| 1.4.0                     |                                         |
114| 1.3.2                     |                                         |
115| 1.3.1                     |                                         |
116| 1.3.0                     |                                         |
117| 1.2.3                     |                                         |
118| 1.2.2                     |                                         |
119| 1.2.0                     |                                         |
120| 1.1.1                     |                                         |
121| 1.0.2                     |                                         |
122| 1.0.1                     |                                         |
123| 0.9.4                     |                                         |
124| 0.9.2                     |                                         |
125| 0.9.1                     |                                         |
126| 0.9.0                     |                                         |
127
128<br><hr>
129<h3>Development Process</h3>
130
131CUB uses the [CMake build system](https://cmake.org/) to build unit tests,
132examples, and header tests. To build CUB as a developer, the following
133recipe should be followed:
134
135```
136# Clone CUB repo from github:
137git clone https://github.com/NVIDIA/cub.git
138cd cub
139
140# Create build directory:
141mkdir build
142cd build
143
144# Configure -- use one of the following:
145cmake ..   # Command line interface.
146ccmake ..  # ncurses GUI (Linux only)
147cmake-gui  # Graphical UI, set source/build directories in the app
148
149# Build:
150cmake --build . -j <num jobs>   # invokes make (or ninja, etc)
151
152# Run tests and examples:
153ctest
154```
155
156By default, the C++14 standard is targeted, but this can be changed in CMake.
157More information on configuring your CUB build and creating a pull request is
158found in [CONTRIBUTING.md](CONTRIBUTING.md).
159
160<br><hr>
161<h3>Open Source License</h3>
162
163CUB is available under the "New BSD" open-source license:
164
165```
166Copyright (c) 2010-2011, Duane Merrill.  All rights reserved.
167Copyright (c) 2011-2018, NVIDIA CORPORATION.  All rights reserved.
168
169Redistribution and use in source and binary forms, with or without
170modification, are permitted provided that the following conditions are met:
171   *  Redistributions of source code must retain the above copyright
172      notice, this list of conditions and the following disclaimer.
173   *  Redistributions in binary form must reproduce the above copyright
174      notice, this list of conditions and the following disclaimer in the
175      documentation and/or other materials provided with the distribution.
176   *  Neither the name of the NVIDIA CORPORATION nor the
177      names of its contributors may be used to endorse or promote products
178      derived from this software without specific prior written permission.
179
180THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
181ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
182WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
183DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE FOR ANY
184DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
185(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
186LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
187ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
188(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
189SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
190```
191