• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..29-Apr-2020-

kernels/H29-Apr-2020-4,1692,846

libcusmm/H29-Apr-2020-65

notebooks/H03-May-2022-1,016970

parameters/H29-Apr-2020-77,29577,293

predict/H03-May-2022-3,8392,985

tune/H29-Apr-2020-1,083816

PACKAGEH A D29-Apr-2020158 65

README.mdH A D29-Apr-20204.6 KiB8754

generate_kernels.pyH A D29-Apr-20205.9 KiB144109

generate_parameters.pyH A D29-Apr-20205.4 KiB147101

libsmm_acc.cppH A D29-Apr-202015.3 KiB369238

libsmm_acc.hH A D29-Apr-20202 KiB5534

libsmm_acc_benchmark.cppH A D29-Apr-202015.5 KiB456326

libsmm_acc_benchmark.hH A D29-Apr-20203.7 KiB8151

libsmm_acc_init.cppH A D29-Apr-20202.1 KiB5131

libsmm_acc_init.hH A D29-Apr-20201 KiB227

parameters_utils.hH A D29-Apr-20201.7 KiB4427

README.md

1# libsmm_acc: GPU Accelerated Small Matrix Multiplications
2
3`libsmm_acc` is a **lib**rary for **s**mall **m**atrix-**m**atrix multiplication on a GPU-**acc**elerator. Stacks of matrix-matrix multiplication indices are passed from DBCSR to `libsmm_acc` which performs the multiplications on the GPU.
4
5For a description of the library (some details are outdated, but this nevertheless provides a very good introduction), see Chapter 8.4 of:
6
7> WALKER, R. C., & GOETZ, A. W. (2016). [Electronic structure calculations on graphics processing units: from quantum chemistry to condensed matter physics](https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118670712).
8
9### Compilation
10
11`libsmm_acc` is compiled from within DBCSR, there is no separate compilation.
12
13## Directory Organization
14
15- [`kernels/`](kernels/): GPU kernels (CUDA- and HIP-compatible) for matrix-matrix multiplication and python interface to autotuning and predictive code.
16- [`notebooks/`](notebooks/): jupyter notebooks for exploring data generated from autotuning and prediction.
17- `generate_*.py`: utility scripts for `libsmm_acc` compilation
18- `libsmm_acc*`: libsmm_acc C++ and CUDA / HIP code
19- [`parameters/`](parameters/): contains `parameters_GPU.json` files. These are sets of matrix-matrix multiplication parameters for different (m, n, k)-triplets optimized for a given GPU card. You can explore these parameters interactively using the [provided jupyter notebook](notebooks/inspect_autotuned_parameters.ipynb)
20- [`predict/`](predict/): scripts for prediction of optimal parameter sets, see [predictive modeling of kernel parameters](predict/README.md)
21- [`tune/`](tune/): scripts for autotuning of optimal parameter sets, see [autotuning of kernel parameters](tune/README.md)
22
23## Matrix-matrix Multiplication Kernels and Parameters
24
25For a given matrix-matrix multiplication **triplet** characterized by dimensions
26
27- **m**
28- **n**
29- **k**,
30
31`libsmm_acc` can run 5 different matrix-matrix multiplication **kernels**:
32
33- [tiny](kernels/smm_acc_dnt_tiny.h)
34- [small](kernels/smm_acc_dnt_small.h)
35- [medium](kernels/smm_acc_dnt_medium.h)
36- [largeDB1](kernels/smm_acc_dnt_largeDB1.h) ("large double-buffering 1")
37- [largeDB2](kernels/smm_acc_dnt_largeDB2.h) ("large double-buffering 2")
38
39which take between 3 - 7 **parameters** (see figure at the top):
40
41- **threads**: number of threads per block in the execution configuration of the CUDA/HIP kernels
42- **grouping**: how many stack entries are grouped together into a CUDA/HIP thread block (if `grouping` is bigger, less blocks are launched)
43- **minblocks**: specifies the desired minimum number of resident blocks per multiprocessor
44- **tile_m**: (on the figure: **M**), `tile_m` * `tile_n` = dimensions of the result block `T`
45- **tile_n** : (on the figure: **N**)
46- **w**: input slab width (width of slab `P_A` and `P_B`)
47- **v**: output slab width (width of slab `P_C`)
48
49The performance of the matrix-matrix multiplication kernels is highly dependent on the choice of algorithm and parameters. For this reason, `libsmm_acc` provides lists of optimal parameters for different GPU cards and different (m, n, k)-triplets. These sets of optimal parameters can be found either through *autotuning* or *predictive modeling*.
50
51## Contributing to libsmm_acc
52
53#### Autotuning procedure
54
55Follow the [autotuning procedure](tune/README.md)
56
57#### Predictive modeling of kernel parameters
58
59Follow the [predictive modeling procedure](predict/README.md)
60
61#### Adding a new kernel
62
631. Choose a kernel `name`
64
652. Add the kernel's code (must be able to compile by both `nvcc` and `hip`) in file `kernels/smm_acc_dnt_name.h`
66
673. Add python kernel class inheriting from base class `kernels/smm_acc_dnt_name.py`
68
694. Add the new kernel to the `kernel_algorithm` data structure in [`kernels/smm_acc_predict.py`](kernels/smm_acc_predict.py)
70
71#### Adding support for a new GPU card
72
731. Add the GPU's compute architecture properties to [`kernels/gpu_properties.json`](kernels/gpu_properties.json). For more information on where to find these properties, please refer to the "info" field of [`kernels/gpu_properties.json`](kernels/gpu_properties.json).
74
752. Add the GPU to the `arch_number` data structure in [`kernels/smm_acc_predict.py`](kernels/smm_acc_predict.py)
76
773. Add the necessary code for setting `ARCH_NUMBER` correctly in the [`CMakeLists`](CMakeLists.txt). Also add this GPU to the list of `SUPPORTED_CUDA_ARCHITECTURES` or `SUPPORTED_HIP_ARCHITECTURES` in the [`CMakeLists`](CMakeLists.txt).
78
794. Add a minimal JSON file `parameters_GPU.json`, containing:
80
81```json
82{
83}
84```
85
86then add matrix-matrix multiplication parameters for this GPU using *autotuning* and *predictive modeling*
87