1libsmm: a library for small matrix multiplies.
2
3In order to deal efficiently with small matrix multiplies,
4often involving 'special' matrix dimensions such as 5,13,17,22,
5a dedicated matrix library can be generated that outperforms (or matches)
6general purpose (optimized) blas libraries.
7
8Generation requires extensive compilation and timing runs, and is machine specific,
9i.e. the library should be constructed on the architecture it is supposed to run.
10
11Users can modify the values inside the file config.in to set which kind of library
12they want to generate. Furthermore, they can modify (or add) the files inside
13the config directory to set the compiler options used to build the
14library. They can use the existing files as template.
15
16There are several options for building the library. Run ./generate -h to see them.
17Below you can find the detailed instructions for some examples.
18
19====================================================================================================================
20a) How to generate the library running several jobs in a cluster, where each
21   node allows for both execution and compilation.
22   For this example we will use a CRAY system with GNU compiler and SLURM.
23   Run "./generate -h" to see the meaning of the options.
24
25   1) Run: ./generate -c config/cray.gnu -j 100 -t 16 -w slurm tiny1
26      This command submits 100 jobs in batch. Wait until their completion.
27
28   2) Run: ./generate -c config/cray.gnu tiny2
29      This command collects all results produced in the tiny1 phase and it
30      generates a file tiny_gen_optimal_dnn_cray.gnu.out
31
32   3) As done in 1) and 2), run: ./generate -c config/cray.gnu -j 20 -t 16 -w slurm small1
33      This command submits 20 jobs in batch. Wait until their completion.
34      Then run: ./generate -c config/cray.gnu small2
35      This command collects all results produced in the small1 phase and it
36      generates a file small_gen_optimal_dnn_cray.gnu.out
37
38   4) Run: ./generate -c config/cray.gnu -t 16 -w slurm lib
39      This commman submit in batch a single job that compiles the library.
40      At the end the library is produced inside the directory lib/
41      (libsmm_dnn_cray.gnu.a).
42
43   5) It is highly recommended to run the final test to check the correctness of the library.
44      Run: ./generate -c config/cray.gnu -j 20 -w slurm check1
45      After the batch jobs completion, run: ./generate -c config/cray.gnu -j 20 check2
46      Note that it is important to use the same number of jobs specified in
47      check1 phase. Finally check test_smm_dnn_cray.gnu.out for performance and correctness.
48
49   6) Intermediate files (but not some key output and the library itself)
50      might be removed using ./generate clean
51
52
53====================================================================================================================
54b) How to generate the library running a single job interactively.
55   For this example we will use a Linux system with GNU compiler.
56   Run "./generate -h" to see the meaning of the options.
57
58   1) Run: ./generate -c config/linux.gnu -j 10 -t 16 -w none tiny1
59      This command generates, compiles and executes the tiny kernels
60      in 10 groups. Please increase the number of groups (-j <#> option)
61      if you get the error "Argument list too long".
62
63   2) Run: ./generate -c config/linux.gnu tiny2
64      This command collects all results produced in the tiny1 phase and it
65      generates a file tiny_gen_optimal_dnn_linux.gnu.out
66
67   3) Run: ./generate -c config/linux.gnu -j 0 -t 16 small1
68      This command generates a file small_gen_optimal_dnn_linux.gnu.out
69
70   4) Run: ./generate -c config/linux.gnu -j 0 -t 16 -w slurm lib
71      This command produces the llibrary inside the directory lib/
72      (libsmm_dnn_linux.gnu.a).
73
74   5) It is highly recommended to run the final test to check the correctness
75      of the library.
76      Run: ./generate -c config/linux.gnu -j 0 -w slurm check1
77      Finally check test_smm_dnn_linux.gnu.out for performance
78      and correctness.
79
80   6) Intermediate files (but not some key output and the library itself)
81      might be removed using ./generate clean
82
83====================================================================================================================
84c) How to generate the library for the Intel Xeon Phi in batch mode.
85
86   For this example we will use a cluster with SLURM, where each node has a
87   Intel Xeon Phi card.
88   Run "./generate -h" to see the meaning of the options.
89   We use the config file mic.intel (inside the directory config).
90   Check if all options are OK for your case, in particular:
91    - the target_compile variable with the flag "-offload-attribute-target=mic".
92    - the target_compile_offload variable with the flag "-offload=mandatory".
93    - Set the MIC_OMP_NUM_THREADS variable to the number of cores on the card.
94
95   Note that the library is produced by offloading the kernels to the Xeon
96   Phi. Performance output files are written in the same directory where the
97   library is executed on the host, therefore this directory must be exported
98   to the Xeon Phi with the right permission (read/write).
99
100   1) Run: ./generate -c config/mic.intel -j 100 -t 16 -w slurm tiny1
101      This command submits 100 jobs in batch. Each job offloads executions
102      to the Intel Xeon Phi card (MIC_OMP_NUM_THREADS threads). Wait until
103      completion of all jobs.
104
105   2) Run: ./generate -c config/mic.intel tiny2
106      This command collects all results of the tiny1 phase and it generates
107      the file tiny_gen_optimal_dnn_mic.intel.out.
108
109   3) As done in 1) and 2), run: ./generate -c config/mic.intel -j 100 -t 16 -w slurm small1
110      This command submits 100 jobs in batch, where each job offloads
111      executions to the Intel Xeon Phi card (MIC_OMP_NUM_THREADS
112      threads). Wait until their completion. Then run: ./generate -c config/mic.intel small2
113      This command collects all results produced in the small1 phase and it
114      generates a file small_gen_optimal_dnn_mic.intel.out
115
116   4) Run: ./generate -c config/mic.intel -t 16 -w slurm lib
117      This commman submit in batch a single job that compiles the library.
118      At the end the library is produced inside the directory lib/
119      (libsmm_dnn_mic.intel.a).
120
121   5) It is highly recommended to run the final test to check the correctness of the library.
122      Run: ./generate -c config/mic.intel -j 200 -w slurm check1
123      After the batch jobs completion, run: ./generate -c config/mic.intel -j 200 check2
124      Note that it is important to use the same number of jobs specified in
125      check1 phase. Finally check test_smm_dnn_mic.intel.out for performance and correctness.
126
127   6) Intermediate files (but not some key output and the library itself)
128      might be removed using ./generate clean
129
130
131The following copyright covers code and generated library
132!====================================================================================================================
133! * Copyright (c) 2015 Joost VandeVondele and Alfio Lazzaro
134! * All rights reserved.
135! *
136! * Redistribution and use in source and binary forms, with or without
137! * modification, are permitted provided that the following conditions are met:
138! *     * Redistributions of source code must retain the above copyright
139! *       notice, this list of conditions and the following disclaimer.
140! *     * Redistributions in binary form must reproduce the above copyright
141! *       notice, this list of conditions and the following disclaimer in the
142! *       documentation and/or other materials provided with the distribution.
143! *
144! * THIS SOFTWARE IS PROVIDED BY Joost VandeVondele ''AS IS'' AND ANY
145! * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
146! * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
147! * DISCLAIMED. IN NO EVENT SHALL Joost VandeVondele BE LIABLE FOR ANY
148! * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
149! * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
150! * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
151! * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
152! * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
153! * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
154! *
155!====================================================================================================================
156
157