1libsmm: a library for small matrix multiplies. 2 3In order to deal efficiently with small matrix multiplies, 4often involving 'special' matrix dimensions such as 5,13,17,22, 5a dedicated matrix library can be generated that outperforms (or matches) 6general purpose (optimized) blas libraries. 7 8Generation requires extensive compilation and timing runs, and is machine specific, 9i.e. the library should be constructed on the architecture it is supposed to run. 10 11Users can modify the values inside the file config.in to set which kind of library 12they want to generate. Furthermore, they can modify (or add) the files inside 13the config directory to set the compiler options used to build the 14library. They can use the existing files as template. 15 16There are several options for building the library. Run ./generate -h to see them. 17Below you can find the detailed instructions for some examples. 18 19==================================================================================================================== 20a) How to generate the library running several jobs in a cluster, where each 21 node allows for both execution and compilation. 22 For this example we will use a CRAY system with GNU compiler and SLURM. 23 Run "./generate -h" to see the meaning of the options. 24 25 1) Run: ./generate -c config/cray.gnu -j 100 -t 16 -w slurm tiny1 26 This command submits 100 jobs in batch. Wait until their completion. 27 28 2) Run: ./generate -c config/cray.gnu tiny2 29 This command collects all results produced in the tiny1 phase and it 30 generates a file tiny_gen_optimal_dnn_cray.gnu.out 31 32 3) As done in 1) and 2), run: ./generate -c config/cray.gnu -j 20 -t 16 -w slurm small1 33 This command submits 20 jobs in batch. Wait until their completion. 34 Then run: ./generate -c config/cray.gnu small2 35 This command collects all results produced in the small1 phase and it 36 generates a file small_gen_optimal_dnn_cray.gnu.out 37 38 4) Run: ./generate -c config/cray.gnu -t 16 -w slurm lib 39 This commman submit in batch a single job that compiles the library. 40 At the end the library is produced inside the directory lib/ 41 (libsmm_dnn_cray.gnu.a). 42 43 5) It is highly recommended to run the final test to check the correctness of the library. 44 Run: ./generate -c config/cray.gnu -j 20 -w slurm check1 45 After the batch jobs completion, run: ./generate -c config/cray.gnu -j 20 check2 46 Note that it is important to use the same number of jobs specified in 47 check1 phase. Finally check test_smm_dnn_cray.gnu.out for performance and correctness. 48 49 6) Intermediate files (but not some key output and the library itself) 50 might be removed using ./generate clean 51 52 53==================================================================================================================== 54b) How to generate the library running a single job interactively. 55 For this example we will use a Linux system with GNU compiler. 56 Run "./generate -h" to see the meaning of the options. 57 58 1) Run: ./generate -c config/linux.gnu -j 10 -t 16 -w none tiny1 59 This command generates, compiles and executes the tiny kernels 60 in 10 groups. Please increase the number of groups (-j <#> option) 61 if you get the error "Argument list too long". 62 63 2) Run: ./generate -c config/linux.gnu tiny2 64 This command collects all results produced in the tiny1 phase and it 65 generates a file tiny_gen_optimal_dnn_linux.gnu.out 66 67 3) Run: ./generate -c config/linux.gnu -j 0 -t 16 small1 68 This command generates a file small_gen_optimal_dnn_linux.gnu.out 69 70 4) Run: ./generate -c config/linux.gnu -j 0 -t 16 -w slurm lib 71 This command produces the llibrary inside the directory lib/ 72 (libsmm_dnn_linux.gnu.a). 73 74 5) It is highly recommended to run the final test to check the correctness 75 of the library. 76 Run: ./generate -c config/linux.gnu -j 0 -w slurm check1 77 Finally check test_smm_dnn_linux.gnu.out for performance 78 and correctness. 79 80 6) Intermediate files (but not some key output and the library itself) 81 might be removed using ./generate clean 82 83==================================================================================================================== 84c) How to generate the library for the Intel Xeon Phi in batch mode. 85 86 For this example we will use a cluster with SLURM, where each node has a 87 Intel Xeon Phi card. 88 Run "./generate -h" to see the meaning of the options. 89 We use the config file mic.intel (inside the directory config). 90 Check if all options are OK for your case, in particular: 91 - the target_compile variable with the flag "-offload-attribute-target=mic". 92 - the target_compile_offload variable with the flag "-offload=mandatory". 93 - Set the MIC_OMP_NUM_THREADS variable to the number of cores on the card. 94 95 Note that the library is produced by offloading the kernels to the Xeon 96 Phi. Performance output files are written in the same directory where the 97 library is executed on the host, therefore this directory must be exported 98 to the Xeon Phi with the right permission (read/write). 99 100 1) Run: ./generate -c config/mic.intel -j 100 -t 16 -w slurm tiny1 101 This command submits 100 jobs in batch. Each job offloads executions 102 to the Intel Xeon Phi card (MIC_OMP_NUM_THREADS threads). Wait until 103 completion of all jobs. 104 105 2) Run: ./generate -c config/mic.intel tiny2 106 This command collects all results of the tiny1 phase and it generates 107 the file tiny_gen_optimal_dnn_mic.intel.out. 108 109 3) As done in 1) and 2), run: ./generate -c config/mic.intel -j 100 -t 16 -w slurm small1 110 This command submits 100 jobs in batch, where each job offloads 111 executions to the Intel Xeon Phi card (MIC_OMP_NUM_THREADS 112 threads). Wait until their completion. Then run: ./generate -c config/mic.intel small2 113 This command collects all results produced in the small1 phase and it 114 generates a file small_gen_optimal_dnn_mic.intel.out 115 116 4) Run: ./generate -c config/mic.intel -t 16 -w slurm lib 117 This commman submit in batch a single job that compiles the library. 118 At the end the library is produced inside the directory lib/ 119 (libsmm_dnn_mic.intel.a). 120 121 5) It is highly recommended to run the final test to check the correctness of the library. 122 Run: ./generate -c config/mic.intel -j 200 -w slurm check1 123 After the batch jobs completion, run: ./generate -c config/mic.intel -j 200 check2 124 Note that it is important to use the same number of jobs specified in 125 check1 phase. Finally check test_smm_dnn_mic.intel.out for performance and correctness. 126 127 6) Intermediate files (but not some key output and the library itself) 128 might be removed using ./generate clean 129 130 131The following copyright covers code and generated library 132!==================================================================================================================== 133! * Copyright (c) 2015 Joost VandeVondele and Alfio Lazzaro 134! * All rights reserved. 135! * 136! * Redistribution and use in source and binary forms, with or without 137! * modification, are permitted provided that the following conditions are met: 138! * * Redistributions of source code must retain the above copyright 139! * notice, this list of conditions and the following disclaimer. 140! * * Redistributions in binary form must reproduce the above copyright 141! * notice, this list of conditions and the following disclaimer in the 142! * documentation and/or other materials provided with the distribution. 143! * 144! * THIS SOFTWARE IS PROVIDED BY Joost VandeVondele ''AS IS'' AND ANY 145! * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED 146! * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 147! * DISCLAIMED. IN NO EVENT SHALL Joost VandeVondele BE LIABLE FOR ANY 148! * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES 149! * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; 150! * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND 151! * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 152! * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS 153! * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 154! * 155!==================================================================================================================== 156 157