1# Notes on OpenBLAS usage 2## Usage 3 4#### Program is Terminated. Because you tried to allocate too many memory regions 5 6In OpenBLAS, we mange a pool of memory buffers and allocate the number of 7buffers as the following. 8``` 9#define NUM_BUFFERS (MAX_CPU_NUMBER * 2) 10``` 11This error indicates that the program exceeded the number of buffers. 12 13Please build OpenBLAS with larger `NUM_THREADS`. For example, `make 14NUM_THREADS=32` or `make NUM_THREADS=64`. In `Makefile.system`, we will set 15`MAX_CPU_NUMBER=NUM_THREADS`. 16 17Despite its name, and due to the use of memory buffers in functions like SGEMM, 18the setting of NUM_THREADS can be relevant even for a single-threaded build 19of OpenBLAS, if such functions get called by multiple threads of a program 20that uses OpenBLAS. In some cases, the affected code may simply crash or throw 21a segmentation fault without displaying the above warning first. 22 23Note that the number of threads used at runtime can be altered to differ from the 24value NUM_THREADS was set to at build time. At runtime, the actual number of 25threads can be set anywhere from 1 to the build's NUM_THREADS (note however, 26that this does not change the number of memory buffers that will be allocated, 27which is set at build time). The number of threads for a process can be set by 28using the mechanisms described below. 29 30 31#### How can I use OpenBLAS in multi-threaded applications? 32 33If your application is already multi-threaded, it will conflict with OpenBLAS 34multi-threading. Thus, you must set OpenBLAS to use single thread in any of the 35following ways: 36 37* `export OPENBLAS_NUM_THREADS=1` in the environment variables. 38* Call `openblas_set_num_threads(1)` in the application on runtime. 39* Build OpenBLAS single thread version, e.g. `make USE_THREAD=0` 40 41If the application is parallelized by OpenMP, please use OpenBLAS built with 42`USE_OPENMP=1` 43 44#### How to choose TARGET manually at runtime when compiled with DYNAMIC_ARCH 45 46The environment variable which control the kernel selection is 47`OPENBLAS_CORETYPE` (see `driver/others/dynamic.c`) e.g. `export 48OPENBLAS_CORETYPE=Haswell` and the function `char* openblas_get_corename()` 49returns the used target. 50 51#### How could I disable OpenBLAS threading affinity on runtime? 52 53You can define the `OPENBLAS_MAIN_FREE` or `GOTOBLAS_MAIN_FREE` environment 54variable to disable threading affinity on runtime. For example, before the 55running, 56``` 57export OPENBLAS_MAIN_FREE=1 58``` 59 60Alternatively, you can disable affinity feature with enabling `NO_AFFINITY=1` 61in `Makefile.rule`. 62 63## Linking with the library 64 65* Link with shared library 66 67`gcc -o test test.c -I /your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas` 68 69If the library is multithreaded, please add `-lpthread`. If the library 70contains LAPACK functions, please add `-lgfortran` or other Fortran libs. 71 72* Link with static library 73 74`gcc -o test test.c /your/path/libopenblas.a` 75 76You can download `test.c` from https://gist.github.com/xianyi/5780018 77 78On Linux, if OpenBLAS was compiled with threading support (`USE_THREAD=1` by 79default), custom programs statically linked against `libopenblas.a` should also 80link with the pthread library e.g.: 81 82``` 83gcc -static -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib -o my_program my_program.c -lopenblas -lpthread 84``` 85 86Failing to add the `-lpthread` flag will cause errors such as: 87 88``` 89/opt/OpenBLAS/libopenblas.a(memory.o): In function `_touch_memory': 90memory.c:(.text+0x15): undefined reference to `pthread_mutex_lock' 91memory.c:(.text+0x41): undefined reference to `pthread_mutex_unlock' 92... 93``` 94 95## Code examples 96 97#### Call CBLAS interface 98This example shows calling cblas_dgemm in C. https://gist.github.com/xianyi/6930656 99``` 100#include <cblas.h> 101#include <stdio.h> 102 103void main() 104{ 105 int i=0; 106 double A[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0}; 107 double B[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0}; 108 double C[9] = {.5,.5,.5,.5,.5,.5,.5,.5,.5}; 109 cblas_dgemm(CblasColMajor, CblasNoTrans, CblasTrans,3,3,2,1,A, 3, B, 3,2,C,3); 110 111 for(i=0; i<9; i++) 112 printf("%lf ", C[i]); 113 printf("\n"); 114} 115``` 116`gcc -o test_cblas_open test_cblas_dgemm.c -I /your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran` 117 118#### Call BLAS Fortran interface 119 120This example shows calling dgemm Fortran interface in C. https://gist.github.com/xianyi/5780018 121 122``` 123#include "stdio.h" 124#include "stdlib.h" 125#include "sys/time.h" 126#include "time.h" 127 128extern void dgemm_(char*, char*, int*, int*,int*, double*, double*, int*, double*, int*, double*, double*, int*); 129 130int main(int argc, char* argv[]) 131{ 132 int i; 133 printf("test!\n"); 134 if(argc<4){ 135 printf("Input Error\n"); 136 return 1; 137 } 138 139 int m = atoi(argv[1]); 140 int n = atoi(argv[2]); 141 int k = atoi(argv[3]); 142 int sizeofa = m * k; 143 int sizeofb = k * n; 144 int sizeofc = m * n; 145 char ta = 'N'; 146 char tb = 'N'; 147 double alpha = 1.2; 148 double beta = 0.001; 149 150 struct timeval start,finish; 151 double duration; 152 153 double* A = (double*)malloc(sizeof(double) * sizeofa); 154 double* B = (double*)malloc(sizeof(double) * sizeofb); 155 double* C = (double*)malloc(sizeof(double) * sizeofc); 156 157 srand((unsigned)time(NULL)); 158 159 for (i=0; i<sizeofa; i++) 160 A[i] = i%3+1;//(rand()%100)/10.0; 161 162 for (i=0; i<sizeofb; i++) 163 B[i] = i%3+1;//(rand()%100)/10.0; 164 165 for (i=0; i<sizeofc; i++) 166 C[i] = i%3+1;//(rand()%100)/10.0; 167 //#if 0 168 printf("m=%d,n=%d,k=%d,alpha=%lf,beta=%lf,sizeofc=%d\n",m,n,k,alpha,beta,sizeofc); 169 gettimeofday(&start, NULL); 170 dgemm_(&ta, &tb, &m, &n, &k, &alpha, A, &m, B, &k, &beta, C, &m); 171 gettimeofday(&finish, NULL); 172 173 duration = ((double)(finish.tv_sec-start.tv_sec)*1000000 + (double)(finish.tv_usec-start.tv_usec)) / 1000000; 174 double gflops = 2.0 * m *n*k; 175 gflops = gflops/duration*1.0e-6; 176 177 FILE *fp; 178 fp = fopen("timeDGEMM.txt", "a"); 179 fprintf(fp, "%dx%dx%d\t%lf s\t%lf MFLOPS\n", m, n, k, duration, gflops); 180 fclose(fp); 181 182 free(A); 183 free(B); 184 free(C); 185 return 0; 186} 187``` 188 189` gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a` 190 191` ./time_dgemm <m> <n> <k> ` 192 193## Troubleshooting 194* Please read [Faq](https://github.com/xianyi/OpenBLAS/wiki/Faq) at first. 195* Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD. 196* Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code. 197* The number of CPUs/Cores should less than or equal to 256. On Linux x86_64(amd64), there is experimental support for up to 1024 CPUs/Cores and 128 numa nodes if you build the library with BIGNUMA=1. 198* OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting the line NO_AFFINITY=1 in Makefile.rule. But this may cause [the conflict with R parallel](https://stat.ethz.ch/pipermail/r-sig-hpc/2012-April/001348.html). 199* On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell. 200 201## BLAS reference manual 202If you want to understand every BLAS function and definition, please read 203[Intel MKL reference manual](https://software.intel.com/sites/products/documentation/doclib/iss/2013/mkl/mklman/GUID-F7ED9FB8-6663-4F44-A62B-61B63C4F0491.htm) 204or [netlib.org](http://netlib.org/blas/) 205 206Here are [OpenBLAS extension functions](https://github.com/xianyi/OpenBLAS/wiki/OpenBLAS-Extensions) 207 208## How to reference OpenBLAS. 209 210You can reference our [papers](https://github.com/xianyi/OpenBLAS/wiki/publications). 211 212Alternatively, you can cite the OpenBLAS homepage http://www.openblas.net directly. 213 214