1# Notes on OpenBLAS usage
2## Usage
3
4#### Program is Terminated. Because you tried to allocate too many memory regions
5
6In OpenBLAS, we mange a pool of memory buffers and allocate the number of
7buffers as the following.
8```
9#define NUM_BUFFERS (MAX_CPU_NUMBER * 2)
10```
11This error indicates that the program exceeded the number of buffers.
12
13Please build OpenBLAS with larger `NUM_THREADS`. For example, `make
14NUM_THREADS=32` or `make NUM_THREADS=64`.  In `Makefile.system`, we will set
15`MAX_CPU_NUMBER=NUM_THREADS`.
16
17Despite its name, and due to the use of memory buffers in functions like SGEMM,
18the setting of NUM_THREADS can be relevant even for a single-threaded build
19of OpenBLAS, if such functions get called by multiple threads of a program
20that uses OpenBLAS. In some cases, the affected code may simply crash or throw
21a segmentation fault without displaying the above warning first.
22
23Note that the number of threads used at runtime can be altered to differ from the
24value NUM_THREADS was set to at build time. At runtime, the actual number of
25threads can be set anywhere from 1 to the build's NUM_THREADS (note however,
26that this does not change the number of memory buffers that will be allocated,
27which is set at build time). The number of threads for a process can be set by
28using the mechanisms described below.
29
30
31#### How can I use OpenBLAS in multi-threaded applications?
32
33If your application is already multi-threaded, it will conflict with OpenBLAS
34multi-threading. Thus, you must set OpenBLAS to use single thread in any of the
35following ways:
36
37* `export OPENBLAS_NUM_THREADS=1` in the environment variables.
38* Call `openblas_set_num_threads(1)` in the application on runtime.
39* Build OpenBLAS single thread version, e.g. `make USE_THREAD=0`
40
41If the application is parallelized by OpenMP, please use OpenBLAS built with
42`USE_OPENMP=1`
43
44#### How to choose TARGET manually at runtime when compiled with DYNAMIC_ARCH
45
46The environment variable which control the kernel selection is
47`OPENBLAS_CORETYPE` (see `driver/others/dynamic.c`) e.g. `export
48OPENBLAS_CORETYPE=Haswell` and the function `char* openblas_get_corename()`
49returns the used target.
50
51#### How could I disable OpenBLAS threading affinity on runtime?
52
53You can define the `OPENBLAS_MAIN_FREE` or `GOTOBLAS_MAIN_FREE` environment
54variable to disable threading affinity on runtime. For example, before the
55running,
56```
57export OPENBLAS_MAIN_FREE=1
58```
59
60Alternatively, you can disable affinity feature with enabling `NO_AFFINITY=1`
61in `Makefile.rule`.
62
63## Linking with the library
64
65* Link with shared library
66
67`gcc -o test test.c -I /your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas`
68
69If the library is multithreaded, please add `-lpthread`. If the library
70contains LAPACK functions, please add `-lgfortran` or other Fortran libs.
71
72* Link with static library
73
74`gcc -o test test.c /your/path/libopenblas.a`
75
76You can download `test.c` from https://gist.github.com/xianyi/5780018
77
78On Linux, if OpenBLAS was compiled with threading support (`USE_THREAD=1` by
79default), custom programs statically linked against `libopenblas.a` should also
80link with the pthread library e.g.:
81
82```
83gcc -static -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib -o my_program my_program.c -lopenblas -lpthread
84```
85
86Failing to add the `-lpthread` flag will cause errors such as:
87
88```
89/opt/OpenBLAS/libopenblas.a(memory.o): In function `_touch_memory':
90memory.c:(.text+0x15): undefined reference to `pthread_mutex_lock'
91memory.c:(.text+0x41): undefined reference to `pthread_mutex_unlock'
92...
93```
94
95## Code examples
96
97#### Call CBLAS interface
98This example shows calling cblas_dgemm in C. https://gist.github.com/xianyi/6930656
99```
100#include <cblas.h>
101#include <stdio.h>
102
103void main()
104{
105  int i=0;
106  double A[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
107  double B[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
108  double C[9] = {.5,.5,.5,.5,.5,.5,.5,.5,.5};
109  cblas_dgemm(CblasColMajor, CblasNoTrans, CblasTrans,3,3,2,1,A, 3, B, 3,2,C,3);
110
111  for(i=0; i<9; i++)
112    printf("%lf ", C[i]);
113  printf("\n");
114}
115```
116`gcc -o test_cblas_open test_cblas_dgemm.c -I /your_path/OpenBLAS/include/ -L/your_path/OpenBLAS/lib -lopenblas -lpthread -lgfortran`
117
118#### Call BLAS Fortran interface
119
120This example shows calling dgemm Fortran interface in C. https://gist.github.com/xianyi/5780018
121
122```
123#include "stdio.h"
124#include "stdlib.h"
125#include "sys/time.h"
126#include "time.h"
127
128extern void dgemm_(char*, char*, int*, int*,int*, double*, double*, int*, double*, int*, double*, double*, int*);
129
130int main(int argc, char* argv[])
131{
132  int i;
133  printf("test!\n");
134  if(argc<4){
135    printf("Input Error\n");
136    return 1;
137  }
138
139  int m = atoi(argv[1]);
140  int n = atoi(argv[2]);
141  int k = atoi(argv[3]);
142  int sizeofa = m * k;
143  int sizeofb = k * n;
144  int sizeofc = m * n;
145  char ta = 'N';
146  char tb = 'N';
147  double alpha = 1.2;
148  double beta = 0.001;
149
150  struct timeval start,finish;
151  double duration;
152
153  double* A = (double*)malloc(sizeof(double) * sizeofa);
154  double* B = (double*)malloc(sizeof(double) * sizeofb);
155  double* C = (double*)malloc(sizeof(double) * sizeofc);
156
157  srand((unsigned)time(NULL));
158
159  for (i=0; i<sizeofa; i++)
160    A[i] = i%3+1;//(rand()%100)/10.0;
161
162  for (i=0; i<sizeofb; i++)
163    B[i] = i%3+1;//(rand()%100)/10.0;
164
165  for (i=0; i<sizeofc; i++)
166    C[i] = i%3+1;//(rand()%100)/10.0;
167  //#if 0
168  printf("m=%d,n=%d,k=%d,alpha=%lf,beta=%lf,sizeofc=%d\n",m,n,k,alpha,beta,sizeofc);
169  gettimeofday(&start, NULL);
170  dgemm_(&ta, &tb, &m, &n, &k, &alpha, A, &m, B, &k, &beta, C, &m);
171  gettimeofday(&finish, NULL);
172
173  duration = ((double)(finish.tv_sec-start.tv_sec)*1000000 + (double)(finish.tv_usec-start.tv_usec)) / 1000000;
174  double gflops = 2.0 * m *n*k;
175  gflops = gflops/duration*1.0e-6;
176
177  FILE *fp;
178  fp = fopen("timeDGEMM.txt", "a");
179  fprintf(fp, "%dx%dx%d\t%lf s\t%lf MFLOPS\n", m, n, k, duration, gflops);
180  fclose(fp);
181
182  free(A);
183  free(B);
184  free(C);
185  return 0;
186}
187```
188
189` gcc -o time_dgemm time_dgemm.c /your/path/libopenblas.a`
190
191` ./time_dgemm <m> <n> <k> `
192
193## Troubleshooting
194* Please read [Faq](https://github.com/xianyi/OpenBLAS/wiki/Faq) at first.
195* Please use gcc version 4.6 and above to compile Sandy Bridge AVX kernels on Linux/MingW/BSD.
196* Please use Clang version 3.1 and above to compile the library on Sandy Bridge microarchitecture. The Clang 3.0 will generate the wrong AVX binary code.
197* The number of CPUs/Cores should less than or equal to 256. On Linux x86_64(amd64), there is experimental support for up to 1024 CPUs/Cores and 128 numa nodes if you build the library with BIGNUMA=1.
198* OpenBLAS does not set processor affinity by default. On Linux, you can enable processor affinity by commenting the line NO_AFFINITY=1 in Makefile.rule. But this may cause [the conflict with R parallel](https://stat.ethz.ch/pipermail/r-sig-hpc/2012-April/001348.html).
199* On Loongson 3A. make test would be failed because of pthread_create error. The error code is EAGAIN. However, it will be OK when you run the same testcase on shell.
200
201## BLAS reference manual
202If you want to understand every BLAS function and definition, please read
203[Intel MKL reference manual](https://software.intel.com/sites/products/documentation/doclib/iss/2013/mkl/mklman/GUID-F7ED9FB8-6663-4F44-A62B-61B63C4F0491.htm)
204or [netlib.org](http://netlib.org/blas/)
205
206Here are [OpenBLAS extension functions](https://github.com/xianyi/OpenBLAS/wiki/OpenBLAS-Extensions)
207
208## How to reference OpenBLAS.
209
210You can reference our [papers](https://github.com/xianyi/OpenBLAS/wiki/publications).
211
212Alternatively, you can cite the OpenBLAS homepage http://www.openblas.net directly.
213
214