1## Build Status
2| Build branch | master | develop |
3|-----|-----|-----|
4| GCC/Clang x64 | [![Build Status](https://travis-ci.org/clMathLibraries/clBLAS.svg?branch=master)](https://travis-ci.org/clMathLibraries/clBLAS/branches) | [![Build Status](https://travis-ci.org/clMathLibraries/clBLAS.svg?branch=develop)](https://travis-ci.org/clMathLibraries/clBLAS/branches) |
5| Visual Studio x64 | [![Build status](https://ci.appveyor.com/api/projects/status/v384bi6e8xv8nxjm/branch/master?svg=true)](https://ci.appveyor.com/project/kknox/clblas-5ph9i/branch/master)|[![Build status](https://ci.appveyor.com/api/projects/status/v384bi6e8xv8nxjm/branch/develop?svg=true)](https://ci.appveyor.com/project/kknox/clblas-5ph9i/branch/develop) |
6
7clBLAS
8=====
9This repository houses the code for the OpenCL™ BLAS portion of clMath.
10The complete set of BLAS level 1, 2 & 3 routines is implemented. Please
11see Netlib BLAS for the list of supported routines. In addition to GPU
12devices, the library also supports running on CPU devices to facilitate
13debugging and multicore programming. APPML 1.10 is the most current
14generally available pre-packaged binary version of the library available
15for download for both Linux and Windows platforms.
16
17The primary goal of clBLAS is to make it easier for developers to
18utilize the inherent performance and power efficiency benefits of
19heterogeneous computing. clBLAS interfaces do not hide nor wrap OpenCL
20interfaces, but rather leaves OpenCL state management to the control of
21the user to allow for maximum performance and flexibility. The clBLAS
22library does generate and enqueue optimized OpenCL kernels, relieving
23the user from the task of writing, optimizing and maintaining kernel
24code themselves.
25
26## clBLAS update notes 09/2015
27
28- Introducing [AutoGemm](http://github.com/clMathLibraries/clBLAS/wiki/AutoGemm)
29  - clBLAS's Gemm implementation has been comprehensively overhauled to use AutoGemm. AutoGemm is a suite of python scripts which generate optimized kernels and kernel selection logic, for all precisions, transposes, tile sizes and so on.
30  - CMake is configured to use AutoGemm for clBLAS so the build and usage experience of Gemm remains unchanged (only performance and maintainability has been improved). Kernel sources are generated at build time (not runtime) and can be configured within CMake to be pre-compiled at build time.
31  - clBLAS users with unique Gemm requirements can customize AutoGemm to their needs (such as non-default tile sizes for very small or very skinny matrices); see [AutoGemm](http://github.com/clMathLibraries/clBLAS/wiki/AutoGemm) documentation for details.
32
33
34## clBLAS library user documentation
35
36[Library and API documentation][] for developers is available online as
37a GitHub Pages website
38
39## Google Groups
40
41Two mailing lists have been created for the clMath projects:
42
43-   [clmath@googlegroups.com][] - group whose focus is to answer
44    questions on using the library or reporting issues
45
46-   [clmath-developers@googlegroups.com][] - group whose focus is for
47    developers interested in contributing to the library code itself
48
49## clBLAS Wiki
50
51The [project wiki][] contains helpful documentation, including a [build
52primer][]
53
54## Contributing code
55
56Please refer to and read the [Contributing][] document for guidelines on
57how to contribute code to this open source project. The code in the
58/master branch is considered to be stable, and all pull-requests should
59be made against the /develop branch.
60
61## License
62The source for clBLAS is licensed under the [Apache License, Version 2.0]( http://www.apache.org/licenses/LICENSE-2.0 )
63
64## Example
65The simple example below shows how to use clBLAS to compute an OpenCL accelerated SGEMM
66
67```c
68    #include <sys/types.h>
69    #include <stdio.h>
70
71    /* Include the clBLAS header. It includes the appropriate OpenCL headers */
72    #include <clBLAS.h>
73
74    /* This example uses predefined matrices and their characteristics for
75     * simplicity purpose.
76    */
77
78    #define M  4
79    #define N  3
80    #define K  5
81
82    static const cl_float alpha = 10;
83
84    static const cl_float A[M*K] = {
85    11, 12, 13, 14, 15,
86    21, 22, 23, 24, 25,
87    31, 32, 33, 34, 35,
88    41, 42, 43, 44, 45,
89    };
90    static const size_t lda = K;        /* i.e. lda = K */
91
92    static const cl_float B[K*N] = {
93    11, 12, 13,
94    21, 22, 23,
95    31, 32, 33,
96    41, 42, 43,
97    51, 52, 53,
98    };
99    static const size_t ldb = N;        /* i.e. ldb = N */
100
101    static const cl_float beta = 20;
102
103    static cl_float C[M*N] = {
104        11, 12, 13,
105        21, 22, 23,
106        31, 32, 33,
107        41, 42, 43,
108    };
109    static const size_t ldc = N;        /* i.e. ldc = N */
110
111    static cl_float result[M*N];
112
113    int main( void )
114    {
115    cl_int err;
116    cl_platform_id platform = 0;
117    cl_device_id device = 0;
118    cl_context_properties props[3] = { CL_CONTEXT_PLATFORM, 0, 0 };
119    cl_context ctx = 0;
120    cl_command_queue queue = 0;
121    cl_mem bufA, bufB, bufC;
122    cl_event event = NULL;
123    int ret = 0;
124
125    /* Setup OpenCL environment. */
126    err = clGetPlatformIDs( 1, &platform, NULL );
127    err = clGetDeviceIDs( platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL );
128
129    props[1] = (cl_context_properties)platform;
130    ctx = clCreateContext( props, 1, &device, NULL, NULL, &err );
131    queue = clCreateCommandQueue( ctx, device, 0, &err );
132
133    /* Setup clBLAS */
134    err = clblasSetup( );
135
136    /* Prepare OpenCL memory objects and place matrices inside them. */
137    bufA = clCreateBuffer( ctx, CL_MEM_READ_ONLY, M * K * sizeof(*A),
138                          NULL, &err );
139    bufB = clCreateBuffer( ctx, CL_MEM_READ_ONLY, K * N * sizeof(*B),
140                          NULL, &err );
141    bufC = clCreateBuffer( ctx, CL_MEM_READ_WRITE, M * N * sizeof(*C),
142                          NULL, &err );
143
144    err = clEnqueueWriteBuffer( queue, bufA, CL_TRUE, 0,
145        M * K * sizeof( *A ), A, 0, NULL, NULL );
146    err = clEnqueueWriteBuffer( queue, bufB, CL_TRUE, 0,
147        K * N * sizeof( *B ), B, 0, NULL, NULL );
148    err = clEnqueueWriteBuffer( queue, bufC, CL_TRUE, 0,
149        M * N * sizeof( *C ), C, 0, NULL, NULL );
150
151        /* Call clBLAS extended function. Perform gemm for the lower right sub-matrices */
152        err = clblasSgemm( clblasRowMajor, clblasNoTrans, clblasNoTrans,
153                                M, N, K,
154                                alpha, bufA, 0, lda,
155                                bufB, 0, ldb, beta,
156                                bufC, 0, ldc,
157                                1, &queue, 0, NULL, &event );
158
159    /* Wait for calculations to be finished. */
160    err = clWaitForEvents( 1, &event );
161
162    /* Fetch results of calculations from GPU memory. */
163    err = clEnqueueReadBuffer( queue, bufC, CL_TRUE, 0,
164                                M * N * sizeof(*result),
165                                result, 0, NULL, NULL );
166
167    /* Release OpenCL memory objects. */
168    clReleaseMemObject( bufC );
169    clReleaseMemObject( bufB );
170    clReleaseMemObject( bufA );
171
172    /* Finalize work with clBLAS */
173    clblasTeardown( );
174
175    /* Release OpenCL working objects. */
176    clReleaseCommandQueue( queue );
177    clReleaseContext( ctx );
178
179    return ret;
180    }
181```
182
183## Build dependencies
184### Library for Windows
185*  Windows® 7/8
186*  Visual Studio 2010 SP1, 2012
187*  An OpenCL SDK, such as APP SDK 2.8
188*  Latest CMake
189
190### Library for Linux
191*  GCC 4.6 and onwards
192*  An OpenCL SDK, such as APP SDK 2.9
193*  Latest CMake
194
195### Library for Mac OSX
196*  Recommended to generate Unix makefiles with cmake
197
198### Test infrastructure
199*  Googletest v1.6
200*  ACML on windows/linux; Accelerate on Mac OSX
201*  Latest Boost
202
203### Performance infrastructure
204* Python
205
206  [Library and API documentation]: http://clmathlibraries.github.io/clBLAS/
207  [clmath@googlegroups.com]: https://groups.google.com/forum/#!forum/clmath
208  [clmath-developers@googlegroups.com]: https://groups.google.com/forum/#!forum/clmath-developers
209  [project wiki]: https://github.com/clMathLibraries/clBLAS/wiki
210  [build primer]: https://github.com/clMathLibraries/clBLAS/wiki/Build
211  [Contributing]: CONTRIBUTING.md
212  [Apache License, Version 2.0]: http://www.apache.org/licenses/LICENSE-2.0
213