1##############
2C API Tutorial
3##############
4
5In this tutorial, we are going to install XGBoost library & configure the CMakeLists.txt file of our C/C++ application to link XGBoost library with our application. Later on, we will see some useful tips for using C API and code snippets as examples to use various functions available in C API to perform basic task like loading, training model & predicting on test dataset.
6
7.. contents::
8  :backlinks: none
9  :local:
10
11************
12Requirements
13************
14
15Install CMake - Follow the `cmake installation documentation <https://cmake.org/install/>`_ for instructions.
16Install Conda - Follow the `conda installation  documentation <https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html>`_ for instructions
17
18*************************************
19Install XGBoost on conda environment
20*************************************
21
22Run the following commands on your terminal. The below commands will install the XGBoost in your XGBoost folder of the repository cloned
23
24.. code-block:: bash
25
26    # clone the XGBoost repository & its submodules
27    git clone --recursive https://github.com/dmlc/xgboost
28    cd xgboost
29    mkdir build
30    cd build
31    # Activate the Conda environment, into which we'll install XGBoost
32    conda activate [env_name]
33    # Build the compiled version of XGBoost inside the build folder
34    cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX
35    # install XGBoost in your conda environment (usually under [your home directory]/miniconda3)
36    make install
37
38*********************************************************************
39Configure CMakeList.txt file of your application to link with XGBoost
40*********************************************************************
41
42Here, we assume that your C++ application is using CMake for builds.
43
44Use ``find_package()`` and ``target_link_libraries()`` in your application's CMakeList.txt to link with the XGBoost library:
45
46.. code-block:: cmake
47
48    cmake_minimum_required(VERSION 3.13)
49    project(your_project_name LANGUAGES C CXX VERSION your_project_version)
50    find_package(xgboost REQUIRED)
51    add_executable(your_project_name /path/to/project_file.c)
52    target_link_libraries(your_project_name xgboost::xgboost)
53
54To ensure that CMake can locate the XGBoost library, supply ``-DCMAKE_PREFIX_PATH=$CONDA_PREFIX`` argument when invoking CMake. This option instructs CMake to locate the XGBoost library in ``$CONDA_PREFIX``, which is where your Conda environment is located.
55
56.. code-block:: bash
57
58  # Nagivate to the build directory for your application
59  cd build
60  # Activate the Conda environment where we previously installed XGBoost
61  conda activate [env_name]
62  # Invoke CMake with CMAKE_PREFIX_PATH
63  cmake .. -DCMAKE_PREFIX_PATH=$CONDA_PREFIX
64  # Build your application
65  make
66
67************************
68Usefull Tips To Remember
69************************
70
71Below are some useful tips while using C API:
72
731. Error handling: Always check the return value of the C API functions.
74
75a. In a C application: Use the following macro to guard all calls to XGBoost's C API functions. The macro prints all the error/ exception occurred:
76
77.. highlight:: c
78   :linenothreshold: 5
79
80.. code-block:: c
81
82  #define safe_xgboost(call) {  \
83    int err = (call); \
84    if (err != 0) { \
85      fprintf(stderr, "%s:%d: error in %s: %s\n", __FILE__, __LINE__, #call, XGBGetLastError());  \
86      exit(1); \
87    } \
88  }
89
90In your application, wrap all C API function calls with the macro as follows:
91
92.. code-block:: c
93
94  DMatrixHandle train;
95  safe_xgboost(XGDMatrixCreateFromFile("/path/to/training/dataset/", silent, &train));
96
97b. In a C++ application: modify the macro ``safe_xgboost`` to throw an exception upon an error.
98
99.. highlight:: cpp
100   :linenothreshold: 5
101
102.. code-block:: cpp
103
104  #define safe_xgboost(call) {  \
105    int err = (call); \
106    if (err != 0) { \
107      throw new Exception(std::string(__FILE__) + ":" + std::to_string(__LINE__) + \
108                          ": error in " + #call + ":" + XGBGetLastError()));  \
109    } \
110  }
111
112c. Assertion technique: It works both in C/ C++. If expression evaluates to 0 (false), then the expression, source code filename, and line number are sent to the standard error, and then abort() function is called. It can be used to test assumptions made by you in the code.
113
114.. code-block:: c
115
116  DMatrixHandle dmat;
117  assert( XGDMatrixCreateFromFile("training_data.libsvm", 0, &dmat) == 0);
118
119
1202. Always remember to free the allocated space by BoosterHandle & DMatrixHandle appropriately:
121
122.. code-block:: c
123
124    #include <assert.h>
125    #include <stdio.h>
126    #include <stdlib.h>
127    #include <xgboost/c_api.h>
128
129    int main(int argc, char** argv) {
130      int silent = 0;
131
132      BoosterHandle booster;
133
134      // do something with booster
135
136      //free the memory
137      XGBoosterFree(booster)
138
139      DMatrixHandle DMatrixHandle_param;
140
141      // do something with DMatrixHandle_param
142
143      // free the memory
144      XGDMatrixFree(DMatrixHandle_param);
145
146      return 0;
147    }
148
149
1503. For tree models, it is important to use consistent data formats during training and scoring/ predicting otherwise it will result in wrong outputs.
151   Example if we our training data is in ``dense matrix`` format then your prediction dataset should also be a ``dense matrix`` or if training in ``libsvm`` format then dataset for prediction should also be in ``libsvm`` format.
152
153
1544. Always use strings for setting values to the parameters in booster handle object. The paramter value can be of any data type (e.g. int, char, float, double, etc), but they should always be encoded as strings.
155
156.. code-block:: c
157
158    BoosterHandle booster;
159    XGBoosterSetParam(booster, "paramter_name", "0.1");
160
161
162**************************************************************
163Sample examples along with Code snippet to use C API functions
164**************************************************************
165
1661. If the dataset is available in a file, it can be loaded into a ``DMatrix`` object using the `XGDMatrixCreateFromFile <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a357c3654a1a4dcc05e6b5c50acd17105>`_
167
168.. code-block:: c
169
170  DMatrixHandle data; // handle to DMatrix
171  // Load the dat from file & store it in data variable of DMatrixHandle datatype
172  safe_xgboost(XGDMatrixCreateFromFile("/path/to/file/filename", silent, &data));
173
174
1752. You can also create a ``DMatrix`` object from a 2D Matrix using the `XGDMatrixCreateFromMat function <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a079f830cb972df70c7f50fb91678d62f>`_
176
177.. code-block:: c
178
179  // 1D matrix
180  const int data1[] = { 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 };
181
182  // 2D matrix
183  const int ROWS = 5, COLS = 3;
184  const int data2[ROWS][COLS] = { {1, 2, 3}, {2, 4, 6}, {3, -1, 9}, {4, 8, -1}, {2, 5, 1}, {0, 1, 5} };
185  DMatrixHandle dmatrix1, dmatrix2;
186  // Pass the matrix, no of rows & columns contained in the matrix variable
187  // here '0' represents the missing value in the matrix dataset
188  // dmatrix variable will contain the created DMatrix using it
189  safe_xgboost(XGDMatrixCreateFromMat(data1, 1, 50, 0, &dmatrix));
190  // here -1 represents the missing value in the matrix dataset
191  safe_xgboost(XGDMatrixCreateFromMat(data2, ROWS, COLS, -1, &dmatrix2));
192
193
1943. Create a Booster object for training & testing on dataset using `XGBoosterCreate <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#ad9fe6f8c8c4901db1c7581a96a21f9ae>`_
195
196.. code-block:: c
197
198  BoosterHandle booster;
199  const int eval_dmats_size;
200  // We assume that training and test data have been loaded into 'train' and 'test'
201  DMatrixHandle eval_dmats[eval_dmats_size] = {train, test};
202  safe_xgboost(XGBoosterCreate(eval_dmats, eval_dmats_size, &booster));
203
204
2054. For each ``DMatrix`` object, set the labels using `XGDMatrixSetFloatInfo <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#aef75cda93db3ae9af89e465ae7e9cbe3>`_. Later you can access the label using `XGDMatrixGetFloatInfo <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#ab0ee317539a1fb1ce2b5f249e8c768f6>`_.
206
207.. code-block:: c
208
209  const int ROWS=5, COLS=3;
210  const int data[ROWS][COLS] = { {1, 2, 3}, {2, 4, 6}, {3, -1, 9}, {4, 8, -1}, {2, 5, 1}, {0, 1, 5} };
211  DMatrixHandle dmatrix;
212
213  safe_xgboost(XGDMatrixCreateFromMat(data, ROWS, COLS, -1, &dmatrix));
214
215  // variable to store labels for the dataset created from above matrix
216  float labels[ROWS];
217
218  for (int i = 0; i < ROWS; i++) {
219    labels[i] = i;
220  }
221
222  // Loading the labels
223  safe_xgboost(XGDMatrixSetFloatInfo(dmatrix, "label", labels, ROWS));
224
225  // reading the labels and store the length of the result
226  bst_ulong result_len;
227
228  // labels result
229  const float *result;
230
231  safe_xgboost(XGDMatrixGetFloatInfo(dmatrix, "label", &result_len, &result));
232
233  for(unsigned int i = 0; i < result_len; i++) {
234    printf("label[%i] = %f\n", i, result[i]);
235  }
236
237
2385. Set the parameters for the ``Booster`` object according to the requirement using `XGBoosterSetParam <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#af7378865b0c999d2d08a5b16483b8bcb>`_ . Check out the full list of parameters available `here <https://xgboost.readthedocs.io/en/latest/parameter.html>`_ .
239
240.. code-block :: c
241
242    BoosterHandle booster;
243    safe_xgboost(XGBoosterSetParam(booster, "booster", "gblinear"));
244    // default max_depth =6
245    safe_xgboost(XGBoosterSetParam(booster, "max_depth", "3"));
246    // default eta  = 0.3
247    safe_xgboost(XGBoosterSetParam(booster, "eta", "0.1"));
248
249
2506. Train & evaluate the model using `XGBoosterUpdateOneIter <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a13594d68b27327db290ec5e0a0ac92ae>`_ and `XGBoosterEvalOneIter <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a201b53edb9cc52e9def1ccea951d18fe>`_ respectively.
251
252.. code-block:: c
253
254    int num_of_iterations = 20;
255    const char* eval_names[eval_dmats_size] = {"train", "test"};
256    const char* eval_result = NULL;
257
258    for (int i = 0; i < num_of_iterations; ++i) {
259      // Update the model performance for each iteration
260      safe_xgboost(XGBoosterUpdateOneIter(booster, i, train));
261
262      // Give the statistics for the learner for training & testing dataset in terms of error after each iteration
263      safe_xgboost(XGBoosterEvalOneIter(booster, i, eval_dmats, eval_names, eval_dmats_size, &eval_result));
264      printf("%s\n", eval_result);
265    }
266
267.. note:: For customized loss function, use `XGBoosterBoostOneIter function <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#afd4a42c38cfb16d2cf2a9cf5daba4e83>`_ instead and manually specify the gradient and 2nd order gradient.
268
269
2707.  Predict the result on a test set using `XGBoosterPredict <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#adc14afaedd5f1add105d18942a4de33c>`_
271
272.. code-block:: c
273
274    bst_ulong output_length;
275
276    const float *output_result;
277    safe_xgboost(XGBoosterPredict(booster, test, 0, 0, &output_length, &output_result));
278
279    for (unsigned int i = 0; i < output_length; i++){
280      printf("prediction[%i] = %f \n", i, output_result[i]);
281    }
282
283
2848. Free all the internal structure used in your code using `XGDMatrixFree <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#af06a15433b01e3b8297930a38155e05d>`_ and `XGBoosterFree <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a5d816936b005a103f0deabf287a6a5da>`_. This step is important to prevent memory leak.
285
286.. code-block:: c
287
288  safe_xgboost(XGDMatrixFree(dmatrix));
289  safe_xgboost(XGBoosterFree(booster));
290
291
2929. Get the number of features in your dataset using `XGBoosterGetNumFeature <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#aa2c22f65cf2770c0e2e56cc7929a14af>`_.
293
294.. code-block:: c
295
296    bst_ulong num_of_features = 0;
297
298    // Assuming booster variable of type BoosterHandle is already declared
299    // and dataset is loaded and trained on booster
300    // storing the results in num_of_features variable
301    safe_xgboost(XGBoosterGetNumFeature(booster, &num_of_features));
302
303    // Printing number of features by type conversion of num_of_features variable from bst_ulong to unsigned long
304    printf("num_feature: %lu\n", (unsigned long)(num_of_features));
305
306
30710. Load the model using `XGBoosterLoadModel function <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a054571e6364f9a1cbf6b6b4fd2f156d6>`_
308
309.. code-block:: c
310
311    BoosterHandle booster;
312    const char *model_path = "/path/of/model";
313
314    // create booster handle first
315    safe_xgboost(XGBoosterCreate(NULL, 0, &booster));
316
317    // set the model parameters here
318
319    // load model
320    safe_xgboost(XGBoosterLoadModel(booster, model_path));
321
322    // predict the model here
323