1############## 2C API Tutorial 3############## 4 5In this tutorial, we are going to install XGBoost library & configure the CMakeLists.txt file of our C/C++ application to link XGBoost library with our application. Later on, we will see some useful tips for using C API and code snippets as examples to use various functions available in C API to perform basic task like loading, training model & predicting on test dataset. 6 7.. contents:: 8 :backlinks: none 9 :local: 10 11************ 12Requirements 13************ 14 15Install CMake - Follow the `cmake installation documentation <https://cmake.org/install/>`_ for instructions. 16Install Conda - Follow the `conda installation documentation <https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html>`_ for instructions 17 18************************************* 19Install XGBoost on conda environment 20************************************* 21 22Run the following commands on your terminal. The below commands will install the XGBoost in your XGBoost folder of the repository cloned 23 24.. code-block:: bash 25 26 # clone the XGBoost repository & its submodules 27 git clone --recursive https://github.com/dmlc/xgboost 28 cd xgboost 29 mkdir build 30 cd build 31 # Activate the Conda environment, into which we'll install XGBoost 32 conda activate [env_name] 33 # Build the compiled version of XGBoost inside the build folder 34 cmake .. -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX 35 # install XGBoost in your conda environment (usually under [your home directory]/miniconda3) 36 make install 37 38********************************************************************* 39Configure CMakeList.txt file of your application to link with XGBoost 40********************************************************************* 41 42Here, we assume that your C++ application is using CMake for builds. 43 44Use ``find_package()`` and ``target_link_libraries()`` in your application's CMakeList.txt to link with the XGBoost library: 45 46.. code-block:: cmake 47 48 cmake_minimum_required(VERSION 3.13) 49 project(your_project_name LANGUAGES C CXX VERSION your_project_version) 50 find_package(xgboost REQUIRED) 51 add_executable(your_project_name /path/to/project_file.c) 52 target_link_libraries(your_project_name xgboost::xgboost) 53 54To ensure that CMake can locate the XGBoost library, supply ``-DCMAKE_PREFIX_PATH=$CONDA_PREFIX`` argument when invoking CMake. This option instructs CMake to locate the XGBoost library in ``$CONDA_PREFIX``, which is where your Conda environment is located. 55 56.. code-block:: bash 57 58 # Nagivate to the build directory for your application 59 cd build 60 # Activate the Conda environment where we previously installed XGBoost 61 conda activate [env_name] 62 # Invoke CMake with CMAKE_PREFIX_PATH 63 cmake .. -DCMAKE_PREFIX_PATH=$CONDA_PREFIX 64 # Build your application 65 make 66 67************************ 68Usefull Tips To Remember 69************************ 70 71Below are some useful tips while using C API: 72 731. Error handling: Always check the return value of the C API functions. 74 75a. In a C application: Use the following macro to guard all calls to XGBoost's C API functions. The macro prints all the error/ exception occurred: 76 77.. highlight:: c 78 :linenothreshold: 5 79 80.. code-block:: c 81 82 #define safe_xgboost(call) { \ 83 int err = (call); \ 84 if (err != 0) { \ 85 fprintf(stderr, "%s:%d: error in %s: %s\n", __FILE__, __LINE__, #call, XGBGetLastError()); \ 86 exit(1); \ 87 } \ 88 } 89 90In your application, wrap all C API function calls with the macro as follows: 91 92.. code-block:: c 93 94 DMatrixHandle train; 95 safe_xgboost(XGDMatrixCreateFromFile("/path/to/training/dataset/", silent, &train)); 96 97b. In a C++ application: modify the macro ``safe_xgboost`` to throw an exception upon an error. 98 99.. highlight:: cpp 100 :linenothreshold: 5 101 102.. code-block:: cpp 103 104 #define safe_xgboost(call) { \ 105 int err = (call); \ 106 if (err != 0) { \ 107 throw new Exception(std::string(__FILE__) + ":" + std::to_string(__LINE__) + \ 108 ": error in " + #call + ":" + XGBGetLastError())); \ 109 } \ 110 } 111 112c. Assertion technique: It works both in C/ C++. If expression evaluates to 0 (false), then the expression, source code filename, and line number are sent to the standard error, and then abort() function is called. It can be used to test assumptions made by you in the code. 113 114.. code-block:: c 115 116 DMatrixHandle dmat; 117 assert( XGDMatrixCreateFromFile("training_data.libsvm", 0, &dmat) == 0); 118 119 1202. Always remember to free the allocated space by BoosterHandle & DMatrixHandle appropriately: 121 122.. code-block:: c 123 124 #include <assert.h> 125 #include <stdio.h> 126 #include <stdlib.h> 127 #include <xgboost/c_api.h> 128 129 int main(int argc, char** argv) { 130 int silent = 0; 131 132 BoosterHandle booster; 133 134 // do something with booster 135 136 //free the memory 137 XGBoosterFree(booster) 138 139 DMatrixHandle DMatrixHandle_param; 140 141 // do something with DMatrixHandle_param 142 143 // free the memory 144 XGDMatrixFree(DMatrixHandle_param); 145 146 return 0; 147 } 148 149 1503. For tree models, it is important to use consistent data formats during training and scoring/ predicting otherwise it will result in wrong outputs. 151 Example if we our training data is in ``dense matrix`` format then your prediction dataset should also be a ``dense matrix`` or if training in ``libsvm`` format then dataset for prediction should also be in ``libsvm`` format. 152 153 1544. Always use strings for setting values to the parameters in booster handle object. The paramter value can be of any data type (e.g. int, char, float, double, etc), but they should always be encoded as strings. 155 156.. code-block:: c 157 158 BoosterHandle booster; 159 XGBoosterSetParam(booster, "paramter_name", "0.1"); 160 161 162************************************************************** 163Sample examples along with Code snippet to use C API functions 164************************************************************** 165 1661. If the dataset is available in a file, it can be loaded into a ``DMatrix`` object using the `XGDMatrixCreateFromFile <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a357c3654a1a4dcc05e6b5c50acd17105>`_ 167 168.. code-block:: c 169 170 DMatrixHandle data; // handle to DMatrix 171 // Load the dat from file & store it in data variable of DMatrixHandle datatype 172 safe_xgboost(XGDMatrixCreateFromFile("/path/to/file/filename", silent, &data)); 173 174 1752. You can also create a ``DMatrix`` object from a 2D Matrix using the `XGDMatrixCreateFromMat function <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a079f830cb972df70c7f50fb91678d62f>`_ 176 177.. code-block:: c 178 179 // 1D matrix 180 const int data1[] = { 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 }; 181 182 // 2D matrix 183 const int ROWS = 5, COLS = 3; 184 const int data2[ROWS][COLS] = { {1, 2, 3}, {2, 4, 6}, {3, -1, 9}, {4, 8, -1}, {2, 5, 1}, {0, 1, 5} }; 185 DMatrixHandle dmatrix1, dmatrix2; 186 // Pass the matrix, no of rows & columns contained in the matrix variable 187 // here '0' represents the missing value in the matrix dataset 188 // dmatrix variable will contain the created DMatrix using it 189 safe_xgboost(XGDMatrixCreateFromMat(data1, 1, 50, 0, &dmatrix)); 190 // here -1 represents the missing value in the matrix dataset 191 safe_xgboost(XGDMatrixCreateFromMat(data2, ROWS, COLS, -1, &dmatrix2)); 192 193 1943. Create a Booster object for training & testing on dataset using `XGBoosterCreate <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#ad9fe6f8c8c4901db1c7581a96a21f9ae>`_ 195 196.. code-block:: c 197 198 BoosterHandle booster; 199 const int eval_dmats_size; 200 // We assume that training and test data have been loaded into 'train' and 'test' 201 DMatrixHandle eval_dmats[eval_dmats_size] = {train, test}; 202 safe_xgboost(XGBoosterCreate(eval_dmats, eval_dmats_size, &booster)); 203 204 2054. For each ``DMatrix`` object, set the labels using `XGDMatrixSetFloatInfo <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#aef75cda93db3ae9af89e465ae7e9cbe3>`_. Later you can access the label using `XGDMatrixGetFloatInfo <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#ab0ee317539a1fb1ce2b5f249e8c768f6>`_. 206 207.. code-block:: c 208 209 const int ROWS=5, COLS=3; 210 const int data[ROWS][COLS] = { {1, 2, 3}, {2, 4, 6}, {3, -1, 9}, {4, 8, -1}, {2, 5, 1}, {0, 1, 5} }; 211 DMatrixHandle dmatrix; 212 213 safe_xgboost(XGDMatrixCreateFromMat(data, ROWS, COLS, -1, &dmatrix)); 214 215 // variable to store labels for the dataset created from above matrix 216 float labels[ROWS]; 217 218 for (int i = 0; i < ROWS; i++) { 219 labels[i] = i; 220 } 221 222 // Loading the labels 223 safe_xgboost(XGDMatrixSetFloatInfo(dmatrix, "label", labels, ROWS)); 224 225 // reading the labels and store the length of the result 226 bst_ulong result_len; 227 228 // labels result 229 const float *result; 230 231 safe_xgboost(XGDMatrixGetFloatInfo(dmatrix, "label", &result_len, &result)); 232 233 for(unsigned int i = 0; i < result_len; i++) { 234 printf("label[%i] = %f\n", i, result[i]); 235 } 236 237 2385. Set the parameters for the ``Booster`` object according to the requirement using `XGBoosterSetParam <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#af7378865b0c999d2d08a5b16483b8bcb>`_ . Check out the full list of parameters available `here <https://xgboost.readthedocs.io/en/latest/parameter.html>`_ . 239 240.. code-block :: c 241 242 BoosterHandle booster; 243 safe_xgboost(XGBoosterSetParam(booster, "booster", "gblinear")); 244 // default max_depth =6 245 safe_xgboost(XGBoosterSetParam(booster, "max_depth", "3")); 246 // default eta = 0.3 247 safe_xgboost(XGBoosterSetParam(booster, "eta", "0.1")); 248 249 2506. Train & evaluate the model using `XGBoosterUpdateOneIter <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a13594d68b27327db290ec5e0a0ac92ae>`_ and `XGBoosterEvalOneIter <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a201b53edb9cc52e9def1ccea951d18fe>`_ respectively. 251 252.. code-block:: c 253 254 int num_of_iterations = 20; 255 const char* eval_names[eval_dmats_size] = {"train", "test"}; 256 const char* eval_result = NULL; 257 258 for (int i = 0; i < num_of_iterations; ++i) { 259 // Update the model performance for each iteration 260 safe_xgboost(XGBoosterUpdateOneIter(booster, i, train)); 261 262 // Give the statistics for the learner for training & testing dataset in terms of error after each iteration 263 safe_xgboost(XGBoosterEvalOneIter(booster, i, eval_dmats, eval_names, eval_dmats_size, &eval_result)); 264 printf("%s\n", eval_result); 265 } 266 267.. note:: For customized loss function, use `XGBoosterBoostOneIter function <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#afd4a42c38cfb16d2cf2a9cf5daba4e83>`_ instead and manually specify the gradient and 2nd order gradient. 268 269 2707. Predict the result on a test set using `XGBoosterPredict <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#adc14afaedd5f1add105d18942a4de33c>`_ 271 272.. code-block:: c 273 274 bst_ulong output_length; 275 276 const float *output_result; 277 safe_xgboost(XGBoosterPredict(booster, test, 0, 0, &output_length, &output_result)); 278 279 for (unsigned int i = 0; i < output_length; i++){ 280 printf("prediction[%i] = %f \n", i, output_result[i]); 281 } 282 283 2848. Free all the internal structure used in your code using `XGDMatrixFree <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#af06a15433b01e3b8297930a38155e05d>`_ and `XGBoosterFree <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a5d816936b005a103f0deabf287a6a5da>`_. This step is important to prevent memory leak. 285 286.. code-block:: c 287 288 safe_xgboost(XGDMatrixFree(dmatrix)); 289 safe_xgboost(XGBoosterFree(booster)); 290 291 2929. Get the number of features in your dataset using `XGBoosterGetNumFeature <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#aa2c22f65cf2770c0e2e56cc7929a14af>`_. 293 294.. code-block:: c 295 296 bst_ulong num_of_features = 0; 297 298 // Assuming booster variable of type BoosterHandle is already declared 299 // and dataset is loaded and trained on booster 300 // storing the results in num_of_features variable 301 safe_xgboost(XGBoosterGetNumFeature(booster, &num_of_features)); 302 303 // Printing number of features by type conversion of num_of_features variable from bst_ulong to unsigned long 304 printf("num_feature: %lu\n", (unsigned long)(num_of_features)); 305 306 30710. Load the model using `XGBoosterLoadModel function <https://xgboost.readthedocs.io/en/stable/dev/c__api_8h.html#a054571e6364f9a1cbf6b6b4fd2f156d6>`_ 308 309.. code-block:: c 310 311 BoosterHandle booster; 312 const char *model_path = "/path/of/model"; 313 314 // create booster handle first 315 safe_xgboost(XGBoosterCreate(NULL, 0, &booster)); 316 317 // set the model parameters here 318 319 // load model 320 safe_xgboost(XGBoosterLoadModel(booster, model_path)); 321 322 // predict the model here 323