1############################ 2Distributed XGBoost with Ray 3############################ 4 5`Ray <https://ray.io/>`_ is a general purpose distributed execution framework. 6Ray can be used to scale computations from a single node to a cluster of hundreds 7of nodes without changing any code. 8 9The Python bindings of Ray come with a collection of well maintained 10machine learning libraries for hyperparameter optimization and model serving. 11 12The `XGBoost-Ray <https://github.com/ray-project/xgboost_ray>`_ project provides 13an interface to run XGBoost training and prediction jobs on a Ray cluster. It allows 14to utilize distributed data representations, such as 15`Modin <https://modin.readthedocs.io/en/latest/>`_ dataframes, 16as well as distributed loading from cloud storage (e.g. Parquet files). 17 18XGBoost-Ray integrates well with hyperparameter optimization library Ray Tune, and 19implements advanced fault tolerance handling mechanisms. With Ray you can scale 20your training jobs to hundreds of nodes just by adding new 21nodes to a cluster. You can also use Ray to leverage multi GPU XGBoost training. 22 23Installing and starting Ray 24=========================== 25Ray can be installed from PyPI like this: 26 27.. code-block:: bash 28 29 pip install ray 30 31If you're using Ray on a single machine, you don't need to do anything else - 32XGBoost-Ray will automatically start a local Ray cluster when used. 33 34If you want to use Ray on a cluster, you can use the 35`Ray cluster launcher <https://docs.ray.io/en/master/cluster/cloud.html>`_. 36 37Installing XGBoost-Ray 38====================== 39XGBoost-Ray is also available via PyPI: 40 41.. code-block:: bash 42 43 pip install xgboost_ray 44 45This will install all dependencies needed to run XGBoost on Ray, including 46Ray itself if it hasn't been installed before. 47 48Using XGBoost-Ray for training and prediction 49============================================= 50XGBoost-Ray uses the same API as core XGBoost. There are only two differences: 51 521. Instead of using a ``xgboost.DMatrix``, you'll use a ``xgboost_ray.RayDMatrix`` object 532. There is an additional :class:`ray_params <xgboost_ray.RayParams>` parameter that you can use to configure distributed training. 54 55Simple training example 56----------------------- 57 58To run this simple example, you'll need to install 59`scikit-learn <https://scikit-learn.org/>`_ (with ``pip install sklearn``). 60 61In this example, we will load the `breast cancer dataset <https://archive.ics.uci.edu/ml/datasets/breast+cancer>`_ 62and train a binary classifier using two actors. 63 64.. code-block:: python 65 66 from xgboost_ray import RayDMatrix, RayParams, train 67 from sklearn.datasets import load_breast_cancer 68 69 train_x, train_y = load_breast_cancer(return_X_y=True) 70 train_set = RayDMatrix(train_x, train_y) 71 72 evals_result = {} 73 bst = train( 74 { 75 "objective": "binary:logistic", 76 "eval_metric": ["logloss", "error"], 77 }, 78 train_set, 79 evals_result=evals_result, 80 evals=[(train_set, "train")], 81 verbose_eval=False, 82 ray_params=RayParams(num_actors=2, cpus_per_actor=1)) 83 84 bst.save_model("model.xgb") 85 print("Final training error: {:.4f}".format( 86 evals_result["train"]["error"][-1])) 87 88 89The only differences compared to the non-distributed API are 90the import statement (``xgboost_ray`` instead of ``xgboost``), using the 91``RayDMatrix`` instead of the ``DMatrix``, and passing a :class:`RayParams <xgboost_ray.RayParams>` object. 92 93The return object is a regular ``xgboost.Booster`` instance. 94 95 96Simple prediction example 97------------------------- 98.. code-block:: python 99 100 from xgboost_ray import RayDMatrix, RayParams, predict 101 from sklearn.datasets import load_breast_cancer 102 import xgboost as xgb 103 104 data, labels = load_breast_cancer(return_X_y=True) 105 106 dpred = RayDMatrix(data, labels) 107 108 bst = xgb.Booster(model_file="model.xgb") 109 pred_ray = predict(bst, dpred, ray_params=RayParams(num_actors=2)) 110 111 print(pred_ray) 112 113In this example, the data will be split across two actors. The result array 114will integrate this data in the correct order. 115 116The RayParams object 117======================== 118The ``RayParams`` object is used to configure various settings relating to 119the distributed training. 120 121.. autoclass:: xgboost_ray.RayParams 122 123Multi GPU training 124================== 125Ray automatically detects GPUs on cluster nodes. 126In order to start training on multiple GPUs, all you have to do is 127to set the ``gpus_per_actor`` parameter of the ``RayParams`` object, as well 128as the ``num_actors`` parameter for multiple GPUs: 129 130.. code-block:: python 131 132 ray_params = RayParams( 133 num_actors=4, 134 gpus_per_actor=1, 135 ) 136 137This will train on four GPUs in parallel. 138 139Note that it usually does not make sense to allocate more than one GPU per actor, 140as XGBoost relies on distributed libraries such as Dask or Ray to utilize multi 141GPU taining. 142 143Setting the number of CPUs per actor 144==================================== 145XGBoost natively utilizes multi threading to speed up computations. Thus if 146your are training on CPUs only, there is likely no benefit in using more than 147one actor per node. In that case, assuming you have a cluster of homogeneous nodes, 148set the number of CPUs per actor to the number of CPUs available on each node, 149and the number of actors to the number of nodes. 150 151If you are using multi GPU training on a single node, divide the number of 152available CPUs evenly across all actors. For instance, if you have 16 CPUs and 1534 GPUs available, each actor should access 1 GPU and 4 CPUs. 154 155If you are using a cluster of heterogeneous nodes (with different amounts of CPUs), 156you might just want to use the `greatest common divisor <https://en.wikipedia.org/wiki/Greatest_common_divisor>`_ 157for the number of CPUs per actor. E.g. if you have a cluster of three nodes with 1584, 8, and 12 CPUs, respectively, you'd start 6 actors with 4 CPUs each for maximum 159CPU utilization. 160 161Fault tolerance 162=============== 163XGBoost-Ray supports two fault tolerance modes. In **non-elastic training**, whenever 164a training actor dies (e.g. because the node goes down), the training job will stop, 165XGBoost-Ray will wait for the actor (or its resources) to become available again 166(this might be on a different node), and then continue training once all actors are back. 167 168In **elastic-training**, whenever a training actor dies, the rest of the actors 169continue training without the dead actor. If the actor comes back, it will be re-integrated 170into training again. 171 172Please note that in elastic-training this means that you will train on fewer data 173for some time. The benefit is that you can continue training even if a node goes 174away for the remainder of the training run, and don't have to wait until it is back up again. 175In practice this usually leads to a very minor decrease in accuracy but a much shorter 176training time compared to non-elastic training. 177 178Both training modes can be configured using the respective :class:`RayParams <xgboost_ray.RayParams>` 179parameters. 180 181Hyperparameter optimization 182=========================== 183XGBoost-Ray integrates well with `hyperparameter optimization framework Ray Tune <http://tune.io>`_. 184Ray Tune uses Ray to start multiple distributed trials with different hyperparameter configurations. 185If used with XGBoost-Ray, these trials will then start their own distributed training 186jobs. 187 188XGBoost-Ray automatically reports evaluation results back to Ray Tune. There's only 189a few things you need to do: 190 1911. Put your XGBoost-Ray training call into a function accepting parameter configurations 192 (``train_model`` in the example below). 1932. Create a :class:`RayParams <xgboost_ray.RayParams>` object (``ray_params`` 194 in the example below). 1953. Define the parameter search space (``config`` dict in the example below). 1964. Call ``tune.run()``: 197 * The ``metric`` parameter should contain the metric you'd like to optimize. 198 Usually this consists of the prefix passed to the ``evals`` argument of 199 ``xgboost_ray.train()``, and an ``eval_metric`` passed in the 200 XGBoost parameters (``train-error`` in the example below). 201 * The ``mode`` should either be ``min`` or ``max``, depending on whether 202 you'd like to minimize or maximize the metric 203 * The ``resources_per_actor`` should be set using ``ray_params.get_tune_resources()``. 204 This will make sure that each trial has the necessary resources available to 205 start their distributed training jobs. 206 207.. code-block:: python 208 209 from xgboost_ray import RayDMatrix, RayParams, train 210 from sklearn.datasets import load_breast_cancer 211 212 num_actors = 4 213 num_cpus_per_actor = 1 214 215 ray_params = RayParams( 216 num_actors=num_actors, cpus_per_actor=num_cpus_per_actor) 217 218 def train_model(config): 219 train_x, train_y = load_breast_cancer(return_X_y=True) 220 train_set = RayDMatrix(train_x, train_y) 221 222 evals_result = {} 223 bst = train( 224 params=config, 225 dtrain=train_set, 226 evals_result=evals_result, 227 evals=[(train_set, "train")], 228 verbose_eval=False, 229 ray_params=ray_params) 230 bst.save_model("model.xgb") 231 232 from ray import tune 233 234 # Specify the hyperparameter search space. 235 config = { 236 "tree_method": "approx", 237 "objective": "binary:logistic", 238 "eval_metric": ["logloss", "error"], 239 "eta": tune.loguniform(1e-4, 1e-1), 240 "subsample": tune.uniform(0.5, 1.0), 241 "max_depth": tune.randint(1, 9) 242 } 243 244 # Make sure to use the `get_tune_resources` method to set the `resources_per_trial` 245 analysis = tune.run( 246 train_model, 247 config=config, 248 metric="train-error", 249 mode="min", 250 num_samples=4, 251 resources_per_trial=ray_params.get_tune_resources()) 252 print("Best hyperparameters", analysis.best_config) 253 254 255Ray Tune supports various 256`search algorithms and libraries (e.g. BayesOpt, Tree-Parzen estimators) <https://docs.ray.io/en/latest/tune/key-concepts.html#search-algorithms>`_, 257`smart schedulers like successive halving <https://docs.ray.io/en/latest/tune/key-concepts.html#trial-schedulers>`_, 258and other features. Please refer to the `Ray Tune documentation <http://tune.io>`_ 259for more information. 260 261Additional resources 262==================== 263* `XGBoost-Ray repository <https://github.com/ray-project/xgboost_ray>`_ 264* `XGBoost-Ray documentation <https://docs.ray.io/en/master/xgboost-ray.html>`_ 265* `Ray core documentation <https://docs.ray.io/en/master/index.html>`_ 266* `Ray Tune documentation <http://tune.io>`_ 267