1############################### 2Feature Interaction Constraints 3############################### 4 5The decision tree is a powerful tool to discover interaction among independent 6variables (features). Variables that appear together in a traversal path 7are interacting with one another, since the condition of a child node is 8predicated on the condition of the parent node. For example, the highlighted 9red path in the diagram below contains three variables: :math:`x_1`, :math:`x_7`, 10and :math:`x_{10}`, so the highlighted prediction (at the highlighted leaf node) 11is the product of interaction between :math:`x_1`, :math:`x_7`, and 12:math:`x_{10}`. 13 14.. plot:: 15 :nofigs: 16 17 from graphviz import Source 18 source = r""" 19 digraph feature_interaction_illustration1 { 20 graph [fontname = "helvetica"]; 21 node [fontname = "helvetica"]; 22 edge [fontname = "helvetica"]; 23 0 [label=<x<SUB><FONT POINT-SIZE="11">10</FONT></SUB> < -1.5 ?>, shape=box, color=red, fontcolor=red]; 24 1 [label=<x<SUB><FONT POINT-SIZE="11">2</FONT></SUB> < 2 ?>, shape=box]; 25 2 [label=<x<SUB><FONT POINT-SIZE="11">7</FONT></SUB> < 0.3 ?>, shape=box, color=red, fontcolor=red]; 26 3 [label="...", shape=none]; 27 4 [label="...", shape=none]; 28 5 [label=<x<SUB><FONT POINT-SIZE="11">1</FONT></SUB> < 0.5 ?>, shape=box, color=red, fontcolor=red]; 29 6 [label="...", shape=none]; 30 7 [label="...", shape=none]; 31 8 [label="Predict +1.3", color=red, fontcolor=red]; 32 0 -> 1 [labeldistance=2.0, labelangle=45, headlabel="Yes/Missing "]; 33 0 -> 2 [labeldistance=2.0, labelangle=-45, 34 headlabel="No", color=red, fontcolor=red]; 35 1 -> 3 [labeldistance=2.0, labelangle=45, headlabel="Yes"]; 36 1 -> 4 [labeldistance=2.0, labelangle=-45, headlabel=" No/Missing"]; 37 2 -> 5 [labeldistance=2.0, labelangle=-45, headlabel="Yes", 38 color=red, fontcolor=red]; 39 2 -> 6 [labeldistance=2.0, labelangle=-45, headlabel=" No/Missing"]; 40 5 -> 7; 41 5 -> 8 [color=red]; 42 } 43 """ 44 Source(source, format='png').render('../_static/feature_interaction_illustration1', view=False) 45 Source(source, format='svg').render('../_static/feature_interaction_illustration1', view=False) 46 47.. figure:: ../_static/feature_interaction_illustration1.svg 48 :align: center 49 :figwidth: 80 % 50 51When the tree depth is larger than one, many variables interact on 52the sole basis of minimizing training loss, and the resulting decision tree may 53capture a spurious relationship (noise) rather than a legitimate relationship 54that generalizes across different datasets. **Feature interaction constraints** 55allow users to decide which variables are allowed to interact and which are not. 56 57Potential benefits include: 58 59* Better predictive performance from focusing on interactions that work -- 60 whether through domain specific knowledge or algorithms that rank interactions 61* Less noise in predictions; better generalization 62* More control to the user on what the model can fit. For example, the user may 63 want to exclude some interactions even if they perform well due to regulatory 64 constraints. 65 66**************** 67A Simple Example 68**************** 69 70Feature interaction constraints are expressed in terms of groups of variables 71that are allowed to interact. For example, the constraint 72``[0, 1]`` indicates that variables :math:`x_0` and :math:`x_1` are allowed to 73interact with each other but with no other variable. Similarly, ``[2, 3, 4]`` 74indicates that :math:`x_2`, :math:`x_3`, and :math:`x_4` are allowed to 75interact with one another but with no other variable. A set of feature 76interaction constraints is expressed as a nested list, e.g. 77``[[0, 1], [2, 3, 4]]``, where each inner list is a group of indices of features 78that are allowed to interact with each other. 79 80In the following diagram, the left decision tree is in violation of the first 81constraint (``[0, 1]``), whereas the right decision tree complies with both the 82first and second constraints (``[0, 1]``, ``[2, 3, 4]``). 83 84.. plot:: 85 :nofigs: 86 87 from graphviz import Source 88 source = r""" 89 digraph feature_interaction_illustration2 { 90 graph [fontname = "helvetica"]; 91 node [fontname = "helvetica"]; 92 edge [fontname = "helvetica"]; 93 0 [label=<x<SUB><FONT POINT-SIZE="11">0</FONT></SUB> < 5.0 ?>, shape=box]; 94 1 [label=<x<SUB><FONT POINT-SIZE="11">2</FONT></SUB> < -3.0 ?>, shape=box]; 95 2 [label="+0.6"]; 96 3 [label="-0.4"]; 97 4 [label="+1.2"]; 98 0 -> 1 [labeldistance=2.0, labelangle=45, headlabel="Yes/Missing "]; 99 0 -> 2 [labeldistance=2.0, labelangle=-45, headlabel="No"]; 100 1 -> 3 [labeldistance=2.0, labelangle=45, headlabel="Yes"]; 101 1 -> 4 [labeldistance=2.0, labelangle=-45, headlabel=" No/Missing"]; 102 } 103 """ 104 Source(source, format='png').render('../_static/feature_interaction_illustration2', view=False) 105 Source(source, format='svg').render('../_static/feature_interaction_illustration2', view=False) 106 107.. plot:: 108 :nofigs: 109 110 from graphviz import Source 111 source = r""" 112 digraph feature_interaction_illustration3 { 113 graph [fontname = "helvetica"]; 114 node [fontname = "helvetica"]; 115 edge [fontname = "helvetica"]; 116 0 [label=<x<SUB><FONT POINT-SIZE="11">3</FONT></SUB> < 2.5 ?>, shape=box]; 117 1 [label="+1.6"]; 118 2 [label=<x<SUB><FONT POINT-SIZE="11">2</FONT></SUB> < -1.2 ?>, shape=box]; 119 3 [label="+0.1"]; 120 4 [label="-0.3"]; 121 0 -> 1 [labeldistance=2.0, labelangle=45, headlabel="Yes"]; 122 0 -> 2 [labeldistance=2.0, labelangle=-45, headlabel=" No/Missing"]; 123 2 -> 3 [labeldistance=2.0, labelangle=45, headlabel="Yes/Missing "]; 124 2 -> 4 [labeldistance=2.0, labelangle=-45, headlabel="No"]; 125 } 126 """ 127 Source(source, format='png').render('../_static/feature_interaction_illustration3', view=False) 128 Source(source, format='svg').render('../_static/feature_interaction_illustration3', view=False) 129 130.. |fig1| image:: ../_static/feature_interaction_illustration2.svg 131 :scale: 7% 132 :align: middle 133 134.. |fig2| image:: ../_static/feature_interaction_illustration3.svg 135 :scale: 7% 136 :align: middle 137 138+-----------+---------+ 139| |fig1| | |fig2| | 140+-----------+---------+ 141| forbidden | allowed | 142+-----------+---------+ 143 144 145**************************************************** 146Enforcing Feature Interaction Constraints in XGBoost 147**************************************************** 148 149It is very simple to enforce feature interaction constraints in XGBoost. Here we will 150give an example using Python, but the same general idea generalizes to other 151platforms. 152 153Suppose the following code fits your model without feature interaction constraints: 154 155.. code-block:: python 156 157 model_no_constraints = xgb.train(params, dtrain, 158 num_boost_round = 1000, evals = evallist, 159 early_stopping_rounds = 10) 160 161Then fitting with feature interaction constraints only requires adding a single 162parameter: 163 164.. code-block:: python 165 166 params_constrained = params.copy() 167 # Use nested list to define feature interaction constraints 168 params_constrained['interaction_constraints'] = '[[0, 2], [1, 3, 4], [5, 6]]' 169 # Features 0 and 2 are allowed to interact with each other but with no other feature 170 # Features 1, 3, 4 are allowed to interact with one another but with no other feature 171 # Features 5 and 6 are allowed to interact with each other but with no other feature 172 173 model_with_constraints = xgb.train(params_constrained, dtrain, 174 num_boost_round = 1000, evals = evallist, 175 early_stopping_rounds = 10) 176 177**Choice of tree construction algorithm**. To use feature interaction constraints, be sure 178to set the ``tree_method`` parameter to one of the following: ``exact``, ``hist``, 179``approx`` or ``gpu_hist``. Support for ``gpu_hist`` and ``approx`` is added only in 1801.0.0. 181 182************** 183Advanced topic 184************** 185 186The intuition behind interaction constraints is simple. Users may have prior knowledge about 187relations between different features, and encode it as constraints during model 188construction. But there are also some subtleties around specifying constraints. Take 189the constraint ``[[1, 2], [2, 3, 4]]`` as an example. The second feature appears in two 190different interaction sets, ``[1, 2]`` and ``[2, 3, 4]``. So the union set of features 191allowed to interact with ``2`` is ``{1, 3, 4}``. In the following diagram, the root splits at 192feature ``2``. Because all its descendants should be able to interact with it, all 4 features 193are legitimate split candidates at the second layer. At first sight, this might look like 194disregarding the specified constraint sets, but it is not. 195 196.. plot:: 197 :nofigs: 198 199 from graphviz import Source 200 source = r""" 201 digraph feature_interaction_illustration4 { 202 graph [fontname = "helvetica"]; 203 node [fontname = "helvetica"]; 204 edge [fontname = "helvetica"]; 205 0 [label=<x<SUB><FONT POINT-SIZE="11">2</FONT></SUB>>, shape=box, color=black, fontcolor=black]; 206 1 [label=<x<SUB><FONT POINT-SIZE="11">{1, 2, 3, 4}</FONT></SUB>>, shape=box]; 207 2 [label=<x<SUB><FONT POINT-SIZE="11">{1, 2, 3, 4}</FONT></SUB>>, shape=box, color=black, fontcolor=black]; 208 3 [label="...", shape=none]; 209 4 [label="...", shape=none]; 210 5 [label="...", shape=none]; 211 6 [label="...", shape=none]; 212 0 -> 1; 213 0 -> 2; 214 1 -> 3; 215 1 -> 4; 216 2 -> 5; 217 2 -> 6; 218 } 219 """ 220 Source(source, format='png').render('../_static/feature_interaction_illustration4', view=False) 221 Source(source, format='svg').render('../_static/feature_interaction_illustration5', view=False) 222 223.. figure:: ../_static/feature_interaction_illustration4.png 224 :align: center 225 :figwidth: 80 % 226 227 ``{1, 2, 3, 4}`` represents the sets of legitimate split features. 228 229This has lead to some interesting implications of feature interaction constraints. Take 230``[[0, 1], [0, 1, 2], [1, 2]]`` as another example. Assuming we have only 3 available 231features in our training datasets for presentation purpose, careful readers might have 232found out that the above constraint is the same as simply ``[[0, 1, 2]]``. Since no matter which 233feature is chosen for split in the root node, all its descendants are allowd to include every 234feature as legitimate split candidates without violating interaction constraints. 235 236For one last example, we use ``[[0, 1], [1, 3, 4]]`` and choose feature ``0`` as split for 237the root node. At the second layer of the built tree, ``1`` is the only legitimate split 238candidate except for ``0`` itself, since they belong to the same constraint set. 239Following the grow path of our example tree below, the node at the second layer splits at 240feature ``1``. But due to the fact that ``1`` also belongs to second constraint set ``[1, 2413, 4]``, at the third layer, we are allowed to include all features as split candidates and 242still comply with the interaction constraints of its ascendants. 243 244.. plot:: 245 :nofigs: 246 247 from graphviz import Source 248 source = r""" 249 digraph feature_interaction_illustration5 { 250 graph [fontname = "helvetica"]; 251 node [fontname = "helvetica"]; 252 edge [fontname = "helvetica"]; 253 0 [label=<x<SUB><FONT POINT-SIZE="11">0</FONT></SUB>>, shape=box, color=black, fontcolor=black]; 254 1 [label="...", shape=none]; 255 2 [label=<x<SUB><FONT POINT-SIZE="11">1</FONT></SUB>>, shape=box, color=black, fontcolor=black]; 256 3 [label=<x<SUB><FONT POINT-SIZE="11">{0, 1, 3, 4}</FONT></SUB>>, shape=box, color=black, fontcolor=black]; 257 4 [label=<x<SUB><FONT POINT-SIZE="11">{0, 1, 3, 4}</FONT></SUB>>, shape=box, color=black, fontcolor=black]; 258 5 [label="...", shape=none]; 259 6 [label="...", shape=none]; 260 7 [label="...", shape=none]; 261 8 [label="...", shape=none]; 262 0 -> 1; 263 0 -> 2; 264 2 -> 3; 265 2 -> 4; 266 3 -> 5; 267 3 -> 6; 268 4 -> 7; 269 4 -> 8; 270 } 271 """ 272 Source(source, format='png').render('../_static/feature_interaction_illustration6', view=False) 273 Source(source, format='svg').render('../_static/feature_interaction_illustration7', view=False) 274 275 276.. figure:: ../_static/feature_interaction_illustration6.png 277 :align: center 278 :figwidth: 80 % 279 280 ``{0, 1, 3, 4}`` represents the sets of legitimate split features. 281