1###############################
2Feature Interaction Constraints
3###############################
4
5The decision tree is a powerful tool to discover interaction among independent
6variables (features). Variables that appear together in a traversal path
7are interacting with one another, since the condition of a child node is
8predicated on the condition of the parent node. For example, the highlighted
9red path in the diagram below contains three variables: :math:`x_1`, :math:`x_7`,
10and :math:`x_{10}`, so the highlighted prediction (at the highlighted leaf node)
11is the product of interaction between :math:`x_1`, :math:`x_7`, and
12:math:`x_{10}`.
13
14.. plot::
15  :nofigs:
16
17  from graphviz import Source
18  source = r"""
19    digraph feature_interaction_illustration1 {
20      graph [fontname = "helvetica"];
21      node [fontname = "helvetica"];
22      edge [fontname = "helvetica"];
23      0 [label=<x<SUB><FONT POINT-SIZE="11">10</FONT></SUB> &lt; -1.5 ?>, shape=box, color=red, fontcolor=red];
24      1 [label=<x<SUB><FONT POINT-SIZE="11">2</FONT></SUB> &lt; 2 ?>, shape=box];
25      2 [label=<x<SUB><FONT POINT-SIZE="11">7</FONT></SUB> &lt; 0.3 ?>, shape=box, color=red, fontcolor=red];
26      3 [label="...", shape=none];
27      4 [label="...", shape=none];
28      5 [label=<x<SUB><FONT POINT-SIZE="11">1</FONT></SUB> &lt; 0.5 ?>, shape=box, color=red, fontcolor=red];
29      6 [label="...", shape=none];
30      7 [label="...", shape=none];
31      8 [label="Predict +1.3", color=red, fontcolor=red];
32      0 -> 1 [labeldistance=2.0, labelangle=45, headlabel="Yes/Missing           "];
33      0 -> 2 [labeldistance=2.0, labelangle=-45,
34              headlabel="No", color=red, fontcolor=red];
35      1 -> 3 [labeldistance=2.0, labelangle=45, headlabel="Yes"];
36      1 -> 4 [labeldistance=2.0, labelangle=-45, headlabel="             No/Missing"];
37      2 -> 5 [labeldistance=2.0, labelangle=-45, headlabel="Yes",
38              color=red, fontcolor=red];
39      2 -> 6 [labeldistance=2.0, labelangle=-45, headlabel="           No/Missing"];
40      5 -> 7;
41      5 -> 8 [color=red];
42    }
43  """
44  Source(source, format='png').render('../_static/feature_interaction_illustration1', view=False)
45  Source(source, format='svg').render('../_static/feature_interaction_illustration1', view=False)
46
47.. figure:: ../_static/feature_interaction_illustration1.svg
48   :align: center
49   :figwidth: 80 %
50
51When the tree depth is larger than one, many variables interact on
52the sole basis of minimizing training loss, and the resulting decision tree may
53capture a spurious relationship (noise) rather than a legitimate relationship
54that generalizes across different datasets. **Feature interaction constraints**
55allow users to decide which variables are allowed to interact and which are not.
56
57Potential benefits include:
58
59* Better predictive performance from focusing on interactions that work --
60  whether through domain specific knowledge or algorithms that rank interactions
61* Less noise in predictions; better generalization
62* More control to the user on what the model can fit. For example, the user may
63  want to exclude some interactions even if they perform well due to regulatory
64  constraints.
65
66****************
67A Simple Example
68****************
69
70Feature interaction constraints are expressed in terms of groups of variables
71that are allowed to interact. For example, the constraint
72``[0, 1]`` indicates that variables :math:`x_0` and :math:`x_1` are allowed to
73interact with each other but with no other variable. Similarly, ``[2, 3, 4]``
74indicates that :math:`x_2`, :math:`x_3`, and :math:`x_4` are allowed to
75interact with one another but with no other variable. A set of feature
76interaction constraints is expressed as a nested list, e.g.
77``[[0, 1], [2, 3, 4]]``, where each inner list is a group of indices of features
78that are allowed to interact with each other.
79
80In the following diagram, the left decision tree is in violation of the first
81constraint (``[0, 1]``), whereas the right decision tree complies with both the
82first and second constraints (``[0, 1]``, ``[2, 3, 4]``).
83
84.. plot::
85  :nofigs:
86
87  from graphviz import Source
88  source = r"""
89    digraph feature_interaction_illustration2 {
90      graph [fontname = "helvetica"];
91      node [fontname = "helvetica"];
92      edge [fontname = "helvetica"];
93      0 [label=<x<SUB><FONT POINT-SIZE="11">0</FONT></SUB> &lt; 5.0 ?>, shape=box];
94      1 [label=<x<SUB><FONT POINT-SIZE="11">2</FONT></SUB> &lt; -3.0 ?>, shape=box];
95      2 [label="+0.6"];
96      3 [label="-0.4"];
97      4 [label="+1.2"];
98      0 -> 1 [labeldistance=2.0, labelangle=45, headlabel="Yes/Missing           "];
99      0 -> 2 [labeldistance=2.0, labelangle=-45, headlabel="No"];
100      1 -> 3 [labeldistance=2.0, labelangle=45, headlabel="Yes"];
101      1 -> 4 [labeldistance=2.0, labelangle=-45, headlabel="           No/Missing"];
102    }
103  """
104  Source(source, format='png').render('../_static/feature_interaction_illustration2', view=False)
105  Source(source, format='svg').render('../_static/feature_interaction_illustration2', view=False)
106
107.. plot::
108  :nofigs:
109
110  from graphviz import Source
111  source = r"""
112    digraph feature_interaction_illustration3 {
113      graph [fontname = "helvetica"];
114      node [fontname = "helvetica"];
115      edge [fontname = "helvetica"];
116      0 [label=<x<SUB><FONT POINT-SIZE="11">3</FONT></SUB> &lt; 2.5 ?>, shape=box];
117      1 [label="+1.6"];
118      2 [label=<x<SUB><FONT POINT-SIZE="11">2</FONT></SUB> &lt; -1.2 ?>, shape=box];
119      3 [label="+0.1"];
120      4 [label="-0.3"];
121      0 -> 1 [labeldistance=2.0, labelangle=45, headlabel="Yes"];
122      0 -> 2 [labeldistance=2.0, labelangle=-45, headlabel="           No/Missing"];
123      2 -> 3 [labeldistance=2.0, labelangle=45, headlabel="Yes/Missing           "];
124      2 -> 4 [labeldistance=2.0, labelangle=-45, headlabel="No"];
125    }
126  """
127  Source(source, format='png').render('../_static/feature_interaction_illustration3', view=False)
128  Source(source, format='svg').render('../_static/feature_interaction_illustration3', view=False)
129
130.. |fig1| image:: ../_static/feature_interaction_illustration2.svg
131   :scale: 7%
132   :align: middle
133
134.. |fig2| image:: ../_static/feature_interaction_illustration3.svg
135   :scale: 7%
136   :align: middle
137
138+-----------+---------+
139| |fig1|    | |fig2|  |
140+-----------+---------+
141| forbidden | allowed |
142+-----------+---------+
143
144
145****************************************************
146Enforcing Feature Interaction Constraints in XGBoost
147****************************************************
148
149It is very simple to enforce feature interaction constraints in XGBoost.  Here we will
150give an example using Python, but the same general idea generalizes to other
151platforms.
152
153Suppose the following code fits your model without feature interaction constraints:
154
155.. code-block:: python
156
157  model_no_constraints = xgb.train(params, dtrain,
158                                   num_boost_round = 1000, evals = evallist,
159                                   early_stopping_rounds = 10)
160
161Then fitting with feature interaction constraints only requires adding a single
162parameter:
163
164.. code-block:: python
165
166  params_constrained = params.copy()
167  # Use nested list to define feature interaction constraints
168  params_constrained['interaction_constraints'] = '[[0, 2], [1, 3, 4], [5, 6]]'
169  # Features 0 and 2 are allowed to interact with each other but with no other feature
170  # Features 1, 3, 4 are allowed to interact with one another but with no other feature
171  # Features 5 and 6 are allowed to interact with each other but with no other feature
172
173  model_with_constraints = xgb.train(params_constrained, dtrain,
174                                     num_boost_round = 1000, evals = evallist,
175                                     early_stopping_rounds = 10)
176
177**Choice of tree construction algorithm**. To use feature interaction constraints, be sure
178to set the ``tree_method`` parameter to one of the following: ``exact``, ``hist``,
179``approx`` or ``gpu_hist``.  Support for ``gpu_hist`` and ``approx`` is added only in
1801.0.0.
181
182**************
183Advanced topic
184**************
185
186The intuition behind interaction constraints is simple.  Users may have prior knowledge about
187relations between different features, and encode it as constraints during model
188construction.  But there are also some subtleties around specifying constraints.  Take
189the constraint ``[[1, 2], [2, 3, 4]]`` as an example.  The second feature appears in two
190different interaction sets, ``[1, 2]`` and ``[2, 3, 4]``.  So the union set of features
191allowed to interact with ``2`` is ``{1, 3, 4}``.  In the following diagram, the root splits at
192feature ``2``.  Because all its descendants should be able to interact with it, all 4 features
193are legitimate split candidates at the second layer. At first sight, this might look like
194disregarding the specified constraint sets, but it is not.
195
196.. plot::
197  :nofigs:
198
199  from graphviz import Source
200  source = r"""
201    digraph feature_interaction_illustration4 {
202      graph [fontname = "helvetica"];
203      node [fontname = "helvetica"];
204      edge [fontname = "helvetica"];
205      0 [label=<x<SUB><FONT POINT-SIZE="11">2</FONT></SUB>>, shape=box, color=black, fontcolor=black];
206      1 [label=<x<SUB><FONT POINT-SIZE="11">{1, 2, 3, 4}</FONT></SUB>>, shape=box];
207      2 [label=<x<SUB><FONT POINT-SIZE="11">{1, 2, 3, 4}</FONT></SUB>>, shape=box, color=black, fontcolor=black];
208      3 [label="...", shape=none];
209      4 [label="...", shape=none];
210      5 [label="...", shape=none];
211      6 [label="...", shape=none];
212      0 -> 1;
213      0 -> 2;
214      1 -> 3;
215      1 -> 4;
216      2 -> 5;
217      2 -> 6;
218    }
219  """
220  Source(source, format='png').render('../_static/feature_interaction_illustration4', view=False)
221  Source(source, format='svg').render('../_static/feature_interaction_illustration5', view=False)
222
223.. figure:: ../_static/feature_interaction_illustration4.png
224   :align: center
225   :figwidth: 80 %
226
227   ``{1, 2, 3, 4}`` represents the sets of legitimate split features.
228
229This has lead to some interesting implications of feature interaction constraints.  Take
230``[[0, 1], [0, 1, 2], [1, 2]]`` as another example.  Assuming we have only 3 available
231features in our training datasets for presentation purpose, careful readers might have
232found out that the above constraint is the same as simply ``[[0, 1, 2]]``.  Since no matter which
233feature is chosen for split in the root node, all its descendants are allowd to include every
234feature as legitimate split candidates without violating interaction constraints.
235
236For one last example, we use ``[[0, 1], [1, 3, 4]]`` and choose feature ``0`` as split for
237the root node.  At the second layer of the built tree, ``1`` is the only legitimate split
238candidate except for ``0`` itself, since they belong to the same constraint set.
239Following the grow path of our example tree below, the node at the second layer splits at
240feature ``1``.  But due to the fact that ``1`` also belongs to second constraint set ``[1,
2413, 4]``, at the third layer, we are allowed to include all features as split candidates and
242still comply with the interaction constraints of its ascendants.
243
244.. plot::
245  :nofigs:
246
247  from graphviz import Source
248  source = r"""
249    digraph feature_interaction_illustration5 {
250      graph [fontname = "helvetica"];
251      node [fontname = "helvetica"];
252      edge [fontname = "helvetica"];
253      0 [label=<x<SUB><FONT POINT-SIZE="11">0</FONT></SUB>>, shape=box, color=black, fontcolor=black];
254      1 [label="...", shape=none];
255      2 [label=<x<SUB><FONT POINT-SIZE="11">1</FONT></SUB>>, shape=box, color=black, fontcolor=black];
256      3 [label=<x<SUB><FONT POINT-SIZE="11">{0, 1, 3, 4}</FONT></SUB>>, shape=box, color=black, fontcolor=black];
257      4 [label=<x<SUB><FONT POINT-SIZE="11">{0, 1, 3, 4}</FONT></SUB>>, shape=box, color=black, fontcolor=black];
258      5 [label="...", shape=none];
259      6 [label="...", shape=none];
260      7 [label="...", shape=none];
261      8 [label="...", shape=none];
262      0 -> 1;
263      0 -> 2;
264      2 -> 3;
265      2 -> 4;
266      3 -> 5;
267      3 -> 6;
268      4 -> 7;
269      4 -> 8;
270    }
271  """
272  Source(source, format='png').render('../_static/feature_interaction_illustration6', view=False)
273  Source(source, format='svg').render('../_static/feature_interaction_illustration7', view=False)
274
275
276.. figure:: ../_static/feature_interaction_illustration6.png
277   :align: center
278   :figwidth: 80 %
279
280   ``{0, 1, 3, 4}`` represents the sets of legitimate split features.
281