1.. _spearman_coefficient:
2
3Spearman correlation coefficient
4--------------------------------
5
6This method deals with the parametric modelling of a probability
7distribution for a random vector
8:math:`\vect{X} = \left( X^1,\ldots,X^{n_X} \right)`. It aims to measure
9a type of dependence (here a monotonous correlation) which may exist
10between two components :math:`X^i` and :math:`X^j`.
11
12The Spearman’s correlation coefficient :math:`\rho^S_{U,V}` aims to
13measure the strength of a monotonic relationship between two random
14variables :math:`U` and :math:`V`. It is in fact equivalent to the
15Pearson’s correlation coefficient after having transformed :math:`U` and
16:math:`V` to linearize any monotonic relationship (remember that
17Pearson’s correlation coefficient may only be used to measure the
18strength of linear relationships, see :ref:`Pearson’s correlation coefficient <pearson_coefficient>`):
19
20.. math::
21
22   \begin{aligned}
23       \rho^S_{U,V} = \rho_{F_U(U),F_V(V)}
24     \end{aligned}
25
26where :math:`F_U` and :math:`F_V` denote the cumulative distribution
27functions of :math:`U` and :math:`V`.
28
29If we arrange a sample made up of :math:`N` pairs
30:math:`\left\{ (u_1,v_1),(u_2,v_2),\ldots,(u_N,v_N) \right\}`, the
31estimation of Spearman’s correlation coefficient first of all requires a
32ranking to produce two samples :math:`(u_1,\ldots,u_N)` and
33:math:`(v_1,\ldots,v_N)`. The ranking :math:`u_{[i]}` of the observation
34:math:`u_i` is defined as the position of :math:`u_i` in the sample
35reordered in ascending order: if :math:`u_i` is the smallest value in
36the sample :math:`(u_1,\ldots,u_N)`, its ranking would equal 1; if
37:math:`u_i` is the second smallest value in the sample, its ranking
38would equal 2, and so forth. The ranking transformation is a procedure
39that takes the sample :math:`(u_1,\ldots,u_N)`) as input data and
40produces the sample :math:`(u_{[1]},\ldots,u_{[N]})` as an output
41result.
42
43For example, let us consider the sample
44:math:`(u_1,u_2,u_3,u_4) = (1.5,0.7,5.1,4.3)`. We therefore have
45:math:`(u_{[1]},u_{[2]}u_{[3]},u_{[4]}) = (2,1,4,3)`. :math:`u_1 = 1.5`
46is in fact the second smallest value in the original, :math:`u_2 = 0.7`
47the smallest, etc.
48
49The estimation of Spearman’s correlation coefficient is therefore equal
50to Pearson’s coefficient estimated with the aid of the :math:`N` pairs
51:math:`(u_{[1]},v_{[1]})`, :math:`(u_{[2]},v_{[2]})`, …,
52:math:`(u_{[N]},v_{[N]})`:
53
54.. math::
55
56   \begin{aligned}
57       \widehat{\rho}^S_{U,V} = \frac{ \displaystyle \sum_{i=1}^N \left( u_{[i]} - \overline{u}_{[]} \right) \left( v_{[i]} - \overline{v}_{[]} \right) }{ \sqrt{\displaystyle \sum_{i=1}^N \left( u_{[i]} - \overline{u}_{[]} \right)^2 \left( v_{[i]} - \overline{v}_{[]} \right)^2} }
58     \end{aligned}
59
60where :math:`\overline{u}_{[]}` and :math:`\overline{v}_{[]}` represent
61the empirical means of the samples :math:`(u_{[1]},\ldots,u_{[N]})` and
62:math:`(v_{[1]},\ldots,v_{[N]})`.
63
64The Spearman’s correlation coefficient takes values between -1 and 1.
65The closer its absolute value is to 1, the stronger the indication is
66that a monotonic relationship exists between variables :math:`U` and
67:math:`V`. The sign of Spearman’s coefficient indicates if the two
68variables increase or decrease in the same direction (positive
69coefficient) or in opposite directions (negative coefficient). We note
70that a correlation coefficient equal to 0 does not necessarily imply the
71independence of variables :math:`U` and :math:`V`. There are two
72possible situations in the event of a zero Spearman’s correlation
73coefficient:
74
75-  the variables :math:`U` and :math:`V` are in fact independent,
76
77-  or a non-monotonic relationship exists between :math:`U` and
78   :math:`V`.
79
80.. plot::
81
82    import openturns as ot
83    from openturns.viewer import View
84
85    N = 20
86    ot.RandomGenerator.SetSeed(10)
87    x = ot.Uniform(0.0, 10.0).getSample(N)
88    f = ot.SymbolicFunction(['x'], ['x^2'])
89    y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
90    graph = f.draw(0.0, 10.0)
91    graph.setTitle('There is a monotonic relationship between U and V:\nSpearman\'s coefficient is a relevant measure of dependency...')
92    graph.setXTitle('u')
93    graph.setYTitle('v')
94    cloud = ot.Cloud(x, y)
95    cloud.setPointStyle('circle')
96    cloud.setColor('orange')
97    graph.add(cloud)
98    View(graph)
99
100.. plot::
101
102    import openturns as ot
103    from openturns.viewer import View
104
105    N = 20
106    ot.RandomGenerator.SetSeed(10)
107    x = ot.Uniform(0.0, 10.0).getSample(N)
108    f = ot.SymbolicFunction(['x'], ['5*x+10'])
109    y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
110    graph = f.draw(0.0, 10.0)
111    graph.setTitle('... because the rank transformation turns any monotonic trend\ninto a linear relation for which Pearson\'s correlation is relevant')
112    graph.setXTitle('u')
113    graph.setYTitle('v')
114    cloud = ot.Cloud(x, y)
115    cloud.setPointStyle('circle')
116    cloud.setColor('orange')
117    graph.add(cloud)
118    View(graph)
119
120.. plot::
121
122    import openturns as ot
123    from openturns.viewer import View
124
125    N = 20
126    ot.RandomGenerator.SetSeed(10)
127    x = ot.Uniform(0.0, 10.0).getSample(N)
128    f = ot.SymbolicFunction(['x'], ['5'])
129    y = ot.Uniform(0.0, 10.0).getSample(N)
130    graph = f.draw(0.0, 10.0)
131    graph.setTitle('nSpearman\'s coefficient estimate is close to zero\nbecause U and V are independent')
132    graph.setXTitle('u')
133    graph.setYTitle('v')
134    cloud = ot.Cloud(x, y)
135    cloud.setPointStyle('circle')
136    cloud.setColor('orange')
137    graph.add(cloud)
138    View(graph)
139
140.. plot::
141
142    import openturns as ot
143    from openturns.viewer import View
144
145    N = 20
146    ot.RandomGenerator.SetSeed(10)
147    x = ot.Uniform(0.0, 10.0).getSample(N)
148    f = ot.SymbolicFunction(['x'], ['30*sin(x)'])
149    y = f(x) + ot.Normal(0.0, 5.0).getSample(N)
150    graph = f.draw(0.0, 10.0)
151    graph.setTitle('Spearman\'s coefficient estimate is quite close to zero\neven though U and V are not independent')
152    graph.setXTitle('u')
153    graph.setYTitle('v')
154    cloud = ot.Cloud(x, y)
155    cloud.setPointStyle('circle')
156    cloud.setColor('orange')
157    graph.add(cloud)
158    View(graph)
159
160Spearman’s coefficient is often referred to as the rank correlation
161coefficient.
162
163
164.. topic:: API:
165
166    - See :class:`~openturns.CorrelationAnalysis_SpearmanCorrelation`
167    - See :py:meth:`~openturns.Sample.computeSpearmanCorrelation`
168
169.. topic:: Examples:
170
171    - See :doc:`/auto_data_analysis/manage_data_and_samples/plot_sample_correlation`
172
173.. topic:: References:
174
175    - [saporta1990]_
176    - [dixon1983]_
177    - [nisthandbook]_
178    - [dagostino1986]_
179    - [bhattacharyya1997]_
180    - [sprent2001]_
181    - [burnham2002]_
182