1.. _spearman_coefficient: 2 3Spearman correlation coefficient 4-------------------------------- 5 6This method deals with the parametric modelling of a probability 7distribution for a random vector 8:math:`\vect{X} = \left( X^1,\ldots,X^{n_X} \right)`. It aims to measure 9a type of dependence (here a monotonous correlation) which may exist 10between two components :math:`X^i` and :math:`X^j`. 11 12The Spearman’s correlation coefficient :math:`\rho^S_{U,V}` aims to 13measure the strength of a monotonic relationship between two random 14variables :math:`U` and :math:`V`. It is in fact equivalent to the 15Pearson’s correlation coefficient after having transformed :math:`U` and 16:math:`V` to linearize any monotonic relationship (remember that 17Pearson’s correlation coefficient may only be used to measure the 18strength of linear relationships, see :ref:`Pearson’s correlation coefficient <pearson_coefficient>`): 19 20.. math:: 21 22 \begin{aligned} 23 \rho^S_{U,V} = \rho_{F_U(U),F_V(V)} 24 \end{aligned} 25 26where :math:`F_U` and :math:`F_V` denote the cumulative distribution 27functions of :math:`U` and :math:`V`. 28 29If we arrange a sample made up of :math:`N` pairs 30:math:`\left\{ (u_1,v_1),(u_2,v_2),\ldots,(u_N,v_N) \right\}`, the 31estimation of Spearman’s correlation coefficient first of all requires a 32ranking to produce two samples :math:`(u_1,\ldots,u_N)` and 33:math:`(v_1,\ldots,v_N)`. The ranking :math:`u_{[i]}` of the observation 34:math:`u_i` is defined as the position of :math:`u_i` in the sample 35reordered in ascending order: if :math:`u_i` is the smallest value in 36the sample :math:`(u_1,\ldots,u_N)`, its ranking would equal 1; if 37:math:`u_i` is the second smallest value in the sample, its ranking 38would equal 2, and so forth. The ranking transformation is a procedure 39that takes the sample :math:`(u_1,\ldots,u_N)`) as input data and 40produces the sample :math:`(u_{[1]},\ldots,u_{[N]})` as an output 41result. 42 43For example, let us consider the sample 44:math:`(u_1,u_2,u_3,u_4) = (1.5,0.7,5.1,4.3)`. We therefore have 45:math:`(u_{[1]},u_{[2]}u_{[3]},u_{[4]}) = (2,1,4,3)`. :math:`u_1 = 1.5` 46is in fact the second smallest value in the original, :math:`u_2 = 0.7` 47the smallest, etc. 48 49The estimation of Spearman’s correlation coefficient is therefore equal 50to Pearson’s coefficient estimated with the aid of the :math:`N` pairs 51:math:`(u_{[1]},v_{[1]})`, :math:`(u_{[2]},v_{[2]})`, …, 52:math:`(u_{[N]},v_{[N]})`: 53 54.. math:: 55 56 \begin{aligned} 57 \widehat{\rho}^S_{U,V} = \frac{ \displaystyle \sum_{i=1}^N \left( u_{[i]} - \overline{u}_{[]} \right) \left( v_{[i]} - \overline{v}_{[]} \right) }{ \sqrt{\displaystyle \sum_{i=1}^N \left( u_{[i]} - \overline{u}_{[]} \right)^2 \left( v_{[i]} - \overline{v}_{[]} \right)^2} } 58 \end{aligned} 59 60where :math:`\overline{u}_{[]}` and :math:`\overline{v}_{[]}` represent 61the empirical means of the samples :math:`(u_{[1]},\ldots,u_{[N]})` and 62:math:`(v_{[1]},\ldots,v_{[N]})`. 63 64The Spearman’s correlation coefficient takes values between -1 and 1. 65The closer its absolute value is to 1, the stronger the indication is 66that a monotonic relationship exists between variables :math:`U` and 67:math:`V`. The sign of Spearman’s coefficient indicates if the two 68variables increase or decrease in the same direction (positive 69coefficient) or in opposite directions (negative coefficient). We note 70that a correlation coefficient equal to 0 does not necessarily imply the 71independence of variables :math:`U` and :math:`V`. There are two 72possible situations in the event of a zero Spearman’s correlation 73coefficient: 74 75- the variables :math:`U` and :math:`V` are in fact independent, 76 77- or a non-monotonic relationship exists between :math:`U` and 78 :math:`V`. 79 80.. plot:: 81 82 import openturns as ot 83 from openturns.viewer import View 84 85 N = 20 86 ot.RandomGenerator.SetSeed(10) 87 x = ot.Uniform(0.0, 10.0).getSample(N) 88 f = ot.SymbolicFunction(['x'], ['x^2']) 89 y = f(x) + ot.Normal(0.0, 5.0).getSample(N) 90 graph = f.draw(0.0, 10.0) 91 graph.setTitle('There is a monotonic relationship between U and V:\nSpearman\'s coefficient is a relevant measure of dependency...') 92 graph.setXTitle('u') 93 graph.setYTitle('v') 94 cloud = ot.Cloud(x, y) 95 cloud.setPointStyle('circle') 96 cloud.setColor('orange') 97 graph.add(cloud) 98 View(graph) 99 100.. plot:: 101 102 import openturns as ot 103 from openturns.viewer import View 104 105 N = 20 106 ot.RandomGenerator.SetSeed(10) 107 x = ot.Uniform(0.0, 10.0).getSample(N) 108 f = ot.SymbolicFunction(['x'], ['5*x+10']) 109 y = f(x) + ot.Normal(0.0, 5.0).getSample(N) 110 graph = f.draw(0.0, 10.0) 111 graph.setTitle('... because the rank transformation turns any monotonic trend\ninto a linear relation for which Pearson\'s correlation is relevant') 112 graph.setXTitle('u') 113 graph.setYTitle('v') 114 cloud = ot.Cloud(x, y) 115 cloud.setPointStyle('circle') 116 cloud.setColor('orange') 117 graph.add(cloud) 118 View(graph) 119 120.. plot:: 121 122 import openturns as ot 123 from openturns.viewer import View 124 125 N = 20 126 ot.RandomGenerator.SetSeed(10) 127 x = ot.Uniform(0.0, 10.0).getSample(N) 128 f = ot.SymbolicFunction(['x'], ['5']) 129 y = ot.Uniform(0.0, 10.0).getSample(N) 130 graph = f.draw(0.0, 10.0) 131 graph.setTitle('nSpearman\'s coefficient estimate is close to zero\nbecause U and V are independent') 132 graph.setXTitle('u') 133 graph.setYTitle('v') 134 cloud = ot.Cloud(x, y) 135 cloud.setPointStyle('circle') 136 cloud.setColor('orange') 137 graph.add(cloud) 138 View(graph) 139 140.. plot:: 141 142 import openturns as ot 143 from openturns.viewer import View 144 145 N = 20 146 ot.RandomGenerator.SetSeed(10) 147 x = ot.Uniform(0.0, 10.0).getSample(N) 148 f = ot.SymbolicFunction(['x'], ['30*sin(x)']) 149 y = f(x) + ot.Normal(0.0, 5.0).getSample(N) 150 graph = f.draw(0.0, 10.0) 151 graph.setTitle('Spearman\'s coefficient estimate is quite close to zero\neven though U and V are not independent') 152 graph.setXTitle('u') 153 graph.setYTitle('v') 154 cloud = ot.Cloud(x, y) 155 cloud.setPointStyle('circle') 156 cloud.setColor('orange') 157 graph.add(cloud) 158 View(graph) 159 160Spearman’s coefficient is often referred to as the rank correlation 161coefficient. 162 163 164.. topic:: API: 165 166 - See :class:`~openturns.CorrelationAnalysis_SpearmanCorrelation` 167 - See :py:meth:`~openturns.Sample.computeSpearmanCorrelation` 168 169.. topic:: Examples: 170 171 - See :doc:`/auto_data_analysis/manage_data_and_samples/plot_sample_correlation` 172 173.. topic:: References: 174 175 - [saporta1990]_ 176 - [dixon1983]_ 177 - [nisthandbook]_ 178 - [dagostino1986]_ 179 - [bhattacharyya1997]_ 180 - [sprent2001]_ 181 - [burnham2002]_ 182