1######################################################################## 2## 3## Copyright (C) 2013-2021 The Octave Project Developers 4## 5## See the file COPYRIGHT.md in the top-level directory of this 6## distribution or <https://octave.org/copyright/>. 7## 8## This file is part of Octave. 9## 10## Octave is free software: you can redistribute it and/or modify it 11## under the terms of the GNU General Public License as published by 12## the Free Software Foundation, either version 3 of the License, or 13## (at your option) any later version. 14## 15## Octave is distributed in the hope that it will be useful, but 16## WITHOUT ANY WARRANTY; without even the implied warranty of 17## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 18## GNU General Public License for more details. 19## 20## You should have received a copy of the GNU General Public License 21## along with Octave; see the file COPYING. If not, see 22## <https://www.gnu.org/licenses/>. 23## 24######################################################################## 25 26## -*- texinfo -*- 27## @deftypefn {} {} stemleaf (@var{x}, @var{caption}) 28## @deftypefnx {} {} stemleaf (@var{x}, @var{caption}, @var{stem_sz}) 29## @deftypefnx {} {@var{plotstr} =} stemleaf (@dots{}) 30## Compute and display a stem and leaf plot of the vector @var{x}. 31## 32## The input @var{x} should be a vector of integers. Any non-integer values 33## will be converted to integer by @code{@var{x} = fix (@var{x})}. By default 34## each element of @var{x} will be plotted with the last digit of the element 35## as a leaf value and the remaining digits as the stem. For example, 123 36## will be plotted with the stem @samp{12} and the leaf @samp{3}. The second 37## argument, @var{caption}, should be a character array which provides a 38## description of the data. It is included as a heading for the output. 39## 40## The optional input @var{stem_sz} sets the width of each stem. 41## The stem width is determined by @code{10^(@var{stem_sz} + 1)}. 42## The default stem width is 10. 43## 44## The output of @code{stemleaf} is composed of two parts: a 45## "Fenced Letter Display," followed by the stem-and-leaf plot itself. 46## The Fenced Letter Display is described in @cite{Exploratory Data Analysis}. 47## Briefly, the entries are as shown: 48## 49## @example 50## @group 51## 52## Fenced Letter Display 53## #% nx|___________________ nx = numel (x) 54## M% mi| md | mi median index, md median 55## H% hi|hl hu| hs hi lower hinge index, hl,hu hinges, 56## 1 |x(1) x(nx)| hs h_spreadx(1), x(nx) first 57## _______ and last data value. 58## ______|step |_______ step 1.5*h_spread 59## f|ifl ifh| inner fence, lower and higher 60## |nfl nfh| no.\ of data points within fences 61## F|ofl ofh| outer fence, lower and higher 62## |nFl nFh| no.\ of data points outside outer 63## fences 64## @end group 65## @end example 66## 67## The stem-and-leaf plot shows on each line the stem value followed by the 68## string made up of the leaf digits. If the @var{stem_sz} is not 1 the 69## successive leaf values are separated by ",". 70## 71## With no return argument, the plot is immediately displayed. If an output 72## argument is provided, the plot is returned as an array of strings. 73## 74## The leaf digits are not sorted. If sorted leaf values are desired, use 75## @code{@var{xs} = sort (@var{x})} before calling @code{stemleaf (@var{xs})}. 76## 77## The stem and leaf plot and associated displays are described in: 78## Chapter 3, @cite{Exploratory Data Analysis} by @nospell{J. W. Tukey}, 79## Addison-Wesley, 1977. 80## @seealso{hist, printd} 81## @end deftypefn 82 83function plotstr = stemleaf (x, caption, stem_sz) 84 ## Compute and display a stem and leaf plot of the vector x. The x 85 ## vector is converted to integer by x = fix(x). If an output argument 86 ## is provided, the plot is returned as an array of strings. The 87 ## first element is the heading followed by an element for each stem. 88 ## 89 ## The default stem step is 10. If stem_sz is provided the stem 90 ## step is set to: 10^(stem_sz+1). The x vector should be integers. 91 ## It will be treated so that the last digit is the leaf value and the 92 ## other digits are the stems. 93 ## 94 ## When we first implemented stem and leaf plots in the early 1960's 95 ## there was some discussion about sorting vs. leaving the leaf 96 ## entries in the original order in the data. We decided in favor of 97 ## sorting the leaves for most purposes. This is the choice 98 ## implemented in the SNAP/IEDA system that was written at that time. 99 ## 100 ## SNAP/IEDA, and particularly its stem and leaf plotting, were further 101 ## developed by Hale Trotter, David Hoagland (at Princeton and MIT), 102 ## and others. 103 ## 104 ## Tukey, in EDA, generally uses unsorted leaves. In addition, he 105 ## described a wide range of additional display formats. This 106 ## implementation does not sort the leaves, but if the x vector is 107 ## sorted then the leaves come out sorted. A simple display format is 108 ## used. 109 ## 110 ## I doubt if providing other options is worthwhile. The code can 111 ## quite easily be modified to provide specific display results. Or, 112 ## the returned output string can be edited. The returned output is an 113 ## array of strings with each row containing a line of the plot 114 ## preceded by the lines of header text as the first row. This 115 ## facilitates annotation. 116 ## 117 ## Note that the code has some added complexity due to the need to 118 ## distinguish both + and - 0 stems. The +- stem values are essential 119 ## for all plots which span 0. After dealing with +-0 stems, the added 120 ## complexity of putting +- data values in the correct stem is minor, 121 ## but the sign of 0 leaves must be checked. And, the cases where the 122 ## stems start or end at +- 0 must also be considered. 123 ## 124 ## The fact that IEEE floating point defines +- 0 helps make this 125 ## easier. 126 ## 127 ## Michael D. Godfrey January 2013 128 129 ## More could be implemented for better data scaling. And, of course, 130 ## other options for the kinds of plots described by Tukey could be 131 ## provided. This may best be left to users. 132 133 if (nargin < 2 || nargin > 3) 134 print_usage (); 135 endif 136 137 if (! isvector (x)) 138 error ("stemleaf: X must be a vector"); 139 endif 140 141 if (isinteger (x)) 142 ## Avoid use of integers because rounding rules do not use fix(): 143 ## Example: floor (int32 (-44)/10) == -4, floor (int32 (-46)/10) = -5 !!! 144 x = single (x); 145 elseif (isfloat (x)) 146 xint = fix (x); 147 if (any (x != xint)) 148 warning ("stemleaf: X truncated to integer values"); 149 x = xint; 150 endif 151 else 152 error ("stemleaf: X must be a numeric vector"); 153 endif 154 155 if (! ischar (caption)) 156 error ("stemleaf: CAPTION must be a character array"); 157 endif 158 159 if (nargin == 2) 160 stem_step = 10; 161 else 162 if (! (isscalar (stem_sz) && stem_sz >= 0 && isreal (stem_sz))) 163 error ("stemleaf: STEM_SZ must be a real integer >= 0"); 164 endif 165 stem_sz = fix (stem_sz); 166 stem_step = 10^(stem_sz+1); 167 endif 168 169 ## Note that IEEE 754 states that -+ 0 should compare equal. This has 170 ## led to C sort (and therefore Octave) treating them as equal. Thus, 171 ## sort([-1 0 -0 1]) yields [-1 0 -0 1], and sort([-1 -0 0 1]) 172 ## yields: [-1 -0 0 1]. This means that stem-and-leaf plotting cannot 173 ## rely on sort to order the data as needed for display. 174 ## This also applies to min()/max() so these routines can't be relied 175 ## upon if the max or min is -+ 0. 176 177 ## Compute hinges and fences based on ref: EDA pgs. 33 and 44. 178 ## Note that these outlier estimates are meant to be "distribution free". 179 180 nx = numel (x); 181 xs = sort (x); # Note that sort preserves -0 182 mdidx = fix ((nx + 1)/2); # median index 183 hlidx = fix ((mdidx + 1)/2); # lower hinge index 184 huidx = fix (nx + 1 - hlidx); # upper hinge index 185 md = xs(mdidx); # median 186 hl = xs(hlidx); # lower hinge 187 hu = xs(huidx); # upper hinge 188 h_spread = hu - hl; # h_spread: difference between hinges 189 step = fix (1.5*h_spread); # step: 1.5 * h_spread 190 i_fence_l = hl - step; # inner fences: outside hinges + step 191 o_fence_l = hl - 2*step; # outer fences: outside hinges + 2*step 192 i_fence_h = hu + step; 193 o_fence_h = hu + 2*step; 194 n_out_l = sum (x<i_fence_l) - sum (x<o_fence_l); 195 n_out_h = sum (x>i_fence_h) - sum (x>o_fence_h); 196 n_far_l = sum (x<o_fence_l); 197 n_far_h = sum (x>o_fence_h); 198 199 ## display table similar to that on pg. 33 200 plot_out = sprintf (" Data: %s", caption); 201 plot_out = [plot_out; sprintf(" ")]; 202 plot_out = [plot_out; sprintf(" Fenced Letter Display")]; 203 plot_out = [plot_out; sprintf(" ")]; 204 plot_out = [plot_out; sprintf(" #%3d|___________________", nx)]; 205 plot_out = [plot_out; sprintf(" M%3d| %5d |", mdidx, md)]; 206 plot_out = [plot_out; sprintf(" H%3d|%5d %5d| %d", hlidx, hl, hu, h_spread)]; 207 plot_out = [plot_out; sprintf(" 1 |%5d %5d|", xs(1), xs(nx))]; 208 plot_out = [plot_out; sprintf(" _______")]; 209 plot_out = [plot_out; sprintf(" ______|%5d|_______",step)]; 210 plot_out = [plot_out; sprintf(" f|%5d %5d|", i_fence_l, i_fence_h)]; 211 plot_out = [plot_out; sprintf(" |%5d %5d| out", n_out_l, n_out_h)]; 212 plot_out = [plot_out; sprintf(" F|%5d %5g|", o_fence_l, o_fence_h)]; 213 plot_out = [plot_out; sprintf(" |%5d %5d| far",n_far_l,n_far_h)]; 214 plot_out = [plot_out; " "]; 215 216 ## Determine stem values 217 min_x = min (x); 218 max_x = max (x); 219 if (min_x > 0) # all stems > 0 220 stems = [fix(min(x)/stem_step) : (fix(max(x)/stem_step)+1)]; 221 elseif (max_x < 0) # all stems < 0 222 stems = [(fix(min_x/stem_step)-1) : fix(max_x/stem_step)]; 223 elseif (min_x < 0 && max_x > 0) # range crosses 0 224 stems = [(fix(min_x/stem_step)-1) : -0, 0 : fix(max_x/stem_step)+1 ]; 225 else # one endpoint is a zero which may be +0 or -0 226 if (min_x == 0) 227 if (any (x == 0 & signbit (x))) 228 min_x = -0; 229 else 230 min_x = +0; 231 endif 232 endif 233 if (max_x == 0) 234 if (any (x == 0 & ! signbit (x))) 235 max_x = +0; 236 else 237 max_x = -0; 238 endif 239 endif 240 stems = []; 241 if (signbit (min_x)) 242 stems = [(fix(min_x/stem_step)-1) : -0]; 243 endif 244 if (! signbit (max_x)) 245 stems = [stems, 0 : fix(max_x/stem_step)+1 ]; 246 endif 247 endif 248 249 ## Vectorized version provided by Rik Wehbring (rik@octave.org) 250 ## Determine leaves for each stem: 251 new_line = 1; 252 for kx = 2 : numel (stems) 253 254 stem_sign = signbit (stems(kx)); 255 if (stems(kx) <= 0) 256 idx = ((x <= stems(kx)*stem_step) & (x > (stems(kx-1)*stem_step)) 257 & (signbit (x) == stem_sign)); 258 xlf = abs (x(idx) - stems(kx)*stem_step); 259 else 260 idx = ((x < stems(kx)*stem_step) & (x >= (stems(kx-1)*stem_step)) 261 & (signbit (x) == stem_sign)); 262 xlf = abs (x(idx) - stems(kx-1)*stem_step); 263 endif 264 ## Convert leaves to a string 265 if (stem_step == 10) 266 lf_str = sprintf ("%d", xlf); 267 else 268 lf_str = ""; 269 if (! isempty (xlf)) 270 lf_str = sprintf ("%d", xlf(1)); 271 if (numel (xlf) > 1) 272 lf_str = [lf_str sprintf(",%d", xlf(2:end))]; 273 endif 274 endif 275 endif 276 277 ## Set correct -0 278 if (stems(kx) == 0 && signbit (stems(kx))) 279 line = sprintf (" -0 | %s", lf_str); # -0 stem. 280 elseif (stems(kx) < 0) 281 line = sprintf ("%4d | %s", stems(kx), lf_str); 282 elseif (stems(kx) > 0) 283 line = sprintf ("%4d | %s", stems(kx-1), lf_str); 284 else 285 line = ""; 286 endif 287 288 if (! isempty (lf_str) || stems(kx) == 0 || stems(kx-1) == 0) 289 plot_out = [plot_out; line]; 290 new_line = 1; 291 else 292 if (new_line == 1) 293 plot_out = [plot_out; " :"]; # just print one : if no leaves 294 new_line = 0; 295 endif 296 endif 297 298 endfor # kx = 2: numel (stems) 299 300 if (nargout == 0) 301 disp (plot_out); 302 else 303 plotstr = plot_out; 304 endif 305 306endfunction 307 308 309%!demo 310%! ## Unsorted plot: 311%! x = [-22 12 -28 52 39 -2 12 10 11 11 42 38 44 18 44]; 312%! stemleaf (x, "Unsorted plot"); 313 314%!demo 315%! ## Sorted leaves: 316%! x = [-22 12 -28 52 39 -2 12 10 11 11 42 38 44 18 44]; 317%! y = sort (x); 318%! stemleaf (y, "Sorted leaves"); 319 320%!demo 321%! ## Sorted leaves (large dataset): 322%! x = [-22 12 -28 52 39 -2 12 10 11 11 42 38 44 18 44 37 113 124 37 48 ... 323%! 127 36 29 31 125 139 131 115 105 132 104 123 35 113 122 42 117 119 ... 324%! 58 109 23 105 63 27 44 105 99 41 128 121 116 125 32 61 37 127 29 113 ... 325%! 121 58 114 126 53 114 96 25 109 7 31 141 46 -13 71 43 117 116 27 7 ... 326%! 68 40 31 115 124 42 128 52 71 118 117 38 27 106 33 117 116 111 40 ... 327%! 119 47 105 57 122 109 124 115 43 120 43 27 27 18 28 48 125 107 114 ... 328%! 34 133 45 120 30 127 31 116 146 21 23 30 10 20 21 30 0 100 110 1 20 ... 329%! 0]; 330%! y = sort (x); 331%! stemleaf (y, "Sorted leaves (large dataset)"); 332 333%!demo 334%! ## Gaussian leaves: 335%! x = fix (30 * randn (300,1)); 336%! stemleaf (x, "Gaussian leaves"); 337 338%!test 339%! ## test minus to plus 340%! x = [-22 12 -28 52 39 -2 12 10 11 11 42 38 44 18 44 37 113 124 37 48 127 ... 341%! 36 29 31 125 139 131 115 105 132 104 123 35 113 122 42 117 119 58 109 ... 342%! 23 105 63 27 44 105 99 41 128 121 116 125 32 61 37 127 29 113 121 58 ... 343%! 114 126 53 114 96 25 109 7 31 141 46 -13 71 43 117 116 27 7 68 40 31 ... 344%! 115 124 42 128 52 71 118 117 38 27 106 33 117 116 111 40 119 47 105 57 ... 345%! 122 109 124 115 43 120 43 27 27 18 28 48 125 107 114 34 133 45 120 30 ... 346%! 127 31 116 146 21 23 30 10 20 21 30 0 100 110 1 20 0]; 347%! x = sort (x); 348%! rexp = char ( 349%! " Data: test minus to plus" , 350%! " " , 351%! " Fenced Letter Display" , 352%! " " , 353%! " #138|___________________" , 354%! " M 69| 52 |" , 355%! " H 35| 30 116| 86" , 356%! " 1 | -28 146|" , 357%! " _______" , 358%! " ______| 129|_______" , 359%! " f| -99 245|" , 360%! " | 0 0| out" , 361%! " F| -228 374|" , 362%! " | 0 0| far" , 363%! " " , 364%! " -2 | 82" , 365%! " -1 | 3" , 366%! " -0 | 2" , 367%! " 0 | 00177" , 368%! " 1 | 00112288" , 369%! " 2 | 001133577777899" , 370%! " 3 | 000111123456777889" , 371%! " 4 | 00122233344456788" , 372%! " 5 | 223788" , 373%! " 6 | 138" , 374%! " 7 | 11" , 375%! " : " , 376%! " 9 | 69" , 377%! " 10 | 04555567999" , 378%! " 11 | 0133344455566667777899" , 379%! " 12 | 0011223444555677788" , 380%! " 13 | 1239" , 381%! " 14 | 16" ); 382%! r = stemleaf (x, "test minus to plus", 0); 383%! assert (r, rexp); 384 385%!test 386%! ## positive values above 0 387%! x = [5 22 12 28 52 39 12 11 11 42 38 44 18 44]; 388%! rexp = char ( 389%! " Data: positive values above 0", 390%! " " , 391%! " Fenced Letter Display" , 392%! " " , 393%! " # 14|___________________" , 394%! " M 7| 22 |" , 395%! " H 4| 12 42| 30" , 396%! " 1 | 5 52|" , 397%! " _______" , 398%! " ______| 45|_______" , 399%! " f| -33 87|" , 400%! " | 0 0| out" , 401%! " F| -78 132|" , 402%! " | 0 0| far" , 403%! " " , 404%! " 0 | 5" , 405%! " 1 | 22118" , 406%! " 2 | 28" , 407%! " 3 | 98" , 408%! " 4 | 244" , 409%! " 5 | 2" ); 410%! r = stemleaf (x, "positive values above 0"); 411%! assert (r, rexp); 412 413%!test 414%! ## negative values below 0 415%! x = [5 22 12 28 52 39 12 11 11 42 38 44 18 44]; 416%! x = -x; 417%! rexp = char ( 418%! " Data: negative values below 0", 419%! " " , 420%! " Fenced Letter Display" , 421%! " " , 422%! " # 14|___________________" , 423%! " M 7| -28 |" , 424%! " H 4| -42 -12| 30" , 425%! " 1 | -52 -5|" , 426%! " _______" , 427%! " ______| 45|_______" , 428%! " f| -87 33|" , 429%! " | 0 0| out" , 430%! " F| -132 78|" , 431%! " | 0 0| far" , 432%! " " , 433%! " -5 | 2" , 434%! " -4 | 244" , 435%! " -3 | 98" , 436%! " -2 | 28" , 437%! " -1 | 22118" , 438%! " -0 | 5" ); 439%! r = stemleaf (x, "negative values below 0"); 440%! assert (r, rexp); 441 442%!test 443%! ## positive values from 0 444%! x = [22 12 28 52 39 2 12 0 11 11 42 38 44 18 44]; 445%! rexp = char ( 446%! " Data: positive values from 0", 447%! " " , 448%! " Fenced Letter Display" , 449%! " " , 450%! " # 15|___________________" , 451%! " M 8| 22 |" , 452%! " H 4| 11 42| 31" , 453%! " 1 | 0 52|" , 454%! " _______" , 455%! " ______| 46|_______" , 456%! " f| -35 88|" , 457%! " | 0 0| out" , 458%! " F| -81 134|" , 459%! " | 0 0| far" , 460%! " " , 461%! " 0 | 20" , 462%! " 1 | 22118" , 463%! " 2 | 28" , 464%! " 3 | 98" , 465%! " 4 | 244" , 466%! " 5 | 2" ); 467%! r = stemleaf (x, "positive values from 0"); 468%! assert (r, rexp); 469 470%!test 471%! ## negative values from 0 472%! x = [22 12 28 52 39 2 12 0 11 11 42 38 44 18 44]; 473%! x = -x; 474%! rexp = char ( 475%! " Data: negative values from 0", 476%! " " , 477%! " Fenced Letter Display" , 478%! " " , 479%! " # 15|___________________" , 480%! " M 8| -22 |" , 481%! " H 4| -42 -11| 31" , 482%! " 1 | -52 0|" , 483%! " _______" , 484%! " ______| 46|_______" , 485%! " f| -88 35|" , 486%! " | 0 0| out" , 487%! " F| -134 81|" , 488%! " | 0 0| far" , 489%! " " , 490%! " -5 | 2" , 491%! " -4 | 244" , 492%! " -3 | 98" , 493%! " -2 | 28" , 494%! " -1 | 22118" , 495%! " -0 | 20" ); 496%! r = stemleaf (x, "negative values from 0"); 497%! assert (r, rexp); 498 499%!test 500%! ## both +0 and -0 present 501%! x = [-9 -7 -0 0 -0]; 502%! rexp = char ( 503%! " Data: both +0 and -0 present", 504%! " " , 505%! " Fenced Letter Display" , 506%! " " , 507%! " # 5|___________________" , 508%! " M 3| 0 |" , 509%! " H 2| -7 0| 7" , 510%! " 1 | -9 0|" , 511%! " _______" , 512%! " ______| 10|_______" , 513%! " f| -17 10|" , 514%! " | 0 0| out" , 515%! " F| -27 20|" , 516%! " | 0 0| far" , 517%! " " , 518%! " -0 | 9700" , 519%! " 0 | 0" ); 520%! r = stemleaf (x, "both +0 and -0 present"); 521%! assert (r, rexp); 522 523%!test 524%! ## both <= 0 and -0 present 525%! x = [-9 -7 0 -0]; 526%! rexp = char ( 527%! " Data: both <= 0 and -0 present", 528%! " " , 529%! " Fenced Letter Display" , 530%! " " , 531%! " # 4|___________________" , 532%! " M 2| -7 |" , 533%! " H 1| -9 0| 9" , 534%! " 1 | -9 0|" , 535%! " _______" , 536%! " ______| 13|_______" , 537%! " f| -22 13|" , 538%! " | 0 0| out" , 539%! " F| -35 26|" , 540%! " | 0 0| far" , 541%! " " , 542%! " -0 | 970" , 543%! " 0 | 0" ); 544%! r = stemleaf (x, "both <= 0 and -0 present"); 545%! assert (r, rexp); 546 547%!test 548%! ## Example from EDA: Chevrolet Prices pg. 30 549%! x = [150 250 688 695 795 795 895 895 895 ... 550%! 1099 1166 1333 1499 1693 1699 1775 1995]; 551%! rexp = char ( 552%! " Data: Chevrolet Prices EDA pg.30", 553%! " " , 554%! " Fenced Letter Display" , 555%! " " , 556%! " # 17|___________________" , 557%! " M 9| 895 |" , 558%! " H 5| 795 1499| 704" , 559%! " 1 | 150 1995|" , 560%! " _______" , 561%! " ______| 1056|_______" , 562%! " f| -261 2555|" , 563%! " | 0 0| out" , 564%! " F|-1317 3611|" , 565%! " | 0 0| far" , 566%! " " , 567%! " 1 | 50" , 568%! " 2 | 50" , 569%! " :" , 570%! " 6 | 88,95" , 571%! " 7 | 95,95" , 572%! " 8 | 95,95,95" , 573%! " :" , 574%! " 10 | 99" , 575%! " 11 | 66" , 576%! " :" , 577%! " 13 | 33" , 578%! " 14 | 99" , 579%! " :" , 580%! " 16 | 93,99" , 581%! " 17 | 75" , 582%! " :" , 583%! " 19 | 95" ); 584%! r = stemleaf (x, "Chevrolet Prices EDA pg.30", 1); 585%! assert (r, rexp); 586 587## Test input validation 588%!error stemleaf () 589%!error stemleaf (1, 2, 3, 4) 590%!error <X must be a vector> stemleaf (ones (2,2), "") 591%!warning <X truncated to integer values> tmp = stemleaf ([0 0.5 1],""); 592%!error <X must be a numeric vector> stemleaf ("Hello World", "data") 593%!error <CAPTION must be a character array> stemleaf (1, 2) 594%!error <STEM_SZ must be a real integer> stemleaf (1, "", ones (2,2)) 595%!error <STEM_SZ must be a real integer> stemleaf (1, "", -1) 596%!error <STEM_SZ must be a real integer> stemleaf (1, "", 1+i) 597