1<!-- doc/src/sgml/seg.sgml --> 2 3<sect1 id="seg" xreflabel="seg"> 4 <title>seg</title> 5 6 <indexterm zone="seg"> 7 <primary>seg</primary> 8 </indexterm> 9 10 <para> 11 This module implements a data type <type>seg</type> for 12 representing line segments, or floating point intervals. 13 <type>seg</type> can represent uncertainty in the interval endpoints, 14 making it especially useful for representing laboratory measurements. 15 </para> 16 17 <sect2> 18 <title>Rationale</title> 19 20 <para> 21 The geometry of measurements is usually more complex than that of a 22 point in a numeric continuum. A measurement is usually a segment of 23 that continuum with somewhat fuzzy limits. The measurements come out 24 as intervals because of uncertainty and randomness, as well as because 25 the value being measured may naturally be an interval indicating some 26 condition, such as the temperature range of stability of a protein. 27 </para> 28 29 <para> 30 Using just common sense, it appears more convenient to store such data 31 as intervals, rather than pairs of numbers. In practice, it even turns 32 out more efficient in most applications. 33 </para> 34 35 <para> 36 Further along the line of common sense, the fuzziness of the limits 37 suggests that the use of traditional numeric data types leads to a 38 certain loss of information. Consider this: your instrument reads 39 6.50, and you input this reading into the database. What do you get 40 when you fetch it? Watch: 41 42<screen> 43test=> select 6.50 :: float8 as "pH"; 44 pH 45--- 466.5 47(1 row) 48</screen> 49 50 In the world of measurements, 6.50 is not the same as 6.5. It may 51 sometimes be critically different. The experimenters usually write 52 down (and publish) the digits they trust. 6.50 is actually a fuzzy 53 interval contained within a bigger and even fuzzier interval, 6.5, 54 with their center points being (probably) the only common feature they 55 share. We definitely do not want such different data items to appear the 56 same. 57 </para> 58 59 <para> 60 Conclusion? It is nice to have a special data type that can record the 61 limits of an interval with arbitrarily variable precision. Variable in 62 the sense that each data element records its own precision. 63 </para> 64 65 <para> 66 Check this out: 67 68<screen> 69test=> select '6.25 .. 6.50'::seg as "pH"; 70 pH 71------------ 726.25 .. 6.50 73(1 row) 74</screen> 75 </para> 76 </sect2> 77 78 <sect2> 79 <title>Syntax</title> 80 81 <para> 82 The external representation of an interval is formed using one or two 83 floating-point numbers joined by the range operator (<literal>..</literal> 84 or <literal>...</literal>). Alternatively, it can be specified as a 85 center point plus or minus a deviation. 86 Optional certainty indicators (<literal><</literal>, 87 <literal>></literal> or <literal>~</literal>) can be stored as well. 88 (Certainty indicators are ignored by all the built-in operators, however.) 89 <xref linkend="seg-repr-table"/> gives an overview of allowed 90 representations; <xref linkend="seg-input-examples"/> shows some 91 examples. 92 </para> 93 94 <para> 95 In <xref linkend="seg-repr-table"/>, <replaceable>x</replaceable>, <replaceable>y</replaceable>, and 96 <replaceable>delta</replaceable> denote 97 floating-point numbers. <replaceable>x</replaceable> and <replaceable>y</replaceable>, but 98 not <replaceable>delta</replaceable>, can be preceded by a certainty indicator. 99 </para> 100 101 <table id="seg-repr-table"> 102 <title><type>seg</type> External Representations</title> 103 <tgroup cols="2"> 104 <tbody> 105 <row> 106 <entry><literal><replaceable>x</replaceable></literal></entry> 107 <entry>Single value (zero-length interval) 108 </entry> 109 </row> 110 <row> 111 <entry><literal><replaceable>x</replaceable> .. <replaceable>y</replaceable></literal></entry> 112 <entry>Interval from <replaceable>x</replaceable> to <replaceable>y</replaceable> 113 </entry> 114 </row> 115 <row> 116 <entry><literal><replaceable>x</replaceable> (+-) <replaceable>delta</replaceable></literal></entry> 117 <entry>Interval from <replaceable>x</replaceable> - <replaceable>delta</replaceable> to 118 <replaceable>x</replaceable> + <replaceable>delta</replaceable> 119 </entry> 120 </row> 121 <row> 122 <entry><literal><replaceable>x</replaceable> ..</literal></entry> 123 <entry>Open interval with lower bound <replaceable>x</replaceable> 124 </entry> 125 </row> 126 <row> 127 <entry><literal>.. <replaceable>x</replaceable></literal></entry> 128 <entry>Open interval with upper bound <replaceable>x</replaceable> 129 </entry> 130 </row> 131 </tbody> 132 </tgroup> 133 </table> 134 135 <table id="seg-input-examples"> 136 <title>Examples of Valid <type>seg</type> Input</title> 137 <tgroup cols="2"> 138 <tbody> 139 <row> 140 <entry><literal>5.0</literal></entry> 141 <entry> 142 Creates a zero-length segment (a point, if you will) 143 </entry> 144 </row> 145 <row> 146 <entry><literal>~5.0</literal></entry> 147 <entry> 148 Creates a zero-length segment and records 149 <literal>~</literal> in the data. <literal>~</literal> is ignored 150 by <type>seg</type> operations, but 151 is preserved as a comment. 152 </entry> 153 </row> 154 <row> 155 <entry><literal><5.0</literal></entry> 156 <entry> 157 Creates a point at 5.0. <literal><</literal> is ignored but 158 is preserved as a comment. 159 </entry> 160 </row> 161 <row> 162 <entry><literal>>5.0</literal></entry> 163 <entry> 164 Creates a point at 5.0. <literal>></literal> is ignored but 165 is preserved as a comment. 166 </entry> 167 </row> 168 <row> 169 <entry><literal>5(+-)0.3</literal></entry> 170 <entry> 171 Creates an interval <literal>4.7 .. 5.3</literal>. 172 Note that the <literal>(+-)</literal> notation isn't preserved. 173 </entry> 174 </row> 175 <row> 176 <entry><literal>50 .. </literal></entry> 177 <entry>Everything that is greater than or equal to 50</entry> 178 </row> 179 <row> 180 <entry><literal>.. 0</literal></entry> 181 <entry>Everything that is less than or equal to 0</entry> 182 </row> 183 <row> 184 <entry><literal>1.5e-2 .. 2E-2 </literal></entry> 185 <entry>Creates an interval <literal>0.015 .. 0.02</literal></entry> 186 </row> 187 <row> 188 <entry><literal>1 ... 2</literal></entry> 189 <entry> 190 The same as <literal>1...2</literal>, or <literal>1 .. 2</literal>, 191 or <literal>1..2</literal> 192 (spaces around the range operator are ignored) 193 </entry> 194 </row> 195 </tbody> 196 </tgroup> 197 </table> 198 199 <para> 200 Because the <literal>...</literal> operator is widely used in data sources, it is allowed 201 as an alternative spelling of the <literal>..</literal> operator. Unfortunately, this 202 creates a parsing ambiguity: it is not clear whether the upper bound 203 in <literal>0...23</literal> is meant to be <literal>23</literal> or <literal>0.23</literal>. 204 This is resolved by requiring at least one digit before the decimal 205 point in all numbers in <type>seg</type> input. 206 </para> 207 208 <para> 209 As a sanity check, <type>seg</type> rejects intervals with the lower bound 210 greater than the upper, for example <literal>5 .. 2</literal>. 211 </para> 212 213 </sect2> 214 215 <sect2> 216 <title>Precision</title> 217 218 <para> 219 <type>seg</type> values are stored internally as pairs of 32-bit floating point 220 numbers. This means that numbers with more than 7 significant digits 221 will be truncated. 222 </para> 223 224 <para> 225 Numbers with 7 or fewer significant digits retain their 226 original precision. That is, if your query returns 0.00, you will be 227 sure that the trailing zeroes are not the artifacts of formatting: they 228 reflect the precision of the original data. The number of leading 229 zeroes does not affect precision: the value 0.0067 is considered to 230 have just 2 significant digits. 231 </para> 232 </sect2> 233 234 <sect2> 235 <title>Usage</title> 236 237 <para> 238 The <filename>seg</filename> module includes a GiST index operator class for 239 <type>seg</type> values. 240 The operators supported by the GiST operator class are shown in <xref linkend="seg-gist-operators"/>. 241 </para> 242 243 <table id="seg-gist-operators"> 244 <title>Seg GiST Operators</title> 245 <tgroup cols="2"> 246 <thead> 247 <row> 248 <entry>Operator</entry> 249 <entry>Description</entry> 250 </row> 251 </thead> 252 253 <tbody> 254 <row> 255 <entry><literal>[a, b] << [c, d]</literal></entry> 256 <entry>[a, b] is entirely to the left of [c, d]. That is, [a, 257 b] << [c, d] is true if b < c and false otherwise.</entry> 258 </row> 259 260 <row> 261 <entry><literal>[a, b] >> [c, d]</literal></entry> 262 <entry>[a, b] is entirely to the right of [c, d]. That is, [a, 263 b] >> [c, d] is true if a > d and false otherwise.</entry> 264 </row> 265 266 <row> 267 <entry><literal>[a, b] &< [c, d]</literal></entry> 268 <entry>Overlaps or is left of — This might be better read 269 as <quote>does not extend to right of</quote>. It is true when 270 b <= d.</entry> 271 </row> 272 273 <row> 274 <entry><literal>[a, b] &> [c, d]</literal></entry> 275 <entry>Overlaps or is right of — This might be better read 276 as <quote>does not extend to left of</quote>. It is true when 277 a >= c.</entry> 278 </row> 279 280 <row> 281 <entry><literal>[a, b] = [c, d]</literal></entry> 282 <entry>Same as — The segments [a, b] and [c, d] are 283 identical, that is, a = c and b = d.</entry> 284 </row> 285 286 <row> 287 <entry><literal>[a, b] && [c, d]</literal></entry> 288 <entry>The segments [a, b] and [c, d] overlap.</entry> 289 </row> 290 291 <row> 292 <entry><literal>[a, b] @> [c, d]</literal></entry> 293 <entry>The segment [a, b] contains the segment [c, d], that is, 294 a <= c and b >= d.</entry> 295 </row> 296 297 <row> 298 <entry><literal>[a, b] <@ [c, d]</literal></entry> 299 <entry>The segment [a, b] is contained in [c, d], that is, a 300 >= c and b <= d.</entry> 301 </row> 302 </tbody> 303 </tgroup> 304 </table> 305 306 <para> 307 (Before PostgreSQL 8.2, the containment operators <literal>@></literal> and <literal><@</literal> were 308 respectively called <literal>@</literal> and <literal>~</literal>. These names are still available, but are 309 deprecated and will eventually be retired. Notice that the old names 310 are reversed from the convention formerly followed by the core geometric 311 data types!) 312 </para> 313 314 <para> 315 The standard B-tree operators are also provided, for example 316 317 <informaltable> 318 <tgroup cols="2"> 319 <thead> 320 <row> 321 <entry>Operator</entry> 322 <entry>Description</entry> 323 </row> 324 </thead> 325 326 <tbody> 327 <row> 328 <entry><literal>[a, b] < [c, d]</literal></entry> 329 <entry>Less than</entry> 330 </row> 331 332 <row> 333 <entry><literal>[a, b] > [c, d]</literal></entry> 334 <entry>Greater than</entry> 335 </row> 336 </tbody> 337 </tgroup> 338 </informaltable> 339 340 These operators do not make a lot of sense for any practical 341 purpose but sorting. These operators first compare (a) to (c), 342 and if these are equal, compare (b) to (d). That results in 343 reasonably good sorting in most cases, which is useful if 344 you want to use ORDER BY with this type. 345 </para> 346 </sect2> 347 348 <sect2> 349 <title>Notes</title> 350 351 <para> 352 For examples of usage, see the regression test <filename>sql/seg.sql</filename>. 353 </para> 354 355 <para> 356 The mechanism that converts <literal>(+-)</literal> to regular ranges 357 isn't completely accurate in determining the number of significant digits 358 for the boundaries. For example, it adds an extra digit to the lower 359 boundary if the resulting interval includes a power of ten: 360 361<screen> 362postgres=> select '10(+-)1'::seg as seg; 363 seg 364--------- 3659.0 .. 11 -- should be: 9 .. 11 366</screen> 367 </para> 368 369 <para> 370 The performance of an R-tree index can largely depend on the initial 371 order of input values. It may be very helpful to sort the input table 372 on the <type>seg</type> column; see the script <filename>sort-segments.pl</filename> 373 for an example. 374 </para> 375 </sect2> 376 377 <sect2> 378 <title>Credits</title> 379 380 <para> 381 Original author: Gene Selkov, Jr. <email>selkovjr@mcs.anl.gov</email>, 382 Mathematics and Computer Science Division, Argonne National Laboratory. 383 </para> 384 385 <para> 386 My thanks are primarily to Prof. Joe Hellerstein 387 (<ulink url="https://dsf.berkeley.edu/jmh/"></ulink>) for elucidating the 388 gist of the GiST (<ulink url="http://gist.cs.berkeley.edu/"></ulink>). I am 389 also grateful to all Postgres developers, present and past, for enabling 390 myself to create my own world and live undisturbed in it. And I would like 391 to acknowledge my gratitude to Argonne Lab and to the U.S. Department of 392 Energy for the years of faithful support of my database research. 393 </para> 394 395 </sect2> 396 397</sect1> 398