1<!-- doc/src/sgml/xoper.sgml --> 2 3 <sect1 id="xoper"> 4 <title>User-Defined Operators</title> 5 6 <indexterm zone="xoper"> 7 <primary>operator</primary> 8 <secondary>user-defined</secondary> 9 </indexterm> 10 11 <para> 12 Every operator is <quote>syntactic sugar</quote> for a call to an 13 underlying function that does the real work; so you must 14 first create the underlying function before you can create 15 the operator. However, an operator is <emphasis>not merely</emphasis> 16 syntactic sugar, because it carries additional information 17 that helps the query planner optimize queries that use the 18 operator. The next section will be devoted to explaining 19 that additional information. 20 </para> 21 22 <para> 23 <productname>PostgreSQL</productname> supports left unary, right 24 unary, and binary operators. Operators can be 25 overloaded;<indexterm><primary>overloading</primary><secondary>operators</secondary></indexterm> 26 that is, the same operator name can be used for different operators 27 that have different numbers and types of operands. When a query is 28 executed, the system determines the operator to call from the 29 number and types of the provided operands. 30 </para> 31 32 <para> 33 Here is an example of creating an operator for adding two complex 34 numbers. We assume we've already created the definition of type 35 <type>complex</type> (see <xref linkend="xtypes"/>). First we need a 36 function that does the work, then we can define the operator: 37 38<programlisting> 39CREATE FUNCTION complex_add(complex, complex) 40 RETURNS complex 41 AS '<replaceable>filename</replaceable>', 'complex_add' 42 LANGUAGE C IMMUTABLE STRICT; 43 44CREATE OPERATOR + ( 45 leftarg = complex, 46 rightarg = complex, 47 function = complex_add, 48 commutator = + 49); 50</programlisting> 51 </para> 52 53 <para> 54 Now we could execute a query like this: 55 56<screen> 57SELECT (a + b) AS c FROM test_complex; 58 59 c 60----------------- 61 (5.2,6.05) 62 (133.42,144.95) 63</screen> 64 </para> 65 66 <para> 67 We've shown how to create a binary operator here. To create unary 68 operators, just omit one of <literal>leftarg</literal> (for left unary) or 69 <literal>rightarg</literal> (for right unary). The <literal>function</literal> 70 clause and the argument clauses are the only required items in 71 <command>CREATE OPERATOR</command>. The <literal>commutator</literal> 72 clause shown in the example is an optional hint to the query 73 optimizer. Further details about <literal>commutator</literal> and other 74 optimizer hints appear in the next section. 75 </para> 76 </sect1> 77 78 <sect1 id="xoper-optimization"> 79 <title>Operator Optimization Information</title> 80 81 <indexterm zone="xoper-optimization"> 82 <primary>optimization information</primary> 83 <secondary>for operators</secondary> 84 </indexterm> 85 86 <para> 87 A <productname>PostgreSQL</productname> operator definition can include 88 several optional clauses that tell the system useful things about how 89 the operator behaves. These clauses should be provided whenever 90 appropriate, because they can make for considerable speedups in execution 91 of queries that use the operator. But if you provide them, you must be 92 sure that they are right! Incorrect use of an optimization clause can 93 result in slow queries, subtly wrong output, or other Bad Things. 94 You can always leave out an optimization clause if you are not sure 95 about it; the only consequence is that queries might run slower than 96 they need to. 97 </para> 98 99 <para> 100 Additional optimization clauses might be added in future versions of 101 <productname>PostgreSQL</productname>. The ones described here are all 102 the ones that release &version; understands. 103 </para> 104 105 <para> 106 It is also possible to attach a planner support function to the function 107 that underlies an operator, providing another way of telling the system 108 about the behavior of the operator. 109 See <xref linkend="xfunc-optimization"/> for more information. 110 </para> 111 112 <sect2> 113 <title><literal>COMMUTATOR</literal></title> 114 115 <para> 116 The <literal>COMMUTATOR</literal> clause, if provided, names an operator that is the 117 commutator of the operator being defined. We say that operator A is the 118 commutator of operator B if (x A y) equals (y B x) for all possible input 119 values x, y. Notice that B is also the commutator of A. For example, 120 operators <literal><</literal> and <literal>></literal> for a particular data type are usually each others' 121 commutators, and operator <literal>+</literal> is usually commutative with itself. 122 But operator <literal>-</literal> is usually not commutative with anything. 123 </para> 124 125 <para> 126 The left operand type of a commutable operator is the same as the 127 right operand type of its commutator, and vice versa. So the name of 128 the commutator operator is all that <productname>PostgreSQL</productname> 129 needs to be given to look up the commutator, and that's all that needs to 130 be provided in the <literal>COMMUTATOR</literal> clause. 131 </para> 132 133 <para> 134 It's critical to provide commutator information for operators that 135 will be used in indexes and join clauses, because this allows the 136 query optimizer to <quote>flip around</quote> such a clause to the forms 137 needed for different plan types. For example, consider a query with 138 a WHERE clause like <literal>tab1.x = tab2.y</literal>, where <literal>tab1.x</literal> 139 and <literal>tab2.y</literal> are of a user-defined type, and suppose that 140 <literal>tab2.y</literal> is indexed. The optimizer cannot generate an 141 index scan unless it can determine how to flip the clause around to 142 <literal>tab2.y = tab1.x</literal>, because the index-scan machinery expects 143 to see the indexed column on the left of the operator it is given. 144 <productname>PostgreSQL</productname> will <emphasis>not</emphasis> simply 145 assume that this is a valid transformation — the creator of the 146 <literal>=</literal> operator must specify that it is valid, by marking the 147 operator with commutator information. 148 </para> 149 150 <para> 151 When you are defining a self-commutative operator, you just do it. 152 When you are defining a pair of commutative operators, things are 153 a little trickier: how can the first one to be defined refer to the 154 other one, which you haven't defined yet? There are two solutions 155 to this problem: 156 157 <itemizedlist> 158 <listitem> 159 <para> 160 One way is to omit the <literal>COMMUTATOR</literal> clause in the first operator that 161 you define, and then provide one in the second operator's definition. 162 Since <productname>PostgreSQL</productname> knows that commutative 163 operators come in pairs, when it sees the second definition it will 164 automatically go back and fill in the missing <literal>COMMUTATOR</literal> clause in 165 the first definition. 166 </para> 167 </listitem> 168 169 <listitem> 170 <para> 171 The other, more straightforward way is just to include <literal>COMMUTATOR</literal> clauses 172 in both definitions. When <productname>PostgreSQL</productname> processes 173 the first definition and realizes that <literal>COMMUTATOR</literal> refers to a nonexistent 174 operator, the system will make a dummy entry for that operator in the 175 system catalog. This dummy entry will have valid data only 176 for the operator name, left and right operand types, and result type, 177 since that's all that <productname>PostgreSQL</productname> can deduce 178 at this point. The first operator's catalog entry will link to this 179 dummy entry. Later, when you define the second operator, the system 180 updates the dummy entry with the additional information from the second 181 definition. If you try to use the dummy operator before it's been filled 182 in, you'll just get an error message. 183 </para> 184 </listitem> 185 </itemizedlist> 186 </para> 187 </sect2> 188 189 <sect2> 190 <title><literal>NEGATOR</literal></title> 191 192 <para> 193 The <literal>NEGATOR</literal> clause, if provided, names an operator that is the 194 negator of the operator being defined. We say that operator A 195 is the negator of operator B if both return Boolean results and 196 (x A y) equals NOT (x B y) for all possible inputs x, y. 197 Notice that B is also the negator of A. 198 For example, <literal><</literal> and <literal>>=</literal> are a negator pair for most data types. 199 An operator can never validly be its own negator. 200 </para> 201 202 <para> 203 Unlike commutators, a pair of unary operators could validly be marked 204 as each other's negators; that would mean (A x) equals NOT (B x) 205 for all x, or the equivalent for right unary operators. 206 </para> 207 208 <para> 209 An operator's negator must have the same left and/or right operand types 210 as the operator to be defined, so just as with <literal>COMMUTATOR</literal>, only the operator 211 name need be given in the <literal>NEGATOR</literal> clause. 212 </para> 213 214 <para> 215 Providing a negator is very helpful to the query optimizer since 216 it allows expressions like <literal>NOT (x = y)</literal> to be simplified into 217 <literal>x <> y</literal>. This comes up more often than you might think, because 218 <literal>NOT</literal> operations can be inserted as a consequence of other rearrangements. 219 </para> 220 221 <para> 222 Pairs of negator operators can be defined using the same methods 223 explained above for commutator pairs. 224 </para> 225 226 </sect2> 227 228 <sect2> 229 <title><literal>RESTRICT</literal></title> 230 231 <para> 232 The <literal>RESTRICT</literal> clause, if provided, names a restriction selectivity 233 estimation function for the operator. (Note that this is a function 234 name, not an operator name.) <literal>RESTRICT</literal> clauses only make sense for 235 binary operators that return <type>boolean</type>. The idea behind a restriction 236 selectivity estimator is to guess what fraction of the rows in a 237 table will satisfy a <literal>WHERE</literal>-clause condition of the form: 238<programlisting> 239column OP constant 240</programlisting> 241 for the current operator and a particular constant value. 242 This assists the optimizer by 243 giving it some idea of how many rows will be eliminated by <literal>WHERE</literal> 244 clauses that have this form. (What happens if the constant is on 245 the left, you might be wondering? Well, that's one of the things that 246 <literal>COMMUTATOR</literal> is for...) 247 </para> 248 249 <para> 250 Writing new restriction selectivity estimation functions is far beyond 251 the scope of this chapter, but fortunately you can usually just use 252 one of the system's standard estimators for many of your own operators. 253 These are the standard restriction estimators: 254 <simplelist> 255 <member><function>eqsel</function> for <literal>=</literal></member> 256 <member><function>neqsel</function> for <literal><></literal></member> 257 <member><function>scalarltsel</function> for <literal><</literal></member> 258 <member><function>scalarlesel</function> for <literal><=</literal></member> 259 <member><function>scalargtsel</function> for <literal>></literal></member> 260 <member><function>scalargesel</function> for <literal>>=</literal></member> 261 </simplelist> 262 </para> 263 264 <para> 265 You can frequently get away with using either <function>eqsel</function> or <function>neqsel</function> for 266 operators that have very high or very low selectivity, even if they 267 aren't really equality or inequality. For example, the 268 approximate-equality geometric operators use <function>eqsel</function> on the assumption that 269 they'll usually only match a small fraction of the entries in a table. 270 </para> 271 272 <para> 273 You can use <function>scalarltsel</function>, <function>scalarlesel</function>, 274 <function>scalargtsel</function> and <function>scalargesel</function> for comparisons on 275 data types that have some sensible means of being converted into numeric 276 scalars for range comparisons. If possible, add the data type to those 277 understood by the function <function>convert_to_scalar()</function> in 278 <filename>src/backend/utils/adt/selfuncs.c</filename>. 279 (Eventually, this function should be replaced by per-data-type functions 280 identified through a column of the <classname>pg_type</classname> system catalog; but that hasn't happened 281 yet.) If you do not do this, things will still work, but the optimizer's 282 estimates won't be as good as they could be. 283 </para> 284 285 <para> 286 Another useful built-in selectivity estimation function 287 is <function>matchingsel</function>, which will work for almost any 288 binary operator, if standard MCV and/or histogram statistics are 289 collected for the input data type(s). Its default estimate is set to 290 twice the default estimate used in <function>eqsel</function>, making 291 it most suitable for comparison operators that are somewhat less 292 strict than equality. (Or you could call the 293 underlying <function>generic_restriction_selectivity</function> 294 function, providing a different default estimate.) 295 </para> 296 297 <para> 298 There are additional selectivity estimation functions designed for geometric 299 operators in <filename>src/backend/utils/adt/geo_selfuncs.c</filename>: <function>areasel</function>, <function>positionsel</function>, 300 and <function>contsel</function>. At this writing these are just stubs, but you might want 301 to use them (or even better, improve them) anyway. 302 </para> 303 </sect2> 304 305 <sect2> 306 <title><literal>JOIN</literal></title> 307 308 <para> 309 The <literal>JOIN</literal> clause, if provided, names a join selectivity 310 estimation function for the operator. (Note that this is a function 311 name, not an operator name.) <literal>JOIN</literal> clauses only make sense for 312 binary operators that return <type>boolean</type>. The idea behind a join 313 selectivity estimator is to guess what fraction of the rows in a 314 pair of tables will satisfy a <literal>WHERE</literal>-clause condition of the form: 315<programlisting> 316table1.column1 OP table2.column2 317</programlisting> 318 for the current operator. As with the <literal>RESTRICT</literal> clause, this helps 319 the optimizer very substantially by letting it figure out which 320 of several possible join sequences is likely to take the least work. 321 </para> 322 323 <para> 324 As before, this chapter will make no attempt to explain how to write 325 a join selectivity estimator function, but will just suggest that 326 you use one of the standard estimators if one is applicable: 327 <simplelist> 328 <member><function>eqjoinsel</function> for <literal>=</literal></member> 329 <member><function>neqjoinsel</function> for <literal><></literal></member> 330 <member><function>scalarltjoinsel</function> for <literal><</literal></member> 331 <member><function>scalarlejoinsel</function> for <literal><=</literal></member> 332 <member><function>scalargtjoinsel</function> for <literal>></literal></member> 333 <member><function>scalargejoinsel</function> for <literal>>=</literal></member> 334 <member><function>matchingjoinsel</function> for generic matching operators</member> 335 <member><function>areajoinsel</function> for 2D area-based comparisons</member> 336 <member><function>positionjoinsel</function> for 2D position-based comparisons</member> 337 <member><function>contjoinsel</function> for 2D containment-based comparisons</member> 338 </simplelist> 339 </para> 340 </sect2> 341 342 <sect2> 343 <title><literal>HASHES</literal></title> 344 345 <para> 346 The <literal>HASHES</literal> clause, if present, tells the system that 347 it is permissible to use the hash join method for a join based on this 348 operator. <literal>HASHES</literal> only makes sense for a binary operator that 349 returns <literal>boolean</literal>, and in practice the operator must represent 350 equality for some data type or pair of data types. 351 </para> 352 353 <para> 354 The assumption underlying hash join is that the join operator can 355 only return true for pairs of left and right values that hash to the 356 same hash code. If two values get put in different hash buckets, the 357 join will never compare them at all, implicitly assuming that the 358 result of the join operator must be false. So it never makes sense 359 to specify <literal>HASHES</literal> for operators that do not represent 360 some form of equality. In most cases it is only practical to support 361 hashing for operators that take the same data type on both sides. 362 However, sometimes it is possible to design compatible hash functions 363 for two or more data types; that is, functions that will generate the 364 same hash codes for <quote>equal</quote> values, even though the values 365 have different representations. For example, it's fairly simple 366 to arrange this property when hashing integers of different widths. 367 </para> 368 369 <para> 370 To be marked <literal>HASHES</literal>, the join operator must appear 371 in a hash index operator family. This is not enforced when you create 372 the operator, since of course the referencing operator family couldn't 373 exist yet. But attempts to use the operator in hash joins will fail 374 at run time if no such operator family exists. The system needs the 375 operator family to find the data-type-specific hash function(s) for the 376 operator's input data type(s). Of course, you must also create suitable 377 hash functions before you can create the operator family. 378 </para> 379 380 <para> 381 Care should be exercised when preparing a hash function, because there 382 are machine-dependent ways in which it might fail to do the right thing. 383 For example, if your data type is a structure in which there might be 384 uninteresting pad bits, you cannot simply pass the whole structure to 385 <function>hash_any</function>. (Unless you write your other operators and 386 functions to ensure that the unused bits are always zero, which is the 387 recommended strategy.) 388 Another example is that on machines that meet the <acronym>IEEE</acronym> 389 floating-point standard, negative zero and positive zero are different 390 values (different bit patterns) but they are defined to compare equal. 391 If a float value might contain negative zero then extra steps are needed 392 to ensure it generates the same hash value as positive zero. 393 </para> 394 395 <para> 396 A hash-joinable operator must have a commutator (itself if the two 397 operand data types are the same, or a related equality operator 398 if they are different) that appears in the same operator family. 399 If this is not the case, planner errors might occur when the operator 400 is used. Also, it is a good idea (but not strictly required) for 401 a hash operator family that supports multiple data types to provide 402 equality operators for every combination of the data types; this 403 allows better optimization. 404 </para> 405 406 <note> 407 <para> 408 The function underlying a hash-joinable operator must be marked 409 immutable or stable. If it is volatile, the system will never 410 attempt to use the operator for a hash join. 411 </para> 412 </note> 413 414 <note> 415 <para> 416 If a hash-joinable operator has an underlying function that is marked 417 strict, the 418 function must also be complete: that is, it should return true or 419 false, never null, for any two nonnull inputs. If this rule is 420 not followed, hash-optimization of <literal>IN</literal> operations might 421 generate wrong results. (Specifically, <literal>IN</literal> might return 422 false where the correct answer according to the standard would be null; 423 or it might yield an error complaining that it wasn't prepared for a 424 null result.) 425 </para> 426 </note> 427 428 </sect2> 429 430 <sect2> 431 <title><literal>MERGES</literal></title> 432 433 <para> 434 The <literal>MERGES</literal> clause, if present, tells the system that 435 it is permissible to use the merge-join method for a join based on this 436 operator. <literal>MERGES</literal> only makes sense for a binary operator that 437 returns <literal>boolean</literal>, and in practice the operator must represent 438 equality for some data type or pair of data types. 439 </para> 440 441 <para> 442 Merge join is based on the idea of sorting the left- and right-hand tables 443 into order and then scanning them in parallel. So, both data types must 444 be capable of being fully ordered, and the join operator must be one 445 that can only succeed for pairs of values that fall at the 446 <quote>same place</quote> 447 in the sort order. In practice this means that the join operator must 448 behave like equality. But it is possible to merge-join two 449 distinct data types so long as they are logically compatible. For 450 example, the <type>smallint</type>-versus-<type>integer</type> 451 equality operator is merge-joinable. 452 We only need sorting operators that will bring both data types into a 453 logically compatible sequence. 454 </para> 455 456 <para> 457 To be marked <literal>MERGES</literal>, the join operator must appear 458 as an equality member of a <literal>btree</literal> index operator family. 459 This is not enforced when you create 460 the operator, since of course the referencing operator family couldn't 461 exist yet. But the operator will not actually be used for merge joins 462 unless a matching operator family can be found. The 463 <literal>MERGES</literal> flag thus acts as a hint to the planner that 464 it's worth looking for a matching operator family. 465 </para> 466 467 <para> 468 A merge-joinable operator must have a commutator (itself if the two 469 operand data types are the same, or a related equality operator 470 if they are different) that appears in the same operator family. 471 If this is not the case, planner errors might occur when the operator 472 is used. Also, it is a good idea (but not strictly required) for 473 a <literal>btree</literal> operator family that supports multiple data types to provide 474 equality operators for every combination of the data types; this 475 allows better optimization. 476 </para> 477 478 <note> 479 <para> 480 The function underlying a merge-joinable operator must be marked 481 immutable or stable. If it is volatile, the system will never 482 attempt to use the operator for a merge join. 483 </para> 484 </note> 485 </sect2> 486 </sect1> 487