1<!-- doc/src/sgml/tablesample-method.sgml --> 2 3<chapter id="tablesample-method"> 4 <title>Writing a Table Sampling Method</title> 5 6 <indexterm zone="tablesample-method"> 7 <primary>table sampling method</primary> 8 </indexterm> 9 10 <indexterm zone="tablesample-method"> 11 <primary><literal>TABLESAMPLE</literal> method</primary> 12 </indexterm> 13 14 <para> 15 <productname>PostgreSQL</productname>'s implementation of the <literal>TABLESAMPLE</literal> 16 clause supports custom table sampling methods, in addition to 17 the <literal>BERNOULLI</literal> and <literal>SYSTEM</literal> methods that are required 18 by the SQL standard. The sampling method determines which rows of the 19 table will be selected when the <literal>TABLESAMPLE</literal> clause is used. 20 </para> 21 22 <para> 23 At the SQL level, a table sampling method is represented by a single SQL 24 function, typically implemented in C, having the signature 25<programlisting> 26method_name(internal) RETURNS tsm_handler 27</programlisting> 28 The name of the function is the same method name appearing in the 29 <literal>TABLESAMPLE</literal> clause. The <type>internal</type> argument is a dummy 30 (always having value zero) that simply serves to prevent this function from 31 being called directly from a SQL command. 32 The result of the function must be a palloc'd struct of 33 type <type>TsmRoutine</type>, which contains pointers to support functions for 34 the sampling method. These support functions are plain C functions and 35 are not visible or callable at the SQL level. The support functions are 36 described in <xref linkend="tablesample-support-functions"/>. 37 </para> 38 39 <para> 40 In addition to function pointers, the <type>TsmRoutine</type> struct must 41 provide these additional fields: 42 </para> 43 44 <variablelist> 45 <varlistentry> 46 <term><literal>List *parameterTypes</literal></term> 47 <listitem> 48 <para> 49 This is an OID list containing the data type OIDs of the parameter(s) 50 that will be accepted by the <literal>TABLESAMPLE</literal> clause when this 51 sampling method is used. For example, for the built-in methods, this 52 list contains a single item with value <literal>FLOAT4OID</literal>, which 53 represents the sampling percentage. Custom sampling methods can have 54 more or different parameters. 55 </para> 56 </listitem> 57 </varlistentry> 58 59 <varlistentry> 60 <term><literal>bool repeatable_across_queries</literal></term> 61 <listitem> 62 <para> 63 If <literal>true</literal>, the sampling method can deliver identical samples 64 across successive queries, if the same parameters 65 and <literal>REPEATABLE</literal> seed value are supplied each time and the 66 table contents have not changed. When this is <literal>false</literal>, 67 the <literal>REPEATABLE</literal> clause is not accepted for use with the 68 sampling method. 69 </para> 70 </listitem> 71 </varlistentry> 72 73 <varlistentry> 74 <term><literal>bool repeatable_across_scans</literal></term> 75 <listitem> 76 <para> 77 If <literal>true</literal>, the sampling method can deliver identical samples 78 across successive scans in the same query (assuming unchanging 79 parameters, seed value, and snapshot). 80 When this is <literal>false</literal>, the planner will not select plans that 81 would require scanning the sampled table more than once, since that 82 might result in inconsistent query output. 83 </para> 84 </listitem> 85 </varlistentry> 86 </variablelist> 87 88 <para> 89 The <type>TsmRoutine</type> struct type is declared 90 in <filename>src/include/access/tsmapi.h</filename>, which see for additional 91 details. 92 </para> 93 94 <para> 95 The table sampling methods included in the standard distribution are good 96 references when trying to write your own. Look into 97 the <filename>src/backend/access/tablesample</filename> subdirectory of the source 98 tree for the built-in sampling methods, and into the <filename>contrib</filename> 99 subdirectory for add-on methods. 100 </para> 101 102 <sect1 id="tablesample-support-functions"> 103 <title>Sampling Method Support Functions</title> 104 105 <para> 106 The TSM handler function returns a palloc'd <type>TsmRoutine</type> struct 107 containing pointers to the support functions described below. Most of 108 the functions are required, but some are optional, and those pointers can 109 be NULL. 110 </para> 111 112 <para> 113<programlisting> 114void 115SampleScanGetSampleSize (PlannerInfo *root, 116 RelOptInfo *baserel, 117 List *paramexprs, 118 BlockNumber *pages, 119 double *tuples); 120</programlisting> 121 122 This function is called during planning. It must estimate the number of 123 relation pages that will be read during a sample scan, and the number of 124 tuples that will be selected by the scan. (For example, these might be 125 determined by estimating the sampling fraction, and then multiplying 126 the <literal>baserel->pages</literal> and <literal>baserel->tuples</literal> 127 numbers by that, being sure to round the results to integral values.) 128 The <literal>paramexprs</literal> list holds the expression(s) that are 129 parameters to the <literal>TABLESAMPLE</literal> clause. It is recommended to 130 use <function>estimate_expression_value()</function> to try to reduce these 131 expressions to constants, if their values are needed for estimation 132 purposes; but the function must provide size estimates even if they cannot 133 be reduced, and it should not fail even if the values appear invalid 134 (remember that they're only estimates of what the run-time values will be). 135 The <literal>pages</literal> and <literal>tuples</literal> parameters are outputs. 136 </para> 137 138 <para> 139<programlisting> 140void 141InitSampleScan (SampleScanState *node, 142 int eflags); 143</programlisting> 144 145 Initialize for execution of a SampleScan plan node. 146 This is called during executor startup. 147 It should perform any initialization needed before processing can start. 148 The <structname>SampleScanState</structname> node has already been created, but 149 its <structfield>tsm_state</structfield> field is NULL. 150 The <function>InitSampleScan</function> function can palloc whatever internal 151 state data is needed by the sampling method, and store a pointer to 152 it in <literal>node->tsm_state</literal>. 153 Information about the table to scan is accessible through other fields 154 of the <structname>SampleScanState</structname> node (but note that the 155 <literal>node->ss.ss_currentScanDesc</literal> scan descriptor is not set 156 up yet). 157 <literal>eflags</literal> contains flag bits describing the executor's 158 operating mode for this plan node. 159 </para> 160 161 <para> 162 When <literal>(eflags & EXEC_FLAG_EXPLAIN_ONLY)</literal> is true, 163 the scan will not actually be performed, so this function should only do 164 the minimum required to make the node state valid for <command>EXPLAIN</command> 165 and <function>EndSampleScan</function>. 166 </para> 167 168 <para> 169 This function can be omitted (set the pointer to NULL), in which case 170 <function>BeginSampleScan</function> must perform all initialization needed 171 by the sampling method. 172 </para> 173 174 <para> 175<programlisting> 176void 177BeginSampleScan (SampleScanState *node, 178 Datum *params, 179 int nparams, 180 uint32 seed); 181</programlisting> 182 183 Begin execution of a sampling scan. 184 This is called just before the first attempt to fetch a tuple, and 185 may be called again if the scan needs to be restarted. 186 Information about the table to scan is accessible through fields 187 of the <structname>SampleScanState</structname> node (but note that the 188 <literal>node->ss.ss_currentScanDesc</literal> scan descriptor is not set 189 up yet). 190 The <literal>params</literal> array, of length <literal>nparams</literal>, contains the 191 values of the parameters supplied in the <literal>TABLESAMPLE</literal> clause. 192 These will have the number and types specified in the sampling 193 method's <literal>parameterTypes</literal> list, and have been checked 194 to not be null. 195 <literal>seed</literal> contains a seed to use for any random numbers generated 196 within the sampling method; it is either a hash derived from the 197 <literal>REPEATABLE</literal> value if one was given, or the result 198 of <literal>random()</literal> if not. 199 </para> 200 201 <para> 202 This function may adjust the fields <literal>node->use_bulkread</literal> 203 and <literal>node->use_pagemode</literal>. 204 If <literal>node->use_bulkread</literal> is <literal>true</literal>, which it is by 205 default, the scan will use a buffer access strategy that encourages 206 recycling buffers after use. It might be reasonable to set this 207 to <literal>false</literal> if the scan will visit only a small fraction of the 208 table's pages. 209 If <literal>node->use_pagemode</literal> is <literal>true</literal>, which it is by 210 default, the scan will perform visibility checking in a single pass for 211 all tuples on each visited page. It might be reasonable to set this 212 to <literal>false</literal> if the scan will select only a small fraction of the 213 tuples on each visited page. That will result in fewer tuple visibility 214 checks being performed, though each one will be more expensive because it 215 will require more locking. 216 </para> 217 218 <para> 219 If the sampling method is 220 marked <literal>repeatable_across_scans</literal>, it must be able to 221 select the same set of tuples during a rescan as it did originally, that is 222 a fresh call of <function>BeginSampleScan</function> must lead to selecting the 223 same tuples as before (if the <literal>TABLESAMPLE</literal> parameters 224 and seed don't change). 225 </para> 226 227 <para> 228<programlisting> 229BlockNumber 230NextSampleBlock (SampleScanState *node, BlockNumber nblocks); 231</programlisting> 232 233 Returns the block number of the next page to be scanned, or 234 <literal>InvalidBlockNumber</literal> if no pages remain to be scanned. 235 </para> 236 237 <para> 238 This function can be omitted (set the pointer to NULL), in which case 239 the core code will perform a sequential scan of the entire relation. 240 Such a scan can use synchronized scanning, so that the sampling method 241 cannot assume that the relation pages are visited in the same order on 242 each scan. 243 </para> 244 245 <para> 246<programlisting> 247OffsetNumber 248NextSampleTuple (SampleScanState *node, 249 BlockNumber blockno, 250 OffsetNumber maxoffset); 251</programlisting> 252 253 Returns the offset number of the next tuple to be sampled on the 254 specified page, or <literal>InvalidOffsetNumber</literal> if no tuples remain to 255 be sampled. <literal>maxoffset</literal> is the largest offset number in use 256 on the page. 257 </para> 258 259 <note> 260 <para> 261 <function>NextSampleTuple</function> is not explicitly told which of the offset 262 numbers in the range <literal>1 .. maxoffset</literal> actually contain valid 263 tuples. This is not normally a problem since the core code ignores 264 requests to sample missing or invisible tuples; that should not result in 265 any bias in the sample. However, if necessary, the function can use 266 <literal>node->donetuples</literal> to examine how many of the tuples 267 it returned were valid and visible. 268 </para> 269 </note> 270 271 <note> 272 <para> 273 <function>NextSampleTuple</function> must <emphasis>not</emphasis> assume 274 that <literal>blockno</literal> is the same page number returned by the most 275 recent <function>NextSampleBlock</function> call. It was returned by some 276 previous <function>NextSampleBlock</function> call, but the core code is allowed 277 to call <function>NextSampleBlock</function> in advance of actually scanning 278 pages, so as to support prefetching. It is OK to assume that once 279 sampling of a given page begins, successive <function>NextSampleTuple</function> 280 calls all refer to the same page until <literal>InvalidOffsetNumber</literal> is 281 returned. 282 </para> 283 </note> 284 285 <para> 286<programlisting> 287void 288EndSampleScan (SampleScanState *node); 289</programlisting> 290 291 End the scan and release resources. It is normally not important 292 to release palloc'd memory, but any externally-visible resources 293 should be cleaned up. 294 This function can be omitted (set the pointer to NULL) in the common 295 case where no such resources exist. 296 </para> 297 298 </sect1> 299 300</chapter> 301