1<!-- doc/src/sgml/tablesample-method.sgml -->
2
3<chapter id="tablesample-method">
4 <title>Writing a Table Sampling Method</title>
5
6 <indexterm zone="tablesample-method">
7  <primary>table sampling method</primary>
8 </indexterm>
9
10 <indexterm zone="tablesample-method">
11  <primary><literal>TABLESAMPLE</literal> method</primary>
12 </indexterm>
13
14 <para>
15  <productname>PostgreSQL</productname>'s implementation of the <literal>TABLESAMPLE</literal>
16  clause supports custom table sampling methods, in addition to
17  the <literal>BERNOULLI</literal> and <literal>SYSTEM</literal> methods that are required
18  by the SQL standard.  The sampling method determines which rows of the
19  table will be selected when the <literal>TABLESAMPLE</literal> clause is used.
20 </para>
21
22 <para>
23  At the SQL level, a table sampling method is represented by a single SQL
24  function, typically implemented in C, having the signature
25<programlisting>
26method_name(internal) RETURNS tsm_handler
27</programlisting>
28  The name of the function is the same method name appearing in the
29  <literal>TABLESAMPLE</literal> clause.  The <type>internal</type> argument is a dummy
30  (always having value zero) that simply serves to prevent this function from
31  being called directly from a SQL command.
32  The result of the function must be a palloc'd struct of
33  type <type>TsmRoutine</type>, which contains pointers to support functions for
34  the sampling method.  These support functions are plain C functions and
35  are not visible or callable at the SQL level.  The support functions are
36  described in <xref linkend="tablesample-support-functions"/>.
37 </para>
38
39 <para>
40  In addition to function pointers, the <type>TsmRoutine</type> struct must
41  provide these additional fields:
42 </para>
43
44 <variablelist>
45  <varlistentry>
46   <term><literal>List *parameterTypes</literal></term>
47   <listitem>
48    <para>
49     This is an OID list containing the data type OIDs of the parameter(s)
50     that will be accepted by the <literal>TABLESAMPLE</literal> clause when this
51     sampling method is used.  For example, for the built-in methods, this
52     list contains a single item with value <literal>FLOAT4OID</literal>, which
53     represents the sampling percentage.  Custom sampling methods can have
54     more or different parameters.
55    </para>
56   </listitem>
57  </varlistentry>
58
59  <varlistentry>
60   <term><literal>bool repeatable_across_queries</literal></term>
61   <listitem>
62    <para>
63     If <literal>true</literal>, the sampling method can deliver identical samples
64     across successive queries, if the same parameters
65     and <literal>REPEATABLE</literal> seed value are supplied each time and the
66     table contents have not changed.  When this is <literal>false</literal>,
67     the <literal>REPEATABLE</literal> clause is not accepted for use with the
68     sampling method.
69    </para>
70   </listitem>
71  </varlistentry>
72
73  <varlistentry>
74   <term><literal>bool repeatable_across_scans</literal></term>
75   <listitem>
76    <para>
77     If <literal>true</literal>, the sampling method can deliver identical samples
78     across successive scans in the same query (assuming unchanging
79     parameters, seed value, and snapshot).
80     When this is <literal>false</literal>, the planner will not select plans that
81     would require scanning the sampled table more than once, since that
82     might result in inconsistent query output.
83    </para>
84   </listitem>
85  </varlistentry>
86 </variablelist>
87
88 <para>
89  The <type>TsmRoutine</type> struct type is declared
90  in <filename>src/include/access/tsmapi.h</filename>, which see for additional
91  details.
92 </para>
93
94 <para>
95  The table sampling methods included in the standard distribution are good
96  references when trying to write your own.  Look into
97  the <filename>src/backend/access/tablesample</filename> subdirectory of the source
98  tree for the built-in sampling methods, and into the <filename>contrib</filename>
99  subdirectory for add-on methods.
100 </para>
101
102 <sect1 id="tablesample-support-functions">
103  <title>Sampling Method Support Functions</title>
104
105  <para>
106   The TSM handler function returns a palloc'd <type>TsmRoutine</type> struct
107   containing pointers to the support functions described below.  Most of
108   the functions are required, but some are optional, and those pointers can
109   be NULL.
110  </para>
111
112  <para>
113<programlisting>
114void
115SampleScanGetSampleSize (PlannerInfo *root,
116                         RelOptInfo *baserel,
117                         List *paramexprs,
118                         BlockNumber *pages,
119                         double *tuples);
120</programlisting>
121
122   This function is called during planning.  It must estimate the number of
123   relation pages that will be read during a sample scan, and the number of
124   tuples that will be selected by the scan.  (For example, these might be
125   determined by estimating the sampling fraction, and then multiplying
126   the <literal>baserel-&gt;pages</literal> and <literal>baserel-&gt;tuples</literal>
127   numbers by that, being sure to round the results to integral values.)
128   The <literal>paramexprs</literal> list holds the expression(s) that are
129   parameters to the <literal>TABLESAMPLE</literal> clause.  It is recommended to
130   use <function>estimate_expression_value()</function> to try to reduce these
131   expressions to constants, if their values are needed for estimation
132   purposes; but the function must provide size estimates even if they cannot
133   be reduced, and it should not fail even if the values appear invalid
134   (remember that they're only estimates of what the run-time values will be).
135   The <literal>pages</literal> and <literal>tuples</literal> parameters are outputs.
136  </para>
137
138  <para>
139<programlisting>
140void
141InitSampleScan (SampleScanState *node,
142                int eflags);
143</programlisting>
144
145   Initialize for execution of a SampleScan plan node.
146   This is called during executor startup.
147   It should perform any initialization needed before processing can start.
148   The <structname>SampleScanState</structname> node has already been created, but
149   its <structfield>tsm_state</structfield> field is NULL.
150   The <function>InitSampleScan</function> function can palloc whatever internal
151   state data is needed by the sampling method, and store a pointer to
152   it in <literal>node-&gt;tsm_state</literal>.
153   Information about the table to scan is accessible through other fields
154   of the <structname>SampleScanState</structname> node (but note that the
155   <literal>node-&gt;ss.ss_currentScanDesc</literal> scan descriptor is not set
156   up yet).
157   <literal>eflags</literal> contains flag bits describing the executor's
158   operating mode for this plan node.
159  </para>
160
161  <para>
162   When <literal>(eflags &amp; EXEC_FLAG_EXPLAIN_ONLY)</literal> is true,
163   the scan will not actually be performed, so this function should only do
164   the minimum required to make the node state valid for <command>EXPLAIN</command>
165   and <function>EndSampleScan</function>.
166  </para>
167
168  <para>
169   This function can be omitted (set the pointer to NULL), in which case
170   <function>BeginSampleScan</function> must perform all initialization needed
171   by the sampling method.
172  </para>
173
174  <para>
175<programlisting>
176void
177BeginSampleScan (SampleScanState *node,
178                 Datum *params,
179                 int nparams,
180                 uint32 seed);
181</programlisting>
182
183   Begin execution of a sampling scan.
184   This is called just before the first attempt to fetch a tuple, and
185   may be called again if the scan needs to be restarted.
186   Information about the table to scan is accessible through fields
187   of the <structname>SampleScanState</structname> node (but note that the
188   <literal>node-&gt;ss.ss_currentScanDesc</literal> scan descriptor is not set
189   up yet).
190   The <literal>params</literal> array, of length <literal>nparams</literal>, contains the
191   values of the parameters supplied in the <literal>TABLESAMPLE</literal> clause.
192   These will have the number and types specified in the sampling
193   method's <literal>parameterTypes</literal> list, and have been checked
194   to not be null.
195   <literal>seed</literal> contains a seed to use for any random numbers generated
196   within the sampling method; it is either a hash derived from the
197   <literal>REPEATABLE</literal> value if one was given, or the result
198   of <literal>random()</literal> if not.
199  </para>
200
201  <para>
202   This function may adjust the fields <literal>node-&gt;use_bulkread</literal>
203   and <literal>node-&gt;use_pagemode</literal>.
204   If <literal>node-&gt;use_bulkread</literal> is <literal>true</literal>, which it is by
205   default, the scan will use a buffer access strategy that encourages
206   recycling buffers after use.  It might be reasonable to set this
207   to <literal>false</literal> if the scan will visit only a small fraction of the
208   table's pages.
209   If <literal>node-&gt;use_pagemode</literal> is <literal>true</literal>, which it is by
210   default, the scan will perform visibility checking in a single pass for
211   all tuples on each visited page.  It might be reasonable to set this
212   to <literal>false</literal> if the scan will select only a small fraction of the
213   tuples on each visited page.  That will result in fewer tuple visibility
214   checks being performed, though each one will be more expensive because it
215   will require more locking.
216  </para>
217
218  <para>
219   If the sampling method is
220   marked <literal>repeatable_across_scans</literal>, it must be able to
221   select the same set of tuples during a rescan as it did originally, that is
222   a fresh call of <function>BeginSampleScan</function> must lead to selecting the
223   same tuples as before (if the <literal>TABLESAMPLE</literal> parameters
224   and seed don't change).
225  </para>
226
227  <para>
228<programlisting>
229BlockNumber
230NextSampleBlock (SampleScanState *node, BlockNumber nblocks);
231</programlisting>
232
233   Returns the block number of the next page to be scanned, or
234   <literal>InvalidBlockNumber</literal> if no pages remain to be scanned.
235  </para>
236
237  <para>
238   This function can be omitted (set the pointer to NULL), in which case
239   the core code will perform a sequential scan of the entire relation.
240   Such a scan can use synchronized scanning, so that the sampling method
241   cannot assume that the relation pages are visited in the same order on
242   each scan.
243  </para>
244
245  <para>
246<programlisting>
247OffsetNumber
248NextSampleTuple (SampleScanState *node,
249                 BlockNumber blockno,
250                 OffsetNumber maxoffset);
251</programlisting>
252
253   Returns the offset number of the next tuple to be sampled on the
254   specified page, or <literal>InvalidOffsetNumber</literal> if no tuples remain to
255   be sampled.  <literal>maxoffset</literal> is the largest offset number in use
256   on the page.
257  </para>
258
259  <note>
260   <para>
261    <function>NextSampleTuple</function> is not explicitly told which of the offset
262    numbers in the range <literal>1 .. maxoffset</literal> actually contain valid
263    tuples.  This is not normally a problem since the core code ignores
264    requests to sample missing or invisible tuples; that should not result in
265    any bias in the sample.  However, if necessary, the function can use
266    <literal>node-&gt;donetuples</literal> to examine how many of the tuples
267    it returned were valid and visible.
268   </para>
269  </note>
270
271  <note>
272   <para>
273    <function>NextSampleTuple</function> must <emphasis>not</emphasis> assume
274    that <literal>blockno</literal> is the same page number returned by the most
275    recent <function>NextSampleBlock</function> call.  It was returned by some
276    previous <function>NextSampleBlock</function> call, but the core code is allowed
277    to call <function>NextSampleBlock</function> in advance of actually scanning
278    pages, so as to support prefetching.  It is OK to assume that once
279    sampling of a given page begins, successive <function>NextSampleTuple</function>
280    calls all refer to the same page until <literal>InvalidOffsetNumber</literal> is
281    returned.
282   </para>
283  </note>
284
285  <para>
286<programlisting>
287void
288EndSampleScan (SampleScanState *node);
289</programlisting>
290
291   End the scan and release resources.  It is normally not important
292   to release palloc'd memory, but any externally-visible resources
293   should be cleaned up.
294   This function can be omitted (set the pointer to NULL) in the common
295   case where no such resources exist.
296  </para>
297
298 </sect1>
299
300</chapter>
301