• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..03-May-2022-

MakefileH A D08-Nov-2021483 206

READMEH A D08-Nov-202126.6 KiB561455

dfmgr.cH A D08-Nov-202119.1 KiB745461

fmgr.cH A D08-Nov-202167.5 KiB2,5521,675

funcapi.cH A D08-Nov-202142.9 KiB1,5681,038

README

1src/backend/utils/fmgr/README
2
3Function Manager
4================
5
6Proposal For Function-Manager Redesign			19-Nov-2000
7--------------------------------------
8
9We know that the existing mechanism for calling Postgres functions needs
10to be redesigned.  It has portability problems because it makes
11assumptions about parameter passing that violate ANSI C; it fails to
12handle NULL arguments and results cleanly; and "function handlers" that
13support a class of functions (such as fmgr_pl) can only be done via a
14really ugly, non-reentrant kluge.  (Global variable set during every
15function call, forsooth.)  Here is a proposal for fixing these problems.
16
17In the past, the major objections to redoing the function-manager
18interface have been (a) it'll be quite tedious to implement, since every
19built-in function and everyplace that calls such functions will need to
20be touched; (b) such wide-ranging changes will be difficult to make in
21parallel with other development work; (c) it will break existing
22user-written loadable modules that define "C language" functions.  While
23I have no solution to the "tedium" aspect, I believe I see an answer to
24the other problems: by use of function handlers, we can support both old
25and new interfaces in parallel for both callers and callees, at some
26small efficiency cost for the old styles.  That way, most of the changes
27can be done on an incremental file-by-file basis --- we won't need a
28"big bang" where everything changes at once.  Support for callees
29written in the old style can be left in place indefinitely, to provide
30backward compatibility for user-written C functions.
31
32
33Changes In pg_proc (System Data About a Function)
34-------------------------------------------------
35
36A new column "proisstrict" will be added to the system pg_proc table.
37This is a boolean value which will be TRUE if the function is "strict",
38that is it always returns NULL when any of its inputs are NULL.  The
39function manager will check this field and skip calling the function when
40it's TRUE and there are NULL inputs.  This allows us to remove explicit
41NULL-value tests from many functions that currently need them (not to
42mention fixing many more that need them but don't have them).  A function
43that is not marked "strict" is responsible for checking whether its inputs
44are NULL or not.  Most builtin functions will be marked "strict".
45
46An optional WITH parameter will be added to CREATE FUNCTION to allow
47specification of whether user-defined functions are strict or not.  I am
48inclined to make the default be "not strict", since that seems to be the
49more useful case for functions expressed in SQL or a PL language, but
50am open to arguments for the other choice.
51
52
53The New Function-Manager Interface
54----------------------------------
55
56The core of the new design is revised data structures for representing
57the result of a function lookup and for representing the parameters
58passed to a specific function invocation.  (We want to keep function
59lookup separate from function call, since many parts of the system apply
60the same function over and over; the lookup overhead should be paid once
61per query, not once per tuple.)
62
63
64When a function is looked up in pg_proc, the result is represented as
65
66typedef struct
67{
68    PGFunction  fn_addr;    /* pointer to function or handler to be called */
69    Oid         fn_oid;     /* OID of function (NOT of handler, if any) */
70    short       fn_nargs;   /* number of input args (0..FUNC_MAX_ARGS) */
71    bool        fn_strict;  /* function is "strict" (NULL in => NULL out) */
72    bool        fn_retset;  /* function returns a set (over multiple calls) */
73    unsigned char fn_stats; /* collect stats if track_functions > this */
74    void       *fn_extra;   /* extra space for use by handler */
75    MemoryContext fn_mcxt;  /* memory context to store fn_extra in */
76    Node       *fn_expr;    /* expression parse tree for call, or NULL */
77} FmgrInfo;
78
79For an ordinary built-in function, fn_addr is just the address of the C
80routine that implements the function.  Otherwise it is the address of a
81handler for the class of functions that includes the target function.
82The handler can use the function OID and perhaps also the fn_extra slot
83to find the specific code to execute.  (fn_oid = InvalidOid can be used
84to denote a not-yet-initialized FmgrInfo struct.  fn_extra will always
85be NULL when an FmgrInfo is first filled by the function lookup code, but
86a function handler could set it to avoid making repeated lookups of its
87own when the same FmgrInfo is used repeatedly during a query.)  fn_nargs
88is the number of arguments expected by the function, fn_strict is its
89strictness flag, and fn_retset shows whether it returns a set; all of
90these values come from the function's pg_proc entry.  fn_stats is also
91set up to control whether or not to track runtime statistics for calling
92this function.
93
94If the function is being called as part of a SQL expression, fn_expr will
95point to the expression parse tree for the function call; this can be used
96to extract parse-time knowledge about the actual arguments.  Note that this
97field really is information about the arguments rather than information
98about the function, but it's proven to be more convenient to keep it in
99FmgrInfo than in FunctionCallInfoData where it might more logically go.
100
101
102During a call of a function, the following data structure is created
103and passed to the function:
104
105typedef struct
106{
107    FmgrInfo   *flinfo;         /* ptr to lookup info used for this call */
108    Node       *context;        /* pass info about context of call */
109    Node       *resultinfo;     /* pass or return extra info about result */
110    Oid         fncollation;    /* collation for function to use */
111    bool        isnull;         /* function must set true if result is NULL */
112    short       nargs;          /* # arguments actually passed */
113    Datum       arg[FUNC_MAX_ARGS];  /* Arguments passed to function */
114    bool        argnull[FUNC_MAX_ARGS];  /* T if arg[i] is actually NULL */
115} FunctionCallInfoData;
116typedef FunctionCallInfoData* FunctionCallInfo;
117
118flinfo points to the lookup info used to make the call.  Ordinary functions
119will probably ignore this field, but function class handlers will need it
120to find out the OID of the specific function being called.
121
122context is NULL for an "ordinary" function call, but may point to additional
123info when the function is called in certain contexts.  (For example, the
124trigger manager will pass information about the current trigger event here.)
125If context is used, it should point to some subtype of Node; the particular
126kind of context is indicated by the node type field.  (A callee should
127always check the node type before assuming it knows what kind of context is
128being passed.)  fmgr itself puts no other restrictions on the use of this
129field.
130
131resultinfo is NULL when calling any function from which a simple Datum
132result is expected.  It may point to some subtype of Node if the function
133returns more than a Datum.  (For example, resultinfo is used when calling a
134function that returns a set, as discussed below.)  Like the context field,
135resultinfo is a hook for expansion; fmgr itself doesn't constrain the use
136of the field.
137
138fncollation is the input collation derived by the parser, or InvalidOid
139when there are no inputs of collatable types or they don't share a common
140collation.  This is effectively a hidden additional argument, which
141collation-sensitive functions can use to determine their behavior.
142
143nargs, arg[], and argnull[] hold the arguments being passed to the function.
144Notice that all the arguments passed to a function (as well as its result
145value) will now uniformly be of type Datum.  As discussed below, callers
146and callees should apply the standard Datum-to-and-from-whatever macros
147to convert to the actual argument types of a particular function.  The
148value in arg[i] is unspecified when argnull[i] is true.
149
150It is generally the responsibility of the caller to ensure that the
151number of arguments passed matches what the callee is expecting; except
152for callees that take a variable number of arguments, the callee will
153typically ignore the nargs field and just grab values from arg[].
154
155The isnull field will be initialized to "false" before the call.  On
156return from the function, isnull is the null flag for the function result:
157if it is true the function's result is NULL, regardless of the actual
158function return value.  Note that simple "strict" functions can ignore
159both isnull and argnull[], since they won't even get called when there
160are any TRUE values in argnull[].
161
162FunctionCallInfo replaces FmgrValues plus a bunch of ad-hoc parameter
163conventions, global variables (fmgr_pl_finfo and CurrentTriggerData at
164least), and other uglinesses.
165
166
167Callees, whether they be individual functions or function handlers,
168shall always have this signature:
169
170Datum function (FunctionCallInfo fcinfo);
171
172which is represented by the typedef
173
174typedef Datum (*PGFunction) (FunctionCallInfo fcinfo);
175
176The function is responsible for setting fcinfo->isnull appropriately
177as well as returning a result represented as a Datum.  Note that since
178all callees will now have exactly the same signature, and will be called
179through a function pointer declared with exactly that signature, we
180should have no portability or optimization problems.
181
182
183Function Coding Conventions
184---------------------------
185
186As an example, int4 addition goes from old-style
187
188int32
189int4pl(int32 arg1, int32 arg2)
190{
191    return arg1 + arg2;
192}
193
194to new-style
195
196Datum
197int4pl(FunctionCallInfo fcinfo)
198{
199    /* we assume the function is marked "strict", so we can ignore
200     * NULL-value handling */
201
202    return Int32GetDatum(DatumGetInt32(fcinfo->arg[0]) +
203                         DatumGetInt32(fcinfo->arg[1]));
204}
205
206This is, of course, much uglier than the old-style code, but we can
207improve matters with some well-chosen macros for the boilerplate parts.
208I propose below macros that would make the code look like
209
210Datum
211int4pl(PG_FUNCTION_ARGS)
212{
213    int32   arg1 = PG_GETARG_INT32(0);
214    int32   arg2 = PG_GETARG_INT32(1);
215
216    PG_RETURN_INT32( arg1 + arg2 );
217}
218
219This is still more code than before, but it's fairly readable, and it's
220also amenable to machine processing --- for example, we could probably
221write a script that scans code like this and extracts argument and result
222type info for comparison to the pg_proc table.
223
224For the standard data types float4, float8, and int8, these macros should hide
225whether the types are pass-by-value or pass-by reference, by incorporating
226indirection and space allocation if needed.  This will offer a considerable
227gain in readability, and it also opens up the opportunity to make these types
228be pass-by-value on machines where it's feasible to do so.
229
230Here are the proposed macros and coding conventions:
231
232The definition of an fmgr-callable function will always look like
233
234Datum
235function_name(PG_FUNCTION_ARGS)
236{
237	...
238}
239
240"PG_FUNCTION_ARGS" just expands to "FunctionCallInfo fcinfo".  The main
241reason for using this macro is to make it easy for scripts to spot function
242definitions.  However, if we ever decide to change the calling convention
243again, it might come in handy to have this macro in place.
244
245A nonstrict function is responsible for checking whether each individual
246argument is null or not, which it can do with PG_ARGISNULL(n) (which is
247just "fcinfo->argnull[n]").  It should avoid trying to fetch the value
248of any argument that is null.
249
250Both strict and nonstrict functions can return NULL, if needed, with
251	PG_RETURN_NULL();
252which expands to
253	{ fcinfo->isnull = true; return (Datum) 0; }
254
255Argument values are ordinarily fetched using code like
256	int32	name = PG_GETARG_INT32(number);
257
258For float4, float8, and int8, the PG_GETARG macros will hide whether the
259types are pass-by-value or pass-by-reference.  For example, if float8 is
260pass-by-reference then PG_GETARG_FLOAT8 expands to
261	(* (float8 *) DatumGetPointer(fcinfo->arg[number]))
262and would typically be called like this:
263	float8  arg = PG_GETARG_FLOAT8(0);
264For what are now historical reasons, the float-related typedefs and macros
265express the type width in bytes (4 or 8), whereas we prefer to label the
266widths of integer types in bits.
267
268Non-null values are returned with a PG_RETURN_XXX macro of the appropriate
269type.  For example, PG_RETURN_INT32 expands to
270	return Int32GetDatum(x)
271PG_RETURN_FLOAT4, PG_RETURN_FLOAT8, and PG_RETURN_INT64 hide whether their
272data types are pass-by-value or pass-by-reference, by doing a palloc if
273needed.
274
275fmgr.h will provide PG_GETARG and PG_RETURN macros for all the basic data
276types.  Modules or header files that define specialized SQL datatypes
277(eg, timestamp) should define appropriate macros for those types, so that
278functions manipulating the types can be coded in the standard style.
279
280For non-primitive data types (particularly variable-length types) it won't
281be very practical to hide the pass-by-reference nature of the data type,
282so the PG_GETARG and PG_RETURN macros for those types won't do much more
283than DatumGetPointer/PointerGetDatum plus the appropriate typecast (but see
284TOAST discussion, below).  Functions returning such types will need to
285palloc() their result space explicitly.  I recommend naming the GETARG and
286RETURN macros for such types to end in "_P", as a reminder that they
287produce or take a pointer.  For example, PG_GETARG_TEXT_P yields "text *".
288
289When a function needs to access fcinfo->flinfo or one of the other auxiliary
290fields of FunctionCallInfo, it should just do it.  I doubt that providing
291syntactic-sugar macros for these cases is useful.
292
293
294Call-Site Coding Conventions
295----------------------------
296
297There are many places in the system that call either a specific function
298(for example, the parser invokes "textin" by name in places) or a
299particular group of functions that have a common argument list (for
300example, the optimizer invokes selectivity estimation functions with
301a fixed argument list).  These places will need to change, but we should
302try to avoid making them significantly uglier than before.
303
304Places that invoke an arbitrary function with an arbitrary argument list
305can simply be changed to fill a FunctionCallInfoData structure directly;
306that'll be no worse and possibly cleaner than what they do now.
307
308When invoking a specific built-in function by name, we have generally
309just written something like
310	result = textin ( ... args ... )
311which will not work after textin() is converted to the new call style.
312I suggest that code like this be converted to use "helper" functions
313that will create and fill in a FunctionCallInfoData struct.  For
314example, if textin is being called with one argument, it'd look
315something like
316	result = DirectFunctionCall1(textin, PointerGetDatum(argument));
317These helper routines will have declarations like
318	Datum DirectFunctionCall2(PGFunction func, Datum arg1, Datum arg2);
319Note it will be the caller's responsibility to convert to and from
320Datum; appropriate conversion macros should be used.
321
322The DirectFunctionCallN routines will not bother to fill in
323fcinfo->flinfo (indeed cannot, since they have no idea about an OID for
324the target function); they will just set it NULL.  This is unlikely to
325bother any built-in function that could be called this way.  Note also
326that this style of coding cannot pass a NULL input value nor cope with
327a NULL result (it couldn't before, either!).  We can make the helper
328routines ereport an error if they see that the function returns a NULL.
329
330When invoking a function that has a known argument signature, we have
331usually written either
332	result = fmgr(targetfuncOid, ... args ... );
333or
334	result = fmgr_ptr(FmgrInfo *finfo, ... args ... );
335depending on whether an FmgrInfo lookup has been done yet or not.
336This kind of code can be recast using helper routines, in the same
337style as above:
338	result = OidFunctionCall1(funcOid, PointerGetDatum(argument));
339	result = FunctionCall2(funcCallInfo,
340	                       PointerGetDatum(argument),
341	                       Int32GetDatum(argument));
342Again, this style of coding does not allow for expressing NULL inputs
343or receiving a NULL result.
344
345As with the callee-side situation, I propose adding argument conversion
346macros that hide whether int8, float4, and float8 are pass-by-value or
347pass-by-reference.
348
349The existing helper functions fmgr(), fmgr_c(), etc will be left in
350place until all uses of them are gone.  Of course their internals will
351have to change in the first step of implementation, but they can
352continue to support the same external appearance.
353
354
355Support for TOAST-Able Data Types
356---------------------------------
357
358For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed
359data value.  There might be a few cases where the still-toasted value is
360wanted, but the vast majority of cases want the de-toasted result, so
361that will be the default.  To get the argument value without causing
362de-toasting, use PG_GETARG_RAW_VARLENA_P(n).
363
364Some functions require a modifiable copy of their input values.  In these
365cases, it's silly to do an extra copy step if we copied the data anyway
366to de-TOAST it.  Therefore, each toastable datatype has an additional
367fetch macro, for example PG_GETARG_TEXT_P_COPY(n), which delivers a
368guaranteed-fresh copy, combining this with the detoasting step if possible.
369
370There is also a PG_FREE_IF_COPY(ptr,n) macro, which pfree's the given
371pointer if and only if it is different from the original value of the n'th
372argument.  This can be used to free the de-toasted value of the n'th
373argument, if it was actually de-toasted.  Currently, doing this is not
374necessary for the majority of functions because the core backend code
375releases temporary space periodically, so that memory leaked in function
376execution isn't a big problem.  However, as of 7.1 memory leaks in
377functions that are called by index searches will not be cleaned up until
378end of transaction.  Therefore, functions that are listed in pg_amop or
379pg_amproc should be careful not to leak detoasted copies, and so these
380functions do need to use PG_FREE_IF_COPY() for toastable inputs.
381
382A function should never try to re-TOAST its result value; it should just
383deliver an untoasted result that's been palloc'd in the current memory
384context.  When and if the value is actually stored into a tuple, the
385tuple toaster will decide whether toasting is needed.
386
387
388Functions Accepting or Returning Sets
389-------------------------------------
390
391If a function is marked in pg_proc as returning a set, then it is called
392with fcinfo->resultinfo pointing to a node of type ReturnSetInfo.  A
393function that desires to return a set should raise an error "called in
394context that does not accept a set result" if resultinfo is NULL or does
395not point to a ReturnSetInfo node.
396
397There are currently two modes in which a function can return a set result:
398value-per-call, or materialize.  In value-per-call mode, the function returns
399one value each time it is called, and finally reports "done" when it has no
400more values to return.  In materialize mode, the function's output set is
401instantiated in a Tuplestore object; all the values are returned in one call.
402Additional modes might be added in future.
403
404ReturnSetInfo contains a field "allowedModes" which is set (by the caller)
405to a bitmask that's the OR of the modes the caller can support.  The actual
406mode used by the function is returned in another field "returnMode".  For
407backwards-compatibility reasons, returnMode is initialized to value-per-call
408and need only be changed if the function wants to use a different mode.
409The function should ereport() if it cannot use any of the modes the caller is
410willing to support.
411
412Value-per-call mode works like this: ReturnSetInfo contains a field
413"isDone", which should be set to one of these values:
414
415    ExprSingleResult             /* expression does not return a set */
416    ExprMultipleResult           /* this result is an element of a set */
417    ExprEndResult                /* there are no more elements in the set */
418
419(the caller will initialize it to ExprSingleResult).  If the function simply
420returns a Datum without touching ReturnSetInfo, then the call is over and a
421single-item set has been returned.  To return a set, the function must set
422isDone to ExprMultipleResult for each set element.  After all elements have
423been returned, the next call should set isDone to ExprEndResult and return a
424null result.  (Note it is possible to return an empty set by doing this on
425the first call.)
426
427Value-per-call functions MUST NOT assume that they will be run to completion;
428the executor might simply stop calling them, for example because of a LIMIT.
429Therefore, it's unsafe to attempt to perform any resource cleanup in the
430final call.  It's usually not necessary to clean up memory, anyway.  If it's
431necessary to clean up other types of resources, such as file descriptors,
432one can register a shutdown callback function in the ExprContext pointed to
433by the ReturnSetInfo node.  (But note that file descriptors are a limited
434resource, so it's generally unwise to hold those open across calls; SRFs
435that need file access are better written to do it in a single call using
436Materialize mode.)
437
438Materialize mode works like this: the function creates a Tuplestore holding
439the (possibly empty) result set, and returns it.  There are no multiple calls.
440The function must also return a TupleDesc that indicates the tuple structure.
441The Tuplestore and TupleDesc should be created in the context
442econtext->ecxt_per_query_memory (note this will *not* be the context the
443function is called in).  The function stores pointers to the Tuplestore and
444TupleDesc into ReturnSetInfo, sets returnMode to indicate materialize mode,
445and returns null.  isDone is not used and should be left at ExprSingleResult.
446
447The Tuplestore must be created with randomAccess = true if
448SFRM_Materialize_Random is set in allowedModes, but it can (and preferably
449should) be created with randomAccess = false if not.  Callers that can support
450both ValuePerCall and Materialize mode will set SFRM_Materialize_Preferred,
451or not, depending on which mode they prefer.
452
453If available, the expected tuple descriptor is passed in ReturnSetInfo;
454in other contexts the expectedDesc field will be NULL.  The function need
455not pay attention to expectedDesc, but it may be useful in special cases.
456
457There is no support for functions accepting sets; instead, the function will
458be called multiple times, once for each element of the input set.
459
460
461Notes About Function Handlers
462-----------------------------
463
464Handlers for classes of functions should find life much easier and
465cleaner in this design.  The OID of the called function is directly
466reachable from the passed parameters; we don't need the global variable
467fmgr_pl_finfo anymore.  Also, by modifying fcinfo->flinfo->fn_extra,
468the handler can cache lookup info to avoid repeat lookups when the same
469function is invoked many times.  (fn_extra can only be used as a hint,
470since callers are not required to re-use an FmgrInfo struct.
471But in performance-critical paths they normally will do so.)
472
473If the handler wants to allocate memory to hold fn_extra data, it should
474NOT do so in CurrentMemoryContext, since the current context may well be
475much shorter-lived than the context where the FmgrInfo is.  Instead,
476allocate the memory in context flinfo->fn_mcxt, or in a long-lived cache
477context.  fn_mcxt normally points at the context that was
478CurrentMemoryContext at the time the FmgrInfo structure was created;
479in any case it is required to be a context at least as long-lived as the
480FmgrInfo itself.
481
482
483Telling the Difference Between Old- and New-Style Functions
484-----------------------------------------------------------
485
486During the conversion process, we carried two different pg_language
487entries, "internal" and "newinternal", for internal functions.  The
488function manager used the language code to distinguish which calling
489convention to use.  (Old-style internal functions were supported via
490a function handler.)  As of Nov. 2000, no old-style internal functions
491remain, so we can drop support for them.  We will remove the old "internal"
492pg_language entry and rename "newinternal" to "internal".
493
494The interim solution for dynamically-loaded compiled functions has been
495similar: two pg_language entries "C" and "newC".  This naming convention
496is not desirable for the long run, and yet we cannot stop supporting
497old-style user functions.  Instead, it seems better to use just one
498pg_language entry "C", and require the dynamically-loaded library to
499provide additional information that identifies new-style functions.
500This avoids compatibility problems --- for example, existing dump
501scripts will identify PL language handlers as being in language "C",
502which would be wrong under the "newC" convention.  Also, this approach
503should generalize more conveniently for future extensions to the function
504interface specification.
505
506Given a dynamically loaded function named "foo" (note that the name being
507considered here is the link-symbol name, not the SQL-level function name),
508the function manager will look for another function in the same dynamically
509loaded library named "pg_finfo_foo".  If this second function does not
510exist, then foo is assumed to be called old-style, thus ensuring backwards
511compatibility with existing libraries.  If the info function does exist,
512it is expected to have the signature
513
514	Pg_finfo_record * pg_finfo_foo (void);
515
516The info function will be called by the fmgr, and must return a pointer
517to a Pg_finfo_record struct.  (The returned struct will typically be a
518statically allocated constant in the dynamic-link library.)  The current
519definition of the struct is just
520
521	typedef struct {
522		int	api_version;
523	} Pg_finfo_record;
524
525where api_version is 0 to indicate old-style or 1 to indicate new-style
526calling convention.  In future releases, additional fields may be defined
527after api_version, but these additional fields will only be used if
528api_version is greater than 1.
529
530These details will be hidden from the author of a dynamically loaded
531function by using a macro.  To define a new-style dynamically loaded
532function named foo, write
533
534	PG_FUNCTION_INFO_V1(foo);
535
536	Datum
537	foo(PG_FUNCTION_ARGS)
538	{
539		...
540	}
541
542The function itself is written using the same conventions as for new-style
543internal functions; you just need to add the PG_FUNCTION_INFO_V1() macro.
544Note that old-style and new-style functions can be intermixed in the same
545library, depending on whether or not you write a PG_FUNCTION_INFO_V1() for
546each one.
547
548The SQL declaration for a dynamically-loaded function is CREATE FUNCTION
549foo ... LANGUAGE C regardless of whether it is old- or new-style.
550
551New-style dynamic functions will be invoked directly by fmgr, and will
552therefore have the same performance as internal functions after the initial
553pg_proc lookup overhead.  Old-style dynamic functions will be invoked via
554a handler, and will therefore have a small performance penalty.
555
556To allow old-style dynamic functions to work safely on toastable datatypes,
557the handler for old-style functions will automatically detoast toastable
558arguments before passing them to the old-style function.  A new-style
559function is expected to take care of toasted arguments by using the
560standard argument access macros defined above.
561