1src/backend/utils/fmgr/README 2 3Function Manager 4================ 5 6[This file originally explained the transition from the V0 to the V1 7interface. Now it just explains some internals and rationale for the V1 8interface, while the V0 interface has been removed.] 9 10The V1 Function-Manager Interface 11--------------------------------- 12 13The core of the design is data structures for representing the result of a 14function lookup and for representing the parameters passed to a specific 15function invocation. (We want to keep function lookup separate from 16function call, since many parts of the system apply the same function over 17and over; the lookup overhead should be paid once per query, not once per 18tuple.) 19 20 21When a function is looked up in pg_proc, the result is represented as 22 23typedef struct 24{ 25 PGFunction fn_addr; /* pointer to function or handler to be called */ 26 Oid fn_oid; /* OID of function (NOT of handler, if any) */ 27 short fn_nargs; /* number of input args (0..FUNC_MAX_ARGS) */ 28 bool fn_strict; /* function is "strict" (NULL in => NULL out) */ 29 bool fn_retset; /* function returns a set (over multiple calls) */ 30 unsigned char fn_stats; /* collect stats if track_functions > this */ 31 void *fn_extra; /* extra space for use by handler */ 32 MemoryContext fn_mcxt; /* memory context to store fn_extra in */ 33 Node *fn_expr; /* expression parse tree for call, or NULL */ 34} FmgrInfo; 35 36For an ordinary built-in function, fn_addr is just the address of the C 37routine that implements the function. Otherwise it is the address of a 38handler for the class of functions that includes the target function. 39The handler can use the function OID and perhaps also the fn_extra slot 40to find the specific code to execute. (fn_oid = InvalidOid can be used 41to denote a not-yet-initialized FmgrInfo struct. fn_extra will always 42be NULL when an FmgrInfo is first filled by the function lookup code, but 43a function handler could set it to avoid making repeated lookups of its 44own when the same FmgrInfo is used repeatedly during a query.) fn_nargs 45is the number of arguments expected by the function, fn_strict is its 46strictness flag, and fn_retset shows whether it returns a set; all of 47these values come from the function's pg_proc entry. fn_stats is also 48set up to control whether or not to track runtime statistics for calling 49this function. 50 51If the function is being called as part of a SQL expression, fn_expr will 52point to the expression parse tree for the function call; this can be used 53to extract parse-time knowledge about the actual arguments. Note that this 54field really is information about the arguments rather than information 55about the function, but it's proven to be more convenient to keep it in 56FmgrInfo than in FunctionCallInfoData where it might more logically go. 57 58 59During a call of a function, the following data structure is created 60and passed to the function: 61 62typedef struct 63{ 64 FmgrInfo *flinfo; /* ptr to lookup info used for this call */ 65 Node *context; /* pass info about context of call */ 66 Node *resultinfo; /* pass or return extra info about result */ 67 Oid fncollation; /* collation for function to use */ 68 bool isnull; /* function must set true if result is NULL */ 69 short nargs; /* # arguments actually passed */ 70 Datum arg[FUNC_MAX_ARGS]; /* Arguments passed to function */ 71 bool argnull[FUNC_MAX_ARGS]; /* T if arg[i] is actually NULL */ 72} FunctionCallInfoData; 73typedef FunctionCallInfoData* FunctionCallInfo; 74 75flinfo points to the lookup info used to make the call. Ordinary functions 76will probably ignore this field, but function class handlers will need it 77to find out the OID of the specific function being called. 78 79context is NULL for an "ordinary" function call, but may point to additional 80info when the function is called in certain contexts. (For example, the 81trigger manager will pass information about the current trigger event here.) 82If context is used, it should point to some subtype of Node; the particular 83kind of context is indicated by the node type field. (A callee should 84always check the node type before assuming it knows what kind of context is 85being passed.) fmgr itself puts no other restrictions on the use of this 86field. 87 88resultinfo is NULL when calling any function from which a simple Datum 89result is expected. It may point to some subtype of Node if the function 90returns more than a Datum. (For example, resultinfo is used when calling a 91function that returns a set, as discussed below.) Like the context field, 92resultinfo is a hook for expansion; fmgr itself doesn't constrain the use 93of the field. 94 95fncollation is the input collation derived by the parser, or InvalidOid 96when there are no inputs of collatable types or they don't share a common 97collation. This is effectively a hidden additional argument, which 98collation-sensitive functions can use to determine their behavior. 99 100nargs, arg[], and argnull[] hold the arguments being passed to the function. 101Notice that all the arguments passed to a function (as well as its result 102value) will now uniformly be of type Datum. As discussed below, callers 103and callees should apply the standard Datum-to-and-from-whatever macros 104to convert to the actual argument types of a particular function. The 105value in arg[i] is unspecified when argnull[i] is true. 106 107It is generally the responsibility of the caller to ensure that the 108number of arguments passed matches what the callee is expecting; except 109for callees that take a variable number of arguments, the callee will 110typically ignore the nargs field and just grab values from arg[]. 111 112The isnull field will be initialized to "false" before the call. On 113return from the function, isnull is the null flag for the function result: 114if it is true the function's result is NULL, regardless of the actual 115function return value. Note that simple "strict" functions can ignore 116both isnull and argnull[], since they won't even get called when there 117are any TRUE values in argnull[]. 118 119FunctionCallInfo replaces FmgrValues plus a bunch of ad-hoc parameter 120conventions, global variables (fmgr_pl_finfo and CurrentTriggerData at 121least), and other uglinesses. 122 123 124Callees, whether they be individual functions or function handlers, 125shall always have this signature: 126 127Datum function (FunctionCallInfo fcinfo); 128 129which is represented by the typedef 130 131typedef Datum (*PGFunction) (FunctionCallInfo fcinfo); 132 133The function is responsible for setting fcinfo->isnull appropriately 134as well as returning a result represented as a Datum. Note that since 135all callees will now have exactly the same signature, and will be called 136through a function pointer declared with exactly that signature, we 137should have no portability or optimization problems. 138 139 140Function Coding Conventions 141--------------------------- 142 143Here are the proposed macros and coding conventions: 144 145The definition of an fmgr-callable function will always look like 146 147Datum 148function_name(PG_FUNCTION_ARGS) 149{ 150 ... 151} 152 153"PG_FUNCTION_ARGS" just expands to "FunctionCallInfo fcinfo". The main 154reason for using this macro is to make it easy for scripts to spot function 155definitions. However, if we ever decide to change the calling convention 156again, it might come in handy to have this macro in place. 157 158A nonstrict function is responsible for checking whether each individual 159argument is null or not, which it can do with PG_ARGISNULL(n) (which is 160just "fcinfo->argnull[n]"). It should avoid trying to fetch the value 161of any argument that is null. 162 163Both strict and nonstrict functions can return NULL, if needed, with 164 PG_RETURN_NULL(); 165which expands to 166 { fcinfo->isnull = true; return (Datum) 0; } 167 168Argument values are ordinarily fetched using code like 169 int32 name = PG_GETARG_INT32(number); 170 171For float4, float8, and int8, the PG_GETARG macros will hide whether the 172types are pass-by-value or pass-by-reference. For example, if float8 is 173pass-by-reference then PG_GETARG_FLOAT8 expands to 174 (* (float8 *) DatumGetPointer(fcinfo->arg[number])) 175and would typically be called like this: 176 float8 arg = PG_GETARG_FLOAT8(0); 177For what are now historical reasons, the float-related typedefs and macros 178express the type width in bytes (4 or 8), whereas we prefer to label the 179widths of integer types in bits. 180 181Non-null values are returned with a PG_RETURN_XXX macro of the appropriate 182type. For example, PG_RETURN_INT32 expands to 183 return Int32GetDatum(x) 184PG_RETURN_FLOAT4, PG_RETURN_FLOAT8, and PG_RETURN_INT64 hide whether their 185data types are pass-by-value or pass-by-reference, by doing a palloc if 186needed. 187 188fmgr.h will provide PG_GETARG and PG_RETURN macros for all the basic data 189types. Modules or header files that define specialized SQL datatypes 190(eg, timestamp) should define appropriate macros for those types, so that 191functions manipulating the types can be coded in the standard style. 192 193For non-primitive data types (particularly variable-length types) it won't 194be very practical to hide the pass-by-reference nature of the data type, 195so the PG_GETARG and PG_RETURN macros for those types won't do much more 196than DatumGetPointer/PointerGetDatum plus the appropriate typecast (but see 197TOAST discussion, below). Functions returning such types will need to 198palloc() their result space explicitly. I recommend naming the GETARG and 199RETURN macros for such types to end in "_P", as a reminder that they 200produce or take a pointer. For example, PG_GETARG_TEXT_P yields "text *". 201 202When a function needs to access fcinfo->flinfo or one of the other auxiliary 203fields of FunctionCallInfo, it should just do it. I doubt that providing 204syntactic-sugar macros for these cases is useful. 205 206 207Support for TOAST-Able Data Types 208--------------------------------- 209 210For TOAST-able data types, the PG_GETARG macro will deliver a de-TOASTed 211data value. There might be a few cases where the still-toasted value is 212wanted, but the vast majority of cases want the de-toasted result, so 213that will be the default. To get the argument value without causing 214de-toasting, use PG_GETARG_RAW_VARLENA_P(n). 215 216Some functions require a modifiable copy of their input values. In these 217cases, it's silly to do an extra copy step if we copied the data anyway 218to de-TOAST it. Therefore, each toastable datatype has an additional 219fetch macro, for example PG_GETARG_TEXT_P_COPY(n), which delivers a 220guaranteed-fresh copy, combining this with the detoasting step if possible. 221 222There is also a PG_FREE_IF_COPY(ptr,n) macro, which pfree's the given 223pointer if and only if it is different from the original value of the n'th 224argument. This can be used to free the de-toasted value of the n'th 225argument, if it was actually de-toasted. Currently, doing this is not 226necessary for the majority of functions because the core backend code 227releases temporary space periodically, so that memory leaked in function 228execution isn't a big problem. However, as of 7.1 memory leaks in 229functions that are called by index searches will not be cleaned up until 230end of transaction. Therefore, functions that are listed in pg_amop or 231pg_amproc should be careful not to leak detoasted copies, and so these 232functions do need to use PG_FREE_IF_COPY() for toastable inputs. 233 234A function should never try to re-TOAST its result value; it should just 235deliver an untoasted result that's been palloc'd in the current memory 236context. When and if the value is actually stored into a tuple, the 237tuple toaster will decide whether toasting is needed. 238 239 240Functions Accepting or Returning Sets 241------------------------------------- 242 243If a function is marked in pg_proc as returning a set, then it is called 244with fcinfo->resultinfo pointing to a node of type ReturnSetInfo. A 245function that desires to return a set should raise an error "called in 246context that does not accept a set result" if resultinfo is NULL or does 247not point to a ReturnSetInfo node. 248 249There are currently two modes in which a function can return a set result: 250value-per-call, or materialize. In value-per-call mode, the function returns 251one value each time it is called, and finally reports "done" when it has no 252more values to return. In materialize mode, the function's output set is 253instantiated in a Tuplestore object; all the values are returned in one call. 254Additional modes might be added in future. 255 256ReturnSetInfo contains a field "allowedModes" which is set (by the caller) 257to a bitmask that's the OR of the modes the caller can support. The actual 258mode used by the function is returned in another field "returnMode". For 259backwards-compatibility reasons, returnMode is initialized to value-per-call 260and need only be changed if the function wants to use a different mode. 261The function should ereport() if it cannot use any of the modes the caller is 262willing to support. 263 264Value-per-call mode works like this: ReturnSetInfo contains a field 265"isDone", which should be set to one of these values: 266 267 ExprSingleResult /* expression does not return a set */ 268 ExprMultipleResult /* this result is an element of a set */ 269 ExprEndResult /* there are no more elements in the set */ 270 271(the caller will initialize it to ExprSingleResult). If the function simply 272returns a Datum without touching ReturnSetInfo, then the call is over and a 273single-item set has been returned. To return a set, the function must set 274isDone to ExprMultipleResult for each set element. After all elements have 275been returned, the next call should set isDone to ExprEndResult and return a 276null result. (Note it is possible to return an empty set by doing this on 277the first call.) 278 279Value-per-call functions MUST NOT assume that they will be run to completion; 280the executor might simply stop calling them, for example because of a LIMIT. 281Therefore, it's unsafe to attempt to perform any resource cleanup in the 282final call. It's usually not necessary to clean up memory, anyway. If it's 283necessary to clean up other types of resources, such as file descriptors, 284one can register a shutdown callback function in the ExprContext pointed to 285by the ReturnSetInfo node. (But note that file descriptors are a limited 286resource, so it's generally unwise to hold those open across calls; SRFs 287that need file access are better written to do it in a single call using 288Materialize mode.) 289 290Materialize mode works like this: the function creates a Tuplestore holding 291the (possibly empty) result set, and returns it. There are no multiple calls. 292The function must also return a TupleDesc that indicates the tuple structure. 293The Tuplestore and TupleDesc should be created in the context 294econtext->ecxt_per_query_memory (note this will *not* be the context the 295function is called in). The function stores pointers to the Tuplestore and 296TupleDesc into ReturnSetInfo, sets returnMode to indicate materialize mode, 297and returns null. isDone is not used and should be left at ExprSingleResult. 298 299The Tuplestore must be created with randomAccess = true if 300SFRM_Materialize_Random is set in allowedModes, but it can (and preferably 301should) be created with randomAccess = false if not. Callers that can support 302both ValuePerCall and Materialize mode will set SFRM_Materialize_Preferred, 303or not, depending on which mode they prefer. 304 305If available, the expected tuple descriptor is passed in ReturnSetInfo; 306in other contexts the expectedDesc field will be NULL. The function need 307not pay attention to expectedDesc, but it may be useful in special cases. 308 309There is no support for functions accepting sets; instead, the function will 310be called multiple times, once for each element of the input set. 311 312 313Notes About Function Handlers 314----------------------------- 315 316Handlers for classes of functions should find life much easier and 317cleaner in this design. The OID of the called function is directly 318reachable from the passed parameters; we don't need the global variable 319fmgr_pl_finfo anymore. Also, by modifying fcinfo->flinfo->fn_extra, 320the handler can cache lookup info to avoid repeat lookups when the same 321function is invoked many times. (fn_extra can only be used as a hint, 322since callers are not required to re-use an FmgrInfo struct. 323But in performance-critical paths they normally will do so.) 324 325If the handler wants to allocate memory to hold fn_extra data, it should 326NOT do so in CurrentMemoryContext, since the current context may well be 327much shorter-lived than the context where the FmgrInfo is. Instead, 328allocate the memory in context flinfo->fn_mcxt, or in a long-lived cache 329context. fn_mcxt normally points at the context that was 330CurrentMemoryContext at the time the FmgrInfo structure was created; 331in any case it is required to be a context at least as long-lived as the 332FmgrInfo itself. 333