1.. _rfc-28:
2
3================================================================================
4RFC 28: OGR SQL Generalized Expressions
5================================================================================
6
7Author: Frank Warmerdam
8
9Contact: warmerdam@pobox.com
10
11Status: Adopted, Implemented
12
13Summary
14-------
15
16The OGR SQL evaluation engine currently does not allow general purpose
17functions to be applied to columns in SELECT statements. Some special
18purpose functions are supported (ie. CAST, COUNT, AVG, MAX, MIN, and
19SUM), but not as part of more general expressions and generally in very
20constrained arrangements. It is the intent of this work item to extend
21the OGR SQL engine to support fairly general purpose expression
22evaluation in the output field list of OGR SQL SELECT statements and to
23implement a few preliminary processing functions in a fashion compatible
24with standard SQL. As well, expressions used in WHERE clauses will be
25generalized to support evaluation of non-logical operations, such as
26math and functions. For example, after implementation it is intended the
27following could be evaluated.
28
29::
30
31   SELECT CONCAT(first_name, ' ', last_name) AS full_name FROM customers
32   SELECT id, "Regional Road" AS roadtypename FROM roads where roadtype=3
33   SELECT (subtotal+salestax) as totalcost from invoice_info where 100 <= (subtotal+salestax)
34
35A prototype implementation is now available for review in
36`http://svn.osgeo.org/gdal/sandbox/warmerdam/gdal-rfc28 <http://svn.osgeo.org/gdal/sandbox/warmerdam/gdal-rfc28>`__
37
38Technical Approach
39------------------
40
41Currently logical expressions take a very constrained format with the
42base elements having to be of the form * <constant_value>*. As part of
43the generalization non-logical expressions will be supported and the
44left and right side of operators will be equally treated. The current
45OGR SQL parser is ad hoc and cannot be practically extended to this
46generalized form of expression. So at this point we will move to a
47yacc/bison based parser grammar for expressions.
48
49Since it is not really practical to continue to use the existing ad hoc
50SELECT parsing when parts of the SELECT statement are expressions, the
51yacc/bison based parser will also be used to parse the whole SELECT
52statement.
53
54The current expression node will be generalized to have 0-n children
55(for arguments to functions), and to treat field references and constant
56values as distinct leaf nodes rather than embedding this information in
57a node defining an operation.
58
59It should be noted that as a side effect WHERE clauses will also support
60more general expressions - not just logical comparisons. For instance:
61
62SELECT \* WHERE (subtotal+salestax) > 100.0
63
64New Functions
65-------------
66
67-  Math: +, -, \*, /, \*\*
68-  String: CONCAT, SUBSTR
69
70SELECT Rules
71------------
72
73::
74
75   SELECT <field-list> FROM <table_def>
76        [LEFT JOIN <table_def>
77         ON [<table_ref>.]<key_field> = [<table_ref>.].<key_field>]*
78        [WHERE <where-expr>]
79        [ORDER BY <sort specification list>]
80
81   <field-list> ::= <column-spec> [ { , <column-spec> }... ]
82
83   <column-spec> ::= <field-spec> [ <as clause> ]
84                    | CAST ( <field-spec> AS <data type> ) [ <as clause> ]
85
86   <field-spec> ::= [DISTINCT] <field_ref>
87                    | <cumm-field-func> ( [DISTINCT] <field-ref> )
88                    | <field-expr>
89                    | Count(*)
90
91   <field-expr> ::= <field_ref>
92                    | <constant-value>
93                    | <field-expr> <field-operator> <field-expr>
94                    | <field-func> ( <field-expr-list> )
95                    | ( <field-expr> )
96
97   <field-expr-list> ::= field-expr
98                    |  field-expr , field-expr-list
99                    |  <empty>
100
101   <as clause> ::= [ AS ] <column_name>
102
103   <data type> ::= character [ ( field_length ) ]
104                   | float [ ( field_length ) ]
105                   | numeric [ ( field_length [, field_precision ] ) ]
106                   | integer [ ( field_length ) ]
107                   | date [ ( field_length ) ]
108                   | time [ ( field_length ) ]
109                   | timestamp [ ( field_length ) ]
110
111   <cumm-field-func> ::= AVG | MAX | MIN | SUM | COUNT
112
113   <field-operator> ::= '+' | '-' | '/' | '*' | '||'
114
115   <field-func> ::= CONCAT | SUBSTR
116
117   <field_ref>  ::= [<table_ref>.]field_name
118
119Special Notes
120-------------
121
122The existing CAST, and column summary functions COUNT, AVG, MIN, MAX and
123SUM will be treated more-or-less as functions but constrained to be root
124operations on column definitions and treated as a special case (still).
125
126Compatibility Implications
127--------------------------
128
129Some identifiers that were previously allowed as unquoted field names
130will likely now have to be quoted as they will be keywords in the
131grammar. The keyword set is:
132
133-  IN
134-  LIKE
135-  NULL
136-  IS
137-  SELECT
138-  LEFT
139-  JOIN
140-  WHERE
141-  ON
142-  ORDER
143-  BY
144-  FROM
145-  AS
146-  ASC
147-  DESC
148-  DISTINCT
149-  CAST
150
151The previous implementation was written in C and avoided all use of
152GDAL/OGR services so that it could be easily used in other contexts,
153including as the where clause evaluator of the OGDI library. After this
154update the code is C++, and direct use of CPL error and other services
155has been directly incorporated. This means the implementation used by
156GDAL and OGDI will diverge.
157
158For the most part the change results in some OGR SQL statements to work
159that would previously have generated an error.
160
161Performance Implications
162------------------------
163
164I am hopeful that the evaluation speed will not be significantly
165different for simple selections, but each output field will need to be
166evaluated as an expression (with presumably one value-from-field node).
167
168Implementation Plan
169-------------------
170
171Frank Warmerdam will implement, test and document for the GDAL/OGR 1.8
172release.
173
174Testing
175-------
176
177All existing OGR SQL test suite tests should pass. A new
178autotest/ogr/ogr_sql_rfc28.py script will be introduced to test new
179functionality.
180
181Documentation
182-------------
183
184The :ref:`OGR SQL <ogr_sql_dialect>` document will be
185extended to describe the new capabilities.
186