1 #ifndef AGGREGATE_CHECK_INCLUDED
2 #define AGGREGATE_CHECK_INCLUDED
3 
4 /* Copyright (c) 2014, 2021, Oracle and/or its affiliates.
5 
6    This program is free software; you can redistribute it and/or modify
7    it under the terms of the GNU General Public License, version 2.0,
8    as published by the Free Software Foundation.
9 
10    This program is also distributed with certain software (including
11    but not limited to OpenSSL) that is licensed under separate terms,
12    as designated in a particular file or component or in included license
13    documentation.  The authors of MySQL hereby grant you an additional
14    permission to link the program and your derivative works with the
15    separately licensed software that they have included with MySQL.
16 
17    This program is distributed in the hope that it will be useful,
18    but WITHOUT ANY WARRANTY; without even the implied warranty of
19    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
20    GNU General Public License, version 2.0, for more details.
21 
22    You should have received a copy of the GNU General Public License
23    along with this program; if not, write to the Free Software
24    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA */
25 
26 /**
27    @file
28    Checks for some semantic constraints on queries using GROUP
29    BY, or aggregate functions, or DISTINCT. Enforced if
30    sql_mode contains 'only_full_group_by'.
31 */
32 
33 /**
34   @defgroup AGGREGATE_CHECKS Aggregate checks of ONLY_FULL_GROUP_BY
35 
36 Checks for some semantic constraints on queries using GROUP BY, or aggregate functions, or DISTINCT.
37 
38 We call "aggregation" the operation of taking a group of rows and replacing
39 it with a single row. There are three types of aggregation: DISTINCT,
40 implicit grouping, explicit grouping.
41 
42 This text describes MySQL's checks (why certain queries are rejected) and the
43 rationale behind them.
44 
45 References:
46 - WL#2489 "better only_full_group_by".
47 - if you have access to the SQL standard, we recommend the following parts of
48 "SQL2011 Foundation": query expression Syntax rule 28; column reference Syntax
49 rule 7 and Conformance rule 2; 4.19 functional dependencies.
50 
51 @section DISTINCT
52 
53 DISTINCT: all identical rows in the result of SELECT are "aggregated" to
54 one single row - duplicates are eliminated.
55 If the result of the SELECT without DISTINCT is
56 @verbatim
57 1 2
58 3 4
59 1 2
60 @endverbatim
61 then the result with DISTINCT is
62 @verbatim
63 1 2
64 3 4
65 @endverbatim
66 Here is a problematic query. Assume we have a table T which contains three
67 columns C1,C2,C3 and has the following rows:
68 @verbatim
69 C1 C2 C3
70 1  2  A
71 3  4  B
72 1  2  C
73 @endverbatim
74 and we do "SELECT DISTINCT C1, C2 FROM T ORDER BY C3".
75 Because we want the result to be ordered, we have to first eliminate
76 duplicates then order the result.
77 When we eliminate duplicates, should we keep the first or the third row? This
78 arbitrary choice will influence the retained value of C3, which will influence
79 ordering. In the end, we have arbitrary ordering, which is problematic.
80 To prevent this, if, in a query block 'sl', an ORDER BY expression
81 is not the same expression as one in the SELECT list of 'sl'
82 and contains a column which:
83 - is of a table whose qualifying query block is 'sl'
84 - and is not in the SELECT list of 'sl'
85 
86 then 'sl' should not have DISTINCT.
87 This rule makes the query above invalid.
88 Class Check_distinct implements this rule.
89 
90 Class Check_distinct implements a second rule, specific of set functions in
91 ORDER BY (a non-standard feature).
92 Consider these queries labelled (1), (2), (3), with DISTINCT and a set
93 function in ORDER BY:
94 @verbatim
95 SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
96 SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C2);
97 SELECT DISTINCT C1, MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3);
98 @endverbatim
99 (1) is a random-order query, (2) and (3) are not.
100 MySQL has traditionally been permissive, we want to allow (2) and (3).
101 
102 So the second rule is that Check_distinct rejects a query if it has DISTINCT
103 and a set function in ORDER BY which is not the same expression as one in the
104 SELECT list. This rejects (1) and allows (2).
105 It would reject (3); luckily, before Check_distinct checks, DISTINCT is
106 removed (it is redundant with GROUP BY) and thus the query is not processed by
107 Check_distinct; the second rule is thus by-passed and (3) is correctly
108 accepted.
109 
110 The implementation of Check_distinct works like this: if the query has
111 DISTINCT, examine each element of the ORDER BY list: if the element is not the
112 same expression as one in the SELECT list, then examine parts of the element
113 (using Item::walk()), to spot offending set functions or columns.
114 
115 @section IMPLICIT_GROUPING Implicit grouping
116 
117 A query with set functions without GROUP BY can be seen as having GROUP BY()
118 i.e. the set of grouping expressions is empty, all rows are part of a
119 single group and are replaced with a single row. So it is just a sub-case of
120 the next section.
121 
122 @section EXPLICIT_GROUPING Explicit grouping (GROUP BY)
123 
124 MySQL is more permissive than the standard, it allows to group by any
125 expression; it does not require that every element of the GROUP BY clause
126 should be a column of one table of the FROM clause.
127 
128 Here is a problematic query, using a table T with columns C1, C2, C3:
129 @verbatim
130 C1 C2 C3
131 1  2  A
132 3  4  B
133 1  2  C
134 
135 SELECT C1, C3 FROM T GROUP BY C1;
136 @endverbatim
137 we first form groups, in each group the value of C1 must be the same for all
138 rows. Consider the group made of the first and third row. We have to produce
139 one row out of it, and this row must have a value for C3 because C3 is in the
140 SELECT list. Among those two rows, which value of C3 should we choose? This is
141 arbitrary, and will give a random result.
142 
143 To prevent this, in a query with GROUP BY or aggregates (known as "a grouped
144 query"), any column referenced by an expression in the SELECT list or HAVING
145 condition and belonging to one table of the FROM clause, should be one of the
146 group columns (enforced by only_full_group_by in MySQL 5.6 and older) or, if
147 functional dependencies are supported (as in MySQL 5.7), should be
148 functionally dependent on the group columns.
149 In the table T above, C3 is obviously not functionally dependent on {C1,C2}:
150 the values of C1 and C2 for a row do not uniquely determine the value of
151 C3. In other words, if two rows have the same value in C1 and the same value
152 in C2 they do not necessarily have the same value in C3.
153 So this rule correctly rejects the query above.
154 
155 Note that NULL is treated as one value: two NULLs are considered equal.
156 
157 In WL#2489, we have implemented the optional standard feature T301
158 "functional dependencies" almost entirely.
159 
160 Here are the functional dependencies which we recognize.
161 
162 @subsection KEYFD Key-based, in a base table
163 
164 A key in this text is a unique constraint made of only non-NULLable
165 columns. For example, a primary key.
166 Considering a base table T, if two rows have the same values of all columns of
167 a key of T they are actually one single row, so:
168 { all columns of key} -> { T.* } (notation: what is on the right of the arrow
169 is functionally dependent on what is on the left).
170 
171 @subsection GCOLFD Generated-column-based, in a base table
172 
173 Considering a base table T, a generated column is functionally dependent on
174 the set of columns it references (the so-called parametric
175 columns). Note that the SQL standard doesn't allow a parametric column
176 to be a generated column, but MySQL does.
177 
178 @subsection INNEREQ Equality-based, in the result of an inner join
179 
180 Considering two tables T1 and T2, a condition C, and the result R of an inner
181 join T1 JOIN T2 ON C.
182 Note that T1/T2 are not necessarily base tables. For example, they can also be
183 views, or derived tables, or the results of some joins; in the rest of text,
184 we use the vague term "table" for those various objects.
185 
186 Note that C may only refer to columns of T1 or T2 (outer references
187 are forbidden by MySQL in join conditions).
188 
189 Assuming that C is a conjunction (i.e. is made of one or more conditions,
190                                   "conjuncts", chained together with AND):
191 - If one conjunct is of the form T1.A=constant, then {} -> {A} holds in R (the
192 value of A is "the constant" in all rows of R).
193 - If one conjunct is of the form T1.A=T2.B, then {T1.A} -> {T2.B} (and vice
194 versa) holds in R (the value of T2.B is that of T1.A in all rows of R).
195 
196 @subsection OUTEREQ Equality-based, in the result of an outer join
197 
198 Considering the result R of TS LEFT JOIN TW ON C.
199 Assuming that C is a conjunction. TS is said to be the strong table, TW is
200 said to be the weak table (the one which might be NULL-complemented in the
201 result of this join).
202 To make this really clear, note that, if we have
203 "t1 left join (t2 left join t3 on C) on D":
204 - in the t2-t3 join, t2 is strong and t3 is weak.
205 - in the t1-(t2-t3) join, t1 is strong, t2 is weak, t3 is weak.
206 
207 If C is deterministic and one conjunct is of the form TW.A=constant or
208 TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
209 of TS referenced by C.
210 Proof: consider in R two rows r1 and r2 which have the same values of DJS
211 columns. Consider r1. There are two possibilities when r1 was built from a row
212 of TS:
213 - no row in TW matched the row of TS (for no row of TW has C been true): so,
214 r1 is NULL-complemented for the columns of TW. Given that r2 has the same
215 values of DJS columns as r1, and given that C is deterministic, it is sure
216 that no row in TW matched when r2 was built. So r2 is also NULL-complemented
217 for the columns of TW. So r1 and r2 have the same value of TW.A (NULL).
218 - At least one row in TW matched: so, r1 contains real values from TW (not
219 NULL-complemented), matching C, and thus TW.A in r1 is equal to the constant
220 or to TS.B. Following the same reasoning as above, it is sure that it is also
221 the case for r2.
222 - In conclusion, we can see that r1 and r2 always have the same value of
223 TW.A.
224 
225 If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in R
226 Proof: consider r1 and r2 two rows in R having the same value of TW.A. Two
227 cases are possible:
228 - this value is NULL. Then both rows are NULL-complemented (if not, the
229 value of TW.A in TW would be NULL, which cannot match in an equality, so C is
230 not true, which is absurd). Thus, in r1 and r2 TW.B is NULL.
231 - This value is not NULL. Then both rows are not NULL-complemented, C
232 matched for both, so TW.B in r1/r2 is equal to TW.A in r1/r2.
233 - In conclusion, r1 and r2 have the same value of TW.B.
234 
235 @subsection WHEREEQ Equality-based, in the result of a WHERE clause
236 
237 Same rule as the result of an inner join.
238 Additionally, the rule is extended to T1.A=outer_reference, because an outer
239 reference is a constant during one execution of this query.
240 
241 Below we examine how functional dependencies in a table propagate to its
242 containing join nest.
243 
244 @subsection PROPAGOUTER Considering the result R of TS LEFT JOIN TW ON C.
245 
246 All functional dependencies in TS are also functional dependencies in R.
247 Proof: trivial.
248 The same is not necessarily true for TW.
249 Let's define a "NULL-friendly functional dependency" (NFFD) as a functional
250 dependency between two sets A and B, which has two properties:
251 - A is not empty
252 - if, in a row, all values of A are NULL, then all values of B are NULL.
253 
254 All NFFDs in TW are also NFFDs in R.
255 Proof: consider an NFFD A -> B in TW, and r1 and r2 two rows in R having the
256 same values of A columns. Two cases are possible:
257 - In r1 and r2, at least one value of A is not NULL. Then r1 is not
258 NULL-complemented. Its values for A and B come from TW. By application of the
259 functional dependency in TW, because values in A are equal in TW, values in B
260 are equal in TW and thus in r1/r2.
261 - In r1 and r2, all values of A are NULL. Two cases are possible:
262 i) r1 is not NULL-complemented.  Its values for A and B come from TW. In the
263 row of TW values of A are all NULL. Because the functional dependency in
264 NULL-friendly, all values of B are NULL in the row of TW and thus in r1.
265 ii) r1 is NULL-complemented. Then all values of B in r1 are NULL.
266 iii) In conclusion, all values of B in r1 are NULL. The same reasoning applies
267 to r2. So, values of B are identical (NULL) in r1 and r2.
268 - In conclusion, values of B are identical in r1/r2, we have proved that
269 this is a functional dependency in R, and a NULL-friendly one.
270 
271 The concept of an NFFD is Guilhem's invention. It was felt it was necessary, to
272 have easy propagation of FDs from TW to R. It was preferred to the
273 alternative, simpler rule which says that a functional dependency A-> B in TW
274 is also a functional dependency in R if A contains a non-nullable
275 column. There are two points to note:
276 - the functional dependency of the simpler rule is an NFFD, so our rule is not
277 more restrictive than the simpler one
278 - this simpler rule prevents free propagation of functional dependencies
279 through join nests, which complicates implementation and leads to rejecting
280 queries which could be accepted. An example follows:
281 @verbatim
282 SELECT T3.A
283 FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON  TRUE) ON  TRUE
284 GROUP BY T3.PK;
285 @endverbatim
286 This is what the simpler rule says for this query:
287 * In T3, T3.PK->T3.A holds.
288 * Let R1 be the result of "(T2 LEFT JOIN T3 ON TRUE)", in R1 T3.PK->T3.A
289 holds, by application of: there is a functional dependency in the
290 weak side T3, and T3.PK is not nullable in T3.
291 * Let R2 be the result of "T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE",
292 in R2 T3.PK->T3.A doesn't hold anymore, because: it's a dependency in the
293 weak side (weak side is R1 here), and T3.PK is nullable _when
294 seen as a column of R1_ (in R1 T3.PK can be NULL, if the row of T3
295 is actually a NULL-complemented one).
296 
297 @subsection PROPAGINNER Considering the result R of T1 JOIN T2 ON C.
298 
299 All [NULL-friendly] functional dependencies in T1 are also [NULL-friendly]
300 functional dependencies in R. the same is true for T2.
301 Proof: trivial.
302 
303 @subsection PROPAGSUMMARY Summary rules for propagating FDs
304 
305 All NULL-friendly functional dependencies propagate freely through join nests
306 all the way up to the result of the WHERE clause.
307 The same is true for ordinary functional dependencies except if there are weak
308 tables along the propagation path between the table where the dependency holds
309 and the result of the WHERE clause; in other words, except if the table where
310 the dependency holds belongs to some embedding join nest which is on the weak
311 side of an outer join.
312 
313 @subsection NFFDS Which functional dependencies are NULL-friendly
314 
315 A key-based functional dependency A -> B in the base table is NULL-friendly,
316 because, by definition, there can never be a NULL value in any column of A.
317 
318 A functional dependency A -> B in a base table, between parametric columns A
319 and a generated column B, is not NULL-friendly; for more details, see
320 @ref FDVIEW .
321 
322 A functional dependency A->B in the result of T1 JOIN T2 ON C, if based on
323 equality of two columns, is NULL-friendly. Indeed, A is not empty and if there
324 was some NULL in A, it would not match the equality in C and thus it would not
325 exist in the result, absurd. If the equality is rather column=constant, A is
326 empty, the dependency is not NULL-friendly. However, in our implementation,
327 function @c simplify_joins() moves inner-join conditions to the embedding
328 outer-join nest's join condition, or to the WHERE clause. Because our analysis
329 of functional dependencies happens after simplify_joins(), when we analyze T1
330 JOIN T2 it is guaranteed to have no condition, and this paragraph is
331 irrelevant.
332 
333 A functional dependency in the result of TS LEFT JOIN TW ON C, if based on
334 equality of two columns, is NULL-friendly.
335 Proof: let's consider, in turn, the two types of equality-based functional
336 dependencies which exist in this result R. Let r1 be a row of R.
337 - If C is deterministic and one conjunct is of the form TW.A=constant or
338 TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns
339 of TS referenced by C. For NULL-friendliness, we need DJS to not be
340 empty. Thus, we exclude the form TW.A= constant and consider only
341 TW.A=TS.B. We suppose that in r1 DJS contains all NULLs. Conjunct is TW.A=TS.B
342 then this equality is not true, so r1 is NULL-complemented: TW.A is NULL in
343 r1.
344 - If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in
345 R. If in r1 TW.A is NULL, again the equality in C is not true, and TW.B is
346 NULL in R1.
347 - In conclusion, this is NULL-friendly.
348 
349 A functional dependency in the result of a WHERE clause, if based on equality
350 of two columns, is NULL-friendly. If based on T1.A=constant, it is not, as it
351 has an empty set of source columns.
352 
353 Summary: all functional dependencies which we have seen so far are
354 NULL-friendly, except those inferred from TW.A=constant in an outer join
355 condition or in a WHERE clause, and those about generated columns.
356 
357 Thus, in the query with T1-T2-T3 previously seen, T3.PK->T3.A is
358 NULL-friendly and propagates, query is accepted.
359 
360 In our implementation, we take special care of TW.A=constant in an outer join
361 condition: we infer a functional dependency DJS->TW.A from such equality only
362 if one of these requirements are met:
363 - the join nest "TS LEFT JOIN TW ON TW.A=constant [AND...]" is not on the
364 weak side of any embedding join nest - in that case, propagation will not meet
365 any weak tables so we do not need the dependency to be NULL-friendly, it will
366 propagate anyway.
367 - DJS contains at least one column from a strong-side table which, if NULL,
368 makes the join condition not evaluate to TRUE - in that case, DJS->TW.A is
369 NULL-friendly.
370 
371 Either of these two conditions is satisfied in most practical cases. For
372 example, it's very common to have an equality between a strong-side column and
373 a weak-side column as a conjunct in the outer join condition (like, "ON
374 strong.pk = weak.foreign_key AND ..." or "ON strong.foreign_key = weak.pk AND
375 ..."); this satisfies the second condition. It's also common to have outer
376 joins only left-deep ("SELECT ... T1 LEFT JOIN T2 ON ... LEFT JOIN T3 ON ..."
377 is left-deep); this satisfies the first condition.
378 Note that the dependency found from TW.A=TS.B in an outer join condition
379 always satisfies the second condition.
380 
381 T1.A=constant in a WHERE clause is exterior to any join nest so does not need
382 to propagate, so does not need to be NULL-friendly.
383 
384 @subsection FDVIEW Functional dependencies in a view or a derived table
385 
386 In the rest of this text, we will use the term "view" for "view or derived
387 table". A view can be merged or materialized, in MySQL.
388 Consider a view V defined by a query expression.
389 If the query expression contains UNION or ROLLUP (which is based on UNION)
390 there are no functional dependencies in this view.
391 So let's assume that the query expression is a query specification (let's note
392 it VS):
393 @verbatim
394 CREATE VIEW V AS SELECT [DISTINCT] VE1 AS C1, VE2 AS C2, ... FROM ... WHERE ... [GROUP BY ...] [HAVING ...] [ORDER BY ...]
395 @endverbatim
396 
397 If {VE1, VE2, VE3} are columns of tables of the FROM clause, and {VE1,
398 VE2} -> {VE3} has been deduced from rules in the previous sections [and is
399 NULL-friendly], then {C1, C2} -> { C3 } holds in the view [and is
400 NULL-friendly].
401 
402 If {VE1, VE2} are columns of tables of the FROM clause, and VE3 is a
403 deterministic expression depending only on VE1 and VE2, then {C1, C2} ->
404 { C3 } in the view.
405 It is not always NULL-friendly, for example: VE3 could be COALESCE(VE1,VE2,3):
406 if VE1 (C1) and VE2 (C2) are NULL, VE3 (C3) is 3: not NULL. Another example:
407 VE3 could be a literal; {}->{C3}, the left set is empty.
408 The same examples apply to a generated column in a base table - it is like a
409 merged view's expression. For example, in a base table T which has a generated
410 column C3 AS COALESCE(C1,C2,3): {C1, C2} -> { C3 } holds in T but is not
411 NULL-friendly.
412 
413 If VS is a grouped query (which, in MySQL, implies that the view is
414 materialized), then in the result of the grouping there is a functional
415 dependency from the set of all group expressions to the set of all selected
416 expressions (otherwise, this query expression would not have passed its own
417 only_full_group_by validation - in the implementation we validate each query
418 expression separately). Thus, if all group expressions of VS are in the select
419 list of VS, for example they are VE1 and VE2, then {C1, C2} -> {V.*}.
420 It is not NULL-friendly, for example: VE3 is COUNT(1): if the result of the
421 WHERE clause contains a row with group expressions VE1 and VE2 equal to NULL,
422 VE3 is not NULL.
423 
424 It's possible to infer functional dependencies from equality conditions in
425 HAVING, but we have not implemented it yet.
426 
427 Because some functional dependencies above are not NULL-friendly, they
428 exist in the view, but may not freely propagate in the result of join nests
429 containing the view. This includes examples just given in paragraphs above,
430 and the case of T1.A=constant in the WHERE clause of VS.
431 
432 Thus, when we know of a functional dependency A -> B in the query expression
433 of a view, we deduce from it a functional dependency in the view only if:
434 - this view is not on the weak side of any embedding join nest (so
435 NULL-friendliness is not required for propagation).
436 - or A contains at least one non-nullable expression, which makes A -> B
437 NULL-friendly.
438 
439 The above is fine for materialized views. For merged views, we cannot study the
440 query expression of the view, it has been merged (and scattered), so we use a
441 different rule:
442 - a merged view is similar to a join nest inserted in the parent query, so
443 for dependencies based on keys or join conditions, we simply follow
444 propagation rules of the non-view sections.
445 - For expression-based dependencies (VE3 depends on VE1 and VE2, VE3
446 belonging to the view SELECT list), which may not be NULL-friendly, we require
447 - the same non-weak-side criterion as above
448 - or that the left set of the dependency be non-empty and that if VE1 and
449 VE2 are NULL then VE3 must be NULL, which makes the dependency NULL-friendly.
450 - The same solution is used for generated columns in a base table.
451 
452   @{
453 
454 */
455 
456 #include "my_global.h"
457 #include "mem_root_array.h"
458 #include "opt_trace.h"
459 #include "item_cmpfunc.h"
460 #include "item_sum.h"
461 struct st_mem_root;
462 class st_select_lex;
463 struct TABLE_LIST;
464 
465 /**
466    Re-usable shortcut, when it does not make sense to do copy objects of a
467    class named "myclass"; add this to a private section of the class. The
468    implementations are intentionally not created, so if someone tries to use
469    them like in "myclass A= B" there will be a linker error.
470 */
471 #define FORBID_COPY_CTOR_AND_ASSIGN_OP(myclass) \
472   myclass(myclass const &);                     \
473   void operator=(myclass const &)
474 
475 
476 /**
477    Utility mixin class to be able to walk() only parts of item trees.
478 
479    Used with WALK_PREFIX+POSTFIX walk: in the prefix call of the Item
480    processor, we process the item X, may decide that its children should not
481    be processed (just like if they didn't exist): processor calls stop_at(X)
482    for that. Then walk() goes to a child Y; the processor tests is_stopped(Y)
483    which returns true, so processor sees that it must not do any processing
484    and returns immediately. Finally, the postfix call to the processor on X
485    tests is_stopped(X) which returns "true" and understands that the
486    not-to-be-processed children have been skipped so calls restart(). Thus,
487    any sibling of X, any part of the Item tree not under X, can then be
488    processed.
489 */
490 class Item_tree_walker
491 {
492 protected:
Item_tree_walker()493   Item_tree_walker() : stopped_at_item(NULL) {}
~Item_tree_walker()494   ~Item_tree_walker() { assert(!stopped_at_item); }
495 
496   /// Stops walking children of this item
stop_at(const Item * i)497   void stop_at(const Item *i)
498   { assert(!stopped_at_item); stopped_at_item= i; }
499 
500   /**
501      @returns if we are stopped. If item 'i' is where we stopped, restarts the
502      walk for next items.
503   */
is_stopped(const Item * i)504   bool is_stopped(const Item *i)
505   {
506     if (stopped_at_item)
507     {
508       /*
509         Walking was disabled for a tree part rooted a one ancestor of 'i' or
510         rooted at 'i'.
511       */
512       if (stopped_at_item == i)
513       {
514         /*
515           Walking was disabled for the tree part rooted at 'i'; we have now just
516           returned back to this root (WALK_POSTFIX call), left the tree part:
517           enable the walk again, for other tree parts.
518         */
519         stopped_at_item= NULL;
520       }
521       // No further processing to do for this item:
522       return true;
523     }
524     return false;
525   }
526 
527 private:
528   const Item *stopped_at_item;
529   FORBID_COPY_CTOR_AND_ASSIGN_OP(Item_tree_walker);
530 };
531 
532 
533 /**
534    Checks for queries which have DISTINCT.
535 */
536 class Distinct_check: public Item_tree_walker, public Sql_alloc
537 {
538 public:
539 
Distinct_check(st_select_lex * select_arg)540   Distinct_check(st_select_lex *select_arg)
541     : select(select_arg), failed_ident(NULL)
542   {}
543 
544   bool check_query(THD *thd);
545 
546 private:
547 
548   /// Query block which we are validating
549   st_select_lex *const select;
550   /// Identifier which triggered an error
551   Item_ident *failed_ident;
552 
553   /**
554      Just because we need to go through Item::walk() to reach all items to
555      validate, some work must be delegated to "Item processors" (members of
556      Item); this work conceptually belongs to Distinct_check, and needs
557      privileged access to it.
558   */
559   friend bool Item_sum::aggregate_check_distinct(uchar *arg);
560   friend bool Item_ident::aggregate_check_distinct(uchar *arg);
561   friend bool Item_func_any_value::aggregate_check_distinct(uchar *arg);
562 
563   FORBID_COPY_CTOR_AND_ASSIGN_OP(Distinct_check);
564 };
565 
566 
567 /**
568    Checks for queries which have GROUP BY or aggregate functions.
569 */
570 class Group_check: public Item_tree_walker, public Sql_alloc
571 {
572 public:
573 
Group_check(st_select_lex * select_arg,st_mem_root * root)574   Group_check(st_select_lex *select_arg, st_mem_root *root)
575     : select(select_arg), search_in_underlying(false),
576     non_null_in_source(false),
577     table(NULL), group_in_fd(~0ULL), m_root(root), fd(root),
578     whole_tables_fd(0), mat_tables(root), failed_ident(NULL)
579     {}
580 
~Group_check()581   ~Group_check()
582   {
583     for (uint j= 0; j < mat_tables.size(); ++j)
584       delete mat_tables.at(j);
585   }
586 
587   bool check_query(THD *thd);
588   void to_opt_trace(THD *thd);
589 
590 private:
591 
592   /// Query block which we are validating
593   st_select_lex *const select;
594 
595   /**
596      "Underlying" == expressions which are underlying in an identifier.
597      The identifier can be a column of a view or derived table, both merged or
598      materialized, or a generated column: all of those have an underlying
599      expression.
600      "Materialized table (mat table)" == view or derived table, materialized.
601      If this is true, is_in_fd() will look for FDs in underlying expressions
602      of columns.
603   */
604   bool search_in_underlying;
605 
606   /**
607      This member is readable only if this is a child Group_check. Is true if
608      one expression of the SELECT list of this->select is non-nullable.
609   */
610   bool non_null_in_source;
611 
612   /**
613      The Group_check employed to validate one query block, the one on which
614      check_query() runs, is named the "master".
615      If the query block references a materialized table, the master may create
616      a child Group_check, whose job is to discover FDs in the query expression
617      of the mat table (with the ultimate goal of deducing from them some FDs
618      in the mat table and thus in the parent Group_check). A child may have
619      children itself. If this Group_check is a child, 'table' points to the
620      materialized table, otherwise it is NULL.
621   */
622   TABLE_LIST *const table;
623 
624   /**
625      Bit N is set if the N-th expression of GROUP BY is functionally dependent
626      on source columns.
627      It serves to find FDs in the query expression of a materialized table
628      having GROUP BY.
629      For a non-child Group-check, all bits are turned on from the start.
630      Currently limited to 64 bits => max 64 GROUP expressions; should probably
631      be MY_BITMAP.
632   */
633   ulonglong group_in_fd;
634 
635   /// Memory for allocations (like of 'fd')
636   st_mem_root *const m_root;
637 
638   /**
639      Columns which are local to 'select' and functionally dependent on an
640      initial set of "source" columns defined like this:
641      - if !is_child(), the GROUP BY columns
642      - if is_child(), columns of the result of the query expression under
643      'table' which are themselves part of 'fd' of the parent Group_check.
644   */
645   Mem_root_array<Item_ident *, true> fd;
646   /// Map of tables for which all columns can be considered part of 'fd'.
647   table_map whole_tables_fd;
648   /// Children Group_checks of 'this'
649   Mem_root_array<Group_check *, true> mat_tables;
650   /// Identifier which triggered an error
651   Item_ident *failed_ident;
652 
653   bool is_fd_on_source(Item *item);
is_child()654   bool is_child() const { return table != NULL; }
655 
656   /// Private ctor, for a Group_check to build a child Group_check
Group_check(st_select_lex * select_arg,st_mem_root * root,TABLE_LIST * table_arg)657   Group_check(st_select_lex *select_arg, st_mem_root *root,
658               TABLE_LIST *table_arg)
659     : select(select_arg), search_in_underlying(false),
660     non_null_in_source(false), table(table_arg),
661     group_in_fd(0ULL), m_root(root), fd(root), whole_tables_fd(0),
662     mat_tables(root)
663     {
664       assert(table);
665     }
666   bool check_expression(THD *thd, Item *expr, bool in_select_list);
667   /// Shortcut for common use of Item::local_column()
local_column(Item * item)668   bool local_column(Item *item) const
669   { return item->local_column(select).is_true(); }
670   void add_to_fd(Item *item, bool local_column, bool add_to_mat_table= true);
add_to_fd(table_map m)671   void add_to_fd(table_map m) { whole_tables_fd|= m; find_group_in_fd(NULL); }
672   void add_to_source_of_mat_table(Item_field *item_field, TABLE_LIST *tl);
673   bool is_in_fd(Item *item);
674   bool is_in_fd_of_underlying(Item_ident *item);
675   void analyze_conjunct(Item *cond, Item *conjunct, table_map weak_tables,
676                         bool weak_side_upwards);
677   void analyze_scalar_eq(Item *cond, Item *left_item, Item *right_item,
678                          table_map weak_tables, bool weak_side_upwards);
679   void find_fd_in_cond(Item *cond, table_map weak_tables,
680                        bool weak_side_upwards);
681   void find_fd_in_joined_table(List<TABLE_LIST> *join_list);
682   void to_opt_trace2(Opt_trace_context *ctx, Opt_trace_object *parent);
683   void find_group_in_fd(Item *item);
684   Item *select_expression(uint idx);
685 
686   /// Enum for argument of do_ident_check()
687   enum enum_ident_check
688   {
689     CHECK_GROUP,
690     CHECK_STRONG_SIDE_COLUMN,
691     CHECK_COLUMN
692   };
693   bool do_ident_check(Item_ident *i, table_map tm,
694                       enum enum_ident_check type);
695 
696   /**
697      Just because we need to go through Item::walk() to reach all items to
698      validate, some work must be delegated to "Item processors" (members of
699      Item); this work conceptually belongs to Group_check, and needs
700      privileged access to it.
701   */
702   friend bool Item_ident::aggregate_check_group(uchar *arg);
703   friend bool Item_sum::aggregate_check_group(uchar *arg);
704   friend bool Item_func_any_value::aggregate_check_group(uchar *arg);
705   friend bool Item_ident::is_strong_side_column_not_in_fd(uchar *arg);
706   friend bool Item_ident::is_column_not_in_fd(uchar *arg);
707 
708   FORBID_COPY_CTOR_AND_ASSIGN_OP(Group_check);
709 };
710 
711 #endif
712 /// @} (end of group AGGREGATE_CHECKS ONLY_FULL_GROUP_BY)
713