1 #ifndef AGGREGATE_CHECK_INCLUDED 2 #define AGGREGATE_CHECK_INCLUDED 3 4 /* Copyright (c) 2014, 2021, Oracle and/or its affiliates. 5 6 This program is free software; you can redistribute it and/or modify 7 it under the terms of the GNU General Public License, version 2.0, 8 as published by the Free Software Foundation. 9 10 This program is also distributed with certain software (including 11 but not limited to OpenSSL) that is licensed under separate terms, 12 as designated in a particular file or component or in included license 13 documentation. The authors of MySQL hereby grant you an additional 14 permission to link the program and your derivative works with the 15 separately licensed software that they have included with MySQL. 16 17 This program is distributed in the hope that it will be useful, 18 but WITHOUT ANY WARRANTY; without even the implied warranty of 19 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 20 GNU General Public License, version 2.0, for more details. 21 22 You should have received a copy of the GNU General Public License 23 along with this program; if not, write to the Free Software 24 Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA */ 25 26 /** 27 @file 28 Checks for some semantic constraints on queries using GROUP 29 BY, or aggregate functions, or DISTINCT. Enforced if 30 sql_mode contains 'only_full_group_by'. 31 */ 32 33 /** 34 @defgroup AGGREGATE_CHECKS Aggregate checks of ONLY_FULL_GROUP_BY 35 36 Checks for some semantic constraints on queries using GROUP BY, or aggregate functions, or DISTINCT. 37 38 We call "aggregation" the operation of taking a group of rows and replacing 39 it with a single row. There are three types of aggregation: DISTINCT, 40 implicit grouping, explicit grouping. 41 42 This text describes MySQL's checks (why certain queries are rejected) and the 43 rationale behind them. 44 45 References: 46 - WL#2489 "better only_full_group_by". 47 - if you have access to the SQL standard, we recommend the following parts of 48 "SQL2011 Foundation": query expression Syntax rule 28; column reference Syntax 49 rule 7 and Conformance rule 2; 4.19 functional dependencies. 50 51 @section DISTINCT 52 53 DISTINCT: all identical rows in the result of SELECT are "aggregated" to 54 one single row - duplicates are eliminated. 55 If the result of the SELECT without DISTINCT is 56 @verbatim 57 1 2 58 3 4 59 1 2 60 @endverbatim 61 then the result with DISTINCT is 62 @verbatim 63 1 2 64 3 4 65 @endverbatim 66 Here is a problematic query. Assume we have a table T which contains three 67 columns C1,C2,C3 and has the following rows: 68 @verbatim 69 C1 C2 C3 70 1 2 A 71 3 4 B 72 1 2 C 73 @endverbatim 74 and we do "SELECT DISTINCT C1, C2 FROM T ORDER BY C3". 75 Because we want the result to be ordered, we have to first eliminate 76 duplicates then order the result. 77 When we eliminate duplicates, should we keep the first or the third row? This 78 arbitrary choice will influence the retained value of C3, which will influence 79 ordering. In the end, we have arbitrary ordering, which is problematic. 80 To prevent this, if, in a query block 'sl', an ORDER BY expression 81 is not the same expression as one in the SELECT list of 'sl' 82 and contains a column which: 83 - is of a table whose qualifying query block is 'sl' 84 - and is not in the SELECT list of 'sl' 85 86 then 'sl' should not have DISTINCT. 87 This rule makes the query above invalid. 88 Class Check_distinct implements this rule. 89 90 Class Check_distinct implements a second rule, specific of set functions in 91 ORDER BY (a non-standard feature). 92 Consider these queries labelled (1), (2), (3), with DISTINCT and a set 93 function in ORDER BY: 94 @verbatim 95 SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3); 96 SELECT DISTINCT MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C2); 97 SELECT DISTINCT C1, MIN(C2) FROM T GROUP BY C1 ORDER BY MIN(C3); 98 @endverbatim 99 (1) is a random-order query, (2) and (3) are not. 100 MySQL has traditionally been permissive, we want to allow (2) and (3). 101 102 So the second rule is that Check_distinct rejects a query if it has DISTINCT 103 and a set function in ORDER BY which is not the same expression as one in the 104 SELECT list. This rejects (1) and allows (2). 105 It would reject (3); luckily, before Check_distinct checks, DISTINCT is 106 removed (it is redundant with GROUP BY) and thus the query is not processed by 107 Check_distinct; the second rule is thus by-passed and (3) is correctly 108 accepted. 109 110 The implementation of Check_distinct works like this: if the query has 111 DISTINCT, examine each element of the ORDER BY list: if the element is not the 112 same expression as one in the SELECT list, then examine parts of the element 113 (using Item::walk()), to spot offending set functions or columns. 114 115 @section IMPLICIT_GROUPING Implicit grouping 116 117 A query with set functions without GROUP BY can be seen as having GROUP BY() 118 i.e. the set of grouping expressions is empty, all rows are part of a 119 single group and are replaced with a single row. So it is just a sub-case of 120 the next section. 121 122 @section EXPLICIT_GROUPING Explicit grouping (GROUP BY) 123 124 MySQL is more permissive than the standard, it allows to group by any 125 expression; it does not require that every element of the GROUP BY clause 126 should be a column of one table of the FROM clause. 127 128 Here is a problematic query, using a table T with columns C1, C2, C3: 129 @verbatim 130 C1 C2 C3 131 1 2 A 132 3 4 B 133 1 2 C 134 135 SELECT C1, C3 FROM T GROUP BY C1; 136 @endverbatim 137 we first form groups, in each group the value of C1 must be the same for all 138 rows. Consider the group made of the first and third row. We have to produce 139 one row out of it, and this row must have a value for C3 because C3 is in the 140 SELECT list. Among those two rows, which value of C3 should we choose? This is 141 arbitrary, and will give a random result. 142 143 To prevent this, in a query with GROUP BY or aggregates (known as "a grouped 144 query"), any column referenced by an expression in the SELECT list or HAVING 145 condition and belonging to one table of the FROM clause, should be one of the 146 group columns (enforced by only_full_group_by in MySQL 5.6 and older) or, if 147 functional dependencies are supported (as in MySQL 5.7), should be 148 functionally dependent on the group columns. 149 In the table T above, C3 is obviously not functionally dependent on {C1,C2}: 150 the values of C1 and C2 for a row do not uniquely determine the value of 151 C3. In other words, if two rows have the same value in C1 and the same value 152 in C2 they do not necessarily have the same value in C3. 153 So this rule correctly rejects the query above. 154 155 Note that NULL is treated as one value: two NULLs are considered equal. 156 157 In WL#2489, we have implemented the optional standard feature T301 158 "functional dependencies" almost entirely. 159 160 Here are the functional dependencies which we recognize. 161 162 @subsection KEYFD Key-based, in a base table 163 164 A key in this text is a unique constraint made of only non-NULLable 165 columns. For example, a primary key. 166 Considering a base table T, if two rows have the same values of all columns of 167 a key of T they are actually one single row, so: 168 { all columns of key} -> { T.* } (notation: what is on the right of the arrow 169 is functionally dependent on what is on the left). 170 171 @subsection GCOLFD Generated-column-based, in a base table 172 173 Considering a base table T, a generated column is functionally dependent on 174 the set of columns it references (the so-called parametric 175 columns). Note that the SQL standard doesn't allow a parametric column 176 to be a generated column, but MySQL does. 177 178 @subsection INNEREQ Equality-based, in the result of an inner join 179 180 Considering two tables T1 and T2, a condition C, and the result R of an inner 181 join T1 JOIN T2 ON C. 182 Note that T1/T2 are not necessarily base tables. For example, they can also be 183 views, or derived tables, or the results of some joins; in the rest of text, 184 we use the vague term "table" for those various objects. 185 186 Note that C may only refer to columns of T1 or T2 (outer references 187 are forbidden by MySQL in join conditions). 188 189 Assuming that C is a conjunction (i.e. is made of one or more conditions, 190 "conjuncts", chained together with AND): 191 - If one conjunct is of the form T1.A=constant, then {} -> {A} holds in R (the 192 value of A is "the constant" in all rows of R). 193 - If one conjunct is of the form T1.A=T2.B, then {T1.A} -> {T2.B} (and vice 194 versa) holds in R (the value of T2.B is that of T1.A in all rows of R). 195 196 @subsection OUTEREQ Equality-based, in the result of an outer join 197 198 Considering the result R of TS LEFT JOIN TW ON C. 199 Assuming that C is a conjunction. TS is said to be the strong table, TW is 200 said to be the weak table (the one which might be NULL-complemented in the 201 result of this join). 202 To make this really clear, note that, if we have 203 "t1 left join (t2 left join t3 on C) on D": 204 - in the t2-t3 join, t2 is strong and t3 is weak. 205 - in the t1-(t2-t3) join, t1 is strong, t2 is weak, t3 is weak. 206 207 If C is deterministic and one conjunct is of the form TW.A=constant or 208 TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns 209 of TS referenced by C. 210 Proof: consider in R two rows r1 and r2 which have the same values of DJS 211 columns. Consider r1. There are two possibilities when r1 was built from a row 212 of TS: 213 - no row in TW matched the row of TS (for no row of TW has C been true): so, 214 r1 is NULL-complemented for the columns of TW. Given that r2 has the same 215 values of DJS columns as r1, and given that C is deterministic, it is sure 216 that no row in TW matched when r2 was built. So r2 is also NULL-complemented 217 for the columns of TW. So r1 and r2 have the same value of TW.A (NULL). 218 - At least one row in TW matched: so, r1 contains real values from TW (not 219 NULL-complemented), matching C, and thus TW.A in r1 is equal to the constant 220 or to TS.B. Following the same reasoning as above, it is sure that it is also 221 the case for r2. 222 - In conclusion, we can see that r1 and r2 always have the same value of 223 TW.A. 224 225 If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in R 226 Proof: consider r1 and r2 two rows in R having the same value of TW.A. Two 227 cases are possible: 228 - this value is NULL. Then both rows are NULL-complemented (if not, the 229 value of TW.A in TW would be NULL, which cannot match in an equality, so C is 230 not true, which is absurd). Thus, in r1 and r2 TW.B is NULL. 231 - This value is not NULL. Then both rows are not NULL-complemented, C 232 matched for both, so TW.B in r1/r2 is equal to TW.A in r1/r2. 233 - In conclusion, r1 and r2 have the same value of TW.B. 234 235 @subsection WHEREEQ Equality-based, in the result of a WHERE clause 236 237 Same rule as the result of an inner join. 238 Additionally, the rule is extended to T1.A=outer_reference, because an outer 239 reference is a constant during one execution of this query. 240 241 Below we examine how functional dependencies in a table propagate to its 242 containing join nest. 243 244 @subsection PROPAGOUTER Considering the result R of TS LEFT JOIN TW ON C. 245 246 All functional dependencies in TS are also functional dependencies in R. 247 Proof: trivial. 248 The same is not necessarily true for TW. 249 Let's define a "NULL-friendly functional dependency" (NFFD) as a functional 250 dependency between two sets A and B, which has two properties: 251 - A is not empty 252 - if, in a row, all values of A are NULL, then all values of B are NULL. 253 254 All NFFDs in TW are also NFFDs in R. 255 Proof: consider an NFFD A -> B in TW, and r1 and r2 two rows in R having the 256 same values of A columns. Two cases are possible: 257 - In r1 and r2, at least one value of A is not NULL. Then r1 is not 258 NULL-complemented. Its values for A and B come from TW. By application of the 259 functional dependency in TW, because values in A are equal in TW, values in B 260 are equal in TW and thus in r1/r2. 261 - In r1 and r2, all values of A are NULL. Two cases are possible: 262 i) r1 is not NULL-complemented. Its values for A and B come from TW. In the 263 row of TW values of A are all NULL. Because the functional dependency in 264 NULL-friendly, all values of B are NULL in the row of TW and thus in r1. 265 ii) r1 is NULL-complemented. Then all values of B in r1 are NULL. 266 iii) In conclusion, all values of B in r1 are NULL. The same reasoning applies 267 to r2. So, values of B are identical (NULL) in r1 and r2. 268 - In conclusion, values of B are identical in r1/r2, we have proved that 269 this is a functional dependency in R, and a NULL-friendly one. 270 271 The concept of an NFFD is Guilhem's invention. It was felt it was necessary, to 272 have easy propagation of FDs from TW to R. It was preferred to the 273 alternative, simpler rule which says that a functional dependency A-> B in TW 274 is also a functional dependency in R if A contains a non-nullable 275 column. There are two points to note: 276 - the functional dependency of the simpler rule is an NFFD, so our rule is not 277 more restrictive than the simpler one 278 - this simpler rule prevents free propagation of functional dependencies 279 through join nests, which complicates implementation and leads to rejecting 280 queries which could be accepted. An example follows: 281 @verbatim 282 SELECT T3.A 283 FROM T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE 284 GROUP BY T3.PK; 285 @endverbatim 286 This is what the simpler rule says for this query: 287 * In T3, T3.PK->T3.A holds. 288 * Let R1 be the result of "(T2 LEFT JOIN T3 ON TRUE)", in R1 T3.PK->T3.A 289 holds, by application of: there is a functional dependency in the 290 weak side T3, and T3.PK is not nullable in T3. 291 * Let R2 be the result of "T1 LEFT JOIN (T2 LEFT JOIN T3 ON TRUE) ON TRUE", 292 in R2 T3.PK->T3.A doesn't hold anymore, because: it's a dependency in the 293 weak side (weak side is R1 here), and T3.PK is nullable _when 294 seen as a column of R1_ (in R1 T3.PK can be NULL, if the row of T3 295 is actually a NULL-complemented one). 296 297 @subsection PROPAGINNER Considering the result R of T1 JOIN T2 ON C. 298 299 All [NULL-friendly] functional dependencies in T1 are also [NULL-friendly] 300 functional dependencies in R. the same is true for T2. 301 Proof: trivial. 302 303 @subsection PROPAGSUMMARY Summary rules for propagating FDs 304 305 All NULL-friendly functional dependencies propagate freely through join nests 306 all the way up to the result of the WHERE clause. 307 The same is true for ordinary functional dependencies except if there are weak 308 tables along the propagation path between the table where the dependency holds 309 and the result of the WHERE clause; in other words, except if the table where 310 the dependency holds belongs to some embedding join nest which is on the weak 311 side of an outer join. 312 313 @subsection NFFDS Which functional dependencies are NULL-friendly 314 315 A key-based functional dependency A -> B in the base table is NULL-friendly, 316 because, by definition, there can never be a NULL value in any column of A. 317 318 A functional dependency A -> B in a base table, between parametric columns A 319 and a generated column B, is not NULL-friendly; for more details, see 320 @ref FDVIEW . 321 322 A functional dependency A->B in the result of T1 JOIN T2 ON C, if based on 323 equality of two columns, is NULL-friendly. Indeed, A is not empty and if there 324 was some NULL in A, it would not match the equality in C and thus it would not 325 exist in the result, absurd. If the equality is rather column=constant, A is 326 empty, the dependency is not NULL-friendly. However, in our implementation, 327 function @c simplify_joins() moves inner-join conditions to the embedding 328 outer-join nest's join condition, or to the WHERE clause. Because our analysis 329 of functional dependencies happens after simplify_joins(), when we analyze T1 330 JOIN T2 it is guaranteed to have no condition, and this paragraph is 331 irrelevant. 332 333 A functional dependency in the result of TS LEFT JOIN TW ON C, if based on 334 equality of two columns, is NULL-friendly. 335 Proof: let's consider, in turn, the two types of equality-based functional 336 dependencies which exist in this result R. Let r1 be a row of R. 337 - If C is deterministic and one conjunct is of the form TW.A=constant or 338 TW.A=TS.B, then DJS -> {TW.A} holds in R, where DJS is the set of all columns 339 of TS referenced by C. For NULL-friendliness, we need DJS to not be 340 empty. Thus, we exclude the form TW.A= constant and consider only 341 TW.A=TS.B. We suppose that in r1 DJS contains all NULLs. Conjunct is TW.A=TS.B 342 then this equality is not true, so r1 is NULL-complemented: TW.A is NULL in 343 r1. 344 - If one conjunct is of the form TW.A=TW.B then {TW.A} -> {TW.B} holds in 345 R. If in r1 TW.A is NULL, again the equality in C is not true, and TW.B is 346 NULL in R1. 347 - In conclusion, this is NULL-friendly. 348 349 A functional dependency in the result of a WHERE clause, if based on equality 350 of two columns, is NULL-friendly. If based on T1.A=constant, it is not, as it 351 has an empty set of source columns. 352 353 Summary: all functional dependencies which we have seen so far are 354 NULL-friendly, except those inferred from TW.A=constant in an outer join 355 condition or in a WHERE clause, and those about generated columns. 356 357 Thus, in the query with T1-T2-T3 previously seen, T3.PK->T3.A is 358 NULL-friendly and propagates, query is accepted. 359 360 In our implementation, we take special care of TW.A=constant in an outer join 361 condition: we infer a functional dependency DJS->TW.A from such equality only 362 if one of these requirements are met: 363 - the join nest "TS LEFT JOIN TW ON TW.A=constant [AND...]" is not on the 364 weak side of any embedding join nest - in that case, propagation will not meet 365 any weak tables so we do not need the dependency to be NULL-friendly, it will 366 propagate anyway. 367 - DJS contains at least one column from a strong-side table which, if NULL, 368 makes the join condition not evaluate to TRUE - in that case, DJS->TW.A is 369 NULL-friendly. 370 371 Either of these two conditions is satisfied in most practical cases. For 372 example, it's very common to have an equality between a strong-side column and 373 a weak-side column as a conjunct in the outer join condition (like, "ON 374 strong.pk = weak.foreign_key AND ..." or "ON strong.foreign_key = weak.pk AND 375 ..."); this satisfies the second condition. It's also common to have outer 376 joins only left-deep ("SELECT ... T1 LEFT JOIN T2 ON ... LEFT JOIN T3 ON ..." 377 is left-deep); this satisfies the first condition. 378 Note that the dependency found from TW.A=TS.B in an outer join condition 379 always satisfies the second condition. 380 381 T1.A=constant in a WHERE clause is exterior to any join nest so does not need 382 to propagate, so does not need to be NULL-friendly. 383 384 @subsection FDVIEW Functional dependencies in a view or a derived table 385 386 In the rest of this text, we will use the term "view" for "view or derived 387 table". A view can be merged or materialized, in MySQL. 388 Consider a view V defined by a query expression. 389 If the query expression contains UNION or ROLLUP (which is based on UNION) 390 there are no functional dependencies in this view. 391 So let's assume that the query expression is a query specification (let's note 392 it VS): 393 @verbatim 394 CREATE VIEW V AS SELECT [DISTINCT] VE1 AS C1, VE2 AS C2, ... FROM ... WHERE ... [GROUP BY ...] [HAVING ...] [ORDER BY ...] 395 @endverbatim 396 397 If {VE1, VE2, VE3} are columns of tables of the FROM clause, and {VE1, 398 VE2} -> {VE3} has been deduced from rules in the previous sections [and is 399 NULL-friendly], then {C1, C2} -> { C3 } holds in the view [and is 400 NULL-friendly]. 401 402 If {VE1, VE2} are columns of tables of the FROM clause, and VE3 is a 403 deterministic expression depending only on VE1 and VE2, then {C1, C2} -> 404 { C3 } in the view. 405 It is not always NULL-friendly, for example: VE3 could be COALESCE(VE1,VE2,3): 406 if VE1 (C1) and VE2 (C2) are NULL, VE3 (C3) is 3: not NULL. Another example: 407 VE3 could be a literal; {}->{C3}, the left set is empty. 408 The same examples apply to a generated column in a base table - it is like a 409 merged view's expression. For example, in a base table T which has a generated 410 column C3 AS COALESCE(C1,C2,3): {C1, C2} -> { C3 } holds in T but is not 411 NULL-friendly. 412 413 If VS is a grouped query (which, in MySQL, implies that the view is 414 materialized), then in the result of the grouping there is a functional 415 dependency from the set of all group expressions to the set of all selected 416 expressions (otherwise, this query expression would not have passed its own 417 only_full_group_by validation - in the implementation we validate each query 418 expression separately). Thus, if all group expressions of VS are in the select 419 list of VS, for example they are VE1 and VE2, then {C1, C2} -> {V.*}. 420 It is not NULL-friendly, for example: VE3 is COUNT(1): if the result of the 421 WHERE clause contains a row with group expressions VE1 and VE2 equal to NULL, 422 VE3 is not NULL. 423 424 It's possible to infer functional dependencies from equality conditions in 425 HAVING, but we have not implemented it yet. 426 427 Because some functional dependencies above are not NULL-friendly, they 428 exist in the view, but may not freely propagate in the result of join nests 429 containing the view. This includes examples just given in paragraphs above, 430 and the case of T1.A=constant in the WHERE clause of VS. 431 432 Thus, when we know of a functional dependency A -> B in the query expression 433 of a view, we deduce from it a functional dependency in the view only if: 434 - this view is not on the weak side of any embedding join nest (so 435 NULL-friendliness is not required for propagation). 436 - or A contains at least one non-nullable expression, which makes A -> B 437 NULL-friendly. 438 439 The above is fine for materialized views. For merged views, we cannot study the 440 query expression of the view, it has been merged (and scattered), so we use a 441 different rule: 442 - a merged view is similar to a join nest inserted in the parent query, so 443 for dependencies based on keys or join conditions, we simply follow 444 propagation rules of the non-view sections. 445 - For expression-based dependencies (VE3 depends on VE1 and VE2, VE3 446 belonging to the view SELECT list), which may not be NULL-friendly, we require 447 - the same non-weak-side criterion as above 448 - or that the left set of the dependency be non-empty and that if VE1 and 449 VE2 are NULL then VE3 must be NULL, which makes the dependency NULL-friendly. 450 - The same solution is used for generated columns in a base table. 451 452 @{ 453 454 */ 455 456 #include "my_global.h" 457 #include "mem_root_array.h" 458 #include "opt_trace.h" 459 #include "item_cmpfunc.h" 460 #include "item_sum.h" 461 struct st_mem_root; 462 class st_select_lex; 463 struct TABLE_LIST; 464 465 /** 466 Re-usable shortcut, when it does not make sense to do copy objects of a 467 class named "myclass"; add this to a private section of the class. The 468 implementations are intentionally not created, so if someone tries to use 469 them like in "myclass A= B" there will be a linker error. 470 */ 471 #define FORBID_COPY_CTOR_AND_ASSIGN_OP(myclass) \ 472 myclass(myclass const &); \ 473 void operator=(myclass const &) 474 475 476 /** 477 Utility mixin class to be able to walk() only parts of item trees. 478 479 Used with WALK_PREFIX+POSTFIX walk: in the prefix call of the Item 480 processor, we process the item X, may decide that its children should not 481 be processed (just like if they didn't exist): processor calls stop_at(X) 482 for that. Then walk() goes to a child Y; the processor tests is_stopped(Y) 483 which returns true, so processor sees that it must not do any processing 484 and returns immediately. Finally, the postfix call to the processor on X 485 tests is_stopped(X) which returns "true" and understands that the 486 not-to-be-processed children have been skipped so calls restart(). Thus, 487 any sibling of X, any part of the Item tree not under X, can then be 488 processed. 489 */ 490 class Item_tree_walker 491 { 492 protected: Item_tree_walker()493 Item_tree_walker() : stopped_at_item(NULL) {} ~Item_tree_walker()494 ~Item_tree_walker() { assert(!stopped_at_item); } 495 496 /// Stops walking children of this item stop_at(const Item * i)497 void stop_at(const Item *i) 498 { assert(!stopped_at_item); stopped_at_item= i; } 499 500 /** 501 @returns if we are stopped. If item 'i' is where we stopped, restarts the 502 walk for next items. 503 */ is_stopped(const Item * i)504 bool is_stopped(const Item *i) 505 { 506 if (stopped_at_item) 507 { 508 /* 509 Walking was disabled for a tree part rooted a one ancestor of 'i' or 510 rooted at 'i'. 511 */ 512 if (stopped_at_item == i) 513 { 514 /* 515 Walking was disabled for the tree part rooted at 'i'; we have now just 516 returned back to this root (WALK_POSTFIX call), left the tree part: 517 enable the walk again, for other tree parts. 518 */ 519 stopped_at_item= NULL; 520 } 521 // No further processing to do for this item: 522 return true; 523 } 524 return false; 525 } 526 527 private: 528 const Item *stopped_at_item; 529 FORBID_COPY_CTOR_AND_ASSIGN_OP(Item_tree_walker); 530 }; 531 532 533 /** 534 Checks for queries which have DISTINCT. 535 */ 536 class Distinct_check: public Item_tree_walker, public Sql_alloc 537 { 538 public: 539 Distinct_check(st_select_lex * select_arg)540 Distinct_check(st_select_lex *select_arg) 541 : select(select_arg), failed_ident(NULL) 542 {} 543 544 bool check_query(THD *thd); 545 546 private: 547 548 /// Query block which we are validating 549 st_select_lex *const select; 550 /// Identifier which triggered an error 551 Item_ident *failed_ident; 552 553 /** 554 Just because we need to go through Item::walk() to reach all items to 555 validate, some work must be delegated to "Item processors" (members of 556 Item); this work conceptually belongs to Distinct_check, and needs 557 privileged access to it. 558 */ 559 friend bool Item_sum::aggregate_check_distinct(uchar *arg); 560 friend bool Item_ident::aggregate_check_distinct(uchar *arg); 561 friend bool Item_func_any_value::aggregate_check_distinct(uchar *arg); 562 563 FORBID_COPY_CTOR_AND_ASSIGN_OP(Distinct_check); 564 }; 565 566 567 /** 568 Checks for queries which have GROUP BY or aggregate functions. 569 */ 570 class Group_check: public Item_tree_walker, public Sql_alloc 571 { 572 public: 573 Group_check(st_select_lex * select_arg,st_mem_root * root)574 Group_check(st_select_lex *select_arg, st_mem_root *root) 575 : select(select_arg), search_in_underlying(false), 576 non_null_in_source(false), 577 table(NULL), group_in_fd(~0ULL), m_root(root), fd(root), 578 whole_tables_fd(0), mat_tables(root), failed_ident(NULL) 579 {} 580 ~Group_check()581 ~Group_check() 582 { 583 for (uint j= 0; j < mat_tables.size(); ++j) 584 delete mat_tables.at(j); 585 } 586 587 bool check_query(THD *thd); 588 void to_opt_trace(THD *thd); 589 590 private: 591 592 /// Query block which we are validating 593 st_select_lex *const select; 594 595 /** 596 "Underlying" == expressions which are underlying in an identifier. 597 The identifier can be a column of a view or derived table, both merged or 598 materialized, or a generated column: all of those have an underlying 599 expression. 600 "Materialized table (mat table)" == view or derived table, materialized. 601 If this is true, is_in_fd() will look for FDs in underlying expressions 602 of columns. 603 */ 604 bool search_in_underlying; 605 606 /** 607 This member is readable only if this is a child Group_check. Is true if 608 one expression of the SELECT list of this->select is non-nullable. 609 */ 610 bool non_null_in_source; 611 612 /** 613 The Group_check employed to validate one query block, the one on which 614 check_query() runs, is named the "master". 615 If the query block references a materialized table, the master may create 616 a child Group_check, whose job is to discover FDs in the query expression 617 of the mat table (with the ultimate goal of deducing from them some FDs 618 in the mat table and thus in the parent Group_check). A child may have 619 children itself. If this Group_check is a child, 'table' points to the 620 materialized table, otherwise it is NULL. 621 */ 622 TABLE_LIST *const table; 623 624 /** 625 Bit N is set if the N-th expression of GROUP BY is functionally dependent 626 on source columns. 627 It serves to find FDs in the query expression of a materialized table 628 having GROUP BY. 629 For a non-child Group-check, all bits are turned on from the start. 630 Currently limited to 64 bits => max 64 GROUP expressions; should probably 631 be MY_BITMAP. 632 */ 633 ulonglong group_in_fd; 634 635 /// Memory for allocations (like of 'fd') 636 st_mem_root *const m_root; 637 638 /** 639 Columns which are local to 'select' and functionally dependent on an 640 initial set of "source" columns defined like this: 641 - if !is_child(), the GROUP BY columns 642 - if is_child(), columns of the result of the query expression under 643 'table' which are themselves part of 'fd' of the parent Group_check. 644 */ 645 Mem_root_array<Item_ident *, true> fd; 646 /// Map of tables for which all columns can be considered part of 'fd'. 647 table_map whole_tables_fd; 648 /// Children Group_checks of 'this' 649 Mem_root_array<Group_check *, true> mat_tables; 650 /// Identifier which triggered an error 651 Item_ident *failed_ident; 652 653 bool is_fd_on_source(Item *item); is_child()654 bool is_child() const { return table != NULL; } 655 656 /// Private ctor, for a Group_check to build a child Group_check Group_check(st_select_lex * select_arg,st_mem_root * root,TABLE_LIST * table_arg)657 Group_check(st_select_lex *select_arg, st_mem_root *root, 658 TABLE_LIST *table_arg) 659 : select(select_arg), search_in_underlying(false), 660 non_null_in_source(false), table(table_arg), 661 group_in_fd(0ULL), m_root(root), fd(root), whole_tables_fd(0), 662 mat_tables(root) 663 { 664 assert(table); 665 } 666 bool check_expression(THD *thd, Item *expr, bool in_select_list); 667 /// Shortcut for common use of Item::local_column() local_column(Item * item)668 bool local_column(Item *item) const 669 { return item->local_column(select).is_true(); } 670 void add_to_fd(Item *item, bool local_column, bool add_to_mat_table= true); add_to_fd(table_map m)671 void add_to_fd(table_map m) { whole_tables_fd|= m; find_group_in_fd(NULL); } 672 void add_to_source_of_mat_table(Item_field *item_field, TABLE_LIST *tl); 673 bool is_in_fd(Item *item); 674 bool is_in_fd_of_underlying(Item_ident *item); 675 void analyze_conjunct(Item *cond, Item *conjunct, table_map weak_tables, 676 bool weak_side_upwards); 677 void analyze_scalar_eq(Item *cond, Item *left_item, Item *right_item, 678 table_map weak_tables, bool weak_side_upwards); 679 void find_fd_in_cond(Item *cond, table_map weak_tables, 680 bool weak_side_upwards); 681 void find_fd_in_joined_table(List<TABLE_LIST> *join_list); 682 void to_opt_trace2(Opt_trace_context *ctx, Opt_trace_object *parent); 683 void find_group_in_fd(Item *item); 684 Item *select_expression(uint idx); 685 686 /// Enum for argument of do_ident_check() 687 enum enum_ident_check 688 { 689 CHECK_GROUP, 690 CHECK_STRONG_SIDE_COLUMN, 691 CHECK_COLUMN 692 }; 693 bool do_ident_check(Item_ident *i, table_map tm, 694 enum enum_ident_check type); 695 696 /** 697 Just because we need to go through Item::walk() to reach all items to 698 validate, some work must be delegated to "Item processors" (members of 699 Item); this work conceptually belongs to Group_check, and needs 700 privileged access to it. 701 */ 702 friend bool Item_ident::aggregate_check_group(uchar *arg); 703 friend bool Item_sum::aggregate_check_group(uchar *arg); 704 friend bool Item_func_any_value::aggregate_check_group(uchar *arg); 705 friend bool Item_ident::is_strong_side_column_not_in_fd(uchar *arg); 706 friend bool Item_ident::is_column_not_in_fd(uchar *arg); 707 708 FORBID_COPY_CTOR_AND_ASSIGN_OP(Group_check); 709 }; 710 711 #endif 712 /// @} (end of group AGGREGATE_CHECKS ONLY_FULL_GROUP_BY) 713