1Parsing a Miller DSL (domain-specific language) expression goes through three representations: 2 3* Source code which is a string of characters. 4* Abstract syntax tree (AST) 5* Concrete syntax tree (AST) 6 7The job of the GOCC parser is to turn the DSL string into an AST. 8 9The job of the CST builder is to turn the AST into a CST. 10 11The job of the `put` and `filter` transformers is to execute the CST statements on each input record. 12 13# Source-code representation 14 15For example, the part between the single quotes in 16 17`mlr put '$v = $i + $x * 4 + 100.7 * $y' myfile.dat` 18 19# AST representation 20 21Use `put -v` to display the AST: 22 23``` 24mlr -n put -v '$v = $i + $x * 4 + 100.7 * $y' 25RAW AST: 26* StatementBlock 27 * SrecDirectAssignment "=" "=" 28 * DirectFieldName "md_token_field_name" "v" 29 * Operator "+" "+" 30 * Operator "+" "+" 31 * DirectFieldName "md_token_field_name" "i" 32 * Operator "*" "*" 33 * DirectFieldName "md_token_field_name" "x" 34 * IntLiteral "md_token_int_literal" "4" 35 * Operator "*" "*" 36 * FloatLiteral "md_token_float_literal" "100.7" 37 * DirectFieldName "md_token_field_name" "y" 38``` 39 40Note the following about the AST: 41 42* Parentheses, commas, semicolons, line endings, whitespace are all stripped away 43* Variable names and literal values remain as leaf nodes of the AST 44* Operators like `=` `+` `-` `*` `/` `**`, function names, and so on remain as non-leaf nodes of the AST 45* Operator precedence is clear from the tree structure 46 47Operator-precedence examples: 48 49``` 50$ mlr -n put -v '$x = 1 + 2 * 3' 51RAW AST: 52* StatementBlock 53 * SrecDirectAssignment "=" "=" 54 * DirectFieldName "md_token_field_name" "x" 55 * Operator "+" "+" 56 * IntLiteral "md_token_int_literal" "1" 57 * Operator "*" "*" 58 * IntLiteral "md_token_int_literal" "2" 59 * IntLiteral "md_token_int_literal" "3" 60``` 61 62``` 63$ mlr -n put -v '$x = 1 * 2 + 3' 64RAW AST: 65* StatementBlock 66 * SrecDirectAssignment "=" "=" 67 * DirectFieldName "md_token_field_name" "x" 68 * Operator "+" "+" 69 * Operator "*" "*" 70 * IntLiteral "md_token_int_literal" "1" 71 * IntLiteral "md_token_int_literal" "2" 72 * IntLiteral "md_token_int_literal" "3" 73``` 74 75``` 76$ mlr -n put -v '$x = 1 * (2 + 3)' 77RAW AST: 78* StatementBlock 79 * SrecDirectAssignment "=" "=" 80 * DirectFieldName "md_token_field_name" "x" 81 * Operator "*" "*" 82 * IntLiteral "md_token_int_literal" "1" 83 * Operator "+" "+" 84 * IntLiteral "md_token_int_literal" "2" 85 * IntLiteral "md_token_int_literal" "3" 86``` 87 88# CST representation 89 90There's no `-v` display for the CST, but it's simply a reshaping of the AST 91with pre-processed setup of function pointers to handle each type of statement 92on a per-record basis. 93 94The if/else and/or switch statements to decide what to do with each AST node 95are done at CST-build time, so they don't need to be re-done when the syntax 96tree is executed once on every data record. 97 98# Source directories/files 99 100* The AST logic is in `./ast*.go`. I didn't use a `src/dsl/ast` naming convention, although that would have been nice, in order to avoid a Go package-dependency cycle. 101* The CST logic is in [`./cst`](./cst). Please see [cst/README.md](./cst/README.md) for more information. 102