1Named Functions and Modules 2=========================== 3Store documentation and provenance metadata in function/module values. 4 5Named functions and modules are the components out of which library APIs 6are built. They are the modules and functors of the Modular Programming 7proposal. Named modules are not used as instances of abstract data types, 8but named functions are used as constructors of ADT instances. 9 10This is a transitional design, not the final Curv 1.0 design. 11 12This is a variant of the "branded values" proposal. 13 * Weaker, because it doesn't require the printed representation to be globally 14 unambiguous, or that value equality is isomorphic to printed rep equality. 15 (This is a transitional design, I'll worry about that other stuff later.) 16 * Named values may contain documentation metadata. 17 18Named Functions 19--------------- 20A function literal like `x->x+1` constructs a function value that prints 21as a lambda expression. If you bind this to a name using 22 incr = x->x+1 23then by the law of substitution, `incr` is just an alias for x->x+1, 24and `incr` also prints as a lambda expression. 25 26To define a named function, you can write: 27 func incr = x->x+1 28 func incr x = x + 1 29and `incr` prints as the name `incr`. The `func` keyword signals that we are 30constructing a named function value. The old definition syntax 31 incr x = x + 1 32is deprecated. 33 34In `func f = expr`, the value of `expr` must match the `is_func` predicate 35or an error is reported. 36 37If a *.curv source file occurs as an element of a directory record, and the 38contents of the source file are a function expression, then use `func` as the 39first token in the file to define a named function. 40 41How are anonymous recursive functions printed? 42* An earlier proposal suggests printing as a `let` expression: 43 let fac n = ...fac(n-1)... in fac 44* However, we could impose the restriction that recursive functions must 45 be named (the definition must use 'func'). 46 fac = n -> if (n <= 0) 1 else n * fac(n-1) 47 ERROR: recursive function 'fac' must be declared using 'func' keyword. 48 func fac = n -> if (n <= 0) 1 else n * fac(n-1) 49 func fac n = if (n <= 0) 1 else n * fac(n-1) 50 This simplifies the job of printing a recursive function. 51 It also simplifies the implementation of recursive functions. 52 (I can avoid the recursive refcounting proposal for now.) 53* This simplification would not prohibit an operation for stripping the name 54 from a named recursive function to expose the lambda expression. 55 56Builtin functions like `sin` are named functions. 57 58Function Documentation 59---------------------- 60Named functions can contain embedded documentation, which is printed 61by the `help` command in the REPL. 62 63The `func` keyword can be preceded by a multi-line comment, which is used 64as a docstring and embedded in the function value. 65 66Named Modules 67------------- 68A module is a named record, defined as 69 module name = record_expr 70The `module` keyword works like `func`: it can be preceded by a doc comment, 71and it can appear as the first token in a source file. 72 73We should support mutually recursive functions that exist in different 74named modules. There are good reasons to support this in modular programming. 75 76A module constructor is a set of definitions surrounded by {}. 77Some of the definitions are 'named' (prefixed with func or module), 78and some are not. We store a boolean flag with each field, in the module's 79metadata, indicating which fields are named. This is separate from the field 80values. There is a distinction between a named field, and an anonymous field 81that happens to be bound to a named value. 82 83A module is a record with extra metadata, including this field metadata. 84'Module' is a subtype of 'Record'. All the algebraic laws for records also 85apply to modules. 86 87`module m = r` constructs a new value 'm' which has the name 'm' but 88inherits all the other properties of record r, including its field metadata. 89`m.f` returns a named value, whose name is `m.f`, if `f` is a named field of m. 90 91Constructors 92------------ 93A constructor is a function that returns a named value. 94 95Anonymous constructor: 96 param ->func expr 97 param ->module expr 98 99func f x y = x+y (curried function) is equivalent to 100 func f = x ->func y -> x+y 101A partially applied named curried function is a constructor. 102And this might motivate a different syntax for curried lambdas: 103 x y -> x+y 104is short for 105 x ->func y -> x+y 106 107'module f x = mexpr' is equivalent to 108 func f x = module mexpr 109 110Related proposals 111----------------- 112But, I also intend to print function values as constructor expressions 113that reconstruct the original value. 114 * A builtin function prints as the function name. 115 * A function from a (nonrecursive) lambda expression prints as a lambda 116 expression. 117 * A function from a recursive lambda expression prints as 118 let fac n = ...fac(n-1)... in fac 119 120In addition, there is a proposal for branded record and function values. 121A branded value is printed as a constructor expression (global variable name, 122function call or selection), rather than as a record or function literal. 123Builtin functions are considered branded, and print as a function name. 124There are also user-defined branded functions and records. 125The limitation is that there exists a global context in which all such 126constructor expressions can be evaluated to reconstruct the value, which 127means that a function value from a local function definition can't be branded, 128even though we do label such function values in Curv 0.4. 129 130 This includes a definition syntax for named, branded record fields. 131 Something like: 132 @name = <rec-or-func>; 133 def name = <rec-or-func>; 134 term name = <rec-or-func>; 135 named name = <rec-or-func>; 136 module name = <rec-or-func>; 137 labelled name = <rec-or-func>; 138 Which is related to the 'func' labelled function definition syntax. 139 Notes: 140 * A 'module' is an abstract, labelled, record or module value. 141 142 In a *.curv source file referenced by a directory (directory syntax), 143 the source file begins with '@' in one version of the proposal. 144 145Finally, I want a 'help' function for displaying the documentation for a 146function value. 'help f' evaluates 'f' as a function expression, then returns 147a multi-line documentation string. Function name, parameter documentation, 148plus a general doc string. Probably this is REPL only. 149 150 Syntax is not designed yet. The metadata will include a function name, 151 even if the function is locally defined and not branded. It will include 152 a function comment, and parameter comments. A keyword like 'func' (or 153 'def' or 'term') will be needed to trigger the collection and storage of 154 this metadata in the case where a function is defined with a combinator. 155 156Combined Proposal 157----------------- 158Use same syntax for labelled and branded func/record definitions. 159Extend this syntax with main docstring and parameter/field docstrings. 160 161When this syntax is used, we collect and store metadata. 162When this syntax is not used, we don't collect and store metadata. 163We don't decide based on the syntax of the definiens expression (eg, is it 164a lambda expression), because this breaks the algebra of programs. 165You should be able to substitute a lambda expression for another expression 166that computes the same value without changing the meaning of the program. 167 168Docstring Syntax 169---------------- 170I don't want to overthink or overengineer the doc syntax. 171This is a transitional design, so I'm going for the simplest design that works, 172and I'll get experience with this before choosing a final design. 173 174I don't need any new syntax for doc strings. If I just do what pydoc does, 175it's fine. Pydoc doesn't require string-literal docstring syntax, it will 176collect a documentation string from a block comment preceding a definition. 177I'll just rely on block comment syntax. 178 179Quote from pydoc: 180 For modules, classes, functions and methods, the displayed documentation 181 is derived from the docstring (i.e. the __doc__ attribute) of the object, 182 and recursively of its documentable members. If there is no docstring, 183 pydoc tries to obtain a description from the block of comment lines just 184 above the definition of the class, function or method in the source file, 185 or at the top of the module. 186 187In Curv, a labelled value definition (function or module or constructor) can be 188preceded with a block comment, which provides the doc string. 189 190I considered collecting docstrings from formal parameter comments, 191like in doxygen. However, I won't do this. First, it doesn't work for 192functions defined using combinators. Second, there's no need (Python doesn't 193do this: the parameter documentation is embedded in the function's docstring). 194So this idea is overengineering. 195 196What about per-module-field docstrings? 197 1. Internal field documentation. 198 Maybe only labelled values in a module have their own documentation, 199 similar to how in Python, only "documentable members" of a class or module 200 have documentation. To collect documentation from a module, you get the 201 module's docstring, then you iterate over the fields, and for each element 202 that is a labelled value, you collect that labelled value's documentation. 203 (Note that a labelled value field can be defined as 'x = labelled_val_expr', 204 so in this case the documentation string is actually defined elsewhere.) 205 Fields that don't have internal documentation can be documented in the 206 module's central docstring. This design nicely explains how unlabelled 207 definitions like 208 x = expr 209 [a,b] = expr 210 include expr 211 can produce fields with docstrings. It's compositional. To display module 212 documentation, first display the module's docstring (which will end with 213 docs for otherwise undocumented fields), then display the docstrings for 214 each member value that has one. I should start with this design, because 215 anything else will be a superset. And hey, it's good enough for Python. 216 2. External field documentation. 217 Maybe I want all the members of a labelled module to have their own doc 218 strings, even definitions like 'pi' in stdlib. This gets complicated: 219 we have to define the precedence of external vs internal docstrings, 220 and define rules for multi-variable definitions. Later. 221 222Data Model 223---------- 224A brand or formula is: 225 <brand> ::= <identifier> 226 | <brand> . <identifier> 227 | <brand> <argument> 228 <formula> ::= <symbol> // variable formula 229 | <module> . <symbol> // field selection formula 230 | <function> <argvalue> // function call formula 231 232Previously, 233 Storing the original constructor function value in a call formula is useful 234 for "customizing" a parametric model (tweaking some of the parameters). 235But parametric shapes are not named modules in this design. 236 237Previously, 238 Storing the original module value in a field formula may also be useful, 239 but depending on implementation, it might result in a recursive object cycle 240 (requiring a tracing garbage collector or cyclic reference counts), or we can 241 avoid that by applying the label to the field value each time the '.' operator 242 is evaluated. Alternatively, instead of storing a module value, we store the 243 module's formula. 244But in this design, a formula only needs to be purely symbolic. I don't have 245a rationale for storing function and module values in a formula. 246 247 <formula> ::= <symbol> // variable formula 248 | <formula> . <symbol> // field selection formula 249 | <formula> <argvalue> // function call formula 250 251A label is a pair of [formula, docstring]. The docstring may be an empty string. 252 253A function has an optional label, and orthogonal to that, 254it has a constructor flag. If the constructor flag is true, the function 255is called a constructor. 256 257A record has an optional label, and orthogonal to that, each field 258has a constructor flag. 259 260Labelled Definition Syntax 261-------------------------- 262I don't have a syntax I like; I don't know what keywords or symbols to use. 263It is an esthetic and linguistic problem of choosing the right words. 264As a stopgap measure, here is a temporary syntax. 265 266<definition> ::= 267 labelled <singleton-pattern> = <function-or-module-expression> 268 labelled <id> <param>... = <expression> 269 same as labelled <id> = <param> -> labelled ... -> <expression> 270 constructor <id> <param>... = <function-or-module-expression> 271 same as labelled <id> = <param> -> labelled ... -> labelled <expression> 272 273In the case of directory syntax, the first token in a *.curv file appearing 274as a directory entry can begin with the 'labelled' keyword, 275and that results in a labelled definition. 276 277<program> ::= 278 labelled <expression> 279 280In all cases, the keyword 'labelled' or 'constructor' can be preceeded by a 281block comment, which will form the doc string for the labelled function or 282module. 283 284Syntax alternative: In the Branded proposal, the keyword 'labelled' is 285replaced by '@', and each variable in a definiendum pattern can be either 286prefixed or not prefixed by '@'. But this doesn't take doc strings into 287account. In the present syntax, the entire definition is prefixed by 288 // <doc comment> <nl> labelled <definition> 289 290Syntax alternative: In the Branded proposal, an anonymous constructor 291is 'a -> @b' and a labelled constructor definition is '@f a = @b'. 292(Instead of two special tokens, => and constructor, we only need one.) 293And also, we could put a docstring before the second @ in a constructor: 294 <doc> @f a = <doc>@b 295 <doc> @f = a -> <doc>@b 296 297A module constructor is a scoped record constructor containing one or more 298labelled field definitions. When an anonymous module value is bound as the 299definiens in a labelled definition, the label from that definition is combined 300with field names and applied to the labelled fields in the module. 301 302A constructor expression is like a lambda expression with -> replaced by =>. 303When an anonymous constructor value is bound as the definiens in a labelled 304definition, the label from that definition is applied to the result of calling 305the constructor function. 306 307We need anonymous constructor values as a first class concept, otherwise we 308can't use combinators to compute a constructor before binding it to a label. 309 310Labelled Value Semantics 311------------------------ 312An anonymous module value is printed as a sequence of singleton definitions 313within curly braces, where labelled definitions are prefixed with 'labelled'. 314 315An anonymous constructor value is printed as <pattern> ->labelled <expr>. 316 317A labelled record, module, function or constructor is printed as a label 318expression. 319 320The 'open' function takes a labelled value as an argument, strips the label, 321and returns the value component of the label/value pair. Labels can be stacked. 322If you apply 'open' enough times to strip all the labels, you get an anonymous 323record, module, function or constructor. The 'open' function is used to look 324inside a data abstraction, perhaps during debugging. Or it is applied in the 325body of a constructor to strip an existing label and apply a new one. 326 constructor cube n = open (box [n,n,n]); 327 328'include R' and '... R' ignore the label of R. 329 330'{include R, ***}' preserves the labelled status of the included fields. 331 Rationale: if you are including a library API into a superset library 332 module, you want to preserve API labels. 333'...R' preserves the labelled status of the included fields. 334 * For consistency with 'include', so as not to create a useless distinction 335 between "records" and "modules". 336 * Also, it is plausible to use a record comprehension to build an API. 337 When records are used purely as data structures, they don't need or use 338 field labels, but that's not a good reason to disallow the use of 339 comprehension syntax to build a module. 340So, a record has an extra bit of information per field: is it labelled? 341 342a==b 343 Two labelled values are equal if they have equal labels and equal payloads. 344 (So we are comparing documentation strings for equality?) 345 346Products, Sums and Subtypes 347--------------------------- 348Based on my requirements, it seems like I want the equivalents of nominally 349typed product types, sum types and subtypes. 350 351Problem: Having two ways to encode data: (1) the original Curv style, 352writing out structured data literals directly (without having to declare 353types first), and (2) first defining nominal types & type constructors, 354and then invoking those constructors. Which would suck: warring language 355constructs, which ones do I use? Can I unify these two approaches? 356 357A Haskell algebraic type such as 358 data T = Foo | Bar a 359could be represented by these constructors: 360 labelled Foo = {}; 361 labelled Bar x =labelled {}; 362 363Can labelled values created by these constructors be unified with tagged 364values? 365 366Criticism of Labelled Values for Performance 367-------------------------------------------- 368Labelled values as a performance hack for precise domains? 369At the cost of language complexity, and heading down the path towards 370reinventing the complex and inexpressive type systems found in mainstream 371languages? Maybe there is a better way. Try thinking differently about 372this problem. 373 374What if we start with a simpler design with the performance hit, then look for 375alternate ways to speed things up without so much language complexity. 376