1Named Functions and Modules
2===========================
3Store documentation and provenance metadata in function/module values.
4
5Named functions and modules are the components out of which library APIs
6are built. They are the modules and functors of the Modular Programming
7proposal. Named modules are not used as instances of abstract data types,
8but named functions are used as constructors of ADT instances.
9
10This is a transitional design, not the final Curv 1.0 design.
11
12This is a variant of the "branded values" proposal.
13 * Weaker, because it doesn't require the printed representation to be globally
14   unambiguous, or that value equality is isomorphic to printed rep equality.
15   (This is a transitional design, I'll worry about that other stuff later.)
16 * Named values may contain documentation metadata.
17
18Named Functions
19---------------
20A function literal like `x->x+1` constructs a function value that prints
21as a lambda expression. If you bind this to a name using
22    incr = x->x+1
23then by the law of substitution, `incr` is just an alias for x->x+1,
24and `incr` also prints as a lambda expression.
25
26To define a named function, you can write:
27    func incr = x->x+1
28    func incr x = x + 1
29and `incr` prints as the name `incr`. The `func` keyword signals that we are
30constructing a named function value. The old definition syntax
31    incr x = x + 1
32is deprecated.
33
34In `func f = expr`, the value of `expr` must match the `is_func` predicate
35or an error is reported.
36
37If a *.curv source file occurs as an element of a directory record, and the
38contents of the source file are a function expression, then use `func` as the
39first token in the file to define a named function.
40
41How are anonymous recursive functions printed?
42* An earlier proposal suggests printing as a `let` expression:
43      let fac n = ...fac(n-1)... in fac
44* However, we could impose the restriction that recursive functions must
45  be named (the definition must use 'func').
46    fac = n -> if (n <= 0) 1 else n * fac(n-1)
47        ERROR: recursive function 'fac' must be declared using 'func' keyword.
48    func fac = n -> if (n <= 0) 1 else n * fac(n-1)
49    func fac n = if (n <= 0) 1 else n * fac(n-1)
50  This simplifies the job of printing a recursive function.
51  It also simplifies the implementation of recursive functions.
52  (I can avoid the recursive refcounting proposal for now.)
53* This simplification would not prohibit an operation for stripping the name
54  from a named recursive function to expose the lambda expression.
55
56Builtin functions like `sin` are named functions.
57
58Function Documentation
59----------------------
60Named functions can contain embedded documentation, which is printed
61by the `help` command in the REPL.
62
63The `func` keyword can be preceded by a multi-line comment, which is used
64as a docstring and embedded in the function value.
65
66Named Modules
67-------------
68A module is a named record, defined as
69    module name = record_expr
70The `module` keyword works like `func`: it can be preceded by a doc comment,
71and it can appear as the first token in a source file.
72
73We should support mutually recursive functions that exist in different
74named modules. There are good reasons to support this in modular programming.
75
76A module constructor is a set of definitions surrounded by {}.
77Some of the definitions are 'named' (prefixed with func or module),
78and some are not. We store a boolean flag with each field, in the module's
79metadata, indicating which fields are named. This is separate from the field
80values. There is a distinction between a named field, and an anonymous field
81that happens to be bound to a named value.
82
83A module is a record with extra metadata, including this field metadata.
84'Module' is a subtype of 'Record'. All the algebraic laws for records also
85apply to modules.
86
87`module m = r` constructs a new value 'm' which has the name 'm' but
88inherits all the other properties of record r, including its field metadata.
89`m.f` returns a named value, whose name is `m.f`, if `f` is a named field of m.
90
91Constructors
92------------
93A constructor is a function that returns a named value.
94
95Anonymous constructor:
96    param ->func expr
97    param ->module expr
98
99func f x y = x+y (curried function) is equivalent to
100    func f = x ->func y -> x+y
101A partially applied named curried function is a constructor.
102And this might motivate a different syntax for curried lambdas:
103    x y -> x+y
104is short for
105    x ->func y -> x+y
106
107'module f x = mexpr' is equivalent to
108    func f x = module mexpr
109
110Related proposals
111-----------------
112But, I also intend to print function values as constructor expressions
113that reconstruct the original value.
114 * A builtin function prints as the function name.
115 * A function from a (nonrecursive) lambda expression prints as a lambda
116   expression.
117 * A function from a recursive lambda expression prints as
118      let fac n = ...fac(n-1)... in fac
119
120In addition, there is a proposal for branded record and function values.
121A branded value is printed as a constructor expression (global variable name,
122function call or selection), rather than as a record or function literal.
123Builtin functions are considered branded, and print as a function name.
124There are also user-defined branded functions and records.
125The limitation is that there exists a global context in which all such
126constructor expressions can be evaluated to reconstruct the value, which
127means that a function value from a local function definition can't be branded,
128even though we do label such function values in Curv 0.4.
129
130    This includes a definition syntax for named, branded record fields.
131    Something like:
132        @name = <rec-or-func>;
133        def name = <rec-or-func>;
134        term name = <rec-or-func>;
135        named name = <rec-or-func>;
136        module name = <rec-or-func>;
137        labelled name = <rec-or-func>;
138    Which is related to the 'func' labelled function definition syntax.
139    Notes:
140     * A 'module' is an abstract, labelled, record or module value.
141
142    In a *.curv source file referenced by a directory (directory syntax),
143    the source file begins with '@' in one version of the proposal.
144
145Finally, I want a 'help' function for displaying the documentation for a
146function value. 'help f' evaluates 'f' as a function expression, then returns
147a multi-line documentation string. Function name, parameter documentation,
148plus a general doc string. Probably this is REPL only.
149
150    Syntax is not designed yet. The metadata will include a function name,
151    even if the function is locally defined and not branded. It will include
152    a function comment, and parameter comments. A keyword like 'func' (or
153    'def' or 'term') will be needed to trigger the collection and storage of
154    this metadata in the case where a function is defined with a combinator.
155
156Combined Proposal
157-----------------
158Use same syntax for labelled and branded func/record definitions.
159Extend this syntax with main docstring and parameter/field docstrings.
160
161When this syntax is used, we collect and store metadata.
162When this syntax is not used, we don't collect and store metadata.
163We don't decide based on the syntax of the definiens expression (eg, is it
164a lambda expression), because this breaks the algebra of programs.
165You should be able to substitute a lambda expression for another expression
166that computes the same value without changing the meaning of the program.
167
168Docstring Syntax
169----------------
170I don't want to overthink or overengineer the doc syntax.
171This is a transitional design, so I'm going for the simplest design that works,
172and I'll get experience with this before choosing a final design.
173
174I don't need any new syntax for doc strings. If I just do what pydoc does,
175it's fine. Pydoc doesn't require string-literal docstring syntax, it will
176collect a documentation string from a block comment preceding a definition.
177I'll just rely on block comment syntax.
178
179Quote from pydoc:
180  For modules, classes, functions and methods, the displayed documentation
181  is derived from the docstring (i.e. the __doc__ attribute) of the object,
182  and recursively of its documentable members. If there is no docstring,
183  pydoc tries to obtain a description from the block of comment lines just
184  above the definition of the class, function or method in the source file,
185  or at the top of the module.
186
187In Curv, a labelled value definition (function or module or constructor) can be
188preceded with a block comment, which provides the doc string.
189
190I considered collecting docstrings from formal parameter comments,
191like in doxygen. However, I won't do this. First, it doesn't work for
192functions defined using combinators. Second, there's no need (Python doesn't
193do this: the parameter documentation is embedded in the function's docstring).
194So this idea is overengineering.
195
196What about per-module-field docstrings?
197 1. Internal field documentation.
198    Maybe only labelled values in a module have their own documentation,
199    similar to how in Python, only "documentable members" of a class or module
200    have documentation. To collect documentation from a module, you get the
201    module's docstring, then you iterate over the fields, and for each element
202    that is a labelled value, you collect that labelled value's documentation.
203    (Note that a labelled value field can be defined as 'x = labelled_val_expr',
204    so in this case the documentation string is actually defined elsewhere.)
205    Fields that don't have internal documentation can be documented in the
206    module's central docstring. This design nicely explains how unlabelled
207    definitions like
208        x = expr
209        [a,b] = expr
210        include expr
211    can produce fields with docstrings. It's compositional. To display module
212    documentation, first display the module's docstring (which will end with
213    docs for otherwise undocumented fields), then display the docstrings for
214    each member value that has one. I should start with this design, because
215    anything else will be a superset. And hey, it's good enough for Python.
216 2. External field documentation.
217    Maybe I want all the members of a labelled module to have their own doc
218    strings, even definitions like 'pi' in stdlib. This gets complicated:
219    we have to define the precedence of external vs internal docstrings,
220    and define rules for multi-variable definitions. Later.
221
222Data Model
223----------
224A brand or formula is:
225  <brand> ::= <identifier>
226            | <brand> . <identifier>
227            | <brand> <argument>
228  <formula> ::= <symbol>                // variable formula
229              | <module> . <symbol>     // field selection formula
230              | <function> <argvalue>   // function call formula
231
232Previously,
233  Storing the original constructor function value in a call formula is useful
234  for "customizing" a parametric model (tweaking some of the parameters).
235But parametric shapes are not named modules in this design.
236
237Previously,
238  Storing the original module value in a field formula may also be useful,
239  but depending on implementation, it might result in a recursive object cycle
240  (requiring a tracing garbage collector or cyclic reference counts), or we can
241  avoid that by applying the label to the field value each time the '.' operator
242  is evaluated. Alternatively, instead of storing a module value, we store the
243  module's formula.
244But in this design, a formula only needs to be purely symbolic. I don't have
245a rationale for storing function and module values in a formula.
246
247  <formula> ::= <symbol>               // variable formula
248              | <formula> . <symbol>   // field selection formula
249              | <formula> <argvalue>   // function call formula
250
251A label is a pair of [formula, docstring]. The docstring may be an empty string.
252
253A function has an optional label, and orthogonal to that,
254it has a constructor flag. If the constructor flag is true, the function
255is called a constructor.
256
257A record has an optional label, and orthogonal to that, each field
258has a constructor flag.
259
260Labelled Definition Syntax
261--------------------------
262I don't have a syntax I like; I don't know what keywords or symbols to use.
263It is an esthetic and linguistic problem of choosing the right words.
264As a stopgap measure, here is a temporary syntax.
265
266<definition> ::=
267  labelled <singleton-pattern> = <function-or-module-expression>
268  labelled <id> <param>... = <expression>
269      same as labelled <id> = <param> -> labelled ... -> <expression>
270  constructor <id> <param>... = <function-or-module-expression>
271      same as labelled <id> = <param> -> labelled ... -> labelled <expression>
272
273In the case of directory syntax, the first token in a *.curv file appearing
274as a directory entry can begin with the 'labelled' keyword,
275and that results in a labelled definition.
276
277<program> ::=
278  labelled <expression>
279
280In all cases, the keyword 'labelled' or 'constructor' can be preceeded by a
281block comment, which will form the doc string for the labelled function or
282module.
283
284Syntax alternative: In the Branded proposal, the keyword 'labelled' is
285replaced by '@', and each variable in a definiendum pattern can be either
286prefixed or not prefixed by '@'. But this doesn't take doc strings into
287account. In the present syntax, the entire definition is prefixed by
288    // <doc comment> <nl> labelled <definition>
289
290Syntax alternative: In the Branded proposal, an anonymous constructor
291is 'a -> @b' and a labelled constructor definition is '@f a = @b'.
292(Instead of two special tokens, => and constructor, we only need one.)
293And also, we could put a docstring before the second @ in a constructor:
294    <doc> @f a = <doc>@b
295    <doc> @f = a -> <doc>@b
296
297A module constructor is a scoped record constructor containing one or more
298labelled field definitions. When an anonymous module value is bound as the
299definiens in a labelled definition, the label from that definition is combined
300with field names and applied to the labelled fields in the module.
301
302A constructor expression is like a lambda expression with -> replaced by =>.
303When an anonymous constructor value is bound as the definiens in a labelled
304definition, the label from that definition is applied to the result of calling
305the constructor function.
306
307We need anonymous constructor values as a first class concept, otherwise we
308can't use combinators to compute a constructor before binding it to a label.
309
310Labelled Value Semantics
311------------------------
312An anonymous module value is printed as a sequence of singleton definitions
313within curly braces, where labelled definitions are prefixed with 'labelled'.
314
315An anonymous constructor value is printed as <pattern> ->labelled <expr>.
316
317A labelled record, module, function or constructor is printed as a label
318expression.
319
320The 'open' function takes a labelled value as an argument, strips the label,
321and returns the value component of the label/value pair. Labels can be stacked.
322If you apply 'open' enough times to strip all the labels, you get an anonymous
323record, module, function or constructor. The 'open' function is used to look
324inside a data abstraction, perhaps during debugging. Or it is applied in the
325body of a constructor to strip an existing label and apply a new one.
326    constructor cube n = open (box [n,n,n]);
327
328'include R' and '... R' ignore the label of R.
329
330'{include R, ***}' preserves the labelled status of the included fields.
331    Rationale: if you are including a library API into a superset library
332    module, you want to preserve API labels.
333'...R' preserves the labelled status of the included fields.
334    * For consistency with 'include', so as not to create a useless distinction
335      between "records" and "modules".
336    * Also, it is plausible to use a record comprehension to build an API.
337    When records are used purely as data structures, they don't need or use
338    field labels, but that's not a good reason to disallow the use of
339    comprehension syntax to build a module.
340So, a record has an extra bit of information per field: is it labelled?
341
342a==b
343    Two labelled values are equal if they have equal labels and equal payloads.
344    (So we are comparing documentation strings for equality?)
345
346Products, Sums and Subtypes
347---------------------------
348Based on my requirements, it seems like I want the equivalents of nominally
349typed product types, sum types and subtypes.
350
351Problem: Having two ways to encode data: (1) the original Curv style,
352writing out structured data literals directly (without having to declare
353types first), and (2) first defining nominal types & type constructors,
354and then invoking those constructors. Which would suck: warring language
355constructs, which ones do I use? Can I unify these two approaches?
356
357A Haskell algebraic type such as
358    data T = Foo | Bar a
359could be represented by these constructors:
360    labelled Foo = {};
361    labelled Bar x =labelled {};
362
363Can labelled values created by these constructors be unified with tagged
364values?
365
366Criticism of Labelled Values for Performance
367--------------------------------------------
368Labelled values as a performance hack for precise domains?
369At the cost of language complexity, and heading down the path towards
370reinventing the complex and inexpressive type systems found in mainstream
371languages? Maybe there is a better way. Try thinking differently about
372this problem.
373
374What if we start with a simpler design with the performance hit, then look for
375alternate ways to speed things up without so much language complexity.
376