1Rationale & Requirements
2------------------------
3Introducing labelled values (as opposed to using JSON style Plain Old Data
4to represent all data) is analogous to introducing nominal types (as opposed to
5using structural type equivalence exclusively). It's more language complexity,
6so why?
7  Modula 3/"How The Language Got Its Spots" provides a rationale for the latter
8  question (why branded types were added to Modula 3).
9   * Structural type equivalence is more elegant, because name equivalence
10     (in languages like Modula 2) violates referential transparency.
11   * Referential transparency is important because it allows you to reason
12     about the identity of types in data that crosses program boundaries.
13     (Something that JSON allows. An important consideration for Curv.)
14   * But sometimes you want two types with the same structure to be treated
15     as distinct types. The Modula-3 solution is to add a 'brand' string to
16     the type and use structural equivalence.
17   * Curv labels are like these brand strings. Referential transparency is
18     preserved. Taking this further, two labelled values are equal if the
19     labels are equal and the payloads are equal. It's possible for two
20     labelled values created by different libraries from different authors to
21     be accidently equal, but this problem is solved at a higher level with
22     culture and naming conventions.
23
24Tagged/Encapsulated/Abstract Data Types
25  The tag on a value guarantees that it conforms to a specified ADT:
26  the data structure isn't corrupted and the contained scalar values
27  are not out of range. The tag means that only constructors belonging
28  to the ADT can have constructed the value.
29
30Algebras
31  I want to use Algebra Directed Design when programming in Curv.
32  A single-sorted algebra is a record that packages a tagged data type with
33  operations and a predicate or type. If the type is scalar or a product,
34  then the primary constructor is 'call'. If the type is a sum, then there
35  are multiple primary constructors (not named 'call'). The primary constructors
36  return labelled values. There may be secondary constructors that are aliases
37  for calls to primary constructors. A primary constructor that doesn't carry
38  a payload is a labelled {}. The Algebra value itself is a labelled record.
39
40Values print as Expressions
41  Every value prints as an expression that, when evaluated in the correct
42  scope, reconstructs the original value. (This weakens the Branded Value
43  proposal.) What you see is all there is (no hidden state). Quite unlike
44  printing [1,2,3] in Python, for example.
45
46No Strict Encapsulation
47  Although ADTs provide a kind of "information hiding", encapsulation is not
48  strict. You have full access to a value's representation, which is important
49  for debugging and browsing a workspace. More Clojure than Haskell.
50
51Precise Domains
52  Each function F precisely defines its domain.
53  A call to F fails if the argument is not in the domain, giving a high
54  quality error message that tells the user they called the function
55  incorrectly. (As opposed to failing deep in the function's implementation
56  and giving a confusing stack trace.) ADT tags can be tested inexpensively
57  at runtime, unlike full data structure verification (of structural types).
58
59Managing Complexity: Self Identifying Data
60  This is an aid to human comprehension when reading a data dump.
61
62  We'd like to print function values in a format that provides provenance
63  and meaning, rather than just printing "<function>" or an anonymous lambda
64  expression. Just knowing the name of the function provides an important hint.
65
66  A 'cube' is represented by a complex data structure, and you can't tell what
67  it is just by looking at the raw data. If we print a cube value as a label,
68  eg as `cube 10`, then we can more easily understand what it is.
69
70Shapes
71  Every shape value has a Shape tag, which asserts the existence of the 5
72  shape fields, with the correct range for each field. There is a generic
73  shape constructor, currently 'make_shape': shapes that don't have a more
74  specific constructor print as 'make_shape' expressions. Each shape constructor
75  in the standard library (eg, 'cube') has a branded constructor that tags
76  shapes with this constructor. Cubes print as 'cube' expressions.
77  (Nested brands were not in the original Branded Value proposal.)
78
79Overloading
80  With tagged values, data can be more easily classified, so we can more
81  easily overload generic functions to take different actions on different
82  types of data.
83
84(1) Labelled Values and Algebras are the Curv equivalent of newtype/algebraic
85types in Haskell, classes in OOP, nominal structure and enum types in Algol
86descendents. (2) Theories are the Curv equivalent of type classes in Haskell,
87interfaces or traits or protocols in OOP. (3) The Tagless Final Interpreter
88papers describe a duality between algebraic types and type classes, and this
89finally made it seem inevitable that Curv should include features (1) and (2)
90in some form.
91
92Curv is intended to be as simple, elegant and expressive as possible (but no
93simpler). There is a high level subset that has a shallow learning curve and
94allows non-experts to quickly become productive. There are additional features
95needed by experts, which are used for designing library APIs that have high
96performance and are easy to use. While a simple language can be learned more
97quickly, the extra complexity added by labelled values is used by experts
98to construct APIs that are easier to use for non-experts. Values are self
99identifying and easy to decode, and contain documentation metadata. Functions
100produce higher level and more accurate error messages when you misuse them.
101
102Uniform notation for labelled function definitions
103--------------------------------------------------
104In Curv 0.4, a function definition like this:
105    f x = x + 1;
106results in a labelled function value where 'f' prints as '<function f>'.
107
108However, the name 'f' is not stored if the function is constructed by a
109combinator.
110    f = <function expression>;
111
112To make function labelling work uniformly regardless of whether a combinator
113is used, we will change the specialized function definition syntax to require
114a keyword prefix, like this:
115    func f x = x + 1;
116Most mainstream languages use a keyword prefix for function definitions,
117so it's cool. Popular choices for this keyword: function, def, func, fn.
118
119Then,
120    func f = <function expression>;
121is available for defining a labelled function using a combinator.
122
123(Note: 'func' panics if <function expression> doesn't return a function,
124as defined by 'is_func'. A record with a 'call' field containing a function
125will work.)
126
127Further rationale: In Curv 0.4, function literals are magic, because
128    f = x -> x + 1
129results in a function labelled 'f', but substituting the definiens for
130another expression returning the same value produces a different result.
131This breaks equational reasoning. We need a special keyword like 'func'
132to indicate the definition of a labelled value.
133
134But, what if the entire source file is a function expression, and the file
135is an element of a module using directory syntax? It seems we need to add a
136magic token or keyword to the start of the source file, especially when a
137combinator builds the function value.
138 * 'func' will work.
139 * The original Branded proposal used '@' for this.
140
141Related proposals
142-----------------
143But, I also intend to print function values as constructor expressions
144that reconstruct the original value.
145 * A builtin function prints as the function name.
146 * A function from a (nonrecursive) lambda expression prints as a lambda
147   expression.
148 * A function from a recursive lambda expression prints as
149      let fac n = ...fac(n-1)... in fac
150
151In addition, there is a proposal for branded record and function values.
152A branded value is printed as a constructor expression (global variable name,
153function call or selection), rather than as a record or function literal.
154Builtin functions are considered branded, and print as a function name.
155There are also user-defined branded functions and records.
156The limitation is that there exists a global context in which all such
157constructor expressions can be evaluated to reconstruct the value, which
158means that a function value from a local function definition can't be branded,
159even though we do label such function values in Curv 0.4.
160
161    This includes a definition syntax for named, branded record fields.
162    Something like:
163        @name = <rec-or-func>;
164        def name = <rec-or-func>;
165        term name = <rec-or-func>;
166        named name = <rec-or-func>;
167        module name = <rec-or-func>;
168        labelled name = <rec-or-func>;
169    Which is related to the 'func' labelled function definition syntax.
170    Notes:
171     * A 'module' is an abstract, labelled, record or module value.
172
173    In a *.curv source file referenced by a directory (directory syntax),
174    the source file begins with '@' in one version of the proposal.
175
176Finally, I want a 'help' function for displaying the documentation for a
177function value. 'help f' evaluates 'f' as a function expression, then returns
178a multi-line documentation string. Function name, parameter documentation,
179plus a general doc string. Probably this is REPL only.
180
181    Syntax is not designed yet. The metadata will include a function name,
182    even if the function is locally defined and not branded. It will include
183    a function comment, and parameter comments. A keyword like 'func' (or
184    'def' or 'term') will be needed to trigger the collection and storage of
185    this metadata in the case where a function is defined with a combinator.
186
187Combined Proposal
188-----------------
189Use same syntax for labelled and branded func/record definitions.
190Extend this syntax with main docstring and parameter/field docstrings.
191
192When this syntax is used, we collect and store metadata.
193When this syntax is not used, we don't collect and store metadata.
194We don't decide based on the syntax of the definiens expression (eg, is it
195a lambda expression), because this breaks the algebra of programs.
196You should be able to substitute a lambda expression for another expression
197that computes the same value without changing the meaning of the program.
198
199Docstring Syntax
200----------------
201I don't want to overthink or overengineer the doc syntax.
202This is a transitional design, so I'm going for the simplest design that works,
203and I'll get experience with this before choosing a final design.
204
205I don't need any new syntax for doc strings. If I just do what pydoc does,
206it's fine. Pydoc doesn't require string-literal docstring syntax, it will
207collect a documentation string from a block comment preceding a definition.
208I'll just rely on block comment syntax.
209
210Quote from pydoc:
211  For modules, classes, functions and methods, the displayed documentation
212  is derived from the docstring (i.e. the __doc__ attribute) of the object,
213  and recursively of its documentable members. If there is no docstring,
214  pydoc tries to obtain a description from the block of comment lines just
215  above the definition of the class, function or method in the source file,
216  or at the top of the module.
217
218In Curv, a labelled value definition (function or module or constructor) can be
219preceded with a block comment, which provides the doc string.
220
221I considered collecting docstrings from formal parameter comments,
222like in doxygen. However, I won't do this. First, it doesn't work for
223functions defined using combinators. Second, there's no need (Python doesn't
224do this: the parameter documentation is embedded in the function's docstring).
225So this idea is overengineering.
226
227What about per-module-field docstrings?
228 1. Internal field documentation.
229    Maybe only labelled values in a module have their own documentation,
230    similar to how in Python, only "documentable members" of a class or module
231    have documentation. To collect documentation from a module, you get the
232    module's docstring, then you iterate over the fields, and for each element
233    that is a labelled value, you collect that labelled value's documentation.
234    (Note that a labelled value field can be defined as 'x = labelled_val_expr',
235    so in this case the documentation string is actually defined elsewhere.)
236    Fields that don't have internal documentation can be documented in the
237    module's central docstring. This design nicely explains how unlabelled
238    definitions like
239        x = expr
240        [a,b] = expr
241        include expr
242    can produce fields with docstrings. It's compositional. To display module
243    documentation, first display the module's docstring (which will end with
244    docs for otherwise undocumented fields), then display the docstrings for
245    each member value that has one. I should start with this design, because
246    anything else will be a superset. And hey, it's good enough for Python.
247 2. External field documentation.
248    Maybe I want all the members of a labelled module to have their own doc
249    strings, even definitions like 'pi' in stdlib. This gets complicated:
250    we have to define the precedence of external vs internal docstrings,
251    and define rules for multi-variable definitions. Later.
252
253Data Model
254----------
255A brand or formula is:
256  <brand> ::= <identifier>
257            | <brand> . <identifier>
258            | <brand> <argument>
259  <formula> ::= <symbol>                // variable formula
260              | <module> . <symbol>     // field selection formula
261              | <function> <argvalue>   // function call formula
262Storing the original constructor function value in a call formula is useful
263for "customizing" a parametric model (tweaking some of the parameters).
264Storing the original module value in a field formula may also be useful,
265but depending on implementation, it might result in a recursive object cycle
266(requiring a tracing garbage collector or cyclic reference counts), or we can
267avoid that by applying the label to the field value each time the '.' operator
268is evaluated. Alternatively, instead of storing a module value, we store the
269module's formula.
270
271A label is a pair of [formula, docstring]. The docstring may be an empty string.
272
273A function has a stack of 0 or more labels, and orthogonal to that,
274it has a constructor flag. If the constructor flag is true, the function
275is called a constructor.
276
277A record has a stack of 0 or more labels, and orthogonal to that, each field
278has a constructor flag.
279
280Labelled Definition Syntax
281--------------------------
282I don't have a syntax I like; I don't know what keywords or symbols to use.
283It is an esthetic and linguistic problem of choosing the right words.
284As a stopgap measure, here is a temporary syntax.
285
286<definition> ::=
287  labelled <singleton-pattern> = <function-or-module-expression>
288  labelled <id> <param>... = <expression>
289      same as labelled <id> = <param> -> labelled ... -> <expression>
290  constructor <id> <param>... = <function-or-module-expression>
291      same as labelled <id> = <param> -> labelled ... -> labelled <expression>
292
293In the case of directory syntax, the first token in a *.curv file appearing
294as a directory entry can begin with the 'labelled' keyword,
295and that results in a labelled definition.
296
297<program> ::=
298  labelled <expression>
299
300In all cases, the keyword 'labelled' or 'constructor' can be preceeded by a
301block comment, which will form the doc string for the labelled function or
302module.
303
304Syntax alternative: In the Branded proposal, the keyword 'labelled' is
305replaced by '@', and each variable in a definiendum pattern can be either
306prefixed or not prefixed by '@'. But this doesn't take doc strings into
307account. In the present syntax, the entire definition is prefixed by
308    // <doc comment> <nl> labelled <definition>
309
310Syntax alternative: In the Branded proposal, an anonymous constructor
311is 'a -> @b' and a labelled constructor definition is '@f a = @b'.
312(Instead of two special tokens, => and constructor, we only need one.)
313And also, we could put a docstring before the second @ in a constructor:
314    <doc> @f a = <doc>@b
315    <doc> @f = a -> <doc>@b
316
317A module constructor is a scoped record constructor containing one or more
318labelled field definitions. When an anonymous module value is bound as the
319definiens in a labelled definition, the label from that definition is combined
320with field names and applied to the labelled fields in the module.
321
322A constructor expression is like a lambda expression with -> replaced by =>.
323When an anonymous constructor value is bound as the definiens in a labelled
324definition, the label from that definition is applied to the result of calling
325the constructor function.
326
327We need anonymous constructor values as a first class concept, otherwise we
328can't use combinators to compute a constructor before binding it to a label.
329
330Labelled Value Semantics
331------------------------
332An anonymous module value is printed as a sequence of singleton definitions
333within curly braces, where labelled definitions are prefixed with 'labelled'.
334
335An anonymous constructor value is printed as <pattern> ->labelled <expr>.
336
337A labelled record, module, function or constructor is printed as a label
338expression.
339
340The 'open' function takes a labelled value as an argument, strips the label,
341and returns the value component of the label/value pair. Labels can be stacked.
342If you apply 'open' enough times to strip all the labels, you get an anonymous
343record, module, function or constructor. The 'open' function is used to look
344inside a data abstraction, perhaps during debugging. Or it is applied in the
345body of a constructor to strip an existing label and apply a new one.
346    constructor cube n = open (box [n,n,n]);
347
348'include R' and '... R' ignore the label of R.
349
350'{include R, ***}' preserves the labelled status of the included fields.
351    Rationale: if you are including a library API into a superset library
352    module, you want to preserve API labels.
353'...R' preserves the labelled status of the included fields.
354    * For consistency with 'include', so as not to create a useless distinction
355      between "records" and "modules".
356    * Also, it is plausible to use a record comprehension to build an API.
357    When records are used purely as data structures, they don't need or use
358    field labels, but that's not a good reason to disallow the use of
359    comprehension syntax to build a module.
360So, a record has an extra bit of information per field: is it labelled?
361
362a==b
363    Two labelled values are equal if they have equal labels and equal payloads.
364    (So we are comparing documentation strings for equality?)
365
366Products, Sums and Subtypes
367---------------------------
368Based on my requirements, it seems like I want the equivalents of nominally
369typed product types, sum types and subtypes.
370
371Problem: Having two ways to encode data: (1) the original Curv style,
372writing out structured data literals directly (without having to declare
373types first), and (2) first defining nominal types & type constructors,
374and then invoking those constructors. Which would suck: warring language
375constructs, which ones do I use? Can I unify these two approaches?
376
377A Haskell algebraic type such as
378    data T = Foo | Bar a
379could be represented by these constructors:
380    labelled Foo = {};
381    labelled Bar x =labelled {};
382
383Can labelled values created by these constructors be unified with tagged
384values?
385
386Criticism of Labelled Values for Performance
387--------------------------------------------
388Labelled values as a performance hack for precise domains?
389At the cost of language complexity, and heading down the path towards
390reinventing the complex and inexpressive type systems found in mainstream
391languages? Maybe there is a better way. Try thinking differently about
392this problem.
393
394What if we start with a simpler design with the performance hit, then look for
395alternate ways to speed things up without so much language complexity.
396