1Rationale & Requirements 2------------------------ 3Introducing labelled values (as opposed to using JSON style Plain Old Data 4to represent all data) is analogous to introducing nominal types (as opposed to 5using structural type equivalence exclusively). It's more language complexity, 6so why? 7 Modula 3/"How The Language Got Its Spots" provides a rationale for the latter 8 question (why branded types were added to Modula 3). 9 * Structural type equivalence is more elegant, because name equivalence 10 (in languages like Modula 2) violates referential transparency. 11 * Referential transparency is important because it allows you to reason 12 about the identity of types in data that crosses program boundaries. 13 (Something that JSON allows. An important consideration for Curv.) 14 * But sometimes you want two types with the same structure to be treated 15 as distinct types. The Modula-3 solution is to add a 'brand' string to 16 the type and use structural equivalence. 17 * Curv labels are like these brand strings. Referential transparency is 18 preserved. Taking this further, two labelled values are equal if the 19 labels are equal and the payloads are equal. It's possible for two 20 labelled values created by different libraries from different authors to 21 be accidently equal, but this problem is solved at a higher level with 22 culture and naming conventions. 23 24Tagged/Encapsulated/Abstract Data Types 25 The tag on a value guarantees that it conforms to a specified ADT: 26 the data structure isn't corrupted and the contained scalar values 27 are not out of range. The tag means that only constructors belonging 28 to the ADT can have constructed the value. 29 30Algebras 31 I want to use Algebra Directed Design when programming in Curv. 32 A single-sorted algebra is a record that packages a tagged data type with 33 operations and a predicate or type. If the type is scalar or a product, 34 then the primary constructor is 'call'. If the type is a sum, then there 35 are multiple primary constructors (not named 'call'). The primary constructors 36 return labelled values. There may be secondary constructors that are aliases 37 for calls to primary constructors. A primary constructor that doesn't carry 38 a payload is a labelled {}. The Algebra value itself is a labelled record. 39 40Values print as Expressions 41 Every value prints as an expression that, when evaluated in the correct 42 scope, reconstructs the original value. (This weakens the Branded Value 43 proposal.) What you see is all there is (no hidden state). Quite unlike 44 printing [1,2,3] in Python, for example. 45 46No Strict Encapsulation 47 Although ADTs provide a kind of "information hiding", encapsulation is not 48 strict. You have full access to a value's representation, which is important 49 for debugging and browsing a workspace. More Clojure than Haskell. 50 51Precise Domains 52 Each function F precisely defines its domain. 53 A call to F fails if the argument is not in the domain, giving a high 54 quality error message that tells the user they called the function 55 incorrectly. (As opposed to failing deep in the function's implementation 56 and giving a confusing stack trace.) ADT tags can be tested inexpensively 57 at runtime, unlike full data structure verification (of structural types). 58 59Managing Complexity: Self Identifying Data 60 This is an aid to human comprehension when reading a data dump. 61 62 We'd like to print function values in a format that provides provenance 63 and meaning, rather than just printing "<function>" or an anonymous lambda 64 expression. Just knowing the name of the function provides an important hint. 65 66 A 'cube' is represented by a complex data structure, and you can't tell what 67 it is just by looking at the raw data. If we print a cube value as a label, 68 eg as `cube 10`, then we can more easily understand what it is. 69 70Shapes 71 Every shape value has a Shape tag, which asserts the existence of the 5 72 shape fields, with the correct range for each field. There is a generic 73 shape constructor, currently 'make_shape': shapes that don't have a more 74 specific constructor print as 'make_shape' expressions. Each shape constructor 75 in the standard library (eg, 'cube') has a branded constructor that tags 76 shapes with this constructor. Cubes print as 'cube' expressions. 77 (Nested brands were not in the original Branded Value proposal.) 78 79Overloading 80 With tagged values, data can be more easily classified, so we can more 81 easily overload generic functions to take different actions on different 82 types of data. 83 84(1) Labelled Values and Algebras are the Curv equivalent of newtype/algebraic 85types in Haskell, classes in OOP, nominal structure and enum types in Algol 86descendents. (2) Theories are the Curv equivalent of type classes in Haskell, 87interfaces or traits or protocols in OOP. (3) The Tagless Final Interpreter 88papers describe a duality between algebraic types and type classes, and this 89finally made it seem inevitable that Curv should include features (1) and (2) 90in some form. 91 92Curv is intended to be as simple, elegant and expressive as possible (but no 93simpler). There is a high level subset that has a shallow learning curve and 94allows non-experts to quickly become productive. There are additional features 95needed by experts, which are used for designing library APIs that have high 96performance and are easy to use. While a simple language can be learned more 97quickly, the extra complexity added by labelled values is used by experts 98to construct APIs that are easier to use for non-experts. Values are self 99identifying and easy to decode, and contain documentation metadata. Functions 100produce higher level and more accurate error messages when you misuse them. 101 102Uniform notation for labelled function definitions 103-------------------------------------------------- 104In Curv 0.4, a function definition like this: 105 f x = x + 1; 106results in a labelled function value where 'f' prints as '<function f>'. 107 108However, the name 'f' is not stored if the function is constructed by a 109combinator. 110 f = <function expression>; 111 112To make function labelling work uniformly regardless of whether a combinator 113is used, we will change the specialized function definition syntax to require 114a keyword prefix, like this: 115 func f x = x + 1; 116Most mainstream languages use a keyword prefix for function definitions, 117so it's cool. Popular choices for this keyword: function, def, func, fn. 118 119Then, 120 func f = <function expression>; 121is available for defining a labelled function using a combinator. 122 123(Note: 'func' panics if <function expression> doesn't return a function, 124as defined by 'is_func'. A record with a 'call' field containing a function 125will work.) 126 127Further rationale: In Curv 0.4, function literals are magic, because 128 f = x -> x + 1 129results in a function labelled 'f', but substituting the definiens for 130another expression returning the same value produces a different result. 131This breaks equational reasoning. We need a special keyword like 'func' 132to indicate the definition of a labelled value. 133 134But, what if the entire source file is a function expression, and the file 135is an element of a module using directory syntax? It seems we need to add a 136magic token or keyword to the start of the source file, especially when a 137combinator builds the function value. 138 * 'func' will work. 139 * The original Branded proposal used '@' for this. 140 141Related proposals 142----------------- 143But, I also intend to print function values as constructor expressions 144that reconstruct the original value. 145 * A builtin function prints as the function name. 146 * A function from a (nonrecursive) lambda expression prints as a lambda 147 expression. 148 * A function from a recursive lambda expression prints as 149 let fac n = ...fac(n-1)... in fac 150 151In addition, there is a proposal for branded record and function values. 152A branded value is printed as a constructor expression (global variable name, 153function call or selection), rather than as a record or function literal. 154Builtin functions are considered branded, and print as a function name. 155There are also user-defined branded functions and records. 156The limitation is that there exists a global context in which all such 157constructor expressions can be evaluated to reconstruct the value, which 158means that a function value from a local function definition can't be branded, 159even though we do label such function values in Curv 0.4. 160 161 This includes a definition syntax for named, branded record fields. 162 Something like: 163 @name = <rec-or-func>; 164 def name = <rec-or-func>; 165 term name = <rec-or-func>; 166 named name = <rec-or-func>; 167 module name = <rec-or-func>; 168 labelled name = <rec-or-func>; 169 Which is related to the 'func' labelled function definition syntax. 170 Notes: 171 * A 'module' is an abstract, labelled, record or module value. 172 173 In a *.curv source file referenced by a directory (directory syntax), 174 the source file begins with '@' in one version of the proposal. 175 176Finally, I want a 'help' function for displaying the documentation for a 177function value. 'help f' evaluates 'f' as a function expression, then returns 178a multi-line documentation string. Function name, parameter documentation, 179plus a general doc string. Probably this is REPL only. 180 181 Syntax is not designed yet. The metadata will include a function name, 182 even if the function is locally defined and not branded. It will include 183 a function comment, and parameter comments. A keyword like 'func' (or 184 'def' or 'term') will be needed to trigger the collection and storage of 185 this metadata in the case where a function is defined with a combinator. 186 187Combined Proposal 188----------------- 189Use same syntax for labelled and branded func/record definitions. 190Extend this syntax with main docstring and parameter/field docstrings. 191 192When this syntax is used, we collect and store metadata. 193When this syntax is not used, we don't collect and store metadata. 194We don't decide based on the syntax of the definiens expression (eg, is it 195a lambda expression), because this breaks the algebra of programs. 196You should be able to substitute a lambda expression for another expression 197that computes the same value without changing the meaning of the program. 198 199Docstring Syntax 200---------------- 201I don't want to overthink or overengineer the doc syntax. 202This is a transitional design, so I'm going for the simplest design that works, 203and I'll get experience with this before choosing a final design. 204 205I don't need any new syntax for doc strings. If I just do what pydoc does, 206it's fine. Pydoc doesn't require string-literal docstring syntax, it will 207collect a documentation string from a block comment preceding a definition. 208I'll just rely on block comment syntax. 209 210Quote from pydoc: 211 For modules, classes, functions and methods, the displayed documentation 212 is derived from the docstring (i.e. the __doc__ attribute) of the object, 213 and recursively of its documentable members. If there is no docstring, 214 pydoc tries to obtain a description from the block of comment lines just 215 above the definition of the class, function or method in the source file, 216 or at the top of the module. 217 218In Curv, a labelled value definition (function or module or constructor) can be 219preceded with a block comment, which provides the doc string. 220 221I considered collecting docstrings from formal parameter comments, 222like in doxygen. However, I won't do this. First, it doesn't work for 223functions defined using combinators. Second, there's no need (Python doesn't 224do this: the parameter documentation is embedded in the function's docstring). 225So this idea is overengineering. 226 227What about per-module-field docstrings? 228 1. Internal field documentation. 229 Maybe only labelled values in a module have their own documentation, 230 similar to how in Python, only "documentable members" of a class or module 231 have documentation. To collect documentation from a module, you get the 232 module's docstring, then you iterate over the fields, and for each element 233 that is a labelled value, you collect that labelled value's documentation. 234 (Note that a labelled value field can be defined as 'x = labelled_val_expr', 235 so in this case the documentation string is actually defined elsewhere.) 236 Fields that don't have internal documentation can be documented in the 237 module's central docstring. This design nicely explains how unlabelled 238 definitions like 239 x = expr 240 [a,b] = expr 241 include expr 242 can produce fields with docstrings. It's compositional. To display module 243 documentation, first display the module's docstring (which will end with 244 docs for otherwise undocumented fields), then display the docstrings for 245 each member value that has one. I should start with this design, because 246 anything else will be a superset. And hey, it's good enough for Python. 247 2. External field documentation. 248 Maybe I want all the members of a labelled module to have their own doc 249 strings, even definitions like 'pi' in stdlib. This gets complicated: 250 we have to define the precedence of external vs internal docstrings, 251 and define rules for multi-variable definitions. Later. 252 253Data Model 254---------- 255A brand or formula is: 256 <brand> ::= <identifier> 257 | <brand> . <identifier> 258 | <brand> <argument> 259 <formula> ::= <symbol> // variable formula 260 | <module> . <symbol> // field selection formula 261 | <function> <argvalue> // function call formula 262Storing the original constructor function value in a call formula is useful 263for "customizing" a parametric model (tweaking some of the parameters). 264Storing the original module value in a field formula may also be useful, 265but depending on implementation, it might result in a recursive object cycle 266(requiring a tracing garbage collector or cyclic reference counts), or we can 267avoid that by applying the label to the field value each time the '.' operator 268is evaluated. Alternatively, instead of storing a module value, we store the 269module's formula. 270 271A label is a pair of [formula, docstring]. The docstring may be an empty string. 272 273A function has a stack of 0 or more labels, and orthogonal to that, 274it has a constructor flag. If the constructor flag is true, the function 275is called a constructor. 276 277A record has a stack of 0 or more labels, and orthogonal to that, each field 278has a constructor flag. 279 280Labelled Definition Syntax 281-------------------------- 282I don't have a syntax I like; I don't know what keywords or symbols to use. 283It is an esthetic and linguistic problem of choosing the right words. 284As a stopgap measure, here is a temporary syntax. 285 286<definition> ::= 287 labelled <singleton-pattern> = <function-or-module-expression> 288 labelled <id> <param>... = <expression> 289 same as labelled <id> = <param> -> labelled ... -> <expression> 290 constructor <id> <param>... = <function-or-module-expression> 291 same as labelled <id> = <param> -> labelled ... -> labelled <expression> 292 293In the case of directory syntax, the first token in a *.curv file appearing 294as a directory entry can begin with the 'labelled' keyword, 295and that results in a labelled definition. 296 297<program> ::= 298 labelled <expression> 299 300In all cases, the keyword 'labelled' or 'constructor' can be preceeded by a 301block comment, which will form the doc string for the labelled function or 302module. 303 304Syntax alternative: In the Branded proposal, the keyword 'labelled' is 305replaced by '@', and each variable in a definiendum pattern can be either 306prefixed or not prefixed by '@'. But this doesn't take doc strings into 307account. In the present syntax, the entire definition is prefixed by 308 // <doc comment> <nl> labelled <definition> 309 310Syntax alternative: In the Branded proposal, an anonymous constructor 311is 'a -> @b' and a labelled constructor definition is '@f a = @b'. 312(Instead of two special tokens, => and constructor, we only need one.) 313And also, we could put a docstring before the second @ in a constructor: 314 <doc> @f a = <doc>@b 315 <doc> @f = a -> <doc>@b 316 317A module constructor is a scoped record constructor containing one or more 318labelled field definitions. When an anonymous module value is bound as the 319definiens in a labelled definition, the label from that definition is combined 320with field names and applied to the labelled fields in the module. 321 322A constructor expression is like a lambda expression with -> replaced by =>. 323When an anonymous constructor value is bound as the definiens in a labelled 324definition, the label from that definition is applied to the result of calling 325the constructor function. 326 327We need anonymous constructor values as a first class concept, otherwise we 328can't use combinators to compute a constructor before binding it to a label. 329 330Labelled Value Semantics 331------------------------ 332An anonymous module value is printed as a sequence of singleton definitions 333within curly braces, where labelled definitions are prefixed with 'labelled'. 334 335An anonymous constructor value is printed as <pattern> ->labelled <expr>. 336 337A labelled record, module, function or constructor is printed as a label 338expression. 339 340The 'open' function takes a labelled value as an argument, strips the label, 341and returns the value component of the label/value pair. Labels can be stacked. 342If you apply 'open' enough times to strip all the labels, you get an anonymous 343record, module, function or constructor. The 'open' function is used to look 344inside a data abstraction, perhaps during debugging. Or it is applied in the 345body of a constructor to strip an existing label and apply a new one. 346 constructor cube n = open (box [n,n,n]); 347 348'include R' and '... R' ignore the label of R. 349 350'{include R, ***}' preserves the labelled status of the included fields. 351 Rationale: if you are including a library API into a superset library 352 module, you want to preserve API labels. 353'...R' preserves the labelled status of the included fields. 354 * For consistency with 'include', so as not to create a useless distinction 355 between "records" and "modules". 356 * Also, it is plausible to use a record comprehension to build an API. 357 When records are used purely as data structures, they don't need or use 358 field labels, but that's not a good reason to disallow the use of 359 comprehension syntax to build a module. 360So, a record has an extra bit of information per field: is it labelled? 361 362a==b 363 Two labelled values are equal if they have equal labels and equal payloads. 364 (So we are comparing documentation strings for equality?) 365 366Products, Sums and Subtypes 367--------------------------- 368Based on my requirements, it seems like I want the equivalents of nominally 369typed product types, sum types and subtypes. 370 371Problem: Having two ways to encode data: (1) the original Curv style, 372writing out structured data literals directly (without having to declare 373types first), and (2) first defining nominal types & type constructors, 374and then invoking those constructors. Which would suck: warring language 375constructs, which ones do I use? Can I unify these two approaches? 376 377A Haskell algebraic type such as 378 data T = Foo | Bar a 379could be represented by these constructors: 380 labelled Foo = {}; 381 labelled Bar x =labelled {}; 382 383Can labelled values created by these constructors be unified with tagged 384values? 385 386Criticism of Labelled Values for Performance 387-------------------------------------------- 388Labelled values as a performance hack for precise domains? 389At the cost of language complexity, and heading down the path towards 390reinventing the complex and inexpressive type systems found in mainstream 391languages? Maybe there is a better way. Try thinking differently about 392this problem. 393 394What if we start with a simpler design with the performance hit, then look for 395alternate ways to speed things up without so much language complexity. 396