1# Files, Trees and Packages
2
3Files, Trees, Packages and Lib are 4 proposed mechanisms for Curv source files
4to reference external resources.
5
6Curv needs a package manager. We can define a package as an encapsulated module
7composed of a number of files, and then focus on mechanisms for referencing
8external packages.
9
10These features support modular programming in Curv, wherein a large system is
11partitioned into encapsulated modules. One conventional property of a module is
12that its dependencies on external modules are all defined in one place. Within
13the body of the module, simple names are used to refer to these dependencies.
14This style should be *possible* in Curv, even if it isn't enforced.
15
16## File Syntax
17
18File Syntax is a set of rules for interpreting a regular file as a Curv value
19(based on its extension), and for interpreting a directory as a Curv value.
20
21Some file types that might be supported:
22* `*.curv` -- a Curv expression, which evaluates to an arbitrary value.
23* `*.cdef` -- a list of Curv definitions, which are textually included
24  by the parent directory module. Recursive dependencies allowed between
25  *.cdef files. Can't directly import this file type, it's not an expression.
26* `*.json`
27* `*.toml`
28* `*.rsdf` -- a Regularly Sampled Distance Field -- a voxel grid of distance
29  values, in binary.
30* *directory* -- if a local filename names a directory, then Tree syntax is
31  used to interpret the directory as a Curv value.
32* `*.vstor` -- Value Store: A compressed binary file representing an arbitrary
33  Curv value. A ZIP file containing a Tree (similar to `*.ODT` or `*.3MF`).
34  The primary use case is to represent a Curv shape as a single file,
35  where we want to package Curv source code together with some binary files.
36
37The shell command `curv filename` interprets `filename` using File Syntax,
38reading and evaluating the file and then displaying the resulting Curv value.
39
40Mime types:
41* `*.curv` == text/curv
42* `*.rsdf` == application/curv.rsdf
43* `*.vstor` == application/curv.vstor
44
45## The `file` Function
46
47This will not be part of Curv. Case analysis:
48* `file relative_pathname`: Replaced by `file.name`.
49  Avoids tricky code that restricts the use of `..` to escape from a
50  package boundary.
51* `file absolute_pathname`: This is potentially useful in a local workspace.
52  But you could also use a symlink, and reference the symlink with `file.name`.
53  This feature is a security hole if used in a package or `*.curv` file
54  downloaded from the internet.
55* `file URL`: How important is this, when we have `package URL`?
56  Potentially more susceptible to being used as a backchannel for malware
57  to "phone home", than `package`.
58
59This also means I won't have 'parameterized file readers'.
60
61## Parameterized File Readers
62
63Suppose that additional parameters must be supplied in order to interpret
64the contents of a file. How are these parameters specified?
65
66* The original plan was to provide type-specific file import functions with
67  extra parameters beyond the pathname. Eg, `svg_file` or `dxf_file`.
68  But, I want to deprecate the file function.
69* Put the parameters into an optional separate file, with the same basename
70  as the file being imported, but with a `.opts` file extension.
71  This contains a JSON or CURV record literal.
72  This is compatible with using the `file.identifier` syntax
73  for referencing file based components within a package.
74* A file reader for something like an SVG or DXF file can return a subtype
75  of Shape that provides rich access to the format-specific data.
76* An external tool can convert one of these files into an alternate form
77  that can be read by Curv without parameterization. For example,
78  mesh files are not directly readable in Curv, you must instead convert
79  the mesh to an RSDF file, and provide the mesh conversion parameters
80  to this external tool.
81
82## Trees
83
84Curv has a 'directory syntax', which interprets a directory tree as a Curv
85value: by default as a nested record value. Directory entries are interpreted
86as record members. Regular files named 'identifier.extension' are interpreted
87using File Syntax. Subdirectories named 'identifier' (no extension) are
88interpreted using Directory Syntax. Entries that don't match these patterns
89are ignored.
90
91The root of the directory tree is marked, possibly by an empty file `.curvroot`.
92
93Trees are encapsulated. You must use the Package mechanism to reference
94resources outside of the Tree. If `file` is used by a `*.curv` file in a Tree,
95you can only use relative pathnames, and you can't use `..` to reference files
96outside the Tree.
97
98The purpose of Tree syntax is to provide a local file system representation
99of `*.cpkg` files and Packages. That's why Trees are encapsulated.
100
101Within `*.curv` files in the Tree, other members of the tree can be referenced
102using lexical scoped identifiers. A directory containing files 'foo', 'bar',
103etc, is semantically equivalent to a record '{foo=..., bar=..., ...}'. The
104parent scope of the root directory is 'std', the standard namespace. This
105reference mechanism doesn't provide any additional expressive power over
106`file`, it's just nicer and more convenient.
107
108Trees may be nested. A directory tree with a `.curvroot` may be nested inside
109another directory tree.
110* This could be used for multi-package repos, or to ship a package with its
111  dependencies.
112* How does one subtree reference another sibling subtree as a dependency?
113  Let's review the existing external reference mechanisms:
114  * Lexical scoping. Nope.
115  * `file`. Nope.
116  * `package` + URL. Nope.
117  * `lib`. Nope.
118  What to do?
119  * Maybe the `.curvroot` file contains definitions of dependencies,
120    evaluated in the scope of the parent tree. Use lexically scoped variables
121    to reference sibling packages, and `package` for Internet scoped packages.
122
123A Tree can evaluate to a Shape. That's a requirement for `*.cpkg` files.
124We will extend the directory syntax with an optional file that contains a Curv
125expression that is evaluated to the directory's value. This can occur in any
126directory, not just the Tree root. Call it `.main.curv`.
127
128To export only 'public' members of a directory, use a `main.curv` file
129that contains
130```
131{
132   foo : foo,
133   bar : bar,
134}
135```
136
137A possible extension: `.include.curv` evaluates to a record whose members are
138added to the record denoted by the directory.
139Can get the same effect using `main.curv`, as shown above.
140
141Many modern languages now have a standard tree/package/project manager
142that will create a project tree for you, then perform operations on that
143project tree. Often with git integration. Examples:
144* Rust, `cargo`
145* Clojure, `lein`
146
147## Trees (version 2)
148
149Maybe it's too weird that an identifier `foo` not defined anywhere in a
150`*.curv` file is implicitly defined by a sibling file `foo.curv`. So, files
151aren't converted into Curv bindings unless they are explicitly declared in
152a `*.curv` source file. Extraneous files and directories that aren't
153explicitly referenced are ignored.
154* The value of a directory `foo` is specified by `foo/main.curv`.
155* An explicit file reference or declaration for a file `foo.*` is:
156   1. `use foo;`. Can also write `use foo.bar.baz;`.
157   2. That makes it cumbersome to include a file into a scope (need two
158      definitions). `file.foo` is an expression.
159      `use a.b.c` is equivalent to `c=a.b.c`.
160      So now we have `use file.foo` or `include file.foo`. (Orthogonality.)
161
162The benefit of an explicit gesture like `file.foo` is that you get an
163explicit error message "File not found".
164
165`file.foo` is interpreted at compile time, because it is intended to behave
166like an identifier. `file` is not a record, despite the use of dot notation,
167it is a mechanism for doing lexically scoped identifier-like lookups
168in a Directory Syntax document.
169* Mutually recursive references between two `*.curv` scripts is illegal,
170  because of implementation restrictions (ref counting not garbage collection).
171  This is enforced at compile time.
172* Using a fancier compiler, we could permit mutual recursion between files,
173  with an implementation that still requires `file.foo` to be resolved at
174  compile time.
175* No immediate plan to implement `file."${foo}"`, `defined(file.foo)`,
176  or `fields file`.
177
178Under this interface, a record field could be represented by two files with
179the same basename and different extensions. Eg, one contains raw data in some
180standard non-Curv format, the other contains Curv metadata. (This is an
181alternative to "parameterized file readers".) Or, one contains geometry
182and the other contains colour.
183
184Can relax the requirement that directories contain a `main.curv` file.
185If not, construct a record from every suitable directory entry.
186
187Can relax the requirement for a `.curvroot` file. `file.foo` means: search
188for `foo.*` in the current directory, then in the parent, recursively until
189either a `.curvroot` file is found, or until the filesystem root is found.
190
191## Packages
192
193A Package is a versioned collection of Curv source files that are distributed
194over the internet as a unit. Packages explicitly declare their dependencies on
195other packages. Inspired by package management in Debian and many other systems.
196
197A Curv program can reference an external package using a URL and a version #.
198Eg, `package{repo:"https://github.com/doug-moen/laser-curv",version:"1.0"}`.
199Inspired by Rust and crates, it's distributed and decentralized.
200
201The package mechanism is heavy weight. Extend `file` to accept a URL
202argument, so that there is a simple way to reference remote resources?
203(But I have a security concern: when and how often are these URLs fetched?)
204
205When you evaluate a Curv program containing Package references, the UI
206notifies you if you have unsatisfied dependencies, and asks you if they can
207be downloaded. There is an 'upgrade' command for updating local copies of
208packages. No internet access without an explicit user action is a security
209feature of Curv.
210
211Questions:
212* How do I nest one package inside another? It's one way to satisfy a package
213  dependency.
214* Can a package be a shape? Or are they only meant to be libraries?
215  How do I distribute a shape that consists of multiple files (eg, a Curv
216  file and some 'assets' such as texture files)? A zip file is the
217  best approach: you want shapes to be single files, and zip is the standard
218  mechanism, eg OpenDocument `*.odf` or 3MF.
219* How do I develop, test, run a package on my local file system?
220
221Package metadata:
222* **In-value metadata**. If the value of a Tree is a record, then metadata
223  can be incorporated into the record value, using a naming convention.
224  Use cases? Control how a shape is rendered. BOM metadata in a shape.
225  These are shape-specific use cases, and not 'package' metadata.
226* **Out-of-value metadata**. The most obvious consumer of 'package' metadata is
227  the package manager, which doesn't need in-value metadata. A full description
228  of the package, with author, licence, description text, keywords, an image,
229  could be used to populate entries in a Curv package website (curvhub.org).
230  Use a file `.metadata.json`.
231
232## Local Packages
233
234The Package mechanism uses URLs to name external packages.
235What if you are disconnected from the internet and want to maintain a collection
236of packages on a file system, old school.
237
238You could use `package "file:/usr/local/curv/foo"`.
239Or, use `file "/usr/local/curv/foo"`.
240But those pathnames are not portable across systems maintained by different
241administrators. An important consideration for portability across heterogenous
242systems with no internet access.
243
244Alternatively, `CURVPATH` is an environment variable containing a list of
245absolute pathnames of directories. Eg, `CURVPATH=/usr/local/curv`.
246
247`lib.foo` searches for a file with basename `foo` in `CURVPATH`, as specified
248by the `file` function. If found, the file is loaded and evaluated and the
249resulting value is returned.
250
251## Standard Packages
252
253What makes sense is to have a small standard library (std), then put
254the remaining library abstractions into a collection of standard packages.
255The standard library forms the outer scope of all source files, while
256standard packages must be referenced explicitly. The standard library is
257harder to evolve than the standard packages, since a package can be deprecated
258as a whole and replaced by a new package with a different name. So it makes
259sense to keep the standard library small.
260
261How are standard packages referenced? Should we use `lib` (they are installed
262on the local file system, as part of the Curv installation process),
263or should we use `package` (they are referenced using URLs)?
264
265In the long term, standard packages should be referenced by URLs, because
266if they are part of the default install, then it becomes hard to abandon them
267or remove them from the default install (backward compatibility reasons).
268(Eg, the Python standard library is notoriously full of abandonware.)
269But then, in the long term, we would want a stable URL for these packages.
270
271I want standard packages now. What do I do?
272* Put standard packages on github.
273* `noise = package "https://github.com/doug-moen/noise.curv"`
274* The `package` function will initially use a simple package manager that
275  caches packages in `~/.cache/curv/`
276* `curvpkg` subcommands: list, install, remove, upgrade
277
278## Mutual Recursion
279Mutually recursive references between two curv files within a package
280is not supported: an error is reported. OTOH, what does work is defining a
281library in file (exporting a record value), and including that library in
282another file. That is a required feature.
283
284Directory syntax is supposedly modelled as a record literal (w.r.t. scoping).
285This suggests that mutual recursion could/should be supported. But that
286creates technical difficulties, especially when we need to support including
287another file. Curv does not let a record include a variable defined elsewhere
288in the same file.
289
290## Synthesis
291
292Implement this combination of features:
293* `file.foo` references a file `foo.*` or a directory `foo`,
294  relative to the current directory, using "lexical scoping" lookup.
295* File syntax: rules for converting files into Curv values, based on file
296  extension.
297* Directory syntax: rules for converting a directory into a Curv value.
298  Optional `main.curv` entry. Optional `.curvroot` entry.
299* `package{repo,version}`: Versioned, encapsulated packages, referenced using
300  absolute https: or file: URLs, represented as git repositories.
301
302How do you use these features, as a user?
303* Create a single hierarchical workspace for all Curv projects: `~/curv`.
304  Use file.name to reference external files.
305* Lollipop tutorial.
306  $ mkdir lollipop; cd lollipop
307  $ create main.curv
308  $ create param.curv
309  $ create lib.curv
310  Use `include file.param` and `include file.lib`.
311* The curv/examples directory will change: it gains a `.curvroot` file,
312  and uses `file.lib.experimental` to reference the library.
313
314## Bibilography
315https://medium.com/@sdboyer/so-you-want-to-write-a-package-manager-4ae9c17d9527
316