Symbols - OpenGrok cross reference for /dports/math/curv/curv-0.5/ideas/language/old/Symbols

`#foo` is a symbol value.
Symbol values are used when you need a C-like enumerated type.

Rationale: I like the syntax.

I want it for modelling the x3d -O colour=face|vertex option.
You will type `-Ocolour=#vertex`
instead of `-Ocolour='"vertex"'` or `-Ocolour={vertex:}`.
The latter syntax is too cumbersome for the Unix command line (or cmd.exe).

This syntax is also nice in the general case where you are working with
enum types or algebraic types. In that context, we want `#foo` to work as a
pattern.

Note, #foo is intentionally similar to a Twitter hash tag.
Also, # is reserved as a prefix for future syntax.
Eg, if I need more container types, #[a,b,c] is a set,
#{key1: value1, ...} is a map. Or, #! comments.

What is the type of a symbol? How do they print?
 0) No symbol syntax. Use strings, special syntax for string-valued options.
 1) #vertex abbreviates "vertex", which prints as "vertex".
 2) #vertex abbreviates {vertex:null}, which prints as #vertex.
 3) Symbols are a new (8th) data type.

But what about a command line option X with an algebraic type, where some
variants have values, and some variants do not? This can't be a string-valued
option.
 * Without symbol syntax: '-OX="foo"' and -OX={bar:a}
 * With symbol syntax: -OX=#foo and -OX={bar:a}
 * With multipart record definitions: -OX=#foo and -OX.bar=a

Judgement:
* Not 0, doesn't work for options with algebraic types.
* Not 3, don't want extra complexity of a new type.
* 1 or 2.
* I slightly prefer 2: #foo == {foo:null}, prints as #foo, is a pattern.
  #"foo bar" if name is not a C identifier.

Symbols are used to simulate 'enumerated types'.
Symbols are used to simulate niladic variants in an algebraic type.

Symbols are used to simulate 'enumerated types' in SubCurv.
Drop-down-list picker type. In a C-like language, the options in a drop-down
list would be enums, which would map to integers in GLSL. So these should be
symbols?
    parametric {s :: dropdown[#square,#circle,#triangle] = #square} ...
In SubCurv, symbols are normally illegal, and even if we get record values,
the null value in a symbol is illegal. But, if a dropdown picker is used,
then the symbols it uses become legal SubCurv values, represented as integers.
We could also use these semantics for interpreting an is_enum[#foo,#bar]
type predicate in SubCurv.

0. No Symbols
-------------
An alternate to this proposal is to parse enum-valued -O options differently
from other options. Enum-valued -O options are actually string valued options,
with only a fixed set of string values being legal. We just interpret the
option value as a string, without any quotation syntax required.

We parse pure string valued options the same way: the option string is
taken literally, you don't need to include double quote characters.
But, we do interpret '$' escape sequences, so you still have full access
to the Curv expression language in string valued options if you need it.

Or, the option parser treats `#identifier` as a special case,
so you can type -Oname=#foo and set name to the string "foo".

Enum values are string values. This means we want "foo" to work as a pattern.

1. Symbols as Strings
---------------------
Pros:
* If JSON export is important, strings are the most natural translation
  for symbols. It's what you would use if modelling data directly in JSON.
  * Counterexample: in JSON-API, a picker config is {"type":data}. If no
    data is needed, I use {"type":null} instead of "type".
* Strings as the representation for symbols is simple and easy for users
  to understand.
* When I first encountered Lisp/Scheme, the distinction between symbols and
  strings seemed unnecessary. Curv doesn't use different types for integers
  and non-integral reals. So why make this distinction? If the only technical
  benefit is "faster equality", that's pretty weak.
* This proposal is the simplest, and requires the least code to implement.

If symbols are strings, why provide an alternate syntax for string literals?
* Usability on the command line, for setting enum-valued options.
  -Omethod=#sharp instead of '-Omethod="sharp"'.
* At present, strings are rarely used in Curv. If you see a string literal
  in Curv, it's probably uninterpreted text, to be printed to the console
  or rendered using the future `text` primitive. Writing #foo instead of "foo"
  is a signal to the reader that this is a semantic tag, not uninterpreted
  natural language text. Symbols are used as enum values, as nilary constructors
  for an algebraic type, as field names for indexing a record.
* #foo is a pattern, "foo" is not a pattern due to field generator syntax.

Cons:
* Since symbols have semantics (semantic tags, not uninterpreted text),
  they ought to print as symbols, not as quoted strings.
* A string is a collection, but symbols are 'atomic' values.
  Conceptually, a non-string symbol type is a better match to the domain.
* Semantically, symbols are more closely related to records than they
  are to strings, due to the analogy with enum types and algebraic types.
* I don't want symbols to compare equal to strings.
  I want them to be disjoint from strings, so that overloaded
  functions can distinguish symbol arguments from string arguments.
* Symbol equality is traditionally faster than string equality.

JSON export: #foo -> "foo"

2. Symbols as Records
---------------------
Here's the sales pitch for this design alternative:
* Curv has 7 data types, there's no compelling reason to add an 8th.
  This preserves the ability to map all Curv non-function data to JSON.
* Symbols are closely related to records, when you think about how
  algebraic types are represented in Curv.
* Records are the general mechanism for representing new data types.
  So representing symbols as records fits this tendency.
* Record literals are patterns. Symbol literals need to be patterns.
* #foo prints as #foo, not as a string.
* Symbols and strings are distinct, which is good, they have different semantic
  roles.

Let's recall the doctrine for representing labelled options
and algebraic data types in Curv.

* A set of labelled arguments is a record.
  Examples from other languages:
  * Swift function calls
  * HTML attributes, within a tag
  * Unix command line arguments: options
  Some of these domains permit labelled arguments that have
  just a name, not a value. The underlying parameter is a
  boolean, defaulting to false, but if the name is specified,
  the parameter becomes true. Eg, --foo instead of --foo=value
  in Unix, or foo instead of foo=value as an HTML attribute.
  That's why `foo:` abbreviates `foo:true` in a Curv record.

A Haskell algebraic type is a set of named alternatives.
The constructor for an alternative is written as `Foo` or `Bar value`.
We can think of `Foo` as an abbreviation; you would otherwise need
to write `Foo ()` if all alternatives were required to have an argument.

In Curv, an algebraic type is a set of tagged values of the
form {name: value} -- tagged values are singleton records.
The abbreviation for tagged value that only needs a name is {name:}:
the value is implicitly `true`. In this case, the choice of the value
`true` is arbitrary: `null` would make as much sense.
* I'm no longer convinced that conflating enum values with singleton boolean
  options is a good idea. Let's define #foo == {foo:null}, which prints as #foo.
  By contrast, {foo:} prints as {foo:true}.

With the introduction of symbol syntax, the idiomatic syntax for
members of an algebraic type will be #foo and {bar: value}.

Possible reasons for identifying #foo with {foo:} in an algebraic type.
* Maybe there are benefits to uniformly representing all members
  of an algebraic type as singleton records?
  * Like, an algebraic value is isomorphic to a single field of a record,
    and can be interpolated into a set of labelled arguments using `...`.
  * A uniform API for extracting the name and value components of an
    algebraic value.

JSON export: #foo -> {"foo":null}

3. Symbols as an Abstract Type
------------------------------
Sales pitch:
* Symbols are fully abstract, simple, scalar values.
  The only Symbol operations are construction, equality, conversion to string
  (which are the generic operations supported by all values).
* #foo prints as #foo, is only equal to values that print as #foo.
  There's no aliasing with other types. Simple.
* (Why not use strings?) Symbols do not compare equal to strings.
  Overloaded functions can distinguish symbol arguments from string arguments.
* Symbols are more fundamental than strings or records, which are aggregates.
* Symbols are the natural representation for nilary enum constructors (instead
  of strings or integers).
* Symbols are the natural representation for field names in records
  (instead of strings; see Structure proposal).
* Define true=#true, false=#false, null=#null. Then, in conjunction with Maps,
  all data types have literals that can be used as patterns. (But, aliasing.)
* Symbols in SubCurv:
  * #foo is compiled to an enum value with an int representation.
  * `dropdown_menu[#Value_Noise,#Fractal_Noise]` is a picker value.
  * is_enum[#foo,#bar] is a type predicate supported by SubCurv?
* Symbols might be useful in the Term proposal.
* Symbols might have a use if Curv becomes homoiconic and supports macros?

Instances of algebraic types are notated as:
    #nilary
    {binary: (a, b)}
The field name `binary` is internally represented as #binary,
so in this sense the constructor name is always a symbol.

Construction:
    #foo
    #'hello world'

A conversion from String to Symbol? make_symbol "foo" == #foo.
This is in the same category as a conversion from String to Number.
It shouldn't normally be needed, given the role of strings in Curv.

Conversion to string:
    "${#foo}" becomes "foo"    or strcat[#foo]
    "$(#foo)" becomes "#foo"   or repr[#foo]
    `strcat[#'Hello world']` becomes "Hello world".

Field names are represented by symbols (Structure proposal).
* `fields` returns a list of symbols.

Cons:
* Explaining to users why symbols are different from strings.
  It's doable, esp. if #true and #false are the boolean values.
* Is there any context where we need a variable that is either a string
  or a symbol? Or are the use cases disjoint? (Because then why not unify them.)

JSON export: #foo -> {"\u0000":"#foo"}
                  or "\u0000foo"
or record keys are strings, no Curv value maps to JSON null, and
             #foo -> {"foo":null}

Symbols are not Strings
-----------------------
A Symbol is an abstract value whose only property is its name.
The symbol `#foo` prints as `#foo`, and is only equal to itself.
You can compare a symbol for equality to any other value, use it as a map key,
or convert it to a string. Those are the only operations.

Symbol constants look like Twitter hash tags, and that's not a coincidence.
Symbols are abstract names that have semantic meaning within a program.

In Curv, the Boolean values are called `#true` and `#false` (they are symbols),
and this is a good example of what symbols are used for. They are used to
distinguish between several different named alternatives.

Statically typed languages like C, Rust, Swift and Go do not have a generic
symbol type. Instead, they have user-defined `enum` types, which serve the
same purpose. Internally, `enum` values are represented efficiently by small
integers. When Curv programs are compiled into statically typed code (eg, into
C++ or into GLSL), symbol values are compiled to small integers or enum values.

The only other languages with a symbol type are dynamically typed languages:
* Lisp, Scheme and other languages from the Lisp family.
* Ruby.
* Erlang and Elixir (where symbols are called "atoms").
Javascript has a Symbol class, but it is an unrelated concept.

When users first encounter Symbols in a language like Scheme, Elixir or Ruby,
it can be unclear how Symbols differ from Strings. In Curv, the distinction
is very clear.

Strings are meant to represent uninterpreted text that is destined to form
part of the program's output.
* Documentation/help strings (in a future language version).
* A string of text that will be rendered into an image using the future `text`
  primitive.
* A string of text that will be printed as a debug message.
* A string of text that represents the final output of a program
  (in the case where you are using Curv to convert your data to some text
  based file format for further use outside of Curv).

You are not meant to parse strings. Curv has no way of opening and reading a
text file, so there's no input to parse. Curv isn't a text processing language,
and doesn't have regular expressions or parsing facilities.

You are not meant to use strings to encode meaning within your data structures.
* You shouldn't internally represent a compound data structure using Strings,
  because now your code has to parse that string to traverse the data structure.
  That's a code smell, because parsing is complex and error prone compared to
  just traversing a real data structure.
* You shouldn't use strings to encode semantically meaningful names, eg denoting
  one of several alternatives. That's what Symbols are for. If your code
  compares two strings for equality, or uses a string as a map key,
  then you should use Symbols instead.