[wg-camlp4] A new branch to experiment with extension points

Wed Mar 6 15:02:12 GMT 2013

I've started a document to describe the current state of the branch:

http://caml.inria.fr/cgi-bin/viewvc.cgi/ocaml/branches/extension_points/experimental/frisch/extension_points.txt?view=markup

A copy of the current version is attached.

Recent additions:

  - Attributes on labels in record declarations.
  - Attributes as standalone signature/structure items.
  - An alternative syntax for attributes on some kinds of expressions
    starting with a keyword, e.g.  "let[@id] p = e in e" is parsed
    as "(let p = e in e)[@id]" and as "[^id](let p = e in e)".
    Do people think this is useful?

    (I'm considering recognizing also "let[%id] p = e in e"
    as "(%id (let p = e in e))" and "let[%id e0] p = e in e"
    e.g. as "(%id (%id e0) (let p = e in e))".  Opinion?)

Don't hesitate to comment on the choice of syntaxes and representation 
in the Parsetree, etc!

Alain
-------------- next part --------------
This file describes the changes on the extension_points branch.

=== Attributes

Attributes are "decorations" of the syntax tree which are ignored by
the type-checker.  An attribute is made of an identifier (an "LIDENT", written
id below) and an optional expression (written expr below).

Four syntaxes exist for attributes, according to the kind of component
the attribute applies to, and whether the attribute is written before
or after the component.

On expressions, type expressions, module expressions, module type expressions,
patterns (TODO: class expressions, class type expressions):

  Prefix syntax:   [^id expr] ...
  Postfix syntax:  ... [@id expr]

The postfix syntax [@id expr] is also available to add attributes on
constructors and labels in type declarations:

  type t =
    | A [@id1]
    | B [@id2] of int [@id3]

Here, id1 (resp. id2) is attached to the constructor A (resp. B)
and id3 is attached to the int type expression.  Example on records:

 type t =
   {
      x [@id1]: int;
      mutable y [@id2] [@id3]: string [@id4];
   }  

On items:

  Prefix syntax: [^^id expr] ...
  Postfix syntax: ... [@@id expr]

  Items designate signature and structure items, and also individual
  components of multiple declaration (type declarations, recursive modules,
  class declarations, class type declarations). (TODO: class fields?)

  For multiple declarations:

    - prefix attributes written before the first keyword of the item
      are attached to the first declared component

    - postfix attributes written after the item are attached to the
      last declared component

    - more prefix attributes can be inserted after the opening keyword
      (for the first component) and after the "and" keyword (for other
      components)

    - more postfix attributes can be inserted before the "and" keyword

  For instance, consider:

    [^^id1] type [^^id2] t1 = ... [@@id3] and [^^id4] t2 = ... [@@id5]

  Here, the attributes on t1 are id1, id2 and id3; the attributes on
  t2 are id4 and id5.

  Note: item attributes are currently not supported on Pstr_eval
  and Pstr_value structure items.

A fifth syntax exists for attributes which stand as signature or
structure item on their own (i.e. they are not attached to any other
component):

   [*id expr]

([^^] and [@@] attributes cannot be attached to such a standalone attribute.)

=== Alternative syntax for attributes on specific kinds of nodes

Some constructions starting with a keyword KW supports an alternative
syntax for attributes:

  KW[@id expr]...[@id expr] ....
  ---->
  (KW ....)[@id expr]...[@id expr]

For instance:

let[@foo] x = 2 in x + 1

is equivalent to:

(let x = 2 in x + 1)[@foo]

and also to:

[^foo](let x = 2 in x + 1)

Supported constructions:

local let binding
function ...
fun (type t) ->  ...
match ... with ...
try ... with ...
begin ... end

=== Representation of attributes in the Parsetree

For attributes on expressions and similar categories, attributes are
just another possible constructor (one single attribute per node):

and expression_desc =
  ....
  | Pexp_attribute of (expression * attribute)

Note: we currently don't distinguish between prefix and postfix attributes.

Similarly, attributes as standalone signature/structure items are represented
by a new constructor:
  | Psig_attribute of attribute
  | Pstr_attribute of attribute

For "declarations"-like items (type declarations, module declarations,
..., but also constructors and record labels), all attributes are stored
in an extra field in their record (again, without making the distinction
between prefix and postfix attributes):

and type_declaration = {
   ...
   ptype_attributes: attribute list;
   ...

For other kinds of items (currently: open/include stataments,
exception rebind), the attributes are stored directly in the
constructor of the item:

  | Pstr_open of Longident.t loc * attribute list

=== Extension nodes

Extension nodes replace valid components in the syntax tree.  They are
normally interpreted and expanded by AST mapper.  The type-checker
fails when it encounters such an extension node.  An extension node is
made of an identifier (an "LIDENT", written id below) and an optional
expression (written expr below).

Two syntaxes exist for extension node:

As expressions, type expressions, module expressions, module type expressions,
patterns (TODO: class expressions, class type expressions):

  [%id expr]

As structure or signature item (TODO: class fields?):

  [%%id expr]

As other structure or signature items, attributes can be attached to a
[%%id expr] extension node.

=== Other changes to the parser and Parsetree

--- Relaxing the syntax for recursive modules.

Before:
   module X1 : MT1 = M1 and ... and Xn : MTn = Mn

Now:
   module X1 = M1 and ... and Xn = Mn
   (with the usual sugar that Xi = (Mi : MTi) can be written as Xi : MTi = Mi
   which gives the old syntax)

   The type-checker fails when a module expression is not of
   the form (M : MT)

Rationale:

1. More uniform representation in the Parsetree.

2. The type-checker can be made more clever in the future to support
   other forms of module expressions (e.g. functions with an explicit
   constraint on its result; or a structure with only type-level
   components).

--- Turning some tuple or n-ary constructors into records

Before:

  | Pstr_module of string loc * module_expr

After:

  | Pstr_module of module_binding
...
  and module_binding =
    {
     pmb_name: string loc;
     pmb_expr: module_expr;
     pmb_attributes: attribute list;
    }

Rationale:

More self-documented, more robust to future additions (such as
attributes), simplifies some code.

--- Keeping names inside value_description and type_declaration

Before:

  | Psig_type of (string loc * type_declaration) list

After:

  | Psig_type of type_declaration list

....
and type_declaration =
  { ptype_name: string loc;
    ...
  }

Rationale:

More self-documented, simplifies some code.

=== More TODOs

- Adapt pprintast.
- Adapt Camlp4 (both its parser(s) and its internal representation of OCaml ASTs).
- Propagate attributes to the Typedtree (so that they can be retrieved in .cmt/.cmti).
- Consider adding hooks to the type-checker so that custom extension expanders can be registered (a la OCaml Templates).
- Quotations (i.e. string literals with custom delimiters and without any interpretation of special characters in them), and a syntax which combines extension nodes and quotations.
- More cleanups to the Parsetree + documentation.