[wg-camlp4] My no-use of camlp4 syntax extensions for Otags

Fri Apr 12 10:26:58 BST 2013

Hi, 

I have suggestions, but it is a bit lengthy. In short: ocamlast
is great, but I don't believe it will satisfy all documentation
needs. 

Alain Frisch <alain.frisch at lexifi.com> writes:

   On 02/22/2013 01:32 PM, Hendrik Tews wrote:
   > I would therefore strongly suggest to provide complete
   > documentation for the ast that preprocessors must process with
   > the _beta_ release of the next version.

   Do you have a suggestion for the form such a documentation should
   take? I can imagine describing how each fragment of concrete syntax is
   represented in the abstract syntax, but this would not be really
   different from paraphrasing the code of the parser.

You are right, but only from the insider's point of view. You
have to remember that potential preprocessor programmers (like
myself) have never seen the parser before and they don't want to
became an parsing/parser.mly expert before starting. For
instance, when I look at parser.mly, I see

  | pattern BAR pattern
      { mkpat(Ppat_or($1, $3)) }

and nothing is clear. When I look up mkpat, then all I know is
that an or-pattern is not a Ppat_or node, it is some record, for
which I don't even know the type. So I would have to search for
ppat_desc, and select the right one of the 100 matches that grep
reports. Note that I have been lucky here, because mkpat is very
simple and is defined in parser.mly . There are surely more
complicated functions that are elsewhere defined.

>From my point of view the parser is not a documentation.

I believe there are (at least) two documentation needs:

(1) getting the ast tree for some concrete syntax

(2) understanding which concrete syntax a given ast node
    represents

For (1) the ocamlast tool is really great! (I've been using my
own camlp4ast tool quite a lot.) I agree that with ocamlast a
description of how each piece of concrete syntax is represented
is not necessary. 

There might however be cases with the danger of wrong
generalization. What I mean with this is the following: The
ocamlast tool only works on correct OCaml code. I cannot feed
it with "assert _ " in order to get the most general ast
tree for assertions. So instead I have to feed it with, for
instance, "assert true" and get Pexp_assert ... I might now
believe that "assert false" is also represented with Pexp_assert,
which is wrong. 

So, if there are cases, where similar looking concrete syntax is
represented differently, then they should be listed somewhere.

For documentation need (2), ocamlast is not really a help. For
(2) I would suggest to annotate each constructor in parsetree.mli
with the concrete syntax. For instance 

  | Pexp_while of expression * expression
     (** while <expression> do <expression> done *)

This annotation should of course include the types that occur in
the ast but are outside of parsetree.mli, such as Longident.t or
constant. 

There are probably a lot of side conditions. For
instance, "expr_1; expr_2; expr_3" is 
Pexp_sequence(expr_1, Pexp_sequence (expr_2, expr3) and not
Pexp_sequence( Pexp_sequence(expr_1, expr_2), expr3) (or vice
versa). 

Another kind of side condition is that certain nodes do not and
must not occur inside other nodes. For instance, the left
expression in Pexp_sequence is never a Pexp_sequence. (This is
probably wrong, but you get the idea.)

Such side conditions should be described somehow for those who
need to construct ast's and those who need to pattern match on
ast's. 

You could write the documentation for (1) and (2) in comments or
ocamldoc comments in the source files. However, I would prefer
something like a wiki page, where we users can immediately
contribute. I, for instance, used the camlp4 wiki [1] a lot to
remember my own findings at a place where it could benefit
others.

[1] http://brion.inria.fr/gallium/index.php/Abstract_Syntax_Tree

Bye,

Hendrik