[wg-camlp4] Matching on concrete syntax (was: Re: Camlp4 uses)

Mon Jan 28 16:42:18 GMT 2013

On 01/28/2013 04:17 PM, Xavier Clerc wrote:
>    3- in Mascot (style tool), camlp4 is used to get either
>       a stream of lexical tokens or an AST, both being used
>       to detect code smells;
...
> Case 3 may appear as a perfect fit for ppx, but indeed
> it is quite pleasant to write some code smell in OCaml
> syntax (through quotations), rather than by matching
> against a bare AST. The latter becomes very verbose,
> and very hard to read/maintain to say the least.

I've the intuition that matching AST with patterns in a concrete syntax 
is a source of never lasting problems.  But it seems you disagree, so 
I'm very much interested in your experience!

Let me explain my point.

First, the parsing (concrete syntax -> AST) has some built-in 
ambiguities and resolution mechanism that the programmer does not even 
need to be aware of.  For instance, I cannot tell how a pattern like "p1 
| p2 | p3" or an expression "e1 || e2 || e3" is parsed without looking 
at the parser (or most probably, playing with -dparsetree).  Is the 
pattern parsed like "(p1 | p2) | p3" or "p1 | (p2 | p3)"?  The problem 
is that you might need to know such information if you write AST 
transformation/matching with concrete syntax.

Second, you need some special extra "concrete" syntax to be able to 
specify whether you care or not about some features.  For instance, you 
need to be able to write a pattern which matches a recursive let 
binding, or non-recursive let binding, or any kind of let binding 
(recursive or not).  In the latter case, you probably want to be able to 
extract the recursive flag for later use.  You need to introduce some 
syntax for that.  Similarly, we need some expert knowledge and specific 
syntax to know exactly what your "free variables" stand for: you need to 
be able to say that you want to match an arbitrary constructor or a 
constructor with no module prefix.  This is the source for a lot of 
complexity related to anti-quotations in camlp4, in my opinion.

Third, working on concrete syntax gives a false feeling that you don't 
need to master the abstract syntax, but this is wrong.  You need to know 
if "M.(e)" and "let open M in e" map to the same AST or not, if record 
punning is implemented in the parser or if is represented in the AST 
(exercise: define a "code smell rule" to detect missed punning 
opportunity, i.e. an instance of a record field "l = l"), etc.

Fourth, working with concrete syntax requires to master a new 
sub-language (with quotations, anti-quotations), which is far from 
trivial, and which looks like the real OCaml syntax but is not really 
the same.  There is a learning curve here, which should not be neglected.

At the end of the day, I find it both simpler and more robust to work on 
abstract syntax, even if this is marginally more verbose.  Some actions 
can reduce the syntactic overhead of working with abstract syntax: 
cleaning up the Parsetree (and removing prefixes) and improvements to 
our beloved host language itself (such as 
http://caml.inria.fr/mantis/view.php?id=5667 or even more ambitious 
extensions).  I'd rather see time invested in such improvements than on 
providing support for working with (almost-)concrete syntax.  But YMMV, 
and I'm very much interested to hear about the opinion and experience of 
other people on this topic.  (And of course, once quotations are 
supported, nothing prevent some -ppx rewriter to provide support for 
concrete-syntax matching, to be used for compiling other -ppx rewriters...)

Alain