[wg-camlp4] Matching on concrete syntax (was: Re: Camlp4 uses)
Alain Frisch
alain at frisch.fr
Mon Jan 28 16:42:18 GMT 2013
On 01/28/2013 04:17 PM, Xavier Clerc wrote:
> 3- in Mascot (style tool), camlp4 is used to get either
> a stream of lexical tokens or an AST, both being used
> to detect code smells;
...
> Case 3 may appear as a perfect fit for ppx, but indeed
> it is quite pleasant to write some code smell in OCaml
> syntax (through quotations), rather than by matching
> against a bare AST. The latter becomes very verbose,
> and very hard to read/maintain to say the least.
I've the intuition that matching AST with patterns in a concrete syntax
is a source of never lasting problems. But it seems you disagree, so
I'm very much interested in your experience!
Let me explain my point.
First, the parsing (concrete syntax -> AST) has some built-in
ambiguities and resolution mechanism that the programmer does not even
need to be aware of. For instance, I cannot tell how a pattern like "p1
| p2 | p3" or an expression "e1 || e2 || e3" is parsed without looking
at the parser (or most probably, playing with -dparsetree). Is the
pattern parsed like "(p1 | p2) | p3" or "p1 | (p2 | p3)"? The problem
is that you might need to know such information if you write AST
transformation/matching with concrete syntax.
Second, you need some special extra "concrete" syntax to be able to
specify whether you care or not about some features. For instance, you
need to be able to write a pattern which matches a recursive let
binding, or non-recursive let binding, or any kind of let binding
(recursive or not). In the latter case, you probably want to be able to
extract the recursive flag for later use. You need to introduce some
syntax for that. Similarly, we need some expert knowledge and specific
syntax to know exactly what your "free variables" stand for: you need to
be able to say that you want to match an arbitrary constructor or a
constructor with no module prefix. This is the source for a lot of
complexity related to anti-quotations in camlp4, in my opinion.
Third, working on concrete syntax gives a false feeling that you don't
need to master the abstract syntax, but this is wrong. You need to know
if "M.(e)" and "let open M in e" map to the same AST or not, if record
punning is implemented in the parser or if is represented in the AST
(exercise: define a "code smell rule" to detect missed punning
opportunity, i.e. an instance of a record field "l = l"), etc.
Fourth, working with concrete syntax requires to master a new
sub-language (with quotations, anti-quotations), which is far from
trivial, and which looks like the real OCaml syntax but is not really
the same. There is a learning curve here, which should not be neglected.
At the end of the day, I find it both simpler and more robust to work on
abstract syntax, even if this is marginally more verbose. Some actions
can reduce the syntactic overhead of working with abstract syntax:
cleaning up the Parsetree (and removing prefixes) and improvements to
our beloved host language itself (such as
http://caml.inria.fr/mantis/view.php?id=5667 or even more ambitious
extensions). I'd rather see time invested in such improvements than on
providing support for working with (almost-)concrete syntax. But YMMV,
and I'm very much interested to hear about the opinion and experience of
other people on this topic. (And of course, once quotations are
supported, nothing prevent some -ppx rewriter to provide support for
concrete-syntax matching, to be used for compiling other -ppx rewriters...)
Alain
More information about the wg-camlp4
mailing list