[wg-camlp4] Matching on concrete syntax (was: Re: Camlp4 uses)

Xavier Clerc xavier.clerc at inria.fr
Tue Jan 29 15:37:22 GMT 2013



----- Mail original -----
> On 01/28/2013 04:17 PM, Xavier Clerc wrote:
> >    3- in Mascot (style tool), camlp4 is used to get either
> >       a stream of lexical tokens or an AST, both being used
> >       to detect code smells;
> ...
> > Case 3 may appear as a perfect fit for ppx, but indeed
> > it is quite pleasant to write some code smell in OCaml
> > syntax (through quotations), rather than by matching
> > against a bare AST. The latter becomes very verbose,
> > and very hard to read/maintain to say the least.
> 
> I've the intuition that matching AST with patterns in a concrete
> syntax
> is a source of never lasting problems.  But it seems you disagree, so
> I'm very much interested in your experience!
> 
> Let me explain my point.
> 
> First, the parsing (concrete syntax -> AST) has some built-in
> ambiguities and resolution mechanism that the programmer does not
> even
> need to be aware of.  For instance, I cannot tell how a pattern like
> "p1
> | p2 | p3" or an expression "e1 || e2 || e3" is parsed without
> | looking
> at the parser (or most probably, playing with -dparsetree).  Is the
> pattern parsed like "(p1 | p2) | p3" or "p1 | (p2 | p3)"?  The
> problem
> is that you might need to know such information if you write AST
> transformation/matching with concrete syntax.
> 
> Second, you need some special extra "concrete" syntax to be able to
> specify whether you care or not about some features.  For instance,
> you
> need to be able to write a pattern which matches a recursive let
> binding, or non-recursive let binding, or any kind of let binding
> (recursive or not).  In the latter case, you probably want to be able
> to
> extract the recursive flag for later use.  You need to introduce some
> syntax for that.  Similarly, we need some expert knowledge and
> specific
> syntax to know exactly what your "free variables" stand for: you need
> to
> be able to say that you want to match an arbitrary constructor or a
> constructor with no module prefix.  This is the source for a lot of
> complexity related to anti-quotations in camlp4, in my opinion.
> 
> Third, working on concrete syntax gives a false feeling that you
> don't
> need to master the abstract syntax, but this is wrong.  You need to
> know
> if "M.(e)" and "let open M in e" map to the same AST or not, if
> record
> punning is implemented in the parser or if is represented in the AST
> (exercise: define a "code smell rule" to detect missed punning
> opportunity, i.e. an instance of a record field "l = l"), etc.
> 
> Fourth, working with concrete syntax requires to master a new
> sub-language (with quotations, anti-quotations), which is far from
> trivial, and which looks like the real OCaml syntax but is not really
> the same.  There is a learning curve here, which should not be
> neglected.
> 
> 
> At the end of the day, I find it both simpler and more robust to work
> on
> abstract syntax, even if this is marginally more verbose.  Some
> actions
> can reduce the syntactic overhead of working with abstract syntax:
> cleaning up the Parsetree (and removing prefixes) and improvements to
> our beloved host language itself (such as
> http://caml.inria.fr/mantis/view.php?id=5667 or even more ambitious
> extensions).  I'd rather see time invested in such improvements than
> on
> providing support for working with (almost-)concrete syntax.  But
> YMMV,
> and I'm very much interested to hear about the opinion and experience
> of
> other people on this topic.  (And of course, once quotations are
> supported, nothing prevent some -ppx rewriter to provide support for
> concrete-syntax matching, to be used for compiling other -ppx
> rewriters...)


Well Alain, I basically agree about the different points you raise.
They closely describe my experience in writing Mascot!
As of today, most checks are written by matching against AST bits,
because as you pointed out it is a tad safer. However, during development,
it was a time-saver to be able to easily express and experiment new
ideas by manipulating "bits of the actual language". Camlp4 was kind
enough to translate them for me into AST code which I had to check.

When expressing a complex pattern, I think this turns out to be really
useful. I even keep some of these patterns in a (non-public) directory
for development. As you said, the release version has to be polished by
checking some ambiguities.

To sum up, I totally agree on "robustness" but not quite on "easiness".


Regards,

Xavier


More information about the wg-camlp4 mailing list