[wg-camlp4] Matching on concrete syntax (was: Re: Camlp4 uses)

Fri Mar 29 15:12:42 GMT 2013

On Fri, Mar 29, 2013 at 3:35 PM, Alain Frisch <alain.frisch at lexifi.com>wrote:

> I'd write it as:
>
> let add_register e body =
>   let_in [pany, app (evar "ExceptionHandling.register") [e]] body
>
> which does not look so bad.  This relies on the following definitions,
> which could go e.g. in a Ast_helper.Convenience module:
>
> let evar s = E.ident (mknoloc (Longident.parse s))
> let let_in l body = E.let_ Nonrecursive l b
>    (* maybe with an optional argument for the recursive case *)
> let pany = P.any ()
> let app f args = apply f (List.map (fun a -> "", a) args)
>

That is better indeed. I can only encourage you to add these kind of
conveniences in the E submodule (or maybe somewhere else) as usage suggests
that they are useful. I originally planned to report on that through proper
(?) channels, but that got lost in the noise of things to do, sorry.

I'm still not sure that quasi-quotations are not a better approach, because
the problem here is that the user has to learn a new interface to describe
code fragments instead of using the syntax he's already familiar with. You
have convincingly argued that the OCaml syntax cannot always be used to
describe AST fragments (even in Camlp4, if you use the classic syntax in
quotations some ambiguities force you to revert to plain AST constructors
from time to time), and I hear the argument about fitting lots of small
pieces together instead of inserting large chunks of code. I still have the
intuition that those arguments are more relevant to the expert extension
writer, and that for a large set of use cases that concern *simple*
extensions and beginner extension writers, quasiquotations are still
noticeably easier to use. That this feeling appears to be reflected by
Xavier Clerc, which is one of the other early-triers of -ppx, gives me a
hint that it may have some objective qualities.

>
> The rest of the code is interesting as well:
>
> Camlp4:
>
>   value rec map_handler =
>     let patvar = "__exn" in
>     fun
>     [ <:match_case at _loc< $m1$ | $m2$ >> ->
>         <:match_case< $map_handler m1$ | $map_handler m2$ >>
>     | <:match_case at _loc< $p$ when $w$ -> $e$ >> ->
>         <:match_case at _loc<
>           ($p$ as $lid:patvar$) when $w$ -> $add_debug_expr _loc patvar e$
> >>
>     | m -> m ];
>
>   value filter = object
>     inherit Ast.map as super;
>     method expr = fun
>     [ <:expr at _loc< try $e$ with [ $h$ ] >> ->
>       <:expr< try $e$ with [ $map_handler h$ ] >>
>     | x -> super#expr x ];
>   end;
>
>
> PPX:
>
>   method expr e =
>     let e = super#expr e in
>     { e with pexp_desc =
>         match e.pexp_desc with
>           | Pexp_try (body, handler) ->
>             let instrument_case (pat, body) =
>               let patvar_str =  "__exn" in
>               let patvar = Location.mknoloc (Longident.parse patvar_str) in
>               let pat = { pat with ppat_desc =
>                 Ppat_alias (pat, Location.mknoloc patvar_str) } in
>               (pat, add_register patvar body) in
>             Pexp_try (body, List.map instrument_case handler)
>           | other -> other
>     }
>
>
> This might be a matter of taste, but I prefer the PPX version, which I can
> read only by knowing about the Parsetree (which is required anyway to write
> any such code), a normal OCaml library.  The quotation and anti-quotations
> in the Camlp4 version look very noisy, and it relies on a syntax I'm not
> familiar with (revised syntax) and syntactic extensions
> (quotations/antiquotations), with their own conventions ("<:match_case<",
> $lid:$).  Since I'm not writing extensions every day, I really prefer
> having to learn how to use simple OCaml data types and libraries
> (Parsetree, Ast_helper) rather than to learn new syntax and new concepts.
>  Also, I'd write the code above as:
>
> (* --> in Ast_helper.Convenience *)
> let palias p x = P.alias p (mknoloc x)
> let evar s = E.ident (mknoloc (Longident.parse s))
> ...
>
>   method expr e =
>     let e = super#expr e in
>     match e.pexp_desc with
>     | Pexp_try (body, handler) ->
>         let instrument_case (pat, body) =
>           (palias pat "__exn", add_register (evar "__exn") body)
>         in
>         {e with pexp_desc = Pexp_try (body, List.map instrument_case
> handler)}
>     | e -> e

I used this example specifically to discuss the quasiquotation feature, not
as a general comparison of -ppx and Camlp4's extension writing facilities.
I do agree that, in this example, the AST traversal framework of -ppx is in
fact better than Camlp4's. This is related to the fact that the AST
structure is simpler and more closely reflect the way I logically think
about pieces of OCaml code: match only takes a (pattern * expr) list
instead of being a recursive expression of nested branches. That's a plus
for ppx's design (and I like your approach of improving the upstream AST
description to make it even better, at least while we don't have extension
writers with code that breaks when we change it).

That said, I think you're also a bit quick to jump to conclusions here. The
question of whether the extension uses classical or revised syntax is
largely orthogonal to the design of the extension itself (except
occasionally with quotation ambiguities concerns), and I could have used
the classic syntax to write the Camlp4 extension just as well. What
happened in practice is that I looked at the set of old Camlp4 extensions I
had lying around, copy-pasted the code of the one that looked most closely
like what I was looking for (traversing the "match" structure), and spent
the rest of the time wondering which output code to produce (the
"add_register" function I quoted, plus the pattern-alias stuff).

So the bottleneck in practice was in the code production part. It was just
as true for the -ppx version (the traversal part was easy to write), but my
experience producing code in the -ppx version was much less gratifying. Of
course, it was my first use of your Ast_mapper.E interface, so there was
some learning cost to take into account. But then I had type errors, I had
to look at the documentation again, and it was a bit painful. Finally, I
only handle Camlp4 extensions about once a year, so I have time to forget
most details about how they work, in particular I *always* look at the
documentation for the concrete AST constructor names in the (rare) cases
where I need them. I found that there was no such re-learning curve with
quasiquotations, they just work out of the box -- once you've been
rebrained, once and forall, to see those <:stuff< >> as structured code
rather than ASCII noise.

I think the small tool you just introduced (translating OCaml code into
AST-building ocaml code) may have made this "code production" part easier,
but I'm not fond of the idea of pasting auto-generated code in my own code.
In any case, it would be better if it produced code using the nice
high-level interface, rather than the hard-to-read AST definitions, but
that may be much more painful to implement so I'm not really asking for
that.

>  But maybe we
>> can have quasiquotations with the current extension mechanism?
>>
>> let add_register patvar body =
>>    [%quote
>>      let _ = ExceptionHandling.register [%anti patvar] in [%anti body]
>>    ]
>>
>
> Implementing this "quote" expander is not very difficult, just a little
> bit tedious (and this can be automated by parsing the definition of the
> Parsetree).

Isn't that essentially the same thing as the tool you implemented for
Xavier above? (Can you reuse code between both?)
If I understand correctly, this is also the "Meta" operation of Camlp4,
turning the AST for the expression <foo> into the AST for the OCaml
expression representing the AST for <foo>. When you say "automated by
parsing the definition of the Parsetree", do you have a realistic design in
mind for such boilerplate code generators, or do you plan in practice to
implement them by hand?

Note that nothing forces to use [%anti x] for antiquotations.  We could
> very well decide on a more lightweight convention, like prefixing
> identifiers, or using a dedicated operator:

>     [%quote
>       let _ = ExceptionHandling.register __patvar in __body
>     ]
>
>     [%quote
>       let _ = ExceptionHandling.register !!patvar in !!body
>     ]
>

Same old battle-horse: I dislike the idea that [%quote ] would change the
meaning of syntactically valid OCaml code such as __patvar or !!patvar. I
think I would with [%anti ] for now, or maybe just [%a ] if need be -- in
any case this is not part of the eventually-crowded extension namespace as
they really are only markers to be rewritten by the implementation of
%quote.

> But this approach would work nicely only for writing expressions or
> patterns on OCaml expressions, not on other syntactic categories (because
> the content of an extension node is an expression).  The problem is that
> AST-manipulating code tend to require to work a lot with many different
> categories (like "match cases") even to build expressions.
>

That's a good point.

> Quasi-quotations would be useful if the expanders had to generate big
> fragments of mostly static code, with only a few "dynamic" placeholders.
>  In my experience, this is rarely the case: you assemble the resulting
> OCaml code by combining many small fragments generated programmatically.
>  For these cases, a nice library of "AST constructors" seems better to me.
>

Maybe we need both, but if we eventually get nice constructor names for the
AST definitions (I know that doesn't depend on you) I think if we only had
time/energy/maintenance for two among (1) AST definitions (2) AST
combinator library and (3) quasiquotation mechanism, I would suggest we
keep (1) and (3) rather than (1) and (2).

>
>
>
> -- Alain
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ocaml.org/pipermail/wg-camlp4/attachments/20130329/2be180d9/attachment-0001.html>