[wg-camlp4] Meta Programming from the view of the implementaion

Hongbo Zhang hongboz at seas.upenn.edu
Tue Jan 29 19:05:19 GMT 2013

Dear wg-camp4 users,

    So far,  the discussion is really interesting and quite helpful, but I
want to talk about the meta-programming *from the point of the view of
implementation side,* features are easy to propose, but maybe only the
compiler/library writers know how hard to implement, I do appreciate that
you can sit dow and read the long email, I am also starting to blog(
http://hongboz.wordpress.com/) about how to do syntactic meta programming
(SMP) in a right way.
   I rewrite the whole camlP4(named Fan) from scratch, building the
quotation kit and throw away the crappy grammar parser, so plz believe me *that
I do understand the whole technology stack of camlP4*, if we could reach
some consensus, I would be happy to handle over the maintaining of  Fan,
Fan does not loose any feature compared with camlP4, in fact it has more
interesting featrues.

   Let's begin with some easy, not too technical parts which has a
significant effect on user experience though:
   1. Performance
          Performance does matter, it's a shame that  the most time spent
in compiling the ocaml compiler is dedicated to camlP4, but it is an
engineering problem, currently compiling Fan only takes less than 20s, and
it can be improved further
   2. Building issues
        The design of having side effects by dynamic loading is generically
a bad idea, in Fan* the dynamic loading only register some functionality
the Fan support,* it *does not have any other side effec*t, each file
stands alone says which (ppx , or filters, or syntax) it want to use with a
good default option. so the building is always something like '-pp fan
pluging1 plugin2 plugin3', *the order of pulgings does not matter*,
also, l*oading
all the plugins you have does not have any side effect, even better, you
can do the static linking all the plugins you collected, the building
process is simplified.  *
* * 3. Grammar Extension (*Language namespace*)
*       *I concur that grammar extension arbitrarily is a bad idea, and I
agree with Gabrier that so far only the quotation(Here  quotation means
delimited DSL, quosi-quotation means Lisp style macros) is modular,
composable, and  I also agree with Gabrier -ppx* should not be used to do
syntax overriding (this should not be called syntax extension
actually), *that's
a terrible idea to do syntax overriding, since the user never understand
what's going on underly without reading the Makefile. So here some my
suggestion is that some really conevenient syntax extesion, i.e, (let try..
in) should be merged to the built in parser. quotations does not bring too
much heavy syntax (imho). In Fan, we proposed the concept of a hierarchical
language name space, since once quotation is heavily used, it's really easy
to introduce conflict, *the language namespace querying is exactly like
java package namespace,* you can import, close import to save some typing.
    Here is a taste

     {:.Fan.Lang.Meta.expr| a + b |} ------>
      `App (`App ((`Id (`Lid "+")), (`Id (`Lid "a")))), (`Id (`Lid "b")))
     {:.Fan.Lang.Meta.N.expr| a + b |}  ----->
         (_loc, (`Id (_loc, (`Lid (_loc, "+")))),
           (`Id (_loc, (`Lid (_loc, "a")))))),
      (`Id (_loc, (`Lid (_loc, "b")))))
 the .Fan.Lang.Meta.expr the first '.' means it's from the absolute
namespace,  the *N.expr shares exactly the same syntax without location*,

   4. Portable to diffierrent compiler extensions(like LexiFi's fork of
       I am pretty sure it's pretty easy to do in Fan, only Ast2pt (dumping
the intemediate Ast into Parsetree) part need to be changed to diffierent

Now let's talk about some internal parts of SMP.
Quasi-Quotation is the essential part of SMP,  I am surprised so far that
the discussion *silently ignores the quasi-quotation,* Leo's answer of
writing   three parsers is neither satisfying nor practical(imho).

Camlp4 is mainly composed of two parts, one is the extensible parser and *the
other significant part is Ast Lifting*. Since we all agree that extensible
parser increases the complexity too much, let's simply ignore that part.

The Ast Lifting are tightly coupled *with the design of the Abstract Syntax
Tree.*  People complain about that Camlp4 Ast is hard to learn and using
quasi-quotation to do the pattern match is a bad idea.

Let me explain the topic a bit:
    Camlp4Ast is hard to learn, I agree, it has some alien names that
nobody understand what it  means, quosi-quotation *is definitely a great
idea* to boom the meta-programming, but my experience here is *for very
very small Ast fragment, using the Abstract Syntax Tree directly,* otherwise
Quasi-quotation is a life saver to do the meta programming.
   Luckily the quotation kit has nothing to do with the parser part, it's
simply several functions(I did some simplify a bit) which turns a normal
value into an Ast node generically, *such kind functions are neither easy
to write nor easy to read*, *the idea case is that it should be generated
once for all, and all the data types in normal ocaml* *should be derived
automatically*(some ADT with functions can not be derived). *I bet it's
mostly likely a nightmare if we maintain 3 parsers for the ocaml grammar
while two other parsers dumping to a meta-level*

   So, how to make Ast Lifting easier,
        The first guideline is *"Don't mixing with records", *
*         *Once you encoding AST with records, you have to encode the
records in the meta level which increases the complexity without bringing
any new features, *it's simply not worthwhile.*
*       * The second guideline is "Don't do *any *syntax desugaring" ,
syntax desguaring makes the semantics of syntax meta programming a bit
weird. Syntax desguaring happens everywhere in Parsetree, think about the
list literals, it uses the syntax desuaring, if you don't use any syntax
desugaring, for example, you want to match the bigarray access, you simply
needed to match `Bigarray(..)' instead of

                     {txt= Ldot (Ldot (Lident "Bigarray", array),
("get"|"set" as gs)) ;_};_},
       The third guideline is to* *make it *as uniform as possible*
*       *This not only helps the user, but *it helps the meta-programming
over types to derive some utility types. *Take a look at my Ast encoding in
Fan https://github.com/bobzhang/Fan/blob/master/src/Ast.ml (it needs to be
polished, plz don't panic when you see variants I use here)
*      *The initial Ast has locations and ant support, but* here we derive
3 other Asts thanks to my very regular design*.* AstN is the Ast without
locations*, the locations are important, but it is simply not too much
helpful when you only do the code generation, but it complicates the
expanded code a lot), *AstA is the Ast without antiquotations(simply remove
the ant branch), *it is a subtype of Ast(thanks to the choice we use
variants here), *AstNA is the Ast without neither locations nor
antiquotations*), it is a subtype of AstN.  *In practice, I found the Ast
without locations is particular helpful when you only do the code
generation, it simplifies this part significantly. The beautif**ul part is
that  all the four Ast share the same grammar with the same quosiquotatoin
mechanism, as I showed .Fan.Lang.N.expr and .Fan.Lang.expr*
    I don't know how many parsers you have to maintain to reach such a goal
or it's never going to happen.
    Using variants to encode the intermediate ast has a lots of other
benefits, but I don't want to cover it in such a short mail.

   So,* my proposal is that the community design an Intermediate Ast
together, and write a built-in parser to such Intermediate Ast then dump to
Parsetree, but I am for that Parsetree still needs to be cleaned a bit but
not too much change .  *I do appreciate you can take something away from
Fan, I think the Parsetree is* not the ideal part* to do SMP, HTH

-- Regards, Hongbo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ocaml.org/pipermail/wg-camlp4/attachments/20130129/0fda1561/attachment-0001.html>

More information about the wg-camlp4 mailing list