[wg-camlp4] Meta Programming from the view of the implementaion

Hongbo Zhang hongboz at seas.upenn.edu
Wed Jan 30 02:34:02 GMT 2013


Quotations(Deliminated  DSL) works reasonably well instead of ppx, here is
my version of ulex in Fan(an improvement compared with ulex or sedlex is t*hat
it works well on toplevel*).

let rec token enc =  {:lex|
   "<utf8>" -> begin enc := Ulexing.Utf8; token enc lexbuf end
  | "<latin1>" -> begin enc := Ulexing.Latin1; token enc lexbuf end
  | xml_letter+ -> Printf.sprintf "word(%s)" (Ulexing.utf8_lexeme lexbuf)
  | number -> "number"
  | eof -> exit 0
  | [1234-1246] -> "bla"
  | "(" ->  begin
      Ulexing.rollback lexbuf; (* Puts the lexeme back into the buffer *)
      {| "(" [^ '(']* ")" -> Ulexing.utf8_lexeme lexbuf |} lexbuf
      (* Note the use of an inline lexer *)
  end
  | "(*" -> begin comment lexbuf; "comment" end
  | ' ' -> "whitespace"
  | _ -> "???" |}
and comment = {:lex|
   "*)" -> ()
  | eof -> failwith "comment"
  | _ -> let _lexeme = Ulexing.lexeme lexbuf in
    comment lexbuf |}
On Tue, Jan 29, 2013 at 2:05 PM, Hongbo Zhang <hongboz at seas.upenn.edu>wrote:

> Dear wg-camp4 users,
>
>     So far,  the discussion is really interesting and quite helpful, but I
> want to talk about the meta-programming *from the point of the view of
> implementation side,* features are easy to propose, but maybe only the
> compiler/library writers know how hard to implement, I do appreciate that
> you can sit dow and read the long email, I am also starting to blog(
> http://hongboz.wordpress.com/) about how to do syntactic meta programming
> (SMP) in a right way.
>    I rewrite the whole camlP4(named Fan) from scratch, building the
> quotation kit and throw away the crappy grammar parser, so plz believe me
> *that I do understand the whole technology stack of camlP4*, if we could
> reach some consensus, I would be happy to handle over the maintaining of
>  Fan, Fan does not loose any feature compared with camlP4, in fact it has
> more interesting featrues.
>
>    Let's begin with some easy, not too technical parts which has a
> significant effect on user experience though:
>    1. Performance
>           Performance does matter, it's a shame that  the most time spent
> in compiling the ocaml compiler is dedicated to camlP4, but it is an
> engineering problem, currently compiling Fan only takes less than 20s, and
> it can be improved further
>    2. Building issues
>         The design of having side effects by dynamic loading is
> generically a bad idea, in Fan* the dynamic loading only register some
> functionality the Fan support,* it *does not have any other side effec*t,
> each file stands alone says which (ppx , or filters, or syntax) it want to
> use with a good default option. so the building is always something like
> '-pp fan pluging1 plugin2 plugin3', *the order of pulgings does not matter
> *, also, l*oading all the plugins you have does not have any side effect,
> even better, you can do the static linking all the plugins you collected,
> the building process is simplified.  *
> * * 3. Grammar Extension (*Language namespace*)
> *       *I concur that grammar extension arbitrarily is a bad idea, and I
> agree with Gabrier that so far only the quotation(Here  quotation means
> delimited DSL, quosi-quotation means Lisp style macros) is modular,
> composable, and  I also agree with Gabrier -ppx* should not be used to do
> syntax overriding (this should not be called syntax extension actually), *that's
> a terrible idea to do syntax overriding, since the user never understand
> what's going on underly without reading the Makefile. So here some my
> suggestion is that some really conevenient syntax extesion, i.e, (let try..
> in) should be merged to the built in parser. quotations does not bring too
> much heavy syntax (imho). In Fan, we proposed the concept of a hierarchical
> language name space, since once quotation is heavily used, it's really easy
> to introduce conflict, *the language namespace querying is exactly like
> java package namespace,* you can import, close import to save some typing.
>     Here is a taste
>
>  -----------------------------------------------------------------------------------------------
>      {:.Fan.Lang.Meta.expr| a + b |} ------>
>       `App (`App ((`Id (`Lid "+")), (`Id (`Lid "a")))), (`Id (`Lid "b")))
>      {:.Fan.Lang.Meta.N.expr| a + b |}  ----->
>       `App
>     (_loc,
>       (`App
>          (_loc, (`Id (_loc, (`Lid (_loc, "+")))),
>            (`Id (_loc, (`Lid (_loc, "a")))))),
>       (`Id (_loc, (`Lid (_loc, "b")))))
>
>  -----------------------------------------------------------------------------------------------
>  the .Fan.Lang.Meta.expr the first '.' means it's from the absolute
> namespace,  the *N.expr shares exactly the same syntax without location*,
> though
>
>    4. Portable to diffierrent compiler extensions(like LexiFi's fork of
> ocaml)
>        I am pretty sure it's pretty easy to do in Fan, only Ast2pt
> (dumping the intemediate Ast into Parsetree) part need to be changed to
> diffierent compilers.
>
>
> ----------------------------------------------------------------------------------------------------------------
> Now let's talk about some internal parts of SMP.
> Quasi-Quotation is the essential part of SMP,  I am surprised so far that
> the discussion *silently ignores the quasi-quotation,* Leo's answer of
> writing   three parsers is neither satisfying nor practical(imho).
>
> Camlp4 is mainly composed of two parts, one is the extensible parser and *the
> other significant part is Ast Lifting*. Since we all agree that
> extensible parser increases the complexity too much, let's simply ignore
> that part.
>
> The Ast Lifting are tightly coupled *with the design of the Abstract
> Syntax Tree.*  People complain about that Camlp4 Ast is hard to learn and
> using quasi-quotation to do the pattern match is a bad idea.
>
> Let me explain the topic a bit:
>     Camlp4Ast is hard to learn, I agree, it has some alien names that
> nobody understand what it  means, quosi-quotation *is definitely a great
> idea* to boom the meta-programming, but my experience here is *for very
> very small Ast fragment, using the Abstract Syntax Tree directly,* otherwise
> Quasi-quotation is a life saver to do the meta programming.
>    Luckily the quotation kit has nothing to do with the parser part, it's
> simply several functions(I did some simplify a bit) which turns a normal
> runtime
> value into an Ast node generically, *such kind functions are neither easy
> to write nor easy to read*, *the idea case is that it should be generated
> once for all, and all the data types in normal ocaml* *should be derived
> automatically*(some ADT with functions can not be derived). *I bet it's
> mostly likely a nightmare if we maintain 3 parsers for the ocaml grammar
> while two other parsers dumping to a meta-level*
>
>    So, how to make Ast Lifting easier,
>         The first guideline is *"Don't mixing with records", *
> *         *Once you encoding AST with records, you have to encode the
> records in the meta level which increases the complexity without bringing
> any new features, *it's simply not worthwhile.*
> *
> *
> *       * The second guideline is "Don't do *any *syntax desugaring" ,
> syntax desguaring makes the semantics of syntax meta programming a bit
> weird. Syntax desguaring happens everywhere in Parsetree, think about the
> list literals, it uses the syntax desuaring, if you don't use any syntax
> desugaring, for example, you want to match the bigarray access, you simply
> needed to match `Bigarray(..)' instead of
>
> Pexp_apply
>         ({pexp_desc=Pexp_ident
>                      {txt= Ldot (Ldot (Lident "Bigarray", array),
> ("get"|"set" as gs)) ;_};_},
>          label_exprs)
> ----------------------------
>        The third guideline is to* *make it *as uniform as possible*
> *       *This not only helps the user, but *it helps the meta-programming
> over types to derive some utility types. *Take a look at my Ast encoding
> in Fan https://github.com/bobzhang/Fan/blob/master/src/Ast.ml (it needs
> to be polished, plz don't panic when you see variants I use here)
> *      *The initial Ast has locations and ant support, but* here we
> derive 3 other Asts thanks to my very regular design*.* AstN is the Ast
> without locations*, the locations are important, but it is simply not too
> much helpful when you only do the code generation, but it complicates the
> expanded code a lot), *AstA is the Ast without antiquotations(simply
> remove the ant branch), *it is a subtype of Ast(thanks to the choice we
> use variants here), *AstNA is the Ast without neither locations nor
> antiquotations*), it is a subtype of AstN.  *In practice, I found the Ast
> without locations is particular helpful when you only do the code
> generation, it simplifies this part significantly. The beautif**ul part
> is that  all the four Ast share the same grammar with the same
> quosiquotatoin mechanism, as I showed .Fan.Lang.N.expr and .Fan.Lang.expr*
>     I don't know how many parsers you have to maintain to reach such a
> goal or it's never going to happen.
>     Using variants to encode the intermediate ast has a lots of other
> benefits, but I don't want to cover it in such a short mail.
>
>    So,* my proposal is that the community design an Intermediate Ast
> together, and write a built-in parser to such Intermediate Ast then dump to
> Parsetree, but I am for that Parsetree still needs to be cleaned a bit but
> not too much change .  *I do appreciate you can take something away from
> Fan, I think the Parsetree is* not the ideal part* to do SMP, HTH
>
> --
> -- Regards, Hongbo
>



-- 
-- Regards, Hongbo
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ocaml.org/pipermail/wg-camlp4/attachments/20130129/ce5cc0d5/attachment-0001.html>


More information about the wg-camlp4 mailing list