[wg-camlp4] Time for a summary?

Thu Feb 7 10:09:29 GMT 2013

On 02/06/2013 07:48 PM, Hongbo Zhang wrote:
>     Some arguments why Fan is a better candidate than ppx
>     1. Fan is much much faster than P4 (10~20 times)

This is great, in particular because it illustrates a vastly simplified 
bootstrapping procedure of Fan compared to Camlp4.  But this does not 
bring much to the comparison with ppx.

"-ppx" on its own is even faster to build, since there is nothing to 
build (it is just a command-line flag, in the same way as "-pp" on which 
Fan relies, I guess).

>     2. Fan does not require any compiler change, easy to distribute, on
> the contrary the pervasive change to compiler is close to kill P4, Fan
> or any other advanced external tools

ppx is already part of the development version and required minimal 
changes.  We are also discussing the addition of a few syntactic 
constructs which will only impact the definition of Parsetree and the 
official parser.  This cannot really be considered as a "pervasive 
change" to the compiler.

Can you elaborate on why you think this would kill Camlp4 or Fan?  I 
know from experience that Camlp4 is quite tedious to update when the 
concrete syntax of OCaml changes, but I'm sure someone will manage to 
update Camlp4/Fan definition of OCaml's AST and the associate parsers.

-ppx is compatible with pre-processors implemented with -pp 
(Camlp4/Fan), as long as those pre-processors can understand the new 
syntactic constructs and pass them back to OCaml.

 > unlike P4, it will
 > not inhibit OCaml's compiler's progress.

Your point above illustrates that it is not that simple: a simple 
addition to the OCaml syntax, which would normally require to adapt only
parser.mly, parsetree.mli, ast_mapper.ml/mli and a few other modules in 
the compilers, also requires to port the same changes to Fan's 
definition of the OCaml AST and to its parsers.

>     3. It's easy to build in Fan

Again, this seems more like a comparison between Fan and Camlp4 than 
between Fan and the ppx approach.

 > Fan can recognize its syntax on its
 > own, without external tool's help

How does this argument apply to the comparison with the ppx approach?

>     4. It's easy to port P4's code base to Fan, it only takes me 2 hours
> to port Alain's ulex to Fan

It did not take that much longer to port ulex from Camlp4 to -ppx, and 
as an extra added bonus, it really gave me the feeling that I could 
finally *breathe* and understand exactly what I'm writing. Frankly, I'm 
more comfortable writing:

   E.let_ Recursive states
     (E.sequence
        (appfun "Sedlexing.start" [eid lexbuf])
        (E.match_ (appfun (state_fun 0) [eid lexbuf])
           (cases @ [P.any (), error])
        )
     )

than:

   <:expr< fun lexbuf ->
     let rec $list:Array.to_list states$ in
     do { Ulexing.start lexbuf;
          match __ulex_state_0 lexbuf with
          [ $list:Array.to_list cases$ | _ -> raise Ulexing.Error ] } >>

which requires me to learn two new "sub-languages" (the revised syntax 
and the notion of quotation / antiquotation).  Imagine that the "rec" 
flag above should be set only according to some condition to be checked; 
I know directly how to write that in regular OCaml but I would need to 
dig into Camlp4 documentation (or not) to see how to introduce an 
"anti-quotation on the rec flag".

Is the code for the Fan version of ulex/sedlex available somewehere?

 > (I also make it available on the toplevel,
 > this feature is not available in sedlex or ulex)

sedlex works fine in the toplevel:

$ ocaml -ppx "ocamlfind sedlex/sedlex.exe"
         OCaml version 4.01.0+dev10-2012-10-16

# #use "topfind";;
- : unit = ()
Findlib has been successfully loaded. Additional directives:
...
# #require "sedlex";;
/home/afrisch/.opam/4.01.0dev+trunk/lib/sedlex: added to search path
/home/afrisch/.opam/4.01.0dev+trunk/lib/sedlex/sedlexing.cma: loaded
# let rec token buf =
   let ('a'..'z'|'A'..'Z') as letter = SEDLEX.regexp in
   let '0'..'9' as digit = SEDLEX.regexp in
   match SEDLEX buf with
   | Plus letter -> Printf.printf "Word %s\n" (Sedlexing.Latin1.lexeme 
buf); token buf
   | Plus digit -> Printf.printf "Number %s\n" (Sedlexing.Latin1.lexeme 
buf); token buf
   | _ -> failwith "Error"
               ;;
...
val token : Sedlexing.lexbuf -> 'a = <fun>
#

>     5. Global Ast Rewriter is available but discouraged
>     6. Local Ast Rewriter is provided(deriving and type_conv conflicts
> will never happen in Fan)
>         {:ocaml| type u = A of int |}
>         {:derive| (sexp,json) |}

Can you describe in more detail your position concerning the following 
points?

   - How do you deal with cases where the "expander" for such a local 
annotation needs to access the context of the AST fragment on which it 
applies, or where it needs to inject code non-locally?  For instance, 
several ulex/sedlex lexers specified in the same file will share code 
"partition function" (put at the beginning of the compilation unit in 
this case).

  - How would you support "ignorable" attributes, e.g. information 
targeted to specific tools like bisect or a variant ocamldoc based on 
structured comments?

  - How would you support a simple macro system?  For instance, I can 
imagine to use -ppx to do something like:

    let(:macro) if_debug x =
      if !debug_mode then (print_endline "DEBUG:"; x)

    let debug_mode = ref false

    ....

    if_debug (print_endline "XXX")

    What would be the Fan equivalent of it?

  - Imagine that two type-conv-like extensions need to be applied on the 
same type declaration, and detect attributes put on type expressions in it:

     type foo =
           {
             quantity: float (@xml ~digits:2) (@bin single);
             code: string (@xml ~cdata) (@bin base64);
           }
          (@xml) (@bin)

  (here we have two annotations "xml" and "bin" on the type declaration 
"foo", understood by two independent tools which generate marshaling 
code based the declaration, and interpret annotations on inner type 
expressions to customize the format)

  How do you write that in Fan?  My understanding is that one the 
expanders will be applied first and it will need to parse the type 
declaration itself; but what will it do with attributes intended to be 
used by the other extension?

  - Can you confirm that the content .... of {:foo| .... |} is passed as 
an unparsed string to the expander "foo"?  If this is the case, what is 
your position concerning support from editor?  Should they treat .... 
like OCaml code (e.g. applying lexing rules for coloring and grammar 
rules for indentation)?  If so, this syntax will not work well with 
fragments of code in a foreign syntax, and could potentially lead to 
broken editor behavior.  If not, you loose all support from your editor 
as soon as your are in the scope of an expander.  Or maybe do you 
envision a much ambitious proposal where the editor would somehow need 
to run the expander "on the fly" (I don't believe this is possible at a 
reasonable cost).  Another question: how is the end marker |} detected? 
  Can you write e.g.:

    {:html|
      <p>I <em>really</em> like <b>Fan</b> syntax, and especially
      its terminator |}
    |}

>     7. Fan's notation is uniform
>         only {:| |} is introduced, I  am a bit headache when seeing so
> many notations

I claim that this single notation corresponds to three different 
concepts which deserve to be understood separately:

   - Extension marker around a parsed fragment, to be interpreted by an 
expander (otherwise the type-checker fails).

   - Attributes attached on AST fragments, to be (optionally) 
interpreted by tools like Bisect, by expanders, or which can trigger 
themselves local rewritings (in that case, one might want a specific 
syntax to ensure they won't be silently ignored by the type-checker).

   - Quotations: a way to write fragment of code in a foreign syntax, by 
escaping from OCaml lexical conventions on string literals.  (I believe 
this is rather orthogonal to -ppx and might be useful for regular 
libraries, but I accept the idea of a combined notation: extension 
marker + quotation.)

>     8. Fan provides first class parser lexer(or in-line parser/lexer) as
> a library

I claim that most interesting uses of camlp4 extensions around can be 
implemented in a way which does not require any extra lexing/parsing 
technology in addition to what's included in the core compiler today 
(based on ocamllex/ocamlyacc).  (The only exceptions seem to be these 
extensions which need to parse foreign syntax, and I find it good that 
they can rely on an arbitrary parsing technology.)

Alain