[wg-camlp4] Structured comments, shallow embeddings and deep quasiquotations

Gabriel Scherer gabriel.scherer at gmail.com
Thu Jan 31 21:13:52 GMT 2013


I like Leo's idea of distinguishing three different forms of syntax
extensions (I'm not discussing concrete syntax):

1. structured comments, which as the name indicates do not directly
  influence the semantics of the OCaml source file

  (users that don't have the corresponding extension can still use the
  source file)

  example in Use Cases
  ( https://github.com/gasche/ocaml-syntax-extension-discussion/wiki/Use-Cases
),
  Bisect comments

  match List.map foo [x; a x; b x] with
  | [y1; y2; y3] -> tata
  | _ -> (*BISECT-VISIT*) assert false

  (Of course the concrete syntax does not need to be OCaml comments,
  and in fact it would be nice to parse them as OCaml code rather than
  raw strings; but this does not cover all "annotations", only those
  that can effectively be turned into comments and leave
  a semantically correct file.)


2. shallow embeddings, which is the style of lightweight marking
  (in mostly valid OCaml syntax) of -ppx rewrites that Alain promotes

  (the writer of said extension expects an OCaml syntax tree as input)

  example in Use Cases : "modified sedlex" (modified by me following
  the spirit of the compromises suggested by Alain in the "not valid
  OCaml syntax" discussion)

   (:sedlex
     let:regexp letter = ('a'..'z'|'A'..'Z') in
     match:lexer buf with
     | number -> Printf.printf "Number %s\n" (Sedlexing.Latin1.lexeme
buf); token buf
     | letter, Star ('A'..'Z' | 'a'..'z' | digit) -> Printf.printf
"Ident %s\n" (Sedlexing.Latin1.lexeme buf); token buf
   )

  (or possibly {{$letter([A-Z]|[a-z]|$digit)*}} -> ...)

3. deep embeddings, which uses a camlp4-like mechanism of
   quasiquotations to inject arbitrary syntax, and antiquotations to
   locally return to standard OCaml code with standard semantics
   (sorry Jeremy, I'm using the "quotation" name for this potentially
   improper usage, but I can't find a good matching pair for
   "antiquotations" otherwise)

  (the writer of said extensions expects a string to parse, with
  library support to get OCaml syntax trees out of the antiquotations)

  example in Use Cases: Cass

  let button = <:css<
     .button {
       $Css.gradient ~low:color2 ~high:color1$;
       color: white;
       $Css.top_rounded$;
   >>

(It's not an exhaustive categorization: I'm not discussing arbitrary
extensions of the OCaml grammar such as "let open" here. Nobody on
wg-camlp4 seems to be discussing those anyway.)




I would like to follow up with some questions:


- What is the breadth of structured comments?

  Which use cases can this form cover?  It's clear that it is adapted
  for compiler pragmas (say local warning/error selection, which we
  have wanted for a long time in OCaml) and more generally local
  configuration of analysis tools (mascot, find-bisect, whatever)

  Anil's remark on "tools that perform external code generations" can
  also be understood in this form:
  https://github.com/avsm/ocaml-github/blob/master/lib/github.atd
  could fairly easily be translated in this style.


- What can user assume of the semantics of shallow embeddings?

  I think having a global marker (here a "prefix annotation" in
  Leo's taxonomy) to denote a piece of OCaml syntax with non-standard
  semantics is important. But that does not resolve all
  questions. Will only the topmost "match" construction be
  interepreted as a lexer, or does this reinterpretation applies
  recursively in depth? (In my snippet I marked concerned matches with
  a :lexer annotation).

  I have the impression that some choices may lead to extensions that
  compose as well as the deep quasiquotations, but that some other may
  result in extensions as fragile as unrestricted grammar
  extension.

  Besides, the concrete syntax is important here: a (match:lexer
  ... with ...) annotation is clearly local to this one match
  construct, while I would tend to understand (:sedlex match .. with
  ...) as being a modality over the expression as a whole.


- What are good concrete syntaxes for these forms?

  (So far: Leo's suggestions, Alain snippets, and existing choices in
  existing syntax extension frameworks)

  It's not clear that shallow embeddings and foreign quasiquotations
  should have radically distinct syntaxes. It's important for the
  extension developper to specify the minimum amount of syntactic
  freedom needed, but does the user care which is used?


- Which forms could/should be *eventually* proposed for adoption by
  the compiler's OCaml definition, which are best relegated to a more
  sophisticated external tool like Fan?


More information about the wg-camlp4 mailing list