[wg-camlp4] Structured comments, shallow embeddings and deep quasiquotations
Gabriel Scherer
gabriel.scherer at gmail.com
Thu Jan 31 21:13:52 GMT 2013
I like Leo's idea of distinguishing three different forms of syntax
extensions (I'm not discussing concrete syntax):
1. structured comments, which as the name indicates do not directly
influence the semantics of the OCaml source file
(users that don't have the corresponding extension can still use the
source file)
example in Use Cases
( https://github.com/gasche/ocaml-syntax-extension-discussion/wiki/Use-Cases
),
Bisect comments
match List.map foo [x; a x; b x] with
| [y1; y2; y3] -> tata
| _ -> (*BISECT-VISIT*) assert false
(Of course the concrete syntax does not need to be OCaml comments,
and in fact it would be nice to parse them as OCaml code rather than
raw strings; but this does not cover all "annotations", only those
that can effectively be turned into comments and leave
a semantically correct file.)
2. shallow embeddings, which is the style of lightweight marking
(in mostly valid OCaml syntax) of -ppx rewrites that Alain promotes
(the writer of said extension expects an OCaml syntax tree as input)
example in Use Cases : "modified sedlex" (modified by me following
the spirit of the compromises suggested by Alain in the "not valid
OCaml syntax" discussion)
(:sedlex
let:regexp letter = ('a'..'z'|'A'..'Z') in
match:lexer buf with
| number -> Printf.printf "Number %s\n" (Sedlexing.Latin1.lexeme
buf); token buf
| letter, Star ('A'..'Z' | 'a'..'z' | digit) -> Printf.printf
"Ident %s\n" (Sedlexing.Latin1.lexeme buf); token buf
)
(or possibly {{$letter([A-Z]|[a-z]|$digit)*}} -> ...)
3. deep embeddings, which uses a camlp4-like mechanism of
quasiquotations to inject arbitrary syntax, and antiquotations to
locally return to standard OCaml code with standard semantics
(sorry Jeremy, I'm using the "quotation" name for this potentially
improper usage, but I can't find a good matching pair for
"antiquotations" otherwise)
(the writer of said extensions expects a string to parse, with
library support to get OCaml syntax trees out of the antiquotations)
example in Use Cases: Cass
let button = <:css<
.button {
$Css.gradient ~low:color2 ~high:color1$;
color: white;
$Css.top_rounded$;
>>
(It's not an exhaustive categorization: I'm not discussing arbitrary
extensions of the OCaml grammar such as "let open" here. Nobody on
wg-camlp4 seems to be discussing those anyway.)
I would like to follow up with some questions:
- What is the breadth of structured comments?
Which use cases can this form cover? It's clear that it is adapted
for compiler pragmas (say local warning/error selection, which we
have wanted for a long time in OCaml) and more generally local
configuration of analysis tools (mascot, find-bisect, whatever)
Anil's remark on "tools that perform external code generations" can
also be understood in this form:
https://github.com/avsm/ocaml-github/blob/master/lib/github.atd
could fairly easily be translated in this style.
- What can user assume of the semantics of shallow embeddings?
I think having a global marker (here a "prefix annotation" in
Leo's taxonomy) to denote a piece of OCaml syntax with non-standard
semantics is important. But that does not resolve all
questions. Will only the topmost "match" construction be
interepreted as a lexer, or does this reinterpretation applies
recursively in depth? (In my snippet I marked concerned matches with
a :lexer annotation).
I have the impression that some choices may lead to extensions that
compose as well as the deep quasiquotations, but that some other may
result in extensions as fragile as unrestricted grammar
extension.
Besides, the concrete syntax is important here: a (match:lexer
... with ...) annotation is clearly local to this one match
construct, while I would tend to understand (:sedlex match .. with
...) as being a modality over the expression as a whole.
- What are good concrete syntaxes for these forms?
(So far: Leo's suggestions, Alain snippets, and existing choices in
existing syntax extension frameworks)
It's not clear that shallow embeddings and foreign quasiquotations
should have radically distinct syntaxes. It's important for the
extension developper to specify the minimum amount of syntactic
freedom needed, but does the user care which is used?
- Which forms could/should be *eventually* proposed for adoption by
the compiler's OCaml definition, which are best relegated to a more
sophisticated external tool like Fan?
More information about the wg-camlp4
mailing list