[wg-camlp4] Against the use of syntactically-valid OCaml code for syntax extension purposes

Gabriel Scherer gabriel.scherer at gmail.com
Tue Jan 29 10:52:56 GMT 2013


Dear wg-camlp4 list,

I have been reading the list discussion and find it very interesting
so far. One point that I am worried about is the seemingly consensual
idea that the way to integrate syntax extensions in a post-camlp4
world is to massage them into syntactically correct OCaml code, to be
parsed by the existing parser and later processed by a -ppx filter.

I think this is a terrible idea. Code with a special semantics should
have a special syntax. Otherwise, how is the user supposed to guess
which syntactically correct code follows the expected OCaml semantics,
and which is in fact to be understood in the context of a specific
syntax extension?

Camlp4 quotations are a generic way to embed foreign pieces of syntax
into an OCaml parstree in a safe, explicit, modular and composable
way. I understand that it is annoying to be forced to "quote" code
that is "mostly valid OCaml code" (eg. encoding type-conv with
continuations would give something as <:type-conv< type foo = ... with
... >>, while we would like to preserve the type declaration and
localize the extension to the "with" part). Sometimes, a full
quotation syntax is also too heavy to make sense in the precise
context (eg. regexp patterns in ulex/xstr/whatever). I think we should
cluster the existing perversions into a small amount of cases that we
add to the existing OCaml syntax (eg. overloadable string literals for
unicode support, xstr and PG'OCaml), as a *different* syntax from what
is currently accepted and well-defined.

(When I say "syntactically valid" this is to be taken relatively to
the future OCaml language: I would be ok with forbidding some
currently accepted corner cases to reuse them for extension purposes.)

This coincides with Fabrice's opinion to agree on a common syntax for
quotations/extensions among extensions. Discipline and robustness at
the cost of generality, that is (imho) the meaningful direction an
out-of-camlp4 work should take. I understand the temptation to express
arbitrary things in the limited (but already useful) -ppx framework as
it stands now, but I think we should resist it.

So that would be my answer: *reject* reuse of syntactically valid code
for extension purposes, and isolate a *few* common cases of extensions
that deserve new extension points to be added to the standard parser
(arbitrary quotations being the maximally expressive, but also
heaviest, tool in this framework, into which all non-otherwise-handled
extension needs would fallback). The question of tool support (Tuareg
indentation, etc.) is solved by updating tools to support those few
extension points, no masquerade new semantics under old syntax.

What do you think of embeddings of syntax extension into syntactically
valid syntax ?

On Mon, Jan 28, 2013 at 4:10 PM, Dario Teixeira <darioteixeira at yahoo.com> wrote:
> Hi,
>
>
> Another syntax extension to consider is PG'OCaml's.  Though it seems like
> it can easily be adapted to the ppx system, it does raise an issue which I
> think ought to be discussed.
>
> Here's how code making use of embedded SQL in PG'OCaml currently looks like:
> (the dbh parameter is the database handle)
>
>   let fetch_users dbh =
>         PGSQL(dbh) "select id, name from users"
>
> This is syntactally incorrect, of course.  Therefore, any adaptation to the
> ppx system will necessarily be backwards-incompatible with existing code
> (not a big problem for me personally, but others may disagree).  As for
> the new syntax, several possibilities exist.  A straightforward one would
> be to lowercase the "PGSQL" token; embedded SQL statements would thus take
> the form of fake function calls:
>
>   let fetch_users dbh =
>         pgsql(dbh) "select id, name from users"
>
> Another possiblity would be for the SQL statements to be arguments to a fake
> variant type constructor:
>
>   let fetch_users dbh =
>         PGSQL (dbh, "select id, name from users")
>
> I could go on.  Anyway, regardless of choice, one issue comes to mind.
> Presently, code using the PG'OCaml syntax extension is conspicuous by its
> syntactic incorrectness.  Therefore, someone unfamiliar with PG'OCaml who
> happened to be looking at code using the syntax extension would immediately
> suspect something camlp4ish was going on.  With ppx, this signaling would
> go away.  This could make code harder to read if several syntax extensions
> are simultaneously used, particularly if syntax extensions become more
> popular because ppx makes it so much easier to write them.
>
> One solution is for the community to adopt conventions that make syntax
> extensions stand out to the naked eye (syntax highlighters in Vim or Emacs
> could also take advantage of these conventions).  For example: the fake variant
> constructor could be instead a fake PV constructor ending in "'" (prime).
> It should not be hard for editors to highlight this code in a special way:
>
>   let fetch_users dbh =
>         `PGSQL' (dbh, "select id, name from users")
>
> What do you think?
>
> Best regards,
> Dario Teixeira
>
> _______________________________________________
> wg-camlp4 mailing list
> wg-camlp4 at lists.ocaml.org
> http://lists.ocaml.org/listinfo/wg-camlp4


More information about the wg-camlp4 mailing list