[wg-camlp4] Structured comments, shallow embeddings and deep quasiquotations
Alain Frisch
alain at frisch.fr
Tue Feb 5 06:53:03 GMT 2013
On 2/1/2013 4:38 PM, Gabriel Scherer wrote:
> Regarding concrete syntax, I would be tempted to suggest that (prefix
> or postfix) annotations be attached to non-terminals (including
> parenthesis). I believe there is an important difference between
> "modulating this AST node" and "modulating the whole enclosed
> expression" that would relevant for reasoning on the locality and
> composability of syntactic extensions. I'm not sure yet what's the
> best way to go (I'll try to see what Leo and you suggest).
I don't see the benefits of allowing annotations on parenthesis (and
why, not, comments), or individual tokens (the "for" "=" "to" or "do" in
"for i = 1 to 10 do"), only extra complexity in the definition of the
Parsetree.
Can you give an example where putting an annotation on e.g. a
parenthesis would be the right thing to do?
> (For example by only passing to the
> extension writer the part(s) of the AST that have been annotated).
> Unfortunately, I don't see how Bisect would fit any such restriction.
> Maybe that's a problem best solved by socialization (writing a
> documentation on good practices, and yelling on people)
I'm strongly in favor of solving that by socialization, if any problem
actually occur, rather than restricting technically what an AST rewriter
can do. My arguments:
1. I don't think there will be any problem in practice. At least this
discussion should be motivated by some concrete problems to be expected
in the interaction of common extensions. For instance, the bad
interaction between type-conv and deriving as of today would disappear,
and I don't see how they could be written to introduce other problems
(and if they do and some people want to use both, this should be easy to
fix).
2. Jeremy's proposal more or less requires all rewriters to be in the
same process. This requires either Dynlink (which is not available
under all platforms in native code; in the past there has been a strong
resistance to have the compiler and basic tools depend on it) or custom
static linking (which is not very "build system" friendly).
3. Some very legitimate uses of ppx don't work nicely with the
constraint that they can operate only on a marked fragment. You cite
Bisect, but this is already the case for Sedlex, which injects some
shared declarations at the beginning of the unit. Yes, if two
"extensions" do that, the order in which they will be applied will
produce different result, but I cannot imagine a case where this would
be problematic. I can see a lot more situations where injecting code
non-locally makes sense. Just as an example, consider an extension to
mark some methods as "memoized":
object
...
(@memoized) method foo = ....
...
end
===>
object
...
val mutable foo_memo = None
method foo =
match foo_memo with
| Some x -> x
| None -> let r = ... in foo_memo <- Some r; r
...
end
Here we need to inject code "just above". In other cases, one might
want to inject code "at the beginning" or "at the end" (of the whole
unit, of the current structure, etc), or at a completely different place
(example: macro expansion).
4. Having "registered" expanders called when they are identified by a
specific marker during a top-down traversal hard-codes a top-down
rewriting strategy. There are cases where this is not optimal.
Consider for instance the interaction between a macro system (or
conditional compilation) and other extensions. I can easily imagine
cases where one really wants macros to be expanded before other
extensions operate on the AST (e.g. because the result of macro
expansion generate extension markers / attributes). Sedlex could for
instance benefit from a macro system on patterns (in order to replace
it's hard-coded notion of declared regexps). One should at least allow
each expander to apply explicitly the "complete rewriting" on a sub-tree
and inspect the result (allowing bottom-up rewriting on demand). But
then we loose all nice reasoning properties.
5. It is always possible to add a marker at the top of the compilation
unit and do all the rewriting there, so bad things are still possible.
6. Order in which ppx rewriters are applied matters, but I don't believe
it is difficult to reason on the resulting composition, nor to devise a
"good" order for most cases. At least, the behavior of the composition
only depends on the behavior of each rewriter (which can be observed
with the -dsource compiler option). Users don't have to understand how
an extension is implemented, but only what it does (in terms of AST
rewriting), in order to understand how it will compose with other
extensions. I'm not sure that reasoning on nested rewriting, especially
if expanders are allowed to call other expanders on sub-terms and
post-process the result, will be any simpler. In practice, I suspect
that simple priority rules between ppx rewriting would be enough to
eliminate most problems (e.g. run first all "macro expanders /
conditional compilation", most of the other extensions would interact
nicely in whichever order they are applied).
7. A nice property of "ppx" is that it is not a "system", which reduces
the risks of over engineering and design mistakes, and simplifies the
learning curve. I suggest to keep to extra layers / libraries the role
of providing higher level APIs with a more restricted semantics but
stronger invariants. Nothing prevent people from proposing such systems
on top of the current -ppx flag and syntactic extensions being
discussed, but those extensions should not be tightly coupled with these
systems.
Alain
More information about the wg-camlp4
mailing list