[wg-camlp4] Reconciling Merlin and ppx

Mon Nov 4 14:11:17 GMT 2013

On 04/11/2013 15:01, Yaron Minsky wrote:
> On Mon, Nov 4, 2013 at 8:02 AM, Frédéric Bour <defree at gmail.com> wrote:
>> On 02/11/2013 18:21, Yaron Minsky wrote:
>>> I've been playing using Merlin more and more for personal projects,
>>> and have started to wonder about how we're going to make Merlin work
>>> effectively in a ppx world.  In some ways, ppx simplifies Merlin's
>>> job, since there's now just one concrete syntax to parse.
>>>
>>> But there are still problems to solve.  In particular, you will still
>>> need to be able to apply AST-transformers to the partial ASTs that
>>> Merlin generates in order to run it through the type-inference stage,
>>> or to determine the environment.
>> Those partial ASTs are still represented with the same type as OCaml
>> "normal" ASTs.
>> There should be no difference for the transformers at this stage.
>>
>>
>>> Obviously, Merlin could just reimplement a bunch of common AST
>>> transformers, as it does now for some common camlp4 extensions.  But
>>> this seems like a situation that is too painful to maintain in the
>>> long term.
>> Indeed, ppx seems to be the opportunity to stop hardcoding extensions in
>> merlin.
>>
>>
>>> A better idea, perhaps, would be to have some libraries or standards
>>> about how to present an AST transformer so that it's suitable for
>>> running against a partial AST.  I don't really know the details of how
>>> Merlin works, but I understand it does a first pass to parse
>>> expressions into top-level units, the parsing of which is done
>>> independently.  AST transformers that operate on those units would
>>> naturally gain some level of incrementality.  Another, perhaps more
>>> intrustive approach would be to write PPX transformers against
>>> Merlin's version of the AST.
>> The work done by the transformer when invoked by merlin or by the toplevel
>> should be the same: the AST is given chunk-by-chunk, each chunk being passed
>> for transformation.
> Does this work for sub-modules as well?  It's not rare in my
> experience to do quite a bit of programming within a sub-module.  In a
> top-level, you'd have to do that as one top-level chunk, which seems
> like the wrong solution for merlin.

In merlin, yes.
This work for modules of the form "module A (P0 : S0) … (Pn : Sn) : S = 
struct … end".
In this case, merlin adds (P0 : S0), … (Pn : Sn) to the environment, 
then work with definitions inside "struct … end" as it would do in the 
top-level context.

All cases are not handled (like completing inside the argument of a 
functor application, constraining the signature after the structure, not 
during the binding).

>> Under the assumption that, for a given transformer t and without imposing an
>> order of transformation:
>> t (def1;; def2) ~= t(def1) ;; t(def2)
>> using this transformer in merlin should not require specific work.
>>
>> That being said, I don't know much about ppx. Here are the cases I see that
>> could be problematic in general:
>> - the transformer relies on some global state, that has to be threaded
>> across different invocations.
>> - the toplevel works only in a sequential way, and backtrack only if the
>> current definition fails to type. merlin can rollback to arbitrary point in
>> files, which could break assumptions in the transformer if there is a state.
>> - merlin also split definitions inside modules, which means that pieces of
>> ast could be extracted and transformed either separately or not in the
>> expected order (e.g. the signature constraining a structure can be extracted
>> by merlin and transformed as if it was an independent declaration, then
>> reinjected in the AST).
>>
>>
>>> I don't quite know what the right solution is, but when 4.02 comes out
>>> we're all going to be writing a bunch of new AST transformers, and if
>>> we really want Merlin to be a success, we should have a plan for how
>>> those transformers will integrate with it.
>>>
>>> y
>>