[Merlin-discuss] further integration of merlin at Jane Street

Sat Aug 24 00:48:23 BST 2013

Here are some thoughts on further integrating merlin into the Jane
Street environment.  I preface this by saying I don't understand the
current merlin/emacs integration, so feel free to elucidate me.

To state what I think is a goal:

  A programmer should be able to go to any position in any OCaml
  source file on their machine, press a button, and autocomplete at
  that point (or be told that autocompletion isn't possible because
  the necessary libraries haven't been compiled).

The first thing I think this implies is that we need to keep a merlin
installed for each ocaml that we install (i.e. /janelibs/ocaml-*/).
This is needed so that the merlin is compatible with the source and
object files for the project.  The merlin editor integration can
determine the correct merlin to run by looking at the .omake-ocaml-bin
at the root of the repo as generated by the build system.  Presumably
our deployment of merlin would put "ocamlmerlin" and related
executables in the same bin as ocamlc, ocamlopt, etc., or in some
fixed directory relative to that bin.

Next, we need merlin to understand the libraries that are in scope in
a given directory.  I think this is a fairly straightforward matter of
having jenga output a .merlin for each directory that contains OCaml
files.  That .merlin would be complete, i.e. not require any recursion
up the directory tree.  The contents of the .merlin would be the
OCAML_LIBRARIES plus the libraries as recursively required by
OCAML_INTERFACES (i.e. whatever the build system is already making
available when compiling source files in that directory).

Next, I speculate that for performance reasons, we need a merlin
server to cache information so that it can quickly reconstruct the
environment at a given program point.  It might cache:

  * the environment for each library
  * the environment that is the union of libraries imported for a
    directory
  * even more, perhaps the imported libraries plus all of the files in
    a directory

A natural architecture for this is to have a merlin server process for
each project (i.e. per jengaroot).  When one first visits a project,
the merlin process would be created.  As one queries for
autocompletion in files in the project, that merlin process answers
the queries, using and updating its cache.

To support multiple simultaneous projects, one needs to manage a set
of merlin processes (with possibly different merlin versions).  It is
unclear whether the process management should be written separately
for each editor, or whether we should write a "meta-merlin" in OCaml
that presents a unified command-line interface that could then be used
by any editor.  At Jane Street, we need to support emacs and vim, so
there is at least some advantage to a meta-merlin written in OCaml.
But if the process management is simple enough in each editor, then
perhaps an OCaml meta-merlin isn't worth it.

We already have an analog of meta-merlin at Jane Street -- it's called
omake-server, which manages a collection of per-project build
processes.  I chatted with Pete about the possibility of implementing
meta-merlin.  Roughly, we think that having ocaml-server has worked
out well, although there are a number of ways in which the
implementation is too complicated.  So, if we write meta-merlin, we'd
like to start with a new codebase, with heavy involvement from Pete to
avoid some of the design mistakes of omake-server.

That's probably enough for now.  Hopefully people who understand the
merlin architecture can speak up so we can all understand it clearly,
and we can then decide how to proceed to get a merlin that works
smoothly for all Jane Street devs.