[ocaml-platform] An alternative proposal for namespaces

Thu Mar 21 06:42:11 GMT 2013

On 3/20/2013 10:07 PM, Leo White wrote:
> This is the strategy I have referred to as "regular ocamldep with
> generated search path files", it works just as well with simple
> namespaces. The only difference is that the build system generates the
> search path file to give to ocamldep, rather than making the user write
> it by hand.

So the "good" mode for using ocamldep would be to have the build system 
generate a big search path file for each call to ocamldep?

   - How does the build system generate this search path file?  I guess 
it has to know about the "simple namespaces" convention.  Does it?  And 
does it know about the "-name" arguments scattered around in many 
subdirectories.  Concretely, I don't see how this would work under 
omake, for instance.

   - How this would work for non-namespaced modules?  Can you represent 
them with searh path files as in your proposal.  I thought that search 
path files only defined namespaced names.

   - If you assume that the build system can generate a search path file 
to avoid calling ocamldep (and thus the compiler as well) with any -I 
directory, what's the point of supporting -I directories any more in the 
compiler and tools?

> As a side note, I think that "ocamldep -modules" should continue to be a
> purely syntactic version that ignores the search path. It is regular
> ocamldep that should be used for this purpose.

I propose that "ocamldep -modules" ignores the search path directories, 
but knows about the definition of namespaces.  Otherwise, you need to 
invent a new convention to report possible namespaces together with each 
module dependency.

>>> I'm not particularly worried about hypothetical build systems. If you
>>> want to implement such a build system then you should really add hooks
>>> into the OCaml compiler. This argument also assumes that catching
>>> "Sys.file_exists" is fine but catching "Sys.readdir" is impossible.
>>
>> No, this argument does not assume that.  But catching Sys.readdir is useless, since you don't know which files the
>> compiler is interested in. The tool would have to assume that the dependency is on the entire directory, which is of
>> course way too weak.
>
> The tool would only have to know what files it could produce, but it
> should already know that to answer Sys.file_exists queries.

That's not the way it works, Sys.file_exists returns not only files that 
the build system can produce but also files which are already here.

The build system I was referring to worked like that:

  - The project is specified by a list of build commands, each of which 
annotated with a list of target files (assumed to be created by the 
command).

  - The build is triggered by asking to build one target.

  - To build one target X, the system picks a command which lists X as a 
target.

  - If the exact same command has already been run previously (this 
information is kept in a persistent cache), the system checks that pre- 
and post-conditions attached to that run are still valid in the current 
state of the file system.  If yes, the command does not need to be run 
again.

  - The recorded conditions are:  the content (before execution) of any 
file opened for reading and the content (after execution) of any file 
opened for writing; the presence or absence (before execution) of any 
file checked for existence (stat) during the command.

  - When running a command, the tool records those conditions and if the 
command checks a file for existence or open a file for reading, the tool 
tries to build this file as a target, recursively.

The result of system calls are not modified, they are just intercepted 
to allow recording and intermediate compilation of other files.  The 
tool make the assumption that the behavior of the build commands only 
depend on the file existence/absence and content, not on extra meta-data 
(such as mtimes, environment variables, or the system date).

A simpler variant of the system was specified with an ordered set of 
commands to be executed in sequence, with the same cache behavior.  The 
benefit is that you don't need to tell the system about which files can 
be generated by each command.

And as said, even if you don't believe that is a viable approach for a 
robust build system, the same approach can be used to add extra checks 
to existing build system that they don't miss dependencies.

> This assumes that a C compiler won't read a directory (say to cache its
> contents) in order to check for the existence of a file. It is not
> exactly the most robust basis for a build system, which is probably why
> it is only a hypothetical build system.

Well, it worked very well for ocaml + gcc.  I could build non trivial 
code bases (CDuce + all its library dependencies), with extremely 
precise dynamic dependency analysis and without having to use ocamldep.

> This kind of behaviour already exists in OCaml. Consider this piece of code:
>
>      type t = Bar.t (* Bar only contains type definitions *)
>
> If you rename bar.mli to baz.mli but don't remove bar.cmi then it will
> continue to compile until you run "make clean".

Yes, and I see it as a problem.  I would actually prefer a system where 
one must pass explicitly to the compiler the list of files it can use, 
but this is not possible because of backward compatibility.  Since 
namespaces change the way OCaml interact with the file system anyway and 
we have this nice notion of explicitly listing available units in 
well-defined files (which can be used by other tools), I think it's a 
good opportunity to fix the existing problems (partially).

Moreover, since the relation between the name of compiled units and of 
source files will be less tightly coupled for namespaces files (because 
of "-name" or "-namespace"), the chances for facing difficult to track 
error messages will become higher.

> I really don't think that preventing a very unlikely scenario, which can
> already happen anyway, is a good reason to make namespaces significantly
> less convenient for the average user.

I don't think it will.  The average user who creates a library to be 
used by others will need to pick a good namespace name and list which 
files constitute the library.  At this point, writing an explicit 
.mlpath file does not add any burden (the same source of information can 
be used to define the content of the library).

-- Alain