[ocaml-platform] Unique file names

Gabriel Scherer gabriel.scherer at gmail.com
Tue Mar 5 07:16:35 GMT 2013


On Sun, Mar 3, 2013 at 4:59 PM, Gabriel Scherer
<gabriel.scherer at gmail.com> wrote:
> Regarding aspect (1), we discussed enriching the compilation unit with
> information not coming from the filename itself (eg. the "provenance
> field" in spec.pdf). The most general situation is to consider that an
> internal prefix (or suffix) can be passed to the compiler at
> invocation time, as is currently used -for-pack (you're then free to
> express directory-local policies, random choices, interface hashes or
> whatever on top of this very primitive idea).
>
> This however opens the door for more things to work out that Alain's
> minimalistic proposal avoided. [...]

On Mon, Mar 4, 2013 at 4:18 PM, Romain Bardou <romain.bardou at inria.fr> wrote:
> I'm sure the possibility of a "-unit-name-prefix video3d"
> compilation flag has been discussed already. I'll go read and see
> what are the counter-arguments...

On Tue, Mar 5, 2013 at 3:48 AM, Yaron Minsky <yminsky at janestreet.com> wrote:
> Could all of these benefits be obtained by having longer names for
> files generated as part of the compilation (.cmx, .cmo, .cmi, etc.),
> but keep the source file names short?  i.e., one could imagine that if
> you put "-put-in-namespace core" on the command-line to ocamlc, it
> would generate "core_list.ml" when given the file "list.ml".  The onus
> would be on the build system to provide said flags, but it would seem
> to simplify things thereupon.

This idea of enriching internal module names with a suffix passed at
compilation time is discussed in the
http://gallium.inria.fr/~scherer/namespaces/spec.pdf document, in the
"Compilation unit information" section:

> The changes proposed so far essentially allow to solve the problems
> evoked in the introduction, under the assumption that programmers
> were to use long, hopefully-unique module names. We are not
> satisfied with this as the only way to avoid name clashes for two
> different reasons:
>
>   - It does not coincide with the development techniques of most
>   OCaml users right now, which prefer small, readable source file
>   names. Helping users to refer to two different modules compiled
>   from the same filename doesn't actually do any good if the OCaml
>   linker rejects them.
>
>   - A naming strategy is not enough to ensure uniqueness in some
>   fairly realistic scenarios, for example if an user needs to use
>   two different versions of the same library in the same program. We
>   hinted at this use case with our Jenny#List and Jenny#ListDev
>   example: it is a very plausible need, and the "long fillenames"
>   approach to avoid clashes would impose to rename all the files of
>   one of the version, which is impractical.
>
> We therefore suggest the addition of two different information
> fields to the meta-data of compiled units:
>
>   - An additional suffix field that would be part of the identity of
>   the compilation unit along with the original file name (in compiled
>   object code), and could be arbitrary data (for example a hash, or
>   randomly-generated unique identifier) with strong unicity
>   guarantees.
>
>   - An additional provenance field that would provide provenance
>   information to help humans distinguish two compilation units
>   (coming from sources with the same lename).
>
>  Our suggestion for the suffix field would be to let the user
> optionally force its value, and otherwise pick a reasonable strategy
> that tries to ensure that two independent developers choosing the
> same le name do not result in a module identity clash situation at
> link-time. For example, [..]

Several ways to choose this suffix (or prefix) are discussed in the
document, notably using interface hashes and random seed generation --
Leo's original "in Foo.Bar" in-source construct would be another way
to do that. None of the proposals are convincing enough to be
specified as the "only way to do it", so manual specification of
suffixes still needs to be possible.

The "provenance" field originally comes from a design of Fabrice Le
Fessant, where a java-like ownership URL, eg. janestreet.core.foo or
ocamlpro.opam.blah, would be used *both* as more-unique information in
the internal name, and as a directive of how to populate the initial
namespace (Janestreet#Core#Foo) to be respected by whatever default
namespace construction choice is made (eg. put it in the right
directory if we choose recursive directory scanning).

I personally suspect it's better in the long run to separate the
internal name and linking aspect on one hand, and the in-source
compilation unit name on the other, and this is reflected in the way
provenance is described in spec.pdf. But of course this has drawbacks
already mentioned here that justify Alain's insistence on
hopefully-unique filenames as the mechanism to avoid internal name
conflicts.

On Tue, Mar 5, 2013 at 3:50 AM, Yaron Minsky <yminsky at janestreet.com> wrote:
> On Mon, Mar 4, 2013 at 9:48 PM, Yaron Minsky <yminsky at janestreet.com> wrote:
>> On Mon, Mar 4, 2013 at 7:19 AM, Alain Frisch <alain.frisch at lexifi.com> wrote:
>>> On 03/03/2013 04:59 PM, Gabriel Scherer wrote:
>>>>
>>>> Regarding aspect (2), having filename conflicts is not an issue if
>>>> your namespace language is rich enough to let you map two different
>>>> identifiers to two compunits of the same filename.
>>>
>>>
>>> I did not really specify this part.  As you mention, one can decide that .ns
>>> files maps names to filenames (probably using relative path to the .ns
>>> file).
>>>
>>> That said, I see it as an advantage to have globally unique filenames, for
>>> various reasons.  I admit it is an annoyance for library writers, but I
>>> believe it is a tiny one compared to the benefits of this solution.  (As a
>>> minor point, my emacs is perfectly fine with autocompleting filename based
>>> on a suffix following an underscore character; e.g. if if type Ctrl-X-F,
>>> then "_pricing", then TAB, it autocompletes it to apropos_pricing_result.ml.
>>> I don't know if this is the default behavior of requires custom
>>> configuration, though.)
>>>
>>> Some benefits of globally unique filenames:
>>>
>>>  - Provide an unambiguous way to refer to a specific compilation unit, in a
>>> way which only depends on the search path.  While I agree that having
>>> shorter names "by default" in the documentation or error messages is nice,
>>> it is also good to support fully explicit outputs, and it is better if it
>>> does not depend too much on the environment.
>>>
>>>  - A corollary to the previous point is to simplify the interface between
>>> "ocamldep -modules" and the build system.
>>>
>>>  - Avoid the need to open compiled files to see "in which namespace" they
>>> are -- or what their internal name is -- and related problems, which we have
>>> discussed on this list.
>>>
>>>  - Support a deployment scenario where files from many libraries are
>>> installed in a single directory.
>>>
>>>  - Never force library users to use namespaces if they are fine with using
>>> long names (maybe with the help of a local manual renaming feature).
>>
>> Could all of these benefits be obtained by having longer names for
>> files generated as part of the compilation (.cmx, .cmo, .cmi, etc.),
>> but keep the source file names short?  i.e., one could imagine that if
>> you put "-put-in-namespace core" on the command-line to ocamlc, it
>> would generate "core_list.ml" when given the file "list.ml".  The onus
>> would be on the build system to provide said flags, but it would seem
>> to simplify things thereupon.
>
> Apologies: Catching up on the thread, this seems essentially the same
> as the -unit-name-prefix proposal that Romain mentions.
>
>>>> Note that Alain's proposal also map the namespace names to .ns
>>>> filenames. My guess would be that if we today recognize that binding
>>>> compunit source-names to their filenames is a mistake, we may regret
>>>> making the exact same choice for the upper-level construct in the
>>>> future.
>>>
>>>
>>> An alternative could be to specify on the command-line a set of .ns files,
>>> each of them containing a definition of the "namespace" it defines (the name
>>> used in the source code).  This would also support defining several
>>> "namespaces" in a single .ns file.  There was a strong resistance to
>>> specifying .ns files to be "opened" on the command-line, but here we're not
>>> talking about opening namespaces, just specifying them to the compiler.
>>
>> As one of the people who objected to "opening" from the command line,
>> I have no objection to this.
>>
>>> In this variant of the proposal, we could almost get rid of -I flags for
>>> deployed libraries.
>>>
>>>
>>>
>>> Alain
>>>
>>> _______________________________________________
>>> Platform mailing list
>>> Platform at lists.ocaml.org
>>> http://lists.ocaml.org/listinfo/platform


More information about the Platform mailing list