[opam-devel] [ANN] opam-fmt 1.0
David Allsopp
dra-news at metastack.com
Mon Nov 23 13:15:29 GMT 2015
Tempting as it would be to spend the next couple of hours hacking it, my 2 euro-cents… why not just make the lexer in opam itself lossless, and expose that in the API? So every single token emitted by the lexer becomes a pair including the “whitespace” (= all whitespace, newlines, comments, etc.) which follow that token and any tokens which may be ambiguously lexed (e.g. integers where arbitrary zero prefixes or where bases are permitted) have to be deferred. All that’s needed then is a function which transforms these lossless tokens to the original lexer’s tokens (so dropping the whitespace/comment parts, possibly performing a few int_of_string and other related functions – but all aided by the type system) for plugging into the parsing stage. It’s even relatively transparent to have the “debugging” version of opam verify that the stream is actually reversible to the original file.
A task like this then uses the full-parsed power of opam-lib to identify opam files which need updating (as you are) and then a fairly simple state machine over the lossless lexer stream to update it.
I think that having a rigid format used for opam-repository is a good idea, but given that presumably that wouldn’t want to become mandatory for all repositories, being able to do formatting (and even version) preserving updates seems useful. Having effectively a richer lexer also means no fork required, so better maintainability, and also puts the onus of supporting the updating of an older version opam file on the script author rather than the opam dev team (as presumably OPAM will always remain capable of reading older format files??). Making the update be an FSM over raw lexer tokens is also slightly nicer and less error-prone than a regex approach.
David
From: opam-devel [mailto:opam-devel-bounces at lists.ocaml.org] On Behalf Of Gabriel Scherer
Sent: 23 November 2015 12:19
To: Thomas Gazagnaire
Cc: opam-devel
Subject: Re: [opam-devel] [ANN] opam-fmt 1.0
One idea I just had when re-reading the scavenged comments is to just consider comment lines as independent items. If you add ordering information to the internal opam representation to keep the items in the order, then I suspect that all the examples above would be reformatted in a way that is acceptable to the original author. One final bit of sophistication would be to store (in the comment items) a boolean for whether they started on their own line, or are to be inserted at the end of the previous line.
Of course this fake structure is less true to the real document structure than properly placed attribute nodes would be. It seems easier to implement (and thus a decent compromise to make), but then maybe not: keeping an ordering may be non-trivial for data whose current internal representation is not a set of things (for example dependency formulas are actually full logic formulas; but then if we keep to Conjunctive Normal Forms they can have a list-like representation).
On Mon, Nov 23, 2015 at 1:11 PM, Thomas Gazagnaire <thomas at gazagnaire.org<mailto:thomas at gazagnaire.org>> wrote:
Thanks Gabriel, that's very useful!
For the various parsers, it might make sense to simply fork (or copy/paste) bits of opam parsers from various versions of opam and try to fix these parsers to:
- keep comments
- support some kind of alignments?
Thomas
> On 22 Nov 2015, at 20:55, Gabriel Scherer <gabriel.scherer at gmail.com<mailto:gabriel.scherer at gmail.com>> wrote:
>
>> - Any chance you could use cmdliner instead of Arg? Fairly minor, but
>> all the other plugins use it and it's nice to have the same behaviours
>> for OPAM plugins where possible.
>
> I'm fond of Daniel's design work, so I would gladly move to
> Cmdliner -- Arg was just what I could easily use "in anger" for
> a first try.
>
>> - How does this behave on pre-1.2 files? I think it's about time that
>> we deprecate pre-1.2 so that we can have clean metadata standards
>> about the new fields such as dev-repo.
>
> opam-fmt updates older opam files to its own support version -- and
> refuses to work on newer files. Two things:
>
> I think it would be a better design to have a family of scripts
> opam-fmt-1.0, opam-fmt-1.1, opam-fmt-1.2 etc. and a "mother script"
> opam-fmt that calls the right reformatter according to the file
> version¹. However, that requires changes in the packaging of opam-lib,
> so that the package for distinct versions can be installed
> simultaneously (they would be separate packages
> opam-lib-VER, and ocamlfind packages as well).
>
> ¹: another option would be for opam-lib to also support pairs of parsing
> and printing functions for older format version specifically, but that
> is an invasive choice to make in a codebase. Right now there is a tiny
> bit of logic to know which fields are 1.0 or 1.1-specific, but this
> would be much more ambitious.
>
> There are various warnings implemented in opam-lib that could be emitted
> during the processing of files by opam-fmt -- they may be already
> available depending on the OPAMDEBUG variable or something, but an
> explicit support in the interface could be nice. When I tested
> reformatting opam-repository, I observed that a large part of its opam
> files raise such warnings (so indeed there seems to be a metadata
> problem in the repository today).
>
>> - Having a bot regularly go through and reformat the entire repository
>> also seems quite legit. The alternative would be to reformat at the
>> merge point, but this would require a staging branch.
>
> I'm not sure what you call "merge point"; my idea was to put the burden
> of reformatting onto users submitting PRs against the
> repository. (Regular reformatting are a sensible idea, but they run in
> the problem of loss of information, whether distributed manual
> reformatting keeps humans closer in the loop)
>
>> - I would really like opam-fmt to be lossless, so preserving comments
>> seems like an extension that we should implement. Ideally everyone
>> can just call it on their packages without thinking about it.
>
> I have mixed feelings about trying to be lossless. At the very least,
> one should recognize that setting this as a design goal would impose
> a significant burden on the developers of the parsing/printing functions
> in opam-lib.
>
> Some human choices (alignment of string fields for example) are rather
> difficult and fragile to recognize -- and they could complexify the
> codebase. Even for comments, right now you cannot tell to which
> configuration item an element is attached. There are several ways around
> this, which are interesting to consider but also involve a fair amount
> of work:
>
> - You could use ocamldoc-like placement rules: "always after the
> relevant field" (a first comment would be a file-global comment), or
> "either before or after, but an empty line between the comment and
> a non-relevant field"; this seems painful and not-that-easy to
> implement.
>
> - You could move to a docstring-like (or attribute-like) syntax where
> comments are explicitly attached to an AST node; from a language
> design point of view this would be my preference, but it may require
> a change in concrete syntax.
>
> - Finally, the choices you can make in this design space depend a lot on
> whether reformatting will be performed by humans or by bots. If your
> comments-attachment rules are obscure, humans have the opportunity to
> reformat, see that they got them wrong, and reiterate. Bots will just
> put stuff at the wrong place.
>
> I think that the people that maintain this corner of opam today are
> those that will pay the greater cost if "lossless" becomes a design
> goal, so it should be their choice to make.
>
>
> In the meantime, it would be interesting to have a look at how opam
> files in the repository actually use comments. With
>
> find packages -name 'opam' | xargs grep --color=always " #"
>
> I see 75 occurences of comments, 38 of which are just "TODO fixme". The
> 37 others seem rather interesting, below are a few representative examples:
>
> packages/arakoon/arakoon.1.8.6/opam:
>> "lwt" { = "2.4.8" } # 2.4.9 had an incompatible API change
>
> Having textual exaplanations for choice of bound is a reasonable
> use-case for attributes.
>
> packages/camlp4/camlp4.4.01/opam:
>> build: [] # dummy package
>
> This could be replaced by a dedicated note/comment field.
>
> packages/frama-c/frama-c.20150201/opam:
>> "lablgtk" { >= "2.18.2" } #for ocaml >= 4.02.1
>
> I don't understand the semantics of this one.
>
> packages/git/git.1.6.0/opam:
>> depopts: [
>> # --enable-mirage
>> "mirage-types-lwt"
>> "mirage-http"
>> "mirage-flow"
>> "channel"
>> # --enable-unix
>> "cohttp"
>> "conduit"
>> "base-unix"
>> ]
>
> This usage is very interesting, it seems to call for a hierarchy inside
> the "depopts" list (and "dependencies" as well, I suppose), with
> annotations on sub-groups of dependencies.
>
> packages/gsl/gsl.1.18.2/opam:
>> depends: [
>> "base-bigarray"
>> "camlp4"
>> "ocamlfind" {>= "1.3.1"}
>> # Included from _opam file
>> "conf-gsl"
>> ]
>
> I don't know what this comment means.
>
> packages/lz4/lz4.1.0.0/opam:
>> depexts: [
>> [["debian"] ["liblz4-dev"]]
>> # [["ubuntu"] ["liblz4-dev"]] reenable when CI updates its Ubuntu
>> [["source"] ["https://.../install.sh"]]
>> ]
>
> Again, this would require annotations.
>
> packages/ppx_deriving/ppx_deriving.0.3/opam:
>> build: [
>> # If there is no native dynlink, we can't use native builds
>> "ocaml" "pkg/build.ml<http://build.ml>" "native=true"
>> "native-dynlink=true"
>> ]
>
> packages/frama-c-e-acsl/frama-c-e-acsl.0.5/opam:
>> build: [
>> ["ocaml" "run_autoconf_if_needed.ml<http://run_autoconf_if_needed.ml>"] #when used in pinned mode the configure *can* not yet be generated
>> ["./configure" "--prefix" prefix]
>> [make]
>> ]
>
> packages/clangml/clangml.0.5.2/opam:
>> depexts: [
>> [["debian"] ["libboost-dev" "llvm-3.4-dev" "clang-3.4" "libclang-3.4-dev" "binutils-dev"]]
>> [["ubuntu"] ["libboost-dev" "llvm-3.4-dev" "clang-3.4" "libclang-3.4-dev" "binutils-dev"]]
>> [["gentoo"] ["dev-libs/boost" "sys-devel/llvm-3.4.1-r2" "sys-devel/clang-3.4.0-r100" "sys-devel/binutils"]]
>> # archlinux has no package providing llvm and clang 3.4.1
>> [["archlinux"] ["boost" "binutils"]]
>> ]
>
> packages/mtime/mtime.0.8.1/opam:
>> depends: [ "ocamlfind"
>> "js_of_ocaml" # FIXME should become a deptopt
>> ]
>
> On Sun, Nov 22, 2015 at 8:09 PM, Anil Madhavapeddy <anil at recoil.org<mailto:anil at recoil.org>> wrote:
>> Thanks for this Gabriel! Quick notes:
>>
>> - I would really like opam-fmt to be lossless, so preserving comments
>> seems like an extension that we should implement. Ideally everyone
>> can just call it on their packages without thinking about it.
>>
>> - Having a bot regularly go through and reformat the entire repository
>> also seems quite legit. The alternative would be to reformat at the
>> merge point, but this would require a staging branch.
>>
>> - Any chance you could use cmdliner instead of Arg? Fairly minor, but
>> all the other plugins use it and it's nice to have the same behaviours
>> for OPAM plugins where possible.
>>
>> - How does this behave on pre-1.2 files? I think it's about time that
>> we deprecate pre-1.2 so that we can have clean metadata standards
>> about the new fields such as dev-repo.
>>
>> regards,
>> Anil
>>
>>> On 21 Nov 2015, at 21:53, Gabriel Scherer <gabriel.scherer at gmail.com<mailto:gabriel.scherer at gmail.com>> wrote:
>>>
>>> Hi opam-devel,
>>>
>>> As part of the discussion in
>>>
>>> bulk addition of 'ocamlbuild {build}' dependencies
>>> https://github.com/ocaml/opam-repository/pull/5140
>>>
>>> it became apparent that performing bulk updates on opam-repository is
>>> made harder by the fact that the parse-print roundtrip does not
>>> preserve human-formatted opam files. For my proposed patch I tried to
>>> separate the reformatting of opam file (to follow the opam-lib printer
>>> convention) from the actual changes in two separate commits, but that
>>> means more work for script authors, and also creates patches that are
>>> harder to review. (If (re)formatting was controlled by the maintainer
>>> of the OPAM packages instead of authors of bulk updates, we would be
>>> more confident in its correctness.)
>>>
>>> In order to move that discussion forward (how to maintain opam
>>> metadata in a way that is easy for both human and scripts to work
>>> with?), I propose the opam-fmt script that can be found at
>>> https://github.com/gasche/opam-fmt/
>>>
>>> I wrote it in the last few days and there are probably some rough
>>> edges. Once I feel that it should work, I may try to package it on the
>>> opam-repo (in the meantime, clone then pin).
>>>
>>> This suggests one possible way forward: publicize opam-fmt, encourage
>>> users to reformat their opam files using it, adapt the opam-repository
>>> QA to call `opam fmt --check` on opam files proposed in PR to enforce
>>> it, and after some time (once we trust it works as expected thanks to
>>> the human guinea pigs) reformat all older opam files to get a perfect
>>> round-trip on all files of the repository.
>>> It is not clear to me that this is the best way forward. (For example
>>> it means that, in the current state of the opam file parsing/printing
>>> code, comments in opam files would always be erased by reformatting.
>>> Should we remove comments from the valid syntax of opam files, or
>>> attach them to configuration lines to re-print them correctly later,
>>> or maybe refuse to work on files with comments?) Opam developers and
>>> repository maintainers may decide that the cost of caring about
>>> reformatting outweigh the benefits in terms of scriptability.
>>>
>>> What do you think?
>>> _______________________________________________
>>> opam-devel mailing list
>>> opam-devel at lists.ocaml.org<mailto:opam-devel at lists.ocaml.org>
>>> http://lists.ocaml.org/listinfo/opam-devel
>>>
>>
> _______________________________________________
> opam-devel mailing list
> opam-devel at lists.ocaml.org<mailto:opam-devel at lists.ocaml.org>
> http://lists.ocaml.org/listinfo/opam-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ocaml.org/pipermail/opam-devel/attachments/20151123/f18dd6af/attachment-0001.html>
More information about the opam-devel
mailing list