[opam-devel] [ANN] opam-fmt 1.0

Tue Nov 24 10:46:08 GMT 2015

David > I'm worried that the scheme you propose could make writing the
transformations sensibly harder. Having the semantical (post-parsing)
representation of the OPAM file is necessary to decide whether/how to
transform, and this could still be done under your scheme, but then
after making a choice you would have to go back to the token
stream. Currently I use the fairly convenient
semantical-representation-update functions provided by the library:

  https://github.com/gasche/opam/blob/2a08373b2fce9de4fad4c3263dc781f6f1c46b11/admin-scripts/add_ocamlbuild_dependency.ml#L289-L291

    let opam = with_depends opam0 (add_ocamlbuild (depends opam0)) in
    if opam = opam0 then opam0
    else with_opam_version opam (OpamVersion.of_string "1.2")

and I'm not eager to move to a FSM design to express these
transforms. It could be done and I would be curious to see the result,
but myself I would rather aim for a less precise but
easier-to-describe transform, at the risk of losing some token
information.

> Another option would be to implement a format-preserving reprinter
> for bulk updates, that would keep the original fields order, and
> check every field for semantic changes before reprinting (we have
> precise file positions already, so we could extract the raw string
> to reprint when there are no changes).

With this solution, you still lose comments on the modified fields
(we've seen in the examples above that for examples there are comments
inside the depext or depopt fields). The general guarantee you provide
to users is still "best effort, comments may disappear".

> Attaching comments to the AST would be nice and allow to keep them
> even in rewritten fields, but require much more dev time.

I would suggest the following road ahead: encourage people to use
`opam-fmt` starting now, with the recognition that it will erase all
their comments. (They are erased from the opam files that are uploaded
in the opam repository, people are welcome to keep comments in the
opam files used to pin their development repository, which are the
living documents that they actually care about). Then incrementally
work towards a more information-rich parsing of files, with the goal
of eventually being to attach comments to a specific AST node. If your
suggestion above to preserve comments for unchanged toplevel fields is
the first incremental step you can afford to make, let's do it.

> Note 1: Gabriel, just making sure you are using opam trunk ? I rewrote big
> parts of OpamFormat/OpamFile recently (merged like one or two months ago).

No, I started with opam trunk but I reverted to opam-lib 1.2, because
the printer for opam trunk appears to be incompatible with the parser
for opam 1.2. See for example this printing issue spotted by Thomas:

  https://github.com/ocaml/opam-repository/pull/5140#issuecomment-156615500

It was not very hard to port and I can do it again in the other
direction (in fact I was thinking of porting the opam-fmt script to
any opam-lib version which files we need to keep supporting).

> Note 2: I think we can now go ahead and push bulk changes to the
> repo. It should only break `opam init` on 1.1, and I seriously doubt
> anyone will complain about it anymore. If the rewriting script
> breaks, we'll just have to freeze the 1.1 updates and update the
> documentation pages accordingly.

I need to re-run my script to take into account changes in the
repository since I sent it. I could port the script to 1.1 in passing,
to not force you to do a 1.1->1.2 move at the same time. (If this was
my project, I would ask a contributor to do so.) What do you think?

Another question is whether I should add some form of linting warning
for future opam-repository uploads, checking that the "ocamlbuild"
dependency is present when it should (using the same heuristic my
bulk-update script uses). Do you think that would be a good idea? One
reason why I refrained doing so is that my test needs to unpack the
archive and look at its content, so it's substantially slower than
just linting the `opam` file. But maybe you are ready to pay this
cost, or you already download the archive to check availability in any
case?

On Tue, Nov 24, 2015 at 2:57 AM, Louis Gesbert
<louis.gesbert at ocamlpro.com> wrote:
> Hopping in a bit late... :)
>
> Indeed, my first approach for bulk-updating opam files was to first normalise
> everything, so that diff could help me see what happened thereafter. This
> loses a lot of history, and is difficult to maintain afterwards -- but opam-
> fmt could help there.
>
> Another option would be to implement a format-preserving reprinter for bulk
> updates, that would keep the original fields order, and check every field for
> semantic changes before reprinting (we have precise file positions already, so
> we could extract the raw string to reprint when there are no changes). It
> shouldn't be too difficult to implement, doesn't require changes to the parser
> or format, and using this for bulk updates would already make them much more
> tolerable. Attaching comments to the AST would be nice and allow to keep them
> even in rewritten fields, but require much more dev time.
>
> Note 1: Gabriel, just making sure you are using opam trunk ? I rewrote big
> parts of OpamFormat/OpamFile recently (merged like one or two months ago).
>
> Note 2: I think we can now go ahead and push bulk changes to the repo. It
> should only break `opam init` on 1.1, and I seriously doubt anyone will
> complain about it anymore. If the rewriting script breaks, we'll just have to
> freeze the 1.1 updates and update the documentation pages accordingly.
>
> Best,
> Louis
>
> Le lundi 23 novembre 2015, 13:15:29 David Allsopp a écrit :
>> Tempting as it would be to spend the next couple of hours hacking it, my 2
>> euro-cents… why not just make the lexer in opam itself lossless, and expose
>> that in the API? So every single token emitted by the lexer becomes a pair
>> including the “whitespace” (= all whitespace, newlines, comments, etc.)
>> which follow that token and any tokens which may be ambiguously lexed (e.g.
>> integers where arbitrary zero prefixes or where bases are permitted) have
>> to be deferred. All that’s needed then is a function which transforms these
>> lossless tokens to the original lexer’s tokens (so dropping the
>> whitespace/comment parts, possibly performing a few int_of_string and other
>> related functions – but all aided by the type system) for plugging into the
>> parsing stage. It’s even relatively transparent to have the “debugging”
>> version of opam verify that the stream is actually reversible to the
>> original file.
>
>> A task like this then uses the full-parsed power of opam-lib to identify
>> opam files which need updating (as you are) and then a fairly simple state
>> machine over the lossless lexer stream to update it.
>
>> I think that having a rigid format used for opam-repository is a good idea,
>> but given that presumably that wouldn’t want to become mandatory for all
>> repositories, being able to do formatting (and even version) preserving
>> updates seems useful. Having effectively a richer lexer also means no fork
>> required, so better maintainability, and also puts the onus of supporting
>> the updating of an older version opam file on the script author rather than
>> the opam dev team (as presumably OPAM will always remain capable of reading
>> older format files??). Making the update be an FSM over raw lexer tokens is
>> also slightly nicer and less error-prone than a regex approach.
>
>>
>> David
>>
>> From: opam-devel [mailto:opam-devel-bounces at lists.ocaml.org] On Behalf Of
>> Gabriel Scherer
>  Sent: 23 November 2015 12:19
>> To: Thomas Gazagnaire
>> Cc: opam-devel
>> Subject: Re: [opam-devel] [ANN] opam-fmt 1.0
>>
>> One idea I just had when re-reading the scavenged comments is to just
>> consider comment lines as independent items. If you add ordering
>> information to the internal opam representation to keep the items in the
>> order, then I suspect that all the examples above would be reformatted in a
>> way that is acceptable to the original author. One final bit of
>> sophistication would be to store (in the comment items) a boolean for
>> whether they started on their own line, or are to be inserted at the end of
>> the previous line.
>  Of course this fake structure is less true to the real
>> document structure than properly placed attribute nodes would be. It seems
>> easier to implement (and thus a decent compromise to make), but then maybe
>> not: keeping an ordering may be non-trivial for data whose current internal
>> representation is not a set of things (for example dependency formulas are
>> actually full logic formulas; but then if we keep to Conjunctive Normal
>> Forms they can have a list-like representation).
>> On Mon, Nov 23, 2015 at 1:11 PM, Thomas Gazagnaire
>> <thomas at gazagnaire.org<mailto:thomas at gazagnaire.org>> wrote:
>  Thanks
>> Gabriel, that's very useful!
>>
>> For the various parsers, it might make sense to simply fork (or copy/paste)
>> bits of opam parsers from various versions of opam and try to fix these
>> parsers to:
>
>> - keep comments
>> - support some kind of alignments?
>>
>> Thomas
>>
>>
>>
>> > On 22 Nov 2015, at 20:55, Gabriel Scherer
>> > <gabriel.scherer at gmail.com<mailto:gabriel.scherer at gmail.com>> wrote:
>>
>> >
>> >
>> >> - Any chance you could use cmdliner instead of Arg?  Fairly minor, but
>> >>
>> >>  all the other plugins use it and it's nice to have the same behaviours
>> >>  for OPAM plugins where possible.
>> >
>> >
>> >
>> > I'm fond of Daniel's design work, so I would gladly move to
>> > Cmdliner -- Arg was just what I could easily use "in anger" for
>> > a first try.
>> >
>> >
>> >
>> >> - How does this behave on pre-1.2 files?  I think it's about time that
>> >>
>> >>  we deprecate pre-1.2 so that we can have clean metadata standards
>> >>  about the new fields such as dev-repo.
>> >
>> >
>> >
>> > opam-fmt updates older opam files to its own support version -- and
>> > refuses to work on newer files. Two things:
>> >
>> >
>> >
>> > I think it would be a better design to have a family of scripts
>> > opam-fmt-1.0, opam-fmt-1.1, opam-fmt-1.2 etc. and a "mother script"
>> > opam-fmt that calls the right reformatter according to the file
>> > version¹. However, that requires changes in the packaging of opam-lib,
>> > so that the package for distinct versions can be installed
>> > simultaneously (they would be separate packages
>> > opam-lib-VER, and ocamlfind packages as well).
>> >
>> >
>> >
>> > ¹: another option would be for opam-lib to also support pairs of parsing
>> > and printing functions for older format version specifically, but that
>> > is an invasive choice to make in a codebase. Right now there is a tiny
>> > bit of logic to know which fields are 1.0 or 1.1-specific, but this
>> > would be much more ambitious.
>> >
>> >
>> >
>> > There are various warnings implemented in opam-lib that could be emitted
>> > during the processing of files by opam-fmt -- they may be already
>> > available depending on the OPAMDEBUG variable or something, but an
>> > explicit support in the interface could be nice. When I tested
>> > reformatting opam-repository, I observed that a large part of its opam
>> > files raise such warnings (so indeed there seems to be a metadata
>> > problem in the repository today).
>> >
>> >
>> >
>> >> - Having a bot regularly go through and reformat the entire repository
>> >>
>> >>  also seems quite legit.  The alternative would be to reformat at the
>> >>  merge point, but this would require a staging branch.
>> >
>> >
>> >
>> > I'm not sure what you call "merge point"; my idea was to put the burden
>> > of reformatting onto users submitting PRs against the
>> > repository. (Regular reformatting are a sensible idea, but they run in
>> > the problem of loss of information, whether distributed manual
>> > reformatting keeps humans closer in the loop)
>> >
>> >
>> >
>> >> - I would really like opam-fmt to be lossless, so preserving comments
>> >>
>> >>  seems like an extension that we should implement.  Ideally everyone
>> >>  can just call it on their packages without thinking about it.
>> >
>> >
>> >
>> > I have mixed feelings about trying to be lossless. At the very least,
>> > one should recognize that setting this as a design goal would impose
>> > a significant burden on the developers of the parsing/printing functions
>> > in opam-lib.
>> >
>> >
>> >
>> > Some human choices (alignment of string fields for example) are rather
>> > difficult and fragile to recognize -- and they could complexify the
>> > codebase. Even for comments, right now you cannot tell to which
>> > configuration item an element is attached. There are several ways around
>> > this, which are interesting to consider but also involve a fair amount
>> > of work:
>> >
>> >
>> >
>> > - You could use ocamldoc-like placement rules: "always after the
>> >
>> >  relevant field" (a first comment would be a file-global comment), or
>> >  "either before or after, but an empty line between the comment and
>> >  a non-relevant field"; this seems painful and not-that-easy to
>> >  implement.
>> >
>> >
>> >
>> > - You could move to a docstring-like (or attribute-like) syntax where
>> >
>> >  comments are explicitly attached to an AST node; from a language
>> >  design point of view this would be my preference, but it may require
>> >  a change in concrete syntax.
>> >
>> >
>> >
>> > - Finally, the choices you can make in this design space depend a lot on
>> >
>> >  whether reformatting will be performed by humans or by bots. If your
>> >  comments-attachment rules are obscure, humans have the opportunity to
>> >  reformat, see that they got them wrong, and reiterate. Bots will just
>> >  put stuff at the wrong place.
>> >
>> >
>> >
>> > I think that the people that maintain this corner of opam today are
>> > those that will pay the greater cost if "lossless" becomes a design
>> > goal, so it should be their choice to make.
>> >
>> >
>> >
>> >
>> > In the meantime, it would be interesting to have a look at how opam
>> > files in the repository actually use comments. With
>> >
>> >
>> >
>> >  find packages -name 'opam' | xargs grep --color=always " #"
>> >
>> >
>> >
>> > I see 75 occurences of comments, 38 of which are just "TODO fixme". The
>> > 37 others seem rather interesting, below are a few representative
>> > examples:
>>
>> >
>> >
>> > packages/arakoon/arakoon.1.8.6/opam:
>> >
>> >>  "lwt" { = "2.4.8" }  # 2.4.9 had an incompatible API change
>> >
>> >
>> >
>> > Having textual exaplanations for choice of bound is a reasonable
>> > use-case for attributes.
>> >
>> >
>> >
>> > packages/camlp4/camlp4.4.01/opam:
>> >
>> >>  build: [] # dummy package
>> >
>> >
>> >
>> > This could be replaced by a dedicated note/comment field.
>> >
>> >
>> >
>> > packages/frama-c/frama-c.20150201/opam:
>> >
>> >>  "lablgtk" { >= "2.18.2" } #for ocaml >= 4.02.1
>> >
>> >
>> >
>> > I don't understand the semantics of this one.
>> >
>> >
>> >
>> > packages/git/git.1.6.0/opam:
>> >
>> >>  depopts: [
>> >>
>> >>    # --enable-mirage
>> >>    "mirage-types-lwt"
>> >>    "mirage-http"
>> >>    "mirage-flow"
>> >>    "channel"
>> >>    # --enable-unix
>> >>    "cohttp"
>> >>    "conduit"
>> >>    "base-unix"
>> >>
>> >>  ]
>> >
>> >
>> >
>> > This usage is very interesting, it seems to call for a hierarchy inside
>> > the "depopts" list (and "dependencies" as well, I suppose), with
>> > annotations on sub-groups of dependencies.
>> >
>> >
>> >
>> > packages/gsl/gsl.1.18.2/opam:
>> >
>> >>  depends: [
>> >>
>> >>    "base-bigarray"
>> >>    "camlp4"
>> >>    "ocamlfind" {>= "1.3.1"}
>> >>    # Included from _opam file
>> >>    "conf-gsl"
>> >>
>> >>  ]
>> >
>> >
>> >
>> > I don't know what this comment means.
>> >
>> >
>> >
>> > packages/lz4/lz4.1.0.0/opam:
>> >
>> >> depexts: [
>> >>
>> >>  [["debian"] ["liblz4-dev"]]
>> >>  # [["ubuntu"] ["liblz4-dev"]] reenable when CI updates its Ubuntu
>> >>  [["source"] ["https://.../install.sh"]]
>> >>
>> >> ]
>> >
>> >
>> >
>> > Again, this would require annotations.
>> >
>> >
>> >
>> > packages/ppx_deriving/ppx_deriving.0.3/opam:
>> >
>> >> build: [
>> >>
>> >>  # If there is no native dynlink, we can't use native builds
>> >>  "ocaml" "pkg/build.ml<http://build.ml>" "native=true"
>> >>
>> >>                         "native-dynlink=true"
>> >>
>> >> ]
>> >
>> >
>> >
>> > packages/frama-c-e-acsl/frama-c-e-acsl.0.5/opam:
>> >
>> >> build: [
>> >>
>> >>  ["ocaml" "run_autoconf_if_needed.ml<http://run_autoconf_if_needed.ml>"]
>> >>  #when used in pinned mode the configure *can* not yet be generated
>> >>  ["./configure" "--prefix" prefix]
>> >>  [make]
>> >>
>> >> ]
>> >
>> >
>> >
>> > packages/clangml/clangml.0.5.2/opam:
>> >
>> >> depexts: [
>> >>
>> >>  [["debian"] ["libboost-dev" "llvm-3.4-dev" "clang-3.4"
>> >>  "libclang-3.4-dev" "binutils-dev"]]
>  [["ubuntu"] ["libboost-dev"
>> >>  "llvm-3.4-dev" "clang-3.4" "libclang-3.4-dev" "binutils-dev"]]
>> >>  [["gentoo"] ["dev-libs/boost" "sys-devel/llvm-3.4.1-r2"
>> >>  "sys-devel/clang-3.4.0-r100" "sys-devel/binutils"]] # archlinux has no
>> >>  package providing llvm and clang 3.4.1
>> >>  [["archlinux"] ["boost" "binutils"]]
>> >>
>> >> ]
>> >
>> >
>> >
>> > packages/mtime/mtime.0.8.1/opam:
>> >
>> >> depends: [ "ocamlfind"
>> >>
>> >>           "js_of_ocaml" # FIXME should become a deptopt
>> >>
>> >> ]
>> >
>> >
>> >
>> > On Sun, Nov 22, 2015 at 8:09 PM, Anil Madhavapeddy
>> > <anil at recoil.org<mailto:anil at recoil.org>> wrote:
>>
>> >> Thanks for this Gabriel! Quick notes:
>> >>
>> >>
>> >>
>> >> - I would really like opam-fmt to be lossless, so preserving comments
>> >>
>> >>  seems like an extension that we should implement.  Ideally everyone
>> >>  can just call it on their packages without thinking about it.
>> >>
>> >>
>> >>
>> >> - Having a bot regularly go through and reformat the entire repository
>> >>
>> >>  also seems quite legit.  The alternative would be to reformat at the
>> >>  merge point, but this would require a staging branch.
>> >>
>> >>
>> >>
>> >> - Any chance you could use cmdliner instead of Arg?  Fairly minor, but
>> >>
>> >>  all the other plugins use it and it's nice to have the same behaviours
>> >>  for OPAM plugins where possible.
>> >>
>> >>
>> >>
>> >> - How does this behave on pre-1.2 files?  I think it's about time that
>> >>
>> >>  we deprecate pre-1.2 so that we can have clean metadata standards
>> >>  about the new fields such as dev-repo.
>> >>
>> >>
>> >>
>> >> regards,
>> >> Anil
>> >>
>> >>
>> >>
>> >>> On 21 Nov 2015, at 21:53, Gabriel Scherer
>> >>> <gabriel.scherer at gmail.com<mailto:gabriel.scherer at gmail.com>> wrote:
>>>>
>> >>>
>> >>>
>> >>> Hi opam-devel,
>> >>>
>> >>>
>> >>>
>> >>> As part of the discussion in
>> >>>
>> >>>
>> >>>
>> >>> bulk addition of 'ocamlbuild {build}' dependencies
>> >>> https://github.com/ocaml/opam-repository/pull/5140
>> >>>
>> >>>
>> >>>
>> >>> it became apparent that performing bulk updates on opam-repository is
>> >>> made harder by the fact that the parse-print roundtrip does not
>> >>> preserve human-formatted opam files. For my proposed patch I tried to
>> >>> separate the reformatting of opam file (to follow the opam-lib printer
>> >>> convention) from the actual changes in two separate commits, but that
>> >>> means more work for script authors, and also creates patches that are
>> >>> harder to review. (If (re)formatting was controlled by the maintainer
>> >>> of the OPAM packages instead of authors of bulk updates, we would be
>> >>> more confident in its correctness.)
>> >>>
>> >>>
>> >>>
>> >>> In order to move that discussion forward (how to maintain opam
>> >>> metadata in a way that is easy for both human and scripts to work
>> >>> with?), I propose the opam-fmt script that can be found at
>> >>> https://github.com/gasche/opam-fmt/
>> >>>
>> >>>
>> >>>
>> >>> I wrote it in the last few days and there are probably some rough
>> >>> edges. Once I feel that it should work, I may try to package it on the
>> >>> opam-repo (in the meantime, clone then pin).
>> >>>
>> >>>
>> >>>
>> >>> This suggests one possible way forward: publicize opam-fmt, encourage
>> >>> users to reformat their opam files using it, adapt the opam-repository
>> >>> QA to call `opam fmt --check` on opam files proposed in PR to enforce
>> >>> it, and after some time (once we trust it works as expected thanks to
>> >>> the human guinea pigs) reformat all older opam files to get a perfect
>> >>> round-trip on all files of the repository.
>> >>> It is not clear to me that this is the best way forward. (For example
>> >>> it means that, in the current state of the opam file parsing/printing
>> >>> code, comments in opam files would always be erased by reformatting.
>> >>> Should we remove comments from the valid syntax of opam files, or
>> >>> attach them to configuration lines to re-print them correctly later,
>> >>> or maybe refuse to work on files with comments?) Opam developers and
>> >>> repository maintainers may decide that the cost of caring about
>> >>> reformatting outweigh the benefits in terms of scriptability.
>> >>>
>> >>>
>> >>>
>> >>> What do you think?
>> >>> _______________________________________________
>> >>> opam-devel mailing list
>> >>> opam-devel at lists.ocaml.org<mailto:opam-devel at lists.ocaml.org>
>> >>> http://lists.ocaml.org/listinfo/opam-devel
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> > _______________________________________________
>> > opam-devel mailing list
>> > opam-devel at lists.ocaml.org<mailto:opam-devel at lists.ocaml.org>
>> > http://lists.ocaml.org/listinfo/opam-devel
>>
>>