[opam-devel] [ANN] opam-fmt 1.0

Wed Nov 25 02:37:53 GMT 2015

I took a couple hours to implement the by-field format preserving printer in 
opam, and it works quite well in practice.

See a first PR here: https://github.com/ocaml/opam-repository/pull/5176/files

I think this makes bulk updates much more acceptable.
I'll push the new opam-admin.top code shortly.

Le mardi 24 novembre 2015, 10:57:00 Louis Gesbert a écrit :
> Hopping in a bit late... :)
> 
> Indeed, my first approach for bulk-updating opam files was to first
> normalise everything, so that diff could help me see what happened
> thereafter. This loses a lot of history, and is difficult to maintain
> afterwards -- but opam- fmt could help there.
> 
> Another option would be to implement a format-preserving reprinter for bulk
> updates, that would keep the original fields order, and check every field
> for semantic changes before reprinting (we have precise file positions
> already, so we could extract the raw string to reprint when there are no
> changes). It shouldn't be too difficult to implement, doesn't require
> changes to the parser or format, and using this for bulk updates would
> already make them much more tolerable. Attaching comments to the AST would
> be nice and allow to keep them even in rewritten fields, but require much
> more dev time.
> 
> Note 1: Gabriel, just making sure you are using opam trunk ? I rewrote big
> parts of OpamFormat/OpamFile recently (merged like one or two months ago).
> 
> Note 2: I think we can now go ahead and push bulk changes to the repo. It
> should only break `opam init` on 1.1, and I seriously doubt anyone will
> complain about it anymore. If the rewriting script breaks, we'll just have
> to freeze the 1.1 updates and update the documentation pages accordingly.
> 
> Best,
> Louis
> 
> Le lundi 23 novembre 2015, 13:15:29 David Allsopp a écrit :
> > Tempting as it would be to spend the next couple of hours hacking it, my 2
> > euro-cents… why not just make the lexer in opam itself lossless, and
> > expose
> > that in the API? So every single token emitted by the lexer becomes a pair
> > including the “whitespace” (= all whitespace, newlines, comments, etc.)
> > which follow that token and any tokens which may be ambiguously lexed
> > (e.g.
> > integers where arbitrary zero prefixes or where bases are permitted) have
> > to be deferred. All that’s needed then is a function which transforms
> > these
> > lossless tokens to the original lexer’s tokens (so dropping the
> > whitespace/comment parts, possibly performing a few int_of_string and
> > other
> > related functions – but all aided by the type system) for plugging into
> > the
> > parsing stage. It’s even relatively transparent to have the “debugging”
> > version of opam verify that the stream is actually reversible to the
> > original file.
> > 
> > A task like this then uses the full-parsed power of opam-lib to identify
> > opam files which need updating (as you are) and then a fairly simple state
> > machine over the lossless lexer stream to update it.
> > 
> > I think that having a rigid format used for opam-repository is a good
> > idea,
> > but given that presumably that wouldn’t want to become mandatory for all
> > repositories, being able to do formatting (and even version) preserving
> > updates seems useful. Having effectively a richer lexer also means no fork
> > required, so better maintainability, and also puts the onus of supporting
> > the updating of an older version opam file on the script author rather
> > than
> > the opam dev team (as presumably OPAM will always remain capable of
> > reading
> > older format files??). Making the update be an FSM over raw lexer tokens
> > is
> > also slightly nicer and less error-prone than a regex approach.
> > 
> > 
> > David
> > 
> > From: opam-devel [mailto:opam-devel-bounces at lists.ocaml.org] On Behalf Of
> > Gabriel Scherer
> 
>  Sent: 23 November 2015 12:19
> 
> > To: Thomas Gazagnaire
> > Cc: opam-devel
> > Subject: Re: [opam-devel] [ANN] opam-fmt 1.0
> > 
> > One idea I just had when re-reading the scavenged comments is to just
> > consider comment lines as independent items. If you add ordering
> > information to the internal opam representation to keep the items in the
> > order, then I suspect that all the examples above would be reformatted in
> > a
> > way that is acceptable to the original author. One final bit of
> > sophistication would be to store (in the comment items) a boolean for
> > whether they started on their own line, or are to be inserted at the end
> > of
> > the previous line.
> 
>  Of course this fake structure is less true to the real
> 
> > document structure than properly placed attribute nodes would be. It seems
> > easier to implement (and thus a decent compromise to make), but then maybe
> > not: keeping an ordering may be non-trivial for data whose current
> > internal
> > representation is not a set of things (for example dependency formulas are
> > actually full logic formulas; but then if we keep to Conjunctive Normal
> > Forms they can have a list-like representation).
> > On Mon, Nov 23, 2015 at 1:11 PM, Thomas Gazagnaire
> 
> > <thomas at gazagnaire.org<mailto:thomas at gazagnaire.org>> wrote:
>  Thanks
> 
> > Gabriel, that's very useful!
> > 
> > For the various parsers, it might make sense to simply fork (or
> > copy/paste)
> > bits of opam parsers from various versions of opam and try to fix these
> > parsers to:
> > 
> > - keep comments
> > - support some kind of alignments?
> > 
> > Thomas
> > 
> > > On 22 Nov 2015, at 20:55, Gabriel Scherer
> > > 
> > > <gabriel.scherer at gmail.com<mailto:gabriel.scherer at gmail.com>> wrote:
> > >> - Any chance you could use cmdliner instead of Arg?  Fairly minor, but
> > >> 
> > >>  all the other plugins use it and it's nice to have the same behaviours
> > >>  for OPAM plugins where possible.
> > > 
> > > I'm fond of Daniel's design work, so I would gladly move to
> > > Cmdliner -- Arg was just what I could easily use "in anger" for
> > > a first try.
> > > 
> > >> - How does this behave on pre-1.2 files?  I think it's about time that
> > >> 
> > >>  we deprecate pre-1.2 so that we can have clean metadata standards
> > >>  about the new fields such as dev-repo.
> > > 
> > > opam-fmt updates older opam files to its own support version -- and
> > > refuses to work on newer files. Two things:
> > > 
> > > 
> > > 
> > > I think it would be a better design to have a family of scripts
> > > opam-fmt-1.0, opam-fmt-1.1, opam-fmt-1.2 etc. and a "mother script"
> > > opam-fmt that calls the right reformatter according to the file
> > > version¹. However, that requires changes in the packaging of opam-lib,
> > > so that the package for distinct versions can be installed
> > > simultaneously (they would be separate packages
> > > opam-lib-VER, and ocamlfind packages as well).
> > > 
> > > 
> > > 
> > > ¹: another option would be for opam-lib to also support pairs of parsing
> > > and printing functions for older format version specifically, but that
> > > is an invasive choice to make in a codebase. Right now there is a tiny
> > > bit of logic to know which fields are 1.0 or 1.1-specific, but this
> > > would be much more ambitious.
> > > 
> > > 
> > > 
> > > There are various warnings implemented in opam-lib that could be emitted
> > > during the processing of files by opam-fmt -- they may be already
> > > available depending on the OPAMDEBUG variable or something, but an
> > > explicit support in the interface could be nice. When I tested
> > > reformatting opam-repository, I observed that a large part of its opam
> > > files raise such warnings (so indeed there seems to be a metadata
> > > problem in the repository today).
> > > 
> > >> - Having a bot regularly go through and reformat the entire repository
> > >> 
> > >>  also seems quite legit.  The alternative would be to reformat at the
> > >>  merge point, but this would require a staging branch.
> > > 
> > > I'm not sure what you call "merge point"; my idea was to put the burden
> > > of reformatting onto users submitting PRs against the
> > > repository. (Regular reformatting are a sensible idea, but they run in
> > > the problem of loss of information, whether distributed manual
> > > reformatting keeps humans closer in the loop)
> > > 
> > >> - I would really like opam-fmt to be lossless, so preserving comments
> > >> 
> > >>  seems like an extension that we should implement.  Ideally everyone
> > >>  can just call it on their packages without thinking about it.
> > > 
> > > I have mixed feelings about trying to be lossless. At the very least,
> > > one should recognize that setting this as a design goal would impose
> > > a significant burden on the developers of the parsing/printing functions
> > > in opam-lib.
> > > 
> > > 
> > > 
> > > Some human choices (alignment of string fields for example) are rather
> > > difficult and fragile to recognize -- and they could complexify the
> > > codebase. Even for comments, right now you cannot tell to which
> > > configuration item an element is attached. There are several ways around
> > > this, which are interesting to consider but also involve a fair amount
> > > of work:
> > > 
> > > 
> > > 
> > > - You could use ocamldoc-like placement rules: "always after the
> > > 
> > >  relevant field" (a first comment would be a file-global comment), or
> > >  "either before or after, but an empty line between the comment and
> > >  a non-relevant field"; this seems painful and not-that-easy to
> > >  implement.
> > > 
> > > - You could move to a docstring-like (or attribute-like) syntax where
> > > 
> > >  comments are explicitly attached to an AST node; from a language
> > >  design point of view this would be my preference, but it may require
> > >  a change in concrete syntax.
> > > 
> > > - Finally, the choices you can make in this design space depend a lot on
> > > 
> > >  whether reformatting will be performed by humans or by bots. If your
> > >  comments-attachment rules are obscure, humans have the opportunity to
> > >  reformat, see that they got them wrong, and reiterate. Bots will just
> > >  put stuff at the wrong place.
> > > 
> > > I think that the people that maintain this corner of opam today are
> > > those that will pay the greater cost if "lossless" becomes a design
> > > goal, so it should be their choice to make.
> > > 
> > > 
> > > 
> > > 
> > > In the meantime, it would be interesting to have a look at how opam
> > > files in the repository actually use comments. With
> > > 
> > >  find packages -name 'opam' | xargs grep --color=always " #"
> > > 
> > > I see 75 occurences of comments, 38 of which are just "TODO fixme". The
> > > 37 others seem rather interesting, below are a few representative
> > > examples:
> > > 
> > > packages/arakoon/arakoon.1.8.6/opam:
> > >>  "lwt" { = "2.4.8" }  # 2.4.9 had an incompatible API change
> > > 
> > > Having textual exaplanations for choice of bound is a reasonable
> > > use-case for attributes.
> > > 
> > > packages/camlp4/camlp4.4.01/opam:
> > >>  build: [] # dummy package
> > > 
> > > This could be replaced by a dedicated note/comment field.
> > > 
> > > packages/frama-c/frama-c.20150201/opam:
> > >>  "lablgtk" { >= "2.18.2" } #for ocaml >= 4.02.1
> > > 
> > > I don't understand the semantics of this one.
> > > 
> > > packages/git/git.1.6.0/opam:
> > >>  depopts: [
> > >>  
> > >>    # --enable-mirage
> > >>    "mirage-types-lwt"
> > >>    "mirage-http"
> > >>    "mirage-flow"
> > >>    "channel"
> > >>    # --enable-unix
> > >>    "cohttp"
> > >>    "conduit"
> > >>    "base-unix"
> > >>  
> > >>  ]
> > > 
> > > This usage is very interesting, it seems to call for a hierarchy inside
> > > the "depopts" list (and "dependencies" as well, I suppose), with
> > > annotations on sub-groups of dependencies.
> > > 
> > > packages/gsl/gsl.1.18.2/opam:
> > >>  depends: [
> > >>  
> > >>    "base-bigarray"
> > >>    "camlp4"
> > >>    "ocamlfind" {>= "1.3.1"}
> > >>    # Included from _opam file
> > >>    "conf-gsl"
> > >>  
> > >>  ]
> > > 
> > > I don't know what this comment means.
> > > 
> > > packages/lz4/lz4.1.0.0/opam:
> > >> depexts: [
> > >> 
> > >>  [["debian"] ["liblz4-dev"]]
> > >>  # [["ubuntu"] ["liblz4-dev"]] reenable when CI updates its Ubuntu
> > >>  [["source"] ["https://.../install.sh"]]
> > >> 
> > >> ]
> > > 
> > > Again, this would require annotations.
> > > 
> > > packages/ppx_deriving/ppx_deriving.0.3/opam:
> > >> build: [
> > >> 
> > >>  # If there is no native dynlink, we can't use native builds
> > >>  "ocaml" "pkg/build.ml<http://build.ml>" "native=true"
> > >>  
> > >>                         "native-dynlink=true"
> > >> 
> > >> ]
> > > 
> > > packages/frama-c-e-acsl/frama-c-e-acsl.0.5/opam:
> > >> build: [
> > >> 
> > >>  ["ocaml"
> > >>  "run_autoconf_if_needed.ml<http://run_autoconf_if_needed.ml>"]
> > >>  #when used in pinned mode the configure *can* not yet be generated
> > >>  ["./configure" "--prefix" prefix]
> > >>  [make]
> > >> 
> > >> ]
> > > 
> > > packages/clangml/clangml.0.5.2/opam:
> > >> depexts: [
> > >> 
> > >>  [["debian"] ["libboost-dev" "llvm-3.4-dev" "clang-3.4"
> > >>  "libclang-3.4-dev" "binutils-dev"]]
> 
>  [["ubuntu"] ["libboost-dev"
> 
> > >>  "llvm-3.4-dev" "clang-3.4" "libclang-3.4-dev" "binutils-dev"]]
> > >>  [["gentoo"] ["dev-libs/boost" "sys-devel/llvm-3.4.1-r2"
> > >>  "sys-devel/clang-3.4.0-r100" "sys-devel/binutils"]] # archlinux has no
> > >>  package providing llvm and clang 3.4.1
> > >>  [["archlinux"] ["boost" "binutils"]]
> > >> 
> > >> ]
> > > 
> > > packages/mtime/mtime.0.8.1/opam:
> > >> depends: [ "ocamlfind"
> > >> 
> > >>           "js_of_ocaml" # FIXME should become a deptopt
> > >> 
> > >> ]
> > > 
> > > On Sun, Nov 22, 2015 at 8:09 PM, Anil Madhavapeddy
> > > 
> > > <anil at recoil.org<mailto:anil at recoil.org>> wrote:
> > >> Thanks for this Gabriel! Quick notes:
> > >> 
> > >> 
> > >> 
> > >> - I would really like opam-fmt to be lossless, so preserving comments
> > >> 
> > >>  seems like an extension that we should implement.  Ideally everyone
> > >>  can just call it on their packages without thinking about it.
> > >> 
> > >> - Having a bot regularly go through and reformat the entire repository
> > >> 
> > >>  also seems quite legit.  The alternative would be to reformat at the
> > >>  merge point, but this would require a staging branch.
> > >> 
> > >> - Any chance you could use cmdliner instead of Arg?  Fairly minor, but
> > >> 
> > >>  all the other plugins use it and it's nice to have the same behaviours
> > >>  for OPAM plugins where possible.
> > >> 
> > >> - How does this behave on pre-1.2 files?  I think it's about time that
> > >> 
> > >>  we deprecate pre-1.2 so that we can have clean metadata standards
> > >>  about the new fields such as dev-repo.
> > >> 
> > >> regards,
> > >> Anil
> > >> 
> > >>> On 21 Nov 2015, at 21:53, Gabriel Scherer
> > >>> <gabriel.scherer at gmail.com<mailto:gabriel.scherer at gmail.com>> wrote:
> > >>> 
> > >>> 
> > >>> 
> > >>> Hi opam-devel,
> > >>> 
> > >>> 
> > >>> 
> > >>> As part of the discussion in
> > >>> 
> > >>> 
> > >>> 
> > >>> bulk addition of 'ocamlbuild {build}' dependencies
> > >>> https://github.com/ocaml/opam-repository/pull/5140
> > >>> 
> > >>> 
> > >>> 
> > >>> it became apparent that performing bulk updates on opam-repository is
> > >>> made harder by the fact that the parse-print roundtrip does not
> > >>> preserve human-formatted opam files. For my proposed patch I tried to
> > >>> separate the reformatting of opam file (to follow the opam-lib printer
> > >>> convention) from the actual changes in two separate commits, but that
> > >>> means more work for script authors, and also creates patches that are
> > >>> harder to review. (If (re)formatting was controlled by the maintainer
> > >>> of the OPAM packages instead of authors of bulk updates, we would be
> > >>> more confident in its correctness.)
> > >>> 
> > >>> 
> > >>> 
> > >>> In order to move that discussion forward (how to maintain opam
> > >>> metadata in a way that is easy for both human and scripts to work
> > >>> with?), I propose the opam-fmt script that can be found at
> > >>> https://github.com/gasche/opam-fmt/
> > >>> 
> > >>> 
> > >>> 
> > >>> I wrote it in the last few days and there are probably some rough
> > >>> edges. Once I feel that it should work, I may try to package it on the
> > >>> opam-repo (in the meantime, clone then pin).
> > >>> 
> > >>> 
> > >>> 
> > >>> This suggests one possible way forward: publicize opam-fmt, encourage
> > >>> users to reformat their opam files using it, adapt the opam-repository
> > >>> QA to call `opam fmt --check` on opam files proposed in PR to enforce
> > >>> it, and after some time (once we trust it works as expected thanks to
> > >>> the human guinea pigs) reformat all older opam files to get a perfect
> > >>> round-trip on all files of the repository.
> > >>> It is not clear to me that this is the best way forward. (For example
> > >>> it means that, in the current state of the opam file parsing/printing
> > >>> code, comments in opam files would always be erased by reformatting.
> > >>> Should we remove comments from the valid syntax of opam files, or
> > >>> attach them to configuration lines to re-print them correctly later,
> > >>> or maybe refuse to work on files with comments?) Opam developers and
> > >>> repository maintainers may decide that the cost of caring about
> > >>> reformatting outweigh the benefits in terms of scriptability.
> > >>> 
> > >>> 
> > >>> 
> > >>> What do you think?
> > >>> _______________________________________________
> > >>> opam-devel mailing list
> > >>> opam-devel at lists.ocaml.org<mailto:opam-devel at lists.ocaml.org>
> > >>> http://lists.ocaml.org/listinfo/opam-devel
> > > 
> > > _______________________________________________
> > > opam-devel mailing list
> > > opam-devel at lists.ocaml.org<mailto:opam-devel at lists.ocaml.org>
> > > http://lists.ocaml.org/listinfo/opam-devel
> 
> _______________________________________________
> opam-devel mailing list
> opam-devel at lists.ocaml.org
> http://lists.ocaml.org/listinfo/opam-devel