[opam-devel] How to know whether a package archive is already in the cache?

Gabriel Scherer gabriel.scherer at gmail.com
Sat Nov 14 02:17:58 GMT 2015

I finally went the "independent table that I manually (de)serialize
before and after download" route. The current state of the script is
available at


and I'm looking for feedback on the preliminary results at


On Thu, Nov 12, 2015 at 6:47 PM, Gabriel Scherer
<gabriel.scherer at gmail.com> wrote:
> Hi opam-devel,
> I'm currently hacking on a script to do a bulk update of OPAM
> metadata, adding "ocamlbuild" as an explicit dependency of all
> packages my killer heuristic decides certainly use ocamlbuild (right
> now: there is a _tags or myocamlbuild.ml somewhere, but I'm soon going
> to integrate the fact that an _oasis file explicitly lists ocamlbuild
> as the relied-upon build system).
> This is rather simple, with most of the time spent browsing through
> the rich opam-library API.
> - iterate over all packages in the repository (using the nice
> Opam_admin_top.iter_packages function)
> - for each package download the archive (I used
> OpamAction.download_package for this, although it requires an
> OpamState.t argument that I wasn't sure how to build¹)
> - extract each archive (OpamFilename.extract_generic_file, under some
> OpamFilename.with_tmp_dir call to get automatic cleanup)
> - walk the archive to test ocamlbuild usage
> Caching downloaded archive works very well, so re-running the script
> (during my test-refine feedback loops) does not re-download those as
> well. Unfortunately, for a handful of packages, download fails, and it
> only fails after a rather long timeout has expired, so just
> re-iterating on those failed packages make a process that should be
> instantaneous takes several minutes.
> So here is my question: how can I test whether a package archive is
> already in the cache? Because I know now that all packages that won't
> time out have been cached by previous runs of my script, I could
> iterate only on those. But I didn't find a clear way to do that (this
> seems to be available internally in some OpamHTTP backend, but I
> haven't seen this exported).
> A way to cache not only the successfully downloaded archives, but also
> the "did not work" last time decision would also fit the bill. In the
> worst case I could store that information in an independent table that
> I would (de)serialize across invocations of my script.
> (Opam seems to have fancy download functions designed to download a
> lot of stuff in parallel, but that seems incompatible with the
> sequential workflow imposed by `iter_packages`. I could first iterate
> to build a list of URLs, then download everything in parallel, then
> re-iterate but then again I need to only access the archives whose
> download actually succeeded.)
> While we're at it: is there a simple way to get a pretty string from a
> Package.t value? I use
>           Printf.sprintf "%s.%s"
>             (OpamPackage.name_to_string package)
>             (OpamPackage.version_to_string package)
> but would expect this to be available already.
> The complete code of the current prototype script (it is not editing
> any metada so far, just printing out the results that seem reasonable,
> except that the _oasis part of the heuristic needs to be implemented
> to get realistic results) is available at
>   https://github.com/gasche/opam/blob/2badfa0810e25ded1495b28b2ec8ff53f03a90cc/admin-scripts/add_ocamlbuild_dependency.ml
> Any comment or advice is warmly welcome. In particular there is a
> question in a comment about: what is the right way to build a
> OpamState.t value?

More information about the opam-devel mailing list