[opam-devel] How to know whether a package archive is already in the cache?

Sat Nov 14 02:59:26 GMT 2015

The "canonical" way to get an OpamState.state is through 
`OpamState.load_state` ;

another possibility, to do the caching, would be to do it manually at the 
repository layer:

1. clone the opam-repository
2. use `opam-admin make` to mirror all of the archives
3. select your local mirror with `opam repo add`

The interface of OpamHTTP may be a bit confusing at the moment, since it mixes 
mirroring and downloading of a full repository state with downloading of 
single files. I intend to make this into two layers for clarity (and helping 
with the signed repository plans) but haven't got to it yet. I agree that it's 
currently quite difficult to figure how caching works.

If you want to do the downloads in parallel, you should start from the 
Opam_admin_top.packages set, and do something like what is done in 
OpamSolution.parallel_apply:

    OpamParallel.map
      ~jobs:(OpamState.dl_jobs t)
      ~command:(OpamAction.download_package t)
      ~dry_run:OpamStateConfig.(!r.dryrun)
      <packages>

On your last question, there is an `OpamPackage.to_string`.

On a side note, ocp-index really helps when browsing the code of opam, it has 
saved me countless hours.

Hope this helps!

Best,
Louis

Le jeudi 12 novembre 2015, 18:47:38 Gabriel Scherer a écrit :
> Hi opam-devel,
> 
> I'm currently hacking on a script to do a bulk update of OPAM
> metadata, adding "ocamlbuild" as an explicit dependency of all
> packages my killer heuristic decides certainly use ocamlbuild (right
> now: there is a _tags or myocamlbuild.ml somewhere, but I'm soon going
> to integrate the fact that an _oasis file explicitly lists ocamlbuild
> as the relied-upon build system).
> 
> This is rather simple, with most of the time spent browsing through
> the rich opam-library API.
> - iterate over all packages in the repository (using the nice
> Opam_admin_top.iter_packages function)
> - for each package download the archive (I used
> OpamAction.download_package for this, although it requires an
> OpamState.t argument that I wasn't sure how to build¹)
> - extract each archive (OpamFilename.extract_generic_file, under some
> OpamFilename.with_tmp_dir call to get automatic cleanup)
> - walk the archive to test ocamlbuild usage
> 
> Caching downloaded archive works very well, so re-running the script
> (during my test-refine feedback loops) does not re-download those as
> well. Unfortunately, for a handful of packages, download fails, and it
> only fails after a rather long timeout has expired, so just
> re-iterating on those failed packages make a process that should be
> instantaneous takes several minutes.
> 
> So here is my question: how can I test whether a package archive is
> already in the cache? Because I know now that all packages that won't
> time out have been cached by previous runs of my script, I could
> iterate only on those. But I didn't find a clear way to do that (this
> seems to be available internally in some OpamHTTP backend, but I
> haven't seen this exported).
> 
> A way to cache not only the successfully downloaded archives, but also
> the "did not work" last time decision would also fit the bill. In the
> worst case I could store that information in an independent table that
> I would (de)serialize across invocations of my script.
> 
> (Opam seems to have fancy download functions designed to download a
> lot of stuff in parallel, but that seems incompatible with the
> sequential workflow imposed by `iter_packages`. I could first iterate
> to build a list of URLs, then download everything in parallel, then
> re-iterate but then again I need to only access the archives whose
> download actually succeeded.)
> 
> While we're at it: is there a simple way to get a pretty string from a
> Package.t value? I use
>           Printf.sprintf "%s.%s"
>             (OpamPackage.name_to_string package)
>             (OpamPackage.version_to_string package)
> but would expect this to be available already.
> 
> The complete code of the current prototype script (it is not editing
> any metada so far, just printing out the results that seem reasonable,
> except that the _oasis part of the heuristic needs to be implemented
> to get realistic results) is available at
> 
>  
> https://github.com/gasche/opam/blob/2badfa0810e25ded1495b28b2ec8ff53f03a90c
> c/admin-scripts/add_ocamlbuild_dependency.ml
> 
> Any comment or advice is warmly welcome. In particular there is a
> question in a comment about: what is the right way to build a
> OpamState.t value?
> _______________________________________________
> opam-devel mailing list
> opam-devel at lists.ocaml.org
> http://lists.ocaml.org/listinfo/opam-devel