[opam-devel] Travis is broken

Wed Oct 29 22:00:57 GMT 2014

On 2014-10-26 14:55, Anil Madhavapeddy wrote:
> On 24 Oct 2014, at 11:59, Peter Zotov <whitequark at whitequark.org> 
> wrote:
>> 
>> On 2014-10-24 14:20, Anil Madhavapeddy wrote:
>>> One reason I haven't spent too much time on buildbot and bors is that 
>>> they
>>> all need some level of customisation to the specific deployment.
>> 
>> I'm actually almost done. (Bored, insomnia, etc.) The Buildbot 
>> configuration
>> is really simple in this case, it just runs a single Docker command, 
>> which
>> pulls from the repo and then runs a script derived from .travis-ci.sh:
>> 
>> https://github.com/whitequark/opam-repository/blob/master/.docker-ci.sh
>> https://gist.github.com/whitequark/516973336a55971e2507
>> 
>> A bigger problem is OS X workers, which don't have anything like 
>> Docker
>> for build isolation. But I think they have a sandboxing mechanism.
> 
> Thanks for setting this up!  It's good to see that it's a relatively 
> simple
> configuration setup.  I'm tempted to have something like this run on a 
> staging
> version of opam-repository, since we could eliminate the 50 minute 
> limit for
> build jobs (and hence rigorously test Core).
> 
>>> The OCamlot work that David Sheets did last year is ripe for a 
>>> refresh with
>>> all the new infrastructure that's been built in the last year.  For 
>>> example:
>>> - opamLib is now much easier to use as a library than it was in opam 
>>> 1.0
>>> - the ocaml-git bindings work, so all the shelling out to the cmdline 
>>> disappear
>>> - David has almost finished GitHub webhooks integration to ease that
>>> callback process
>>> - Irmin or Arakoon could be used as the k/v store for the logs now
>>> Al in all, I'd be inclined to put time into putting together a 
>>> self-hosted
>>> one using this infrastructure.  The only real missing major piece is 
>>> the web
>>> UI.  I wonder if there is some js_of_ocaml-friendly UI layer that we 
>>> could drop
>>> in for log viewing purposes...
>> 
>> This sounds like it could take months.
> 
> More on the order of weeks if you discount the web UI (which could be 
> CLI
> driven).  The reason it's worth doing is the customisation that you 
> pointed
> out is hard to do on external platforms.  Some lessons learnt from the 
> previous
> deployment of OCamlot last year:
> 
> - having a single-OCaml-binary deployment makes multi-OS workers really
>   straightforward compared to using (e.g.) Jenkins.  Getting the JVM 
> working
>   on a Raspberry Pi was no fun.

Buildbot is not based on JVM. The buildslave is a tiny Python 
executable.
The LLVM project has a lot of buildbots running on weird architectures.

> 
> - OPAM-specific logic is required to stop overwhelming slower (ARM, 
> PowerPC,
>   MIPS) workers with unnecessary jobs.  We had a 'stage 1' gateway that 
> would
>   run only on x86_64 to quickly test for errors, and then spawn off 
> tasks
>   on increasingly obscure architectures, as well as on non-Linux 
> operating
>   systems.

Agreed; the current approach is suboptimal.

> 
> - There are a number of custom regexps in the ocamlot repo that do 
> autotriage
>   on the build logs for common OCaml-specific errors, such as ocamlfind 
> packages
>   not being found, or warnings-as-errors.  I do miss these in Travis 
> land...

This would be great!

> 
> - Supporting multiple operating systems requires treating workers as 
> heavyweight
>   VMs, with Docker and similar OS-specific mechanisms being a useful
> optimisation
>   to build times.  We can run *some* workers on Rackspace Cloud where 
> they have
>   been ported, but others (such as OpenBSD) need to run on hosted 
> infrastructure
>   somewhere (such as the Cambridge Computer Lab, which is fine by me).

Yes, but this is unrelated to the CI system used.

> 
>   Specific operating systems:
>   - Windows, we could use Azure, which is also what Appveyor uses
>   - FreeBSD is supported on Rackspace Cloud
>   - OpenBSD requires custom hosting, but has some stability issues 
> under Xen
>     that are on my debugging list (page table crashes on x86_32).
>   - MacOS X could use Vagrant with the VMWare Fusion provider.  
> Sandboxing is
>     more of an app model there, and not suitable for whole-system 
> snapshots.
>   - Most common Linux variants can be handled via Docker.
> 
> It's interesting how there doesn't seem to be any out-of-the-box open 
> source
> solution for continuous integration on multiple operating systems and 
> weird
> architectures (where the JVM wont work too well).

Buildbot works.

Overall, I think that the buildscript itself definitely should be 
rewritten
in OCaml and handle all the OCaml-specific details. However, rewriting
the web UI, buildmaster/buildslave communication and so on is a waste of
time. There is no practical benefit gained from doing so, and the result
will be certainly inferior even to Buildbot, which is not perfect, but
has had much more time to get polished than any homegrown solution.

> 
> -anil

-- 
Peter Zotov