Hi opam-devel,

Here is a rather cool bachelor thesis that seems relevant to OPAM
repository management:

  Typosquatting in Programming Language Package Managers
  Nikolai Philipp Tschacher, March 2016

The described attack is to propose packages whose names are typo-close to
very popular packages. Instead of "opam install omake" I run "opam install
omaek", but "omaek" exists and is attacker-controlled, and its install
script wreaks havoc on my machine.

This is interesting because it is a way to subvert a specific package that
is immune to the common defenses against impersonation -- signing a package
with its maintainers keys, etc. The author of the thesis suggests three
defense methods:

1. Make package installation sandboxed in such a way that just installing a
package is harmless as long as its code is not linked and run. (Of course
this code may be linked and run if a developer also makes a typo in its

2. Alert repository administrators when a typo-candidate is proposed for
integration. (This is especially relevant for repositories with no human
oversight on package addition, but even for OPAM one may consider that the
maintainers themselves may be fooled by the typo or not think of the
security consequences.)

3. Keep a log of the non-existing packages that users commonly try to
install (good candidates for typos) and alert administrators when a
matching package is proposed.

I'm sure that the systems expert in the room have plans for (1) already. I
suspect that opam's architecture does not let us do (3), but I was
interesting in quickly hacking (2) this morning -- I suppose I like
typo-detection stuff.

My plan was: in `opam lint`, emit a warning if the linted package name is
at edit distance 2 or less (but not 0) of an existing package in the
repository. But this does not quite work; I quickly looked at the code and
it seems that "opam lint" is meant to be run purely locally, it does not
have access to a base of packages available in the repository.

So my question: where in the opam-repository QA process should I add a
script (preferably written in OCaml rather than shell) that gets the name
of the packages proposed for inclusion, also has access to the name of
existing packages in the repository, and can fail or warn if the proposed
one is typo-close to an existing one?

(This test can have false positives, eg. installing lablgtk2 when lablgtk
exists. It should still fail in a visible way in the UI, but not in a way
that prevent other, more advanced tests, such as package installability.)
