[wg-parallel] About Lwt and Async

Mon Apr 29 15:57:48 BST 2013

On 29 Apr 2013, at 11:54, Jeremie Dimino <jdimino at janestreet.com> wrote:

> On Mon, Apr 29, 2013 at 11:48 AM, Anil Madhavapeddy <anil at recoil.org> wrote:
>> If I understand this right, Lwt exposes `wakeup` and `wakeup_later`,
>> and the latter defers the wakeup until the scheduler is entered again.
>> Therefore, `wakeup_later` is most similar to Async's model.
> 
> Almost.  The idea is that Lwt.wakeup_later pushes pending jobs to a
> global queue and they are run at the end of the topmost
> Lwt.wakeup/Lwt.wakeup_later, to avoid a stack overflow in some cases.
> 
>> I'm not sure if there are other ways in Lwt to interrupt a running
>> thread, aside from Lwt_preemptive.  Is it sufficient to alias Lwt.wakeup
>> to Lwt.wakeup_later?
> 
> Yes but that's not enough, we also need to remove the code running
> pending jobs from Lwt.wakeup/wakeup_later, put it in its own exported
> function, and call this function in the scheduler.  In scheduler-free
> environment the latter may have to be called in some hook or at the
> end of callbacks.

Got it. I don't have a feel for the performance impact of such deferred
scheduling, except that the difference between busy-spinning if there are
outstanding requests, vs dropping into select/kqueue/epoll more frequently
is very significant.

For example, the Arakoon folks anecdotally reported a 20x performance loss
between a direct Unix implementation of their database layer vs an Lwt
one.  A loss that large can only be explained by context switching or
pathological scheduling somewhere (given we know that Lwt doesn't result
in a lot more allocation on the major heap).
http://www.slideshare.net/eikke/arakoon (slide 14/15)
(I'm CCing Romain, who might have details).

Either way, this seems like a good time to establish a few simple
microbenchmarks such as a TCP iperf or HTTPbench, just to have a few
baseline numbers to evaluate such design decisions against.  I do believe
that the Async 'drop to scheduler' behaviour is far easier to understand
than Lwt's, but not if it costs an order of magnitude in I/O throughput.

-anil