[wg-parallel] About Lwt and Async

Romain Slootmaekers romain at incubaid.com
Mon Apr 29 16:55:06 BST 2013


On 04/29/2013 05:12 PM, Jeremie Dimino wrote:
> On Mon, Apr 29, 2013 at 3:57 PM, Anil Madhavapeddy <anil at recoil.org> wrote:
>> Got it. I don't have a feel for the performance impact of such deferred
>> scheduling, except that the difference between busy-spinning if there are
>> outstanding requests, vs dropping into select/kqueue/epoll more frequently
>> is very significant.
> As Stephen said there shouldn't be more select/kqueue/epoll calls.
> The idea is to run all jobs until there is no one left (with a limit)
> before doing the blocking select/kqueue/epoll call.
>
>> For example, the Arakoon folks anecdotally reported a 20x performance loss
>> between a direct Unix implementation of their database layer vs an Lwt
>> one.  A loss that large can only be explained by context switching or
>> pathological scheduling somewhere (given we know that Lwt doesn't result
>> in a lot more allocation on the major heap).
>> http://www.slideshare.net/eikke/arakoon (slide 14/15)
>> (I'm CCing Romain, who might have details).
> I believe this was about the disk IO.  Disk IO are done using
> preemptive threads since unix doesn't support asynchronous disk IO.
> When data are cached it is indeed much slower.  You can get about the
> same as direct IO after some tweaking: set the async_method to
> 'switch' and force the process to run only on one cpu. But the switch
> method doesn't work with the threaded runtime.
Yes. it was about disk IO, on Linux.

(from the top of my head, we did something like this:

     Lwt_unix.execute_job
-            (pread_job (Lwt_unix.unix_file_descr ch) len offset)
-            (fun job -> pread_result job buf pos)
-            pread_free


which was dead slow.


At that point in time, we also experimented with Lwt_unix.Async_switch
but that gave SEGVs

Some more remarks:
we have seen scheduling differences between different Lwt versions,
In the past, the scheduling was eager, while the current one is very 
round-robin-ish,
and performance impact for us is about -10%. (we have no problem with 
that, it's just an observation)

Posix is indeed broken for all things related to file IO.
* not all file descriptors are equal:
[select on a file descriptor for a regular file, always returns true, 
while for a socket you get more relevant results]
* things can have empty implementations (fe I believe the fsync 
implementation on OSX is something like this:
    int fsync(int fd) {return 0;}

    )

* for a lot of behaviour,
   you depend on file system choices and sys-admin choices (mount options)
   and they have a talent for the wrong choices.


In essence: the situation is a mess.

have fun,

Romain.




More information about the wg-parallel mailing list