Pages

(+)

Wednesday, 27 January 2010

Libevent 2.0.x: Like Libevent 1.4.x, Only More So.

Recent Posts

Libevent 2.0.x is the new version of Libevent from the Tor Project. Libevent is a software library whose purpose is to provide consistent fast interfaces to various operating systems' mutually incompatible fast networking facilities. gives applications two basic interfaces to these networking layers: a low-level interface where the application is notified when an operation (like a network read or write) is ready to begin, and a higher-level interface where Libevent itself manages network operations and the application is notified when the network operations are completed.

It's amazing how complicated things can become when you set out to do a simple task pretty well.

Back in 2003, I found myself working on a project written in C that needed (among other things) to do fast nonblocking IO on a whole bunch of sockets at once. Our placeholder asynchronous IO implementation, based on the widely portable select() and poll() syscalls, wasn't working well on Windows and didn't scale at all.

What does it do?

I'll add a bit of background here, for readers who haven't tried to do their nonblocking IO from scratch before. When you write a program that needs to work with many active network connections at once, you have two basic choices. One choice is to call IO functions that wait (block) until they are done, and to give each connection its own thread or process so that one connection's blocking doesn't force all the connections to block. For many applications, this approach doesn't scale too well.

A better choice is to use so-called "nonblocking" IO functions that make as much progress as they can, and return control to your program once they can make no more immediate progress. With this approach, your program needs some way to tell which connections are ready to do more IO, so that you don't waste CPU time polling connections that aren't ready yet. There are many kernel-provided APIs to do this, but the portable ones all scale badly (like select() and poll(), which are both O(N) in performance), and the fast ones are all non-portable (like kqueue() and epoll() and /dev/poll). This is where Libevent comes in. Libevent provides a consistent interface to wrap the common features of all these interfaces that tell you when connections are ready for IO, so that you can write clean code using nonblocking IO without having to re-code for every possible backend.

So, we started using Niels Provos's Libevent library around 2005. It was around version 1.0 back then, and it had... a few bugs. To get our application working I started submitting patches, and before too long Niels asked me if I'd like to co-maintain it.

Flash-forward to the present day. Libevent had grown an "extras" library to support commonly needed protocols like HTTP and DNS. Libevent had gotten real users too, including Tor, Memcached, and Chromium. With version 1.4, Libevent's core had grown to maturity, and did a pretty good job on most Unixish platforms and a not-too-awful job on Windows. But there were still some issues that made us decide to try for a major development effort in 2.0. I'll touch on the big ones here, explain what progress we've made, and what we still have to do before we can call Libevent 2.0 finished.

Cleaner interfaces
Some of our older interfaces had grown crufty and difficult to use safely. For example, you sometimes needed to remember to follow up every call to event_set() with a corresponding call to event_base_set(), or you might get strange behavior. Some interfaces were a bad idea to begin with, like the ones that relied on global state.

We could have just changed all our APIs to be less error-prone, but that would have broken all the older programs written to use them. Still, leaving the old APIs exposed by default would make it hard for developers to make sure they were avoiding them. As a compromise, we decided to put the preferred APIs in new headers (such as event2/event.h), and the deprecated APIs in compatibility headers (such as event2/event_compat.h), while retaining the old headers (like the old event.h) as wrappers to include both the new and old APIs.


Nearly every non-Windows OS does nonblocking network IO best with the system I described above, where the program first asks the kernel which sockets are ready to read or write and then asks the kernel to perform as much IO on them as could succeed right now without blocking. On Windows, though, the "tell me which sockets are ready" part is not so efficient or scalable. Windows's select() function (which Libevent 1.4 uses) is O(N), its WSAWaitForMultipleEvents() function doesn't support more than 64 sockets at a time, and so on.

Windows does have a good networking API, however. It just follows a different paradigm. Instead of asking the OS to tell you when an operation would be able to make progress, you tell the OS to begin an operation asynchronously, and ask the OS later (via an , or IOCP) to tell you which operations have finished.

It's quite hard to implement a notify-on-readiness interface using a notify-on-completion system like Windows provides. Fortunately, however, it's pretty easy (and frequently desirable!) to implement a notify-on-completion interface using a Unix-style notify-on-readiness API, so that's what we decided to do.

Supporting IOCP on Windows has been the most far-reaching challenge in Libevent 2.0, and has led to improvements throughout the rest of the (non-Windows-specific) codebase. I'll say more when I talk about individual improvements below.

Though most of the groundwork for an IOCP implementation is laid, the code itself is fairly immature. If you fetch the latest code from subversion, you can enable data transfer through an existing socket with IOCP. IOCP-enhanced connect and accept operations are not yet supported, nor is UDP. We hope to get these done pretty soon.


Buffered-IO improvements

Earlier Libevent versions had a "bufferevent" abstraction to handle the common case where you want to queue data received and data to be sent over a TCP connection, and receive notification as more data arrives or is sent. The interface here was a good match for IOCP, but the existing implementation was too inflexible and inefficient for serious use. Among other problems, its storage format relied on large circular buffers, it allowed only a single implementation for the interface, it behaved poorly on Windows, and it was just generally slow.


In Libevent 2.0, we've rewritten bufferevent from the ground up. The data storage format has changed from circular buffers to a linked list of buffers, reminiscent of the classic mbuf interface. We've added support for sending files with sendfile() and friends, and fixed up Windows support. The interface now supports multiple backends, OO-style, to allow multiple implementations of the "buffered IO" API. So far, we have five such backends: the basic socket implementation, a filtering wrapper implementation, an SSL-based implementation (see below), a socketpair-style implementation, and an IOCP-based implementation.

Preliminary results around April 2009 indicated that the rewritten codebase had improved our large-payload HTTP benchmark performance by between 35 and 174 percent. We hope that as we profile and benchmark this code more aggressively, we can find more opportunities for improvement here.


Improved multithreading

Older versions of Libevent allowed one event loop structure per thread; adding or deleting events from another thread wasn't supported. This was not only an annoyance for writers of multithreaded applications; it made IOCP support nigh-impossible, since we needed to have the IOCP loop run in separate threads from the main event thread.

The basic work here is finally done; as of last week, there are no major non-locked parts of Libevent remaining. We may still need to do performance tuning here: different operating systems provide mutex implementations with wildly differing characteristics.


Our next big multithreading initiative will be better support for parallelizing event callbacks across multiple CPUs on demand. This will be trivial for Win32 IOCP callbacks, but will require actual coding for older Unix backends. To ease migration, we'll need to allow programs to enable this functionality one callback at a time, so that we don't force them to upgrade all of their code at once.


As noted above, there's now a bufferevent backend that uses the OpenSSL library to encrypt traffic using the SSL/TLS family of protocols. This wasn't as simple as writing a regular data filter, since the protocol's support for session renegotiation allows a single virtual read/write operation to be implemented as multiple reads and writes to the network.


The OpenSSL cryptography library has support for doing TLS encryption directly to the network or over a user-provided IO object. We've taken both approaches, so that an OpenSSL bufferevent can either deliver data to a network socket or to another bufferevent. This way, Unix programs can have a bufferevent that speaks SSL directly to the network, and Windows programs can wrap an SSL layer over an IOCP-based backend.

We'd eventually like to add support for other free SSL libraries, such as GnuTLS and NSS, though we're not planning to work on this for Libevent 2.0.x unless somebody wants to contribute the code.

As you might imagine, we've added and rewritten a lot of code between Libevent 1.4 and Libevent 2.0. Without good unit tests, we'd probably have introduced a fair number of bugs.


Unfortunately, our old unit tests weren't so good. Though the overall coverage was decent (about 72%), some parts of the code were hardly tested at all (the DNS code and the IO buffer code were both under 60% coverage). The test suite was fairly fragile, too: many tests relied on global state, which made it hard to write new ones, especially if they required inducing strange error conditions.


For Libevent 2.0, we put a major effort into our testing code. Tests now run in separate processes, so they can't affect one another no matter how badly they trash global state. This not only stopped us from adding new bugs, but also revealed a few longstanding ones. Thanks to this change, it's easy to test far more error conditions than we could test before. This has gotten us up to around 80% test coverage overall, with the least-tested module at around 72%.

Before we release Libevent 2.0, I want our test coverage to be at least 80% for every module. I'd also like to do a manual coverage audit to look for any untested sections that seem particularly error-prone.



Next steps
In the coming months, we'll want to get the Libevent 2.0.x series finished. This will mainly involve fixing known bugs in the Sourceforge trackers, finishing the IOCP code, and testing the heck out of everything.

In parallel, I've been working on a short book to serve as an introduction to nonblocking networking in general, and to Libevent in particular. The latest draft is at http://www.wangafu.net/~nickm/libevent-book/.

After Libevent 2.0.x is stable and released, we'll be able to start looking at missing features for Libevent 2.1.x. These aren't at all final yet, but I'm hoping to get improved thread-pool support for IOCP and non-IOCP users alike, and better support for UDP-based protocols. We'll probably have many other changes to make based on user feedback from Libevent 2.0.

Thanks to Google for their support of the project.


By Nick Mathewson, Tor Project
(+)