Wednesday, April 1, 2009

Note about performance in drizzle-jdbc

Ok, i don't really like to optimize early, but this made me learn using blocking queues and executor services. This blog post is very unscientific and does not provide any raw numbers, you can fetch them yourselves, look at the LoadTest test case in drizzle-jdbc. The MySQL stuff is commented out since I want the tests to pass in hudson

First, I ran a few comparisons between drizzle-jdbc against drizzle and Connector/J on MySQL, using the same test case, insert 1000 rows, read them back 1000 times. Connector/J was 5-10 times faster, then i remembered that tables in mysql have myisam as the default storage engine (doh). So, when i created the tables with InnoDB, the performance was still alot better with connector/j, but the difference was not as big (C/J was approx 50% faster).

The first optimization i did was to start fetching packets (communication between drizzle and client is done using packets, for example row packets, ok packets, error packets etc) in a background thread and putting them on a BlockingQueue. To manage the background thread I use an ExecutorService. So, whenever a packet is available on the stream, it is read back and put on the queue. The packet is a raw packet, i.e. the executor simply reads up length and stores the raw bytes in a "RawPacket" on the queue. The gain from this is that while the protocol figures out what kind of packet it is, the next packet is prepared on the queue. Look here for the code of the AsyncPacketFetcher. I also built a SyncPacketFetcher which does not use a background thread or queue, it simply creates the RawPackets directly when called.

By pushing the packet fetching to a background thread, another natural optimization would be to return the result set from the protocol with a reference to the packet fetcher before starting to read RowPackets, and only consume the queue when the user of drizzle-jdbc wants to read a row. Then the user could start processing result sets before all rows are read back from the server.

The second optimization was to start prefetching data when it is available on the inputstream. I.e. read as much as possible whenever reading, and use the local buffered bytes when read():ing. A similar technique is used in connector/j. Look here for the code.

All in all, these optimizations take drizzle-jdbc close to the performance of connector/j (not all the way though, something like 5-10% off). But I want to stay away from any more optimizing until the new protocol is implemented in drizzle. It is also likely that my configuration of my local drizzled is not as good as the configuration of the mysql server I use. All these changes takes the design of drizzle-jdbc closer to what I want when the async stuff is available in drizzle.

No comments: