Monday, April 27, 2009

jHeidi drizzle support

Bill Culp over at just released a version of jHeidi that supports Drizzle, using drizzle-jdbc, cool seeing the driver in action! Go try it out.

Tuesday, April 21, 2009

Initial MySQL support and the JDBC4 wrapper interface

It is now possible to use drizzle-jdbc to execute queries against a MySQL database. The changes needed to support this were very small, the only difference (right now) in the protocol is that the number of commands has been reduced in drizzle, and the server capabilities enum has been modified to mirror the capabilities in drizzle. In the near future drizzle will have a brand new protocol though, so I made the decision to separate the main protocol classes, and share whatever can be shared outside that class.

To use drizzle-jdbc against MySQL, check this example:

try {
} catch (ClassNotFoundException e) {
throw new SQLException("Could not load driver");
// lets hope the big O doesnt sue me for using "thin" in the connection string:
Connection connection =
Statement statement = connection.createStatement();
statement.execute("INSERT INTO a VALUES (1)");

JDBC4 introduced the wrapper interface, so that a user of the JDBC driver can access vendor-specific functionality in the driver. To test it, I implemented the COM_BINLOG_DUMP command against MySQL, it works like this:

if(connection.isWrapperFor(ReplicationConnection.class))) {
ReplicationConnection rc = connection.unwrap(ReplicationConnection.class);
List<RawPacket> rpList = rc.startBinlogDump(891,"mysqld-bin.000001");
for(RawPacket rp : rpList) {
for(byte b:rp.getRawBytes()) {
System.out.printf("0x%x ",b);

Of course, this is just a proof-of-concept, it gives you a list of the raw events in the binlog on the server. This is not something really usable, but it shows how to build vendor-specific stuff and access them through the JDBC4 wrapper interface. My plan is to parse the events from the binlog and make them available on a per-request basis so we don't need to store all events from the server binlog in-memory.

Wednesday, April 1, 2009

Note about performance in drizzle-jdbc

Ok, i don't really like to optimize early, but this made me learn using blocking queues and executor services. This blog post is very unscientific and does not provide any raw numbers, you can fetch them yourselves, look at the LoadTest test case in drizzle-jdbc. The MySQL stuff is commented out since I want the tests to pass in hudson

First, I ran a few comparisons between drizzle-jdbc against drizzle and Connector/J on MySQL, using the same test case, insert 1000 rows, read them back 1000 times. Connector/J was 5-10 times faster, then i remembered that tables in mysql have myisam as the default storage engine (doh). So, when i created the tables with InnoDB, the performance was still alot better with connector/j, but the difference was not as big (C/J was approx 50% faster).

The first optimization i did was to start fetching packets (communication between drizzle and client is done using packets, for example row packets, ok packets, error packets etc) in a background thread and putting them on a BlockingQueue. To manage the background thread I use an ExecutorService. So, whenever a packet is available on the stream, it is read back and put on the queue. The packet is a raw packet, i.e. the executor simply reads up length and stores the raw bytes in a "RawPacket" on the queue. The gain from this is that while the protocol figures out what kind of packet it is, the next packet is prepared on the queue. Look here for the code of the AsyncPacketFetcher. I also built a SyncPacketFetcher which does not use a background thread or queue, it simply creates the RawPackets directly when called.

By pushing the packet fetching to a background thread, another natural optimization would be to return the result set from the protocol with a reference to the packet fetcher before starting to read RowPackets, and only consume the queue when the user of drizzle-jdbc wants to read a row. Then the user could start processing result sets before all rows are read back from the server.

The second optimization was to start prefetching data when it is available on the inputstream. I.e. read as much as possible whenever reading, and use the local buffered bytes when read():ing. A similar technique is used in connector/j. Look here for the code.

All in all, these optimizations take drizzle-jdbc close to the performance of connector/j (not all the way though, something like 5-10% off). But I want to stay away from any more optimizing until the new protocol is implemented in drizzle. It is also likely that my configuration of my local drizzled is not as good as the configuration of the mysql server I use. All these changes takes the design of drizzle-jdbc closer to what I want when the async stuff is available in drizzle.