Thursday, June 4, 2009

Pluggable Batch Update Handlers

Reading about the awesome batch insert performance blog post by Mark Matthews last week ( got me thinking, why has this not been done before? Connector/J must be the most deployed JDBC driver in the world and batch inserts are a common use case, why hasn't the community stepped up and implemented the query rewrite feature before? Most likely because it is a complex issue that requires deep knowledge of the rest of the driver. I have been a happy Connector/J user myself for several years and never considered doing something like this.

To handle this complexity in drizzle-jdbc the batch query functionality is pluggable, i.e. you can implement a small interface and tell the connection to use that implementation. So, if anyone out there has some crazy ideas about how to improve performance of batch inserts/updates, it should be fairly easy.

First you need to implement the ParameterizedBatchHandler interface, it has two methods:

void addToBatch(ParameterizedQuery query);
int [] executeBatch(Protocol protocol) throws QueryException;

  • addToBatch is called when addBatch() is called on the PreparedStatement. I.e. when someone wants to add the current set of parameters in a prepared statement to the current batch - the query parameter contains all the information you need to make something smart.

  • executeBatch is called when executeBatch() is called on the PreparedStatement. The protocol sent to this method should be used to send the query to the server (though, you could make new connections to the server, fork up a few threads and send queries to the server in parallel).

Then, to make the connection use your handler:

Connection connection = DriverManager.getConnection("jdbc:drizzle://localhost:4427/test_units_jdbc");
if(connection.isWrapperFor(DrizzleConnection.class)) {
DrizzleConnection dc = connection.unwrap(DrizzleConnection.class);
PreparedStatement ps = connection.prepareStatement("insert into asdf (somecol) values (?)");

The current implementation in drizzle-jdbc simply stores all queries in a list and when doing executeBatch, the queries are sent, one-by-one, to the server. I'm planning on doing a rewrite handler in the near future.

Look at these files for more information:

No comments: