Monday, December 28, 2009

Drizzle replication to WebSockets

I just pushed a WebSocket applier to RabbitReplication, and yes, it is as crazy as it sounds. It works pretty much like all appliers - it consumes drizzle transactions from  RabbitMQ, converts them into objects by inspecting annotations, marshalls the object to JSON, and then stores the JSON string. In this case it stores it to a set of websockets. RabbitReplication is deployed as a war file to Jetty 7.01 which supports websockets.


I set the demo up on my server at home (in Sweden) on a DSL line, so it might be slow, but it should show the idea, all operations are instant when latency is low (if anyone wants to host it at a better place, please let me know). Of course, it requires a WebSocket capable browser and the only one I know of is Google Chrome.

It works like this:

  1. INSERT is executed from the "drizzle client" webapp - totally separate webapp that uses drizzle jdbc to insert/update/delete data.
  2. Drizzle stores the transaction in the database and in the transaction log.
  3. Master extractor extracts the transaction and publishes it to RabbitMQ
  4. Slave applier consumes the transaction from RabbitMQ
  5. Applier transforms the transaction to JSON
  6. Applier writes the JSON to a set of websockets
  7. Javascript voodoo is performed to make it visible
Possible real usecases
The demo app just shows what is possible, but a real use case could be that someone has a drizzle backed forum and want to add some real time post-updates to some front page somewhere. This would be real easy, simply start a new slave configured for WebSocket application (of course RabbitReplication is already used for other replication needs :) ), convert the JSON to something that makes sense and they are set! If someone has a cool usecase, please let me know and i'll build a more realistic demo app!

Wednesday, December 16, 2009

Better replication from drizzle to cassandra



Introduction
This article describes how one of the replication appliers work in rabbitreplication, namely the HashTransformer which transforms each INSERT/UPDATE into a hashmap which is then stored in a column-family based storage, currently Cassandra. For a better overview of RabbitReplication, go check out earlier posts on the subject here: http://developian.blogspot.com


Configuration
This example replicates changes done to a table called test1 in the schema called unittests. RabbitReplication is configured to only replicate the columns id and test (yes, good example, I know...). The column id is used as a key. The following slave configuration is used for this use case:


replication_role = hashslave


rabbitmq.host = 10.100.100.50
rabbitmq.queuename = ReplicationQueue
rabbitmq.exchangename = ReplicationExchange
rabbitmq.routingkey = ReplicationRoutingKey
rabbitmq.password =
rabbitmq.username =
rabbitmq.virtualhost =
hashstore.host = localhost:9160
hashstore.type = cassandra


hashreplicator.replicate.unittests.test1 = id,test
hashreplicator.key.unittests.test1 = id
hashreplicator.keycolseparator.unittests.test1 = .


The hashreplicator rows are the interresting ones, they describe what columns to replicate, what columns are the primary key and what separator to use between the columns when the key is multi column.


Example
Replicating an insert:
drizzle> use unittests
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A


Database changed
drizzle> desc test1;
+---------+-------------+------+-----+---------+-------+
| Field   | Type        | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| id      | int         | NO   | PRI | NULL    |       | 
| test    | varchar(10) | YES  |     | NULL    |       | 
| ignored | varchar(10) | YES  |     | NULL    |       | 
+---------+-------------+------+-----+---------+-------+
3 rows in set (0.02 sec)


drizzle> insert into test1 (id, test) values (300, "firstins");
Query OK, 1 row affected (0 sec)


Results in the following on the Cassandra side:
cassandra> get unittests.test1['300']
  (column=test, value=firstins; timestamp=1260985298391)
  (column=id, value=300; timestamp=1260985298385)
Returned 2 rows.


Update:
drizzle> update test1 set test = "updated" where id = 300;
Query OK, 1 row affected (0 sec)
Rows matched: 1  Changed: 1  Warnings: 0


Gives this in cassandra:
cassandra> get unittests.test1['300']
  (column=test, value=updated; timestamp=1260985526210)
  (column=id, value=300; timestamp=1260985298385)
Returned 2 rows.


Note that the timestamp for the id column is not updated (only changes are updated, not entire rows).


Delete:
drizzle> delete from test1 where id = 300;
Query OK, 1 row affected (0 sec)
And in Cassandra:
cassandra> get unittests.test1['300']
Returned 0 rows.


That is it, go to http://launchpad.net/rabbitreplication to check out the code, report bugs or suggest features!

Thursday, December 10, 2009

Cassandra support in rabbitreplication

Just pushed support for replicating into cassandra to http://launchpad.net/rabbitreplication

The following format is used:
KeySpace = schema name from the transaction
ColumnFamily = table name from the transaction
Column name = "object" since we only store objects
Key = the key generated from the object to store, either by using the @Id annotation or by implementing the KeyAware interface

In the CLI you would type something like this to get the data (drizzle schema name is unittests and table is test1):


cassandra> get unittests.test1['1']['object']
==> (name=object, value={"name":"updated","ssn":1}; timestamp=1260472768425)

Monday, December 7, 2009

Drizzle persistence in Project Voldemort

I just built drizzle support into Project Voldemort:
http://github.com/krummas/voldemort - just add drizzle in your stores.xml and it should work

Basically just a cutnpaste of the mysql code, uses drizzle-jdbc (which is included in the git repo).

To try it out, check out the code from github, execute "ant release" in the base dir and you get binaries in the dist directory.

Sunday, December 6, 2009

Replication from drizzle to memcached / project voldemort

The last few days I've been working on a way to replicate changes from drizzle into a key value store, currently project voldemort and memcached. It is built in my rabbit replication project which means that the transactions are transfered over a message bus (rabbitmq currently). The picture below describes an example of how the involved components could be set up (not likely that you want both memcached and project voldemort though):




Current feature list of rabbitreplication:

  • Replication from drizzle into drizzle (or any database with a JDBC driver) / memcached / project voldemort.
  • Map inserts and updates onto java objects using annotated classes (see below for example).
  • Two different ways of marshalling objects, JSON ond Java object serialization
  • Full control over how the key is generated (just implement the KeyAware interface in your target object)
  • Simple interface to build new marshallers. 
  • Simple interface to build new object stores.
  • Simple interface to build new transports. (Will blog these extension points later)
Example:
The class below will catch any statements on the table unittests.test1 and take the column "id" and set it on the ssn field, and it will take the "test" column and set it on the name field. It will use the field annotated with @Id as key in the store and use the JSONMarshaller to marshal the object.




@Entity(schema = "unittests", table = "test1", marshaller = JSONMarshaller.class)
public class ExampleRepl {
    @Id
    @Column("id")
    private int ssn;


    @Column("test")
    private String name;
/*...*/



}



Add this to your config to use it:
managed_classes = org.drizzle.managedclasses.ExampleRepl, ...


Then you just start your slave like this: 
java -jar replication.jar objectslave.properties


You need to put your managed classes on the classpath (drop them in the lib dir)


(See earlier posts about rabbitreplication on how to get started)


Todo:
  • Clean up configuration, quite messy right now
  • Write blogposts about how to roll your own transport/marshalling/key-value store implementations
  • Increase test suite and set up hudson for continuous integration
  • Write proper usage documentation
  • Build more backends, marshallers and transports, evolve apis
  • Write a MySQL binlog master (needs to transform mysql binlog into drizzle's protobuf based log, not even sure it is possible)
  • Create a way to not have to write code on the slave (pin tables to a hash and store it)
  • ...
Getting involved
  • Get the code, bzr branch lp:rabbitreplication
  • Use it, give me feedback (krummas@gmail.com) <- most important!
Download
http://marcus.no-ip.biz/rabbitrepl.zip (yes, i will soon set up a proper download page).




Monday, November 30, 2009

Update of RabbitMQ Replicator

The little Drizzle RabbitMQ replicator I built last week got quite popular (and even  users!) so I decided to work a bit more on it. These are the changes:

  • Moved it to a real project on launchpad (http://launchpad.net/rabbitreplication
  • Store transaction log position on master so that a transaction is never sent twice over the wire.
  • Major internal refactorings, start using google guice for IoC
  • Make the transport pluggable, it should now be easy to write your own transport and swap out rabbitmq, JMS anyone?
  • Alot more configurations for rabbitmq, see example config
User Guide:
  1. Download binaries here (or check out code: bzr branch lp:rabbitreplication)
  2. Unzip on master
  3. Edit master.properties to reflect your environment (hope the config vars are self-explanatory, let me know if there are any problems)
  4. Start the master: java -jar replication.jar master.properties
  5. Unzip on slave
  6. Edit slave.properties
  7. Start the slave: java -jar replication.jar slave.properties
  8. Watch changes replicate, report issues to me (krummas@gmail.com)
Todo/Ideas:
I'm thinking about using this code to build a framework for replicating changes from Drizzle into other storage forms, for example Hadoop/hbase or Cassandra. My thinking is that it could be useful for moving data that is very hot into a faster storage without changing your application too much. This might need business logic implemented on the "slave" (saying for example that this column should be stored and this should be ignored), and that is what I'm thinking could be built quite nicely using a framework (a DSL could be handy). 

Please let me know if this has been done somewhere or if it is a stupid idea!

Monday, November 23, 2009

Drizzle Replication using RabbitMQ as a transport

Having spent a bit of time learning how the transaction log in drizzle works (and it is incredibly easy to work with), I got an idea to use RabbitMQ as a transport. RabbitMQ is an implementation of the AMQP standard in Erlang, so it must be awesome.

It works like this; on the master, a java app is simply tailing the transaction log, sending all new transactions, in raw format, to a rabbitmq server.

The slave(s) are connected to the messaging server and are guaranteed to get the raw messages. When the slave gets the message, it transforms it to a JDBC prepared statement and executes every statement as a batch operation. The reason I use prepared statements and batch operations is that I get a lot for free from the JDBC driver, for example correct escaping of strings etc, and I can also enable the rewrite batch handler feature to get a great performance boost.

Another great thing we get for free by using rabbitmq is the fail-safety, if an exception is thrown in the slave, the message is not acknowledged and it will be retried later.

Being able to write something like this in a few hours really shows how powerful the drizzle replication system is. It will be one of the killer features.

There is one issue i really need to fix, namely that the master does not keep track of which transactions it has sent over the wire, so it will resend all transactions in the log every time it is restarted. Of course, since this was written in a short time, there are probably lots of other issues as well. If you want a real replication solution, go check out Tungsten Replicator.

If anyone wants to contribute, the code is on launchpad: https://code.launchpad.net/~krummas/+junk/rabbitmq-replication

If you simply want to try it out, get the binaries here: http://marcus.no-ip.biz/rabbitrepl.zip To start it, you do java -jar replication.jar master.properties - just make sure you edit the .properties files before starting.

Tuesday, November 17, 2009

Tungsten Replicator and Drizzle howto

Last week I got a few hours to spend on making an extractor for Tungsten Replicator which works against the Drizzle transaction log. This post aims to explain how you use it to replicate changes between drizzle instances.

To get a better understanding of the drizzle replication system, please go read the article on Jay Pipes blog, here.

Get the code
First, you need to check the code out from the tungsten sourceforge repo, like this:
svn co https://tungsten.svn.sourceforge.net/svnroot/tungsten/trunk

Then you need to download my patch, here. Unzip it in the trunk/replicator directory and apply it like this:
patch -p0 < drizzle_support.patch

This patch includes all dependencies and the applier I wrote about in the last post.

Build it
Now you need to build tungsten, change working dir to the replicator directory and write
ant

If you got test errors, you can run ant allExceptJunit to skip the tests (there are some environment configurations needed to get the test suite running).

The artifacts end up in the build/ directory.

Get and build Drizzle
Follow the instructions on the drizzle wiki, http://drizzle.org/wiki/Compiling, just make sure that you pass --with-transaction-log-plugin when you ./configure drizzle.

Start Drizzle
Follow the instructions on the drizzle wiki: http://drizzle.org/wiki/Starting_drizzled and add the parameters --transaction-log-enable --default-replicator-enable when starting drizzled, otherwise you wont get a transaction log.

Set up Tungsten Replicator
Using the binaries built before, read the instructions for MySQL on the Continuent page: http://www.continuent.com/community/tungsten-replicator/documentation

Then we need to do some changes to the configuration files, for the master (extractor), use this configuration file as a template. Note that you must change the path to the transaction log. On the slave (applier), use this config file

Now you should be good to go, start your drizzle and tungsten instances and watch the changes replicate.

Note that almost all the involved components (drizzle, drizzle-jdbc, the drizzle extractor and applier) are not recommended for production use yet.

As always, if you have any questions, shoot me an email (krummas@gmail.com) or ping me on #drizzle @ freenode

Thursday, October 29, 2009

Replication from MySQL to Drizzle using Tungsten Replicator

Last week I sat down and created a Drizzle Applier for Tungsten Replicator. This makes it possible to replicate changes from MySQL (for example) to Drizzle. An applier is used to apply changes on a slave. This blog post aims to explain how you could try it out.


  1. Download this patch
  2. Check out the latest tungsten code:
    svn co https://tungsten.svn.sourceforge.net/svnroot/tungsten/trunk/
  3. Apply the patch:
    patch -p0 < drizzle_applier.diff
    in the replicator directory
  4. Build it:
    ant
    in the replicator directory. This creates binaries in the replicator/build/ directory.
  5. Follow the instructions in the Tungsten Replicator documentation to set up the master and the slave (extractor and applier).
  6. Use this properties file as an example for your properties file on the slave (applier).
  7. Download this version of drizzle jdbc, and put it in the lib/ directory (of the extracted distribution)
  8. Start it up!
Until there is a mysqldump-converter (or similar) that can create mysql dumps that can be consumed by drizzle, it will be difficult to get a consistent starting point.

Note that this is a very early version of both the applier and the jdbc driver, so please report any bugs to me, krummas@gmail.com, attach the log file from the log directory.

The code has been submitted to Continuent and will hopefully end up in a future official version.

I'm also currently working on an extractor for drizzle - i.e. making it possible to use Drizzle as a master. Stay tuned for that within a week or so.

Monday, September 28, 2009

Drizzle JDBC 0.5

I just pushed Drizzle-JDBC 0.5 to the maven repos, changelog:
  • fixes
    • setBytes in Blob not working
    • getTypeName in ResultSetMetaData
    • create timestamp based on a date field
    • isBeforeFirst() in ResultSets should return false if it is an empty result set
    • setObject should support java.util.Date
    • timestamp bug
    • if asking for generated keys and no query was executed - return empty result set
  • Support "ON DUPLICATE KEY UPDATE" in rewrite insert batch handler
  • Handle prepared statement placeholders within quotes and comments
  • Automatically run test suite for mysql as well as drizzle
  • Strip queries from comments in prepared statements
  • Use java.nio.ByteBuffer - better performance and cleaner code
  • More reports in Hudson (history lost though, apparently no way to convert a maven-project to a freestyle-project): http://marcus.no-ip.biz/hudson/job/drizzle-jdbc-freestyle/
It is available in the public maven repository: http://repo2.maven.org/maven2/org/drizzle/jdbc/drizzle-jdbc/0.5/

Tuesday, July 14, 2009

Drizzle JDBC 0.4 - Rewrite batch handler

I just pushed up Drizzle JDBC 0.4, which, most importantly, contains a rewrite batch handler (I guess much like the one Mark Matthews announced for Connector/J a couple of months ago). It gives the expected 10x+ batch insert improvements, on my laptop I can insert 1M "simple" rows in under 20 seconds.

It is written as a plugin, so you need to do this to your Connection to enable it:
if(connection.isWrapperFor(DrizzleConnection.class)) {
DrizzleConnection dc = connection.unwrap(DrizzleConnection.class);
dc.setBatchQueryHandlerFactory(new RewriteParameterizedBatchHandlerFactory());
}

(Yes, it will be possible to configure plugins via the connection string and by setting properties). Please, try it out and report any bugs you find!

I also wrote a Scala version of the plugin, it is not bundled with the 0.4 release though - if you are interrested, it is on Launchpad. Scala is awesome, hoping to do some more Scala development in the near future.

This is the changelog for 0.4:
  • Findbug fixes
  • Internal refactorings
  • Make sure all tests pass against both MySQL and Drizzle
  • Batch handler API cleanups, details here.
  • Fix read of 16bit word (by Trond Norbye)
  • A query rewrite batch handler


Download 0.4 here.

Thursday, June 4, 2009

Pluggable Batch Update Handlers

Reading about the awesome batch insert performance blog post by Mark Matthews last week (http://www.jroller.com/mmatthews/entry/speeding_up_batch_inserts_for) got me thinking, why has this not been done before? Connector/J must be the most deployed JDBC driver in the world and batch inserts are a common use case, why hasn't the community stepped up and implemented the query rewrite feature before? Most likely because it is a complex issue that requires deep knowledge of the rest of the driver. I have been a happy Connector/J user myself for several years and never considered doing something like this.


To handle this complexity in drizzle-jdbc the batch query functionality is pluggable, i.e. you can implement a small interface and tell the connection to use that implementation. So, if anyone out there has some crazy ideas about how to improve performance of batch inserts/updates, it should be fairly easy.


First you need to implement the ParameterizedBatchHandler interface, it has two methods:

void addToBatch(ParameterizedQuery query);
int [] executeBatch(Protocol protocol) throws QueryException;


  • addToBatch is called when addBatch() is called on the PreparedStatement. I.e. when someone wants to add the current set of parameters in a prepared statement to the current batch - the query parameter contains all the information you need to make something smart.

  • executeBatch is called when executeBatch() is called on the PreparedStatement. The protocol sent to this method should be used to send the query to the server (though, you could make new connections to the server, fork up a few threads and send queries to the server in parallel).

Then, to make the connection use your handler:

Connection connection = DriverManager.getConnection("jdbc:drizzle://localhost:4427/test_units_jdbc");
if(connection.isWrapperFor(DrizzleConnection.class)) {
DrizzleConnection dc = connection.unwrap(DrizzleConnection.class);
dc.setBatchQueryHandler(VerrrryFastBatchHandler.class);
}
PreparedStatement ps = connection.prepareStatement("insert into asdf (somecol) values (?)");
ps.setString(1,"aa");
ps.addBatch();
ps.executeBatch();



The current implementation in drizzle-jdbc simply stores all queries in a list and when doing executeBatch, the queries are sent, one-by-one, to the server. I'm planning on doing a rewrite handler in the near future.

Look at these files for more information:

Monday, June 1, 2009

Drizzle-JDBC 0.3

I just pushed up 0.3 of drizzle-jdbc to the maven repository, go here to download. It will soon be synced to the official maven repository.


Changes from 0.2 to 0.3:

  • Add the services file to make the driver autoload.

  • Throw proper JDBC4 SQLExceptions.

  • Make blobs + getObject() work against new versions of drizzled.

  • Fix bug with generated keys and prepared statements.

  • Fix bug with prepared statements and adding parameters several times (fix by Trond Norbye).

  • Name the packet fetcher thread.

  • Fix bug with getSchemas() - returned columns in wrong order.

  • Rework of the packet fetching by Trond.

  • Pluggable batch query handlers (I'll blog about this soon)

Friday, May 29, 2009

JavaOne sessions

I'm attending JavaOne this year, it starts on Tuesday. These are the sessions i look forward the most to (in no particular order):


Return of the Puzzlers: Schlock and Awe (TS-5186)
Josh Bloch and Neal Gafter present a couple of interresting programming puzzles, will be fun.

Effective Java: Still effective after all these years (TS-5217)
Again, Joshua Bloch. His book "Effective Java" is one of the best programming books i've read.

Java Platform Concurrency Gotchas (TS-4863)
Concurrency is a difficult issue, this session presents issues with code examples.

Ghost in the virtual machine: A reference to references (TS-5245)
Bob is the author of google guice (see below) and from what I've heard, he is also a great speaker.

Defective Java Code: Mistakes that matter (TS-5335)
About bugs in java code, static analysis etc.

Drizzle: A new database for the cloud (TS-5410)
Drizzle is a great project, positive, helpful people.

JDBC? We don't need no stinkin' JDBC: How LinkedIn scaled with memcached, SOA and a bit of SQL. (TS-4696)
Always interesting to hear how big companies handle scalability issues really job related for me.

Introduction to Google Guice: The Java Programming language is fun again (TS-5434)
Google Guice really opened my eyes to what you can do with java, the subject of the talk is so true, it shows what java is capable of and that java can be fun. This is how modern java development should be done.

Sunday, May 10, 2009

Drizzle-JDBC in central maven repository

Now releases of Drizzle-JDBC are in the central maven repository which makes it very easy to use if you are building your projects with maven (or ant + ivy).

Just add the following dependency to your pom.xml:
<dependency>                                                                        
<groupid>org.drizzle.jdbc</groupid>
<artifactid>drizzle-jdbc</artifactid>
<version>0.1</version>
</dependency>

And you are good to go, maven will download the dependencies for you!

Note, drizzle-jdbc uses SLF4J for logging, so if you don't use that you'll need to add a dependency to one of the logging implementations as well:

<dependency>                                                                        
<groupid>org.slf4j</groupid>
<artifactid>slf4j-nop</artifactid>
<version>1.5.6</version>
</dependency>

Monday, April 27, 2009

jHeidi drizzle support

Bill Culp over at jheidi.com just released a version of jHeidi that supports Drizzle, using drizzle-jdbc, cool seeing the driver in action! Go try it out.

Tuesday, April 21, 2009

Initial MySQL support and the JDBC4 wrapper interface

It is now possible to use drizzle-jdbc to execute queries against a MySQL database. The changes needed to support this were very small, the only difference (right now) in the protocol is that the number of commands has been reduced in drizzle, and the server capabilities enum has been modified to mirror the capabilities in drizzle. In the near future drizzle will have a brand new protocol though, so I made the decision to separate the main protocol classes, and share whatever can be shared outside that class.

To use drizzle-jdbc against MySQL, check this example:

try {
Class.forName("org.drizzle.jdbc.Driver");
} catch (ClassNotFoundException e) {
throw new SQLException("Could not load driver");
}
// lets hope the big O doesnt sue me for using "thin" in the connection string:
Connection connection =
DriverManager.getConnection("jdbc:mysql:thin://slaveusr@localhost:3306/test_units_jdbc");
Statement statement = connection.createStatement();
statement.execute("INSERT INTO a VALUES (1)");


JDBC4 introduced the wrapper interface, so that a user of the JDBC driver can access vendor-specific functionality in the driver. To test it, I implemented the COM_BINLOG_DUMP command against MySQL, it works like this:

if(connection.isWrapperFor(ReplicationConnection.class))) {
ReplicationConnection rc = connection.unwrap(ReplicationConnection.class);
List<RawPacket> rpList = rc.startBinlogDump(891,"mysqld-bin.000001");
for(RawPacket rp : rpList) {
for(byte b:rp.getRawBytes()) {
System.out.printf("0x%x ",b);
}
System.out.printf("\n");
}
}

Of course, this is just a proof-of-concept, it gives you a list of the raw events in the binlog on the server. This is not something really usable, but it shows how to build vendor-specific stuff and access them through the JDBC4 wrapper interface. My plan is to parse the events from the binlog and make them available on a per-request basis so we don't need to store all events from the server binlog in-memory.

Wednesday, April 1, 2009

Note about performance in drizzle-jdbc

Ok, i don't really like to optimize early, but this made me learn using blocking queues and executor services. This blog post is very unscientific and does not provide any raw numbers, you can fetch them yourselves, look at the LoadTest test case in drizzle-jdbc. The MySQL stuff is commented out since I want the tests to pass in hudson

First, I ran a few comparisons between drizzle-jdbc against drizzle and Connector/J on MySQL, using the same test case, insert 1000 rows, read them back 1000 times. Connector/J was 5-10 times faster, then i remembered that tables in mysql have myisam as the default storage engine (doh). So, when i created the tables with InnoDB, the performance was still alot better with connector/j, but the difference was not as big (C/J was approx 50% faster).

The first optimization i did was to start fetching packets (communication between drizzle and client is done using packets, for example row packets, ok packets, error packets etc) in a background thread and putting them on a BlockingQueue. To manage the background thread I use an ExecutorService. So, whenever a packet is available on the stream, it is read back and put on the queue. The packet is a raw packet, i.e. the executor simply reads up length and stores the raw bytes in a "RawPacket" on the queue. The gain from this is that while the protocol figures out what kind of packet it is, the next packet is prepared on the queue. Look here for the code of the AsyncPacketFetcher. I also built a SyncPacketFetcher which does not use a background thread or queue, it simply creates the RawPackets directly when called.

By pushing the packet fetching to a background thread, another natural optimization would be to return the result set from the protocol with a reference to the packet fetcher before starting to read RowPackets, and only consume the queue when the user of drizzle-jdbc wants to read a row. Then the user could start processing result sets before all rows are read back from the server.

The second optimization was to start prefetching data when it is available on the inputstream. I.e. read as much as possible whenever reading, and use the local buffered bytes when read():ing. A similar technique is used in connector/j. Look here for the code.

All in all, these optimizations take drizzle-jdbc close to the performance of connector/j (not all the way though, something like 5-10% off). But I want to stay away from any more optimizing until the new protocol is implemented in drizzle. It is also likely that my configuration of my local drizzled is not as good as the configuration of the mysql server I use. All these changes takes the design of drizzle-jdbc closer to what I want when the async stuff is available in drizzle.

Saturday, March 21, 2009

Using EnumSet when communicating over the network

Many applications use bitmasks to send flags over the wire since it is a very efficient way of representing settings etc, problem is that the code to handle them is quite hard to read and maintain, on both the server and client side. To overcome this, EnumSet in Java can be used. If the flags that are masked in the value sent are represented as enums, the bitmask value can be seen as a set of enums, this concept is used in drizzle-jdbc.

For example, to represent server capabilities the following enum is used:

public enum ServerCapabilities {
LONG_PASSWORD((short)1), /* new more secure passwords */
FOUND_ROWS((short)2), /* Found instead of affected rows */
LONG_FLAG((short)4), /* Get all column flags */
CONNECT_WITH_DB((short)8), /* One can specify db on connect */
NO_SCHEMA((short)16), /* Don't allow database.table.column */
COMPRESS((short)32), /* Can use compression protocol */
ODBC((short)64), /* Odbc client */
LOCAL_FILES((short)128), /* Can use LOAD DATA LOCAL */
IGNORE_SPACE((short)256), /* Ignore spaces before '(' */
CLIENT_PROTOCOL_41((short)512), /* New 4.1 protocol */
INTERACTIVE((short)1024), /* This is an interactive client */
SSL((short)2048), /* Switch to SSL after handshake */
IGNORE_SIGPIPE((short)4096), /* IGNORE sigpipes */
TRANSACTIONS((short)8192), /* Client knows about transactions */
RESERVED((short)16384), /* Old flag for 4.1 protocol */
SECURE_CONNECTION((short)32768), /* New 4.1 authentication */
...
private final short bitmapFlag; //holds the value
ServerCapabilities(short i) {
this.bitmapFlag = i;
}

public static short fromSet(Set<ServerCapabilities> capabilities) {
short retVal = 0;
for(ServerCapabilities cap : capabilities) {
retVal = (short) (retVal | cap.getBitmapFlag());
}
return retVal;
}


public static Set<ServerCapabilities> getServerCapabilitiesSet(short i) {
Set<ServerCapabilities> statusSet = EnumSet.noneOf(ServerCapabilities.class);
for(ServerCapabilities value : ServerCapabilities.values())
if((i & value.getBitmapFlag()) == value.getBitmapFlag())
statusSet.add(value);
return statusSet;
}
}


This enum contains all capabilities of the drizzle server we are talking to. The value in the enum constructor is the bit position in the the value sent over the network. So, two convenience methods are implemented in the enum, one for taking the actual bit-masked value and creating an enum set, and one for taking an enum set and creating a bitmasked value. fromSet creates a short which contains the flags set, and getServerCapabilitiesSet creates an EnumSet which contains the enums represented by i.

Now it is possible to handle bitmasked values like this:

    short bitmask = readShortFromNetwork(); // yeah not really, but hey
Set<ServerCapabilities> serverCapabilities = ServerCapabilities.getServerCapabilitiesSet(bitmask);
if(serverCapabilities.containsAll(EnumSet.of(ServerCapabilities.TRANSACTIONS, ServerCapabilities.SSL)) {
doSomethingSecure();
}


Also, when sending something to the server:


Set<ServerCapabilities> capabilities = EnumSet.of(ServerCapabilities.TRANSACTIONS, ServerCapabilities.SSL);
writeShortToNetwork(ServerCapabilities.fromSet(capabilities));


So, what we've got is a type safe way of sending/receiving flags to/from a server. Internally, the enum set is stored in a long, so performance-wise it is close to equivalent of handling the bit-operations yourself.

Look at these files for usage examples:

DrizzleProtocol.java

ServerCapabilities.java

Thursday, March 19, 2009

Introducing Drizzle JDBC

The last couple of months I've been working on a JDBC driver for Drizzle, the driver is available here. The aim for the driver was to be lightweight and easy to develop. It is likely that Connector/J will support drizzle in the future, so the aim for this driver is to be a lightweight BSD-licensed alternative.

To build the driver, do the following (java 6 is required):

$ bzr branch lp:drizzle-jdbc
$ mvn compile

If you want to package the driver into something usable, you need to have a drizzle server on your localhost with a database called test_units_jdbc, then you can do:

$ mvn package

And you will get a .jar file in the target directory. There are also nightly builds available in hudson.

The only current dependency is slf4j, so you need to put the api-jar and one of the implementations on the classpath, check here for more help. The reason I use slf4j is that i don't want to impose a logging mechanism on the users, and I dont want to build something that scans the classpath for known logging implementations etc.

Below is an example of how to use it:

Connection connection = DriverManager.getConnection("jdbc:drizzle://localhost:4427/test_units_jdbc");
Statement statement = connection.createStatement();
statement.executeUpdate("INSERT INTO t1 (id) values (1)");

The connection string is on the form:
jdbc:drizzle://<username>:<password>@<host>:<port>/<database>

Notes about using this driver:
  • It is under heavy development and many features expected from a JDBC driver are not yet implemented. Of course, since it is very early in the development, there will be bugs!
  • Currently only Java6 is supported (JDBC4 was released with java6)
  • The internal protocols and API's will change when the new asynchronous server-API is available. This improve performance in the driver alot
  • Authentication is not supported, this is due to the fact that drizzle does not do this out-of-the-box and the way of doing the authentication is likely to change when the new protocol is in place
  • The driver reads back the full result set into memory and there is currently no way of not doing this.
  • Prepared statements are
  • It does work with hibernate and hibernate-jpa
  • It can be set up as a datasource in JBoss
  • It could, in theory, be used against MySQL as well
  • Bug reports / feature requests are needed! Also, if anyone is interrested in contributing, please do so, contact me on IRC for example, freenode, #drizzle, my nick is marcuse