Monday, November 30, 2009

Update of RabbitMQ Replicator

The little Drizzle RabbitMQ replicator I built last week got quite popular (and even  users!) so I decided to work a bit more on it. These are the changes:

  • Moved it to a real project on launchpad (http://launchpad.net/rabbitreplication
  • Store transaction log position on master so that a transaction is never sent twice over the wire.
  • Major internal refactorings, start using google guice for IoC
  • Make the transport pluggable, it should now be easy to write your own transport and swap out rabbitmq, JMS anyone?
  • Alot more configurations for rabbitmq, see example config
User Guide:
  1. Download binaries here (or check out code: bzr branch lp:rabbitreplication)
  2. Unzip on master
  3. Edit master.properties to reflect your environment (hope the config vars are self-explanatory, let me know if there are any problems)
  4. Start the master: java -jar replication.jar master.properties
  5. Unzip on slave
  6. Edit slave.properties
  7. Start the slave: java -jar replication.jar slave.properties
  8. Watch changes replicate, report issues to me (krummas@gmail.com)
Todo/Ideas:
I'm thinking about using this code to build a framework for replicating changes from Drizzle into other storage forms, for example Hadoop/hbase or Cassandra. My thinking is that it could be useful for moving data that is very hot into a faster storage without changing your application too much. This might need business logic implemented on the "slave" (saying for example that this column should be stored and this should be ignored), and that is what I'm thinking could be built quite nicely using a framework (a DSL could be handy). 

Please let me know if this has been done somewhere or if it is a stupid idea!

Monday, November 23, 2009

Drizzle Replication using RabbitMQ as a transport

Having spent a bit of time learning how the transaction log in drizzle works (and it is incredibly easy to work with), I got an idea to use RabbitMQ as a transport. RabbitMQ is an implementation of the AMQP standard in Erlang, so it must be awesome.

It works like this; on the master, a java app is simply tailing the transaction log, sending all new transactions, in raw format, to a rabbitmq server.

The slave(s) are connected to the messaging server and are guaranteed to get the raw messages. When the slave gets the message, it transforms it to a JDBC prepared statement and executes every statement as a batch operation. The reason I use prepared statements and batch operations is that I get a lot for free from the JDBC driver, for example correct escaping of strings etc, and I can also enable the rewrite batch handler feature to get a great performance boost.

Another great thing we get for free by using rabbitmq is the fail-safety, if an exception is thrown in the slave, the message is not acknowledged and it will be retried later.

Being able to write something like this in a few hours really shows how powerful the drizzle replication system is. It will be one of the killer features.

There is one issue i really need to fix, namely that the master does not keep track of which transactions it has sent over the wire, so it will resend all transactions in the log every time it is restarted. Of course, since this was written in a short time, there are probably lots of other issues as well. If you want a real replication solution, go check out Tungsten Replicator.

If anyone wants to contribute, the code is on launchpad: https://code.launchpad.net/~krummas/+junk/rabbitmq-replication

If you simply want to try it out, get the binaries here: http://marcus.no-ip.biz/rabbitrepl.zip To start it, you do java -jar replication.jar master.properties - just make sure you edit the .properties files before starting.

Tuesday, November 17, 2009

Tungsten Replicator and Drizzle howto

Last week I got a few hours to spend on making an extractor for Tungsten Replicator which works against the Drizzle transaction log. This post aims to explain how you use it to replicate changes between drizzle instances.

To get a better understanding of the drizzle replication system, please go read the article on Jay Pipes blog, here.

Get the code
First, you need to check the code out from the tungsten sourceforge repo, like this:
svn co https://tungsten.svn.sourceforge.net/svnroot/tungsten/trunk

Then you need to download my patch, here. Unzip it in the trunk/replicator directory and apply it like this:
patch -p0 < drizzle_support.patch

This patch includes all dependencies and the applier I wrote about in the last post.

Build it
Now you need to build tungsten, change working dir to the replicator directory and write
ant

If you got test errors, you can run ant allExceptJunit to skip the tests (there are some environment configurations needed to get the test suite running).

The artifacts end up in the build/ directory.

Get and build Drizzle
Follow the instructions on the drizzle wiki, http://drizzle.org/wiki/Compiling, just make sure that you pass --with-transaction-log-plugin when you ./configure drizzle.

Start Drizzle
Follow the instructions on the drizzle wiki: http://drizzle.org/wiki/Starting_drizzled and add the parameters --transaction-log-enable --default-replicator-enable when starting drizzled, otherwise you wont get a transaction log.

Set up Tungsten Replicator
Using the binaries built before, read the instructions for MySQL on the Continuent page: http://www.continuent.com/community/tungsten-replicator/documentation

Then we need to do some changes to the configuration files, for the master (extractor), use this configuration file as a template. Note that you must change the path to the transaction log. On the slave (applier), use this config file

Now you should be good to go, start your drizzle and tungsten instances and watch the changes replicate.

Note that almost all the involved components (drizzle, drizzle-jdbc, the drizzle extractor and applier) are not recommended for production use yet.

As always, if you have any questions, shoot me an email (krummas@gmail.com) or ping me on #drizzle @ freenode