Monday, December 28, 2009

Drizzle replication to WebSockets

I just pushed a WebSocket applier to RabbitReplication, and yes, it is as crazy as it sounds. It works pretty much like all appliers - it consumes drizzle transactions from  RabbitMQ, converts them into objects by inspecting annotations, marshalls the object to JSON, and then stores the JSON string. In this case it stores it to a set of websockets. RabbitReplication is deployed as a war file to Jetty 7.01 which supports websockets.


I set the demo up on my server at home (in Sweden) on a DSL line, so it might be slow, but it should show the idea, all operations are instant when latency is low (if anyone wants to host it at a better place, please let me know). Of course, it requires a WebSocket capable browser and the only one I know of is Google Chrome.

It works like this:

  1. INSERT is executed from the "drizzle client" webapp - totally separate webapp that uses drizzle jdbc to insert/update/delete data.
  2. Drizzle stores the transaction in the database and in the transaction log.
  3. Master extractor extracts the transaction and publishes it to RabbitMQ
  4. Slave applier consumes the transaction from RabbitMQ
  5. Applier transforms the transaction to JSON
  6. Applier writes the JSON to a set of websockets
  7. Javascript voodoo is performed to make it visible
Possible real usecases
The demo app just shows what is possible, but a real use case could be that someone has a drizzle backed forum and want to add some real time post-updates to some front page somewhere. This would be real easy, simply start a new slave configured for WebSocket application (of course RabbitReplication is already used for other replication needs :) ), convert the JSON to something that makes sense and they are set! If someone has a cool usecase, please let me know and i'll build a more realistic demo app!

Wednesday, December 16, 2009

Better replication from drizzle to cassandra



Introduction
This article describes how one of the replication appliers work in rabbitreplication, namely the HashTransformer which transforms each INSERT/UPDATE into a hashmap which is then stored in a column-family based storage, currently Cassandra. For a better overview of RabbitReplication, go check out earlier posts on the subject here: http://developian.blogspot.com


Configuration
This example replicates changes done to a table called test1 in the schema called unittests. RabbitReplication is configured to only replicate the columns id and test (yes, good example, I know...). The column id is used as a key. The following slave configuration is used for this use case:


replication_role = hashslave


rabbitmq.host = 10.100.100.50
rabbitmq.queuename = ReplicationQueue
rabbitmq.exchangename = ReplicationExchange
rabbitmq.routingkey = ReplicationRoutingKey
rabbitmq.password =
rabbitmq.username =
rabbitmq.virtualhost =
hashstore.host = localhost:9160
hashstore.type = cassandra


hashreplicator.replicate.unittests.test1 = id,test
hashreplicator.key.unittests.test1 = id
hashreplicator.keycolseparator.unittests.test1 = .


The hashreplicator rows are the interresting ones, they describe what columns to replicate, what columns are the primary key and what separator to use between the columns when the key is multi column.


Example
Replicating an insert:
drizzle> use unittests
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A


Database changed
drizzle> desc test1;
+---------+-------------+------+-----+---------+-------+
| Field   | Type        | Null | Key | Default | Extra |
+---------+-------------+------+-----+---------+-------+
| id      | int         | NO   | PRI | NULL    |       | 
| test    | varchar(10) | YES  |     | NULL    |       | 
| ignored | varchar(10) | YES  |     | NULL    |       | 
+---------+-------------+------+-----+---------+-------+
3 rows in set (0.02 sec)


drizzle> insert into test1 (id, test) values (300, "firstins");
Query OK, 1 row affected (0 sec)


Results in the following on the Cassandra side:
cassandra> get unittests.test1['300']
  (column=test, value=firstins; timestamp=1260985298391)
  (column=id, value=300; timestamp=1260985298385)
Returned 2 rows.


Update:
drizzle> update test1 set test = "updated" where id = 300;
Query OK, 1 row affected (0 sec)
Rows matched: 1  Changed: 1  Warnings: 0


Gives this in cassandra:
cassandra> get unittests.test1['300']
  (column=test, value=updated; timestamp=1260985526210)
  (column=id, value=300; timestamp=1260985298385)
Returned 2 rows.


Note that the timestamp for the id column is not updated (only changes are updated, not entire rows).


Delete:
drizzle> delete from test1 where id = 300;
Query OK, 1 row affected (0 sec)
And in Cassandra:
cassandra> get unittests.test1['300']
Returned 0 rows.


That is it, go to http://launchpad.net/rabbitreplication to check out the code, report bugs or suggest features!

Thursday, December 10, 2009

Cassandra support in rabbitreplication

Just pushed support for replicating into cassandra to http://launchpad.net/rabbitreplication

The following format is used:
KeySpace = schema name from the transaction
ColumnFamily = table name from the transaction
Column name = "object" since we only store objects
Key = the key generated from the object to store, either by using the @Id annotation or by implementing the KeyAware interface

In the CLI you would type something like this to get the data (drizzle schema name is unittests and table is test1):


cassandra> get unittests.test1['1']['object']
==> (name=object, value={"name":"updated","ssn":1}; timestamp=1260472768425)

Monday, December 7, 2009

Drizzle persistence in Project Voldemort

I just built drizzle support into Project Voldemort:
http://github.com/krummas/voldemort - just add drizzle in your stores.xml and it should work

Basically just a cutnpaste of the mysql code, uses drizzle-jdbc (which is included in the git repo).

To try it out, check out the code from github, execute "ant release" in the base dir and you get binaries in the dist directory.

Sunday, December 6, 2009

Replication from drizzle to memcached / project voldemort

The last few days I've been working on a way to replicate changes from drizzle into a key value store, currently project voldemort and memcached. It is built in my rabbit replication project which means that the transactions are transfered over a message bus (rabbitmq currently). The picture below describes an example of how the involved components could be set up (not likely that you want both memcached and project voldemort though):




Current feature list of rabbitreplication:

  • Replication from drizzle into drizzle (or any database with a JDBC driver) / memcached / project voldemort.
  • Map inserts and updates onto java objects using annotated classes (see below for example).
  • Two different ways of marshalling objects, JSON ond Java object serialization
  • Full control over how the key is generated (just implement the KeyAware interface in your target object)
  • Simple interface to build new marshallers. 
  • Simple interface to build new object stores.
  • Simple interface to build new transports. (Will blog these extension points later)
Example:
The class below will catch any statements on the table unittests.test1 and take the column "id" and set it on the ssn field, and it will take the "test" column and set it on the name field. It will use the field annotated with @Id as key in the store and use the JSONMarshaller to marshal the object.




@Entity(schema = "unittests", table = "test1", marshaller = JSONMarshaller.class)
public class ExampleRepl {
    @Id
    @Column("id")
    private int ssn;


    @Column("test")
    private String name;
/*...*/



}



Add this to your config to use it:
managed_classes = org.drizzle.managedclasses.ExampleRepl, ...


Then you just start your slave like this: 
java -jar replication.jar objectslave.properties


You need to put your managed classes on the classpath (drop them in the lib dir)


(See earlier posts about rabbitreplication on how to get started)


Todo:
  • Clean up configuration, quite messy right now
  • Write blogposts about how to roll your own transport/marshalling/key-value store implementations
  • Increase test suite and set up hudson for continuous integration
  • Write proper usage documentation
  • Build more backends, marshallers and transports, evolve apis
  • Write a MySQL binlog master (needs to transform mysql binlog into drizzle's protobuf based log, not even sure it is possible)
  • Create a way to not have to write code on the slave (pin tables to a hash and store it)
  • ...
Getting involved
  • Get the code, bzr branch lp:rabbitreplication
  • Use it, give me feedback (krummas@gmail.com) <- most important!
Download
http://marcus.no-ip.biz/rabbitrepl.zip (yes, i will soon set up a proper download page).