Recently I was taking a look at RabbitMQ to do some of the queuing I need for a system I work on at Fidgt. At the time when I looked at it, AMQP seemed obtuse and unnecessarily complex, and I went with whipping up my own simple queue implementation in erlang. This as it turns out is very easy, though I'm sure my design is naive. But this post isn't about that, and in fact I'm pretty sure there's a good use case for building very simple, context specific queues in erlang simple because of the reduction in overhead, both conceptually and in terms of performance. This post is about a series of events that have lead up to the release of a neat server and XMPP bot by the folks at RabbitMQ.
Not long after I abandoned RabbitMQ (because I've consistently found the complexity of enterprise queues to be beyond the scopes of my projects), my attention was drawn back to it because of an interesting release they made that flitted through the XMPP community. They had just released an XMPP component that allowed access to their queue through xmpp. That was even more interesting to me (it's like they were tailoring these releases to sell this to me!). I read a bit about what they were doing, thought it was pretty awesome and decided to file it away for a future project and then promptly forgot about RabbitMQ again. Now last week they've done it again, and we reach the point of this post.
The folks there at RabbitMQ have now worked to release an interesting side project they built using those two products called Rabbiter. Rabbiter is a small jabber bot / component that works on top of RabbitMQ, the XMPP Queue Transport and EjabberD to build a twitter like XMPP service. Pretty quickly a bunch of my friends and I were using it to do MUC style chat and to see how it all worked. Though again, what's more interesting than this service is the way they built it.
Not too long ago I did some consulting for a company in LA building a twitter like service. They had two primary concerns. First, how do you build a service like this such that it scales, what are the pain points, what problems does twitter have and why? and Second, how do we use XMPP to make this better, what options to do have in terms of real time communication, and what are the tradeoffs. I wont even begin to claim I had all those answers, and really I think I learned more from all the folks there than I taught them. But there are some key things to take away.
First is to recognize that what twitter is doing is pubsub, people publish content, people subscribe to recieve updates about that content. Publish - Subscribe. Pubsub isn't a content problem, it's a communication problem, one that's already well understood, and has good solutions to handle. Second is recognizing that the obvious conclusion from the previous point, is that relational databases will never scale for these kinds of queries. Pubsub would dictate rather that each user has an inbox of updates, that exist entirely for themselves. Queries to retrieve the updates for the user can thus be sequential reads and are extremely fast.
So where do you start? You start with a message queue. Message queues, already have very efficient routing mechanisms and often support one to many publication right off the bat. Built to scale, to distribute their work load, and to quickly and efficiently pass messages, these systems will help you handle the incoming load, while giving you the framework to parse and store the messages in ways that allow for the second bit - Inbox Delivery. For inbox delivery you want a storage solution that will scale across disks, be cacheable, and have extremely efficient read access.
There are of course flaws in this style of architecture as well. Rebuilding the inboxes is extremely costly. There's extra work to be done when looking to attach semantic value to messages and running queries across the whole system. And probably plenty more that exceed what I'm here to talk about. But the result is an extremely efficient messages passing and parsing architecture, capable of scaling to extreme influxes of messages while maintaining high availability.
So why am I excited about Rabbiter (it should be clear by now). Because rabbiter wraps up in a single, easy to install package many of the pieces necessary to create large decentralized pubsub systems, in a scalable, efficient architecture. There are only a couple of more pieces to fill in. For inbox storage there's a good case to be made for Scalaris or CouchDB. Especially since write speed is much less critical that efficient partitioning and read speed. Memcache to cache those results, and then anything could be used to produce the front end, simply relying api's exposed to the backend storage.
Also there are some clear ways forward for federating rabbiter through XMPP nodes, and since that system works through Push and not polling, there's a real chance you could build very large federated systems capable of publish large amounts of content all around the world while staying up.