journal :: ejabberd

Building Ejabberd Modules - Part 4 - XMPP Bots

Aug 06, 2008 - 11:40 p.m.

Update: Now with a link to code! echo_bot.erl

Continued from Building Ejabberd Modules - Part 3 - HTTP Modules

So for this part we're going to step it up a lot, I was getting bored so here we are. I'm going to show you how to build a fast and efficient XMPP bot that lives in ejabberd. To be fair this bot is one part bot one part component and mostly tricky (thanks to the folks at RabbitMQ for giving me a lot of the ideas on how to do this). Since we don't want to get too crazy yet, we're going to build the most simple bot possible Echo bot!. This bot will simply message back to you everything you send it.

To discuss basic strategies, we're going to use the register_route function in ejabberd to build what is essentially a new virtual host. This will take all traffic heading to x.example.com and pass it through this module. Anyone who has written or looked at components in other languages should be familiar with this. This is the internal function used to register components.

Doing this gives us a lot of power that regular bots don't have. Component style bots aren't required to have rosters, don't have to deal with restrictions on identity, but they come at a cost. We'll be managing our own presence and subscription state, as well as a slew of other nigly details. That being said, the total comes out to 130 lines so it can't be that bad.

The first step will be to pull out our trusty tools from the previous articles. gen_mod and gen_server.


-module(echo_bot).
-behavior(gen_server).
-behavior(gen_mod).

-export([start_link/2]).

-export([start/2,
         stop/1,
         init/1,
         handle_call/3,
         handle_cast/2,
         handle_info/2,
         terminate/2,
         code_change/3]).

-export([route/3]).

-include("ejabberd.hrl").
-include("jlib.hrl").

-define(PROCNAME, ejabberd_mod_bot).
-define(BOTNAME, echo_bot).

The only new part here is that we're going to expose one extra function "route/3" this is the function that will be passed to ejabberd to handle traffic coming to our bot. I also defined a couple of macros that will save us some headache later.

next we'll define the gen_server and gen_mod callbacks


start_link(Host, Opts) ->
    Proc = gen_mod:get_module_proc(Host, ?PROCNAME),
    gen_server:start_link({local, Proc}, ?MODULE, [Host, Opts], []).

start(Host, Opts) ->
    Proc = gen_mod:get_module_proc(Host, ?PROCNAME),
    ChildSpec = {Proc,
        {?MODULE, start_link, [Host, Opts]},
        temporary,
        1000,
        worker,
        [?MODULE]},
    supervisor:start_child(ejabberd_sup, ChildSpec).

stop(Host) ->
    Proc = gen_mod:get_module_proc(Host, ?PROCNAME),
    gen_server:call(Proc, stop),
    supervisor:terminate_child(ejabberd_sup, Proc),
    supervisor:delete_child(ejabberd_sup, Proc).

init([Host, Opts]) ->
    ?DEBUG("ECHO_BOT: Starting echo_bot", []),
    % add a new virtual host / subdomain "echo".example.com
    MyHost = gen_mod:get_opt_host(Host, Opts, "echo.@HOST@"),
    ejabberd_router:register_route(MyHost, {apply, ?MODULE, route}),
    {ok, Host}.

So here's where I apologize to anyone who hasn't played with erlang/OTP before, cause there's some vodoo magic going on here. So let's take it slow.

start/2 - we define a childspec. A childspec is the format of tupple that an OTP supervisor expects in order to start new children. What we're doing here is taking our bot and telling ejabberd to make sure that it stays up. After this stage even if our bot crashes, it will be restarted by ejabberd.

stop/1 - we reverse the processes in start/1, we kill our prices and we remove it from supervision.

init/1 - here we first create a new host (used for virtual hosting in ejabberd) this will be the domain that data passes through, so in this case "echo".yourdomain.com. Then we register a new route with ejabberd, giving it the host to route, and the function and module to call with each incoming packet.

We finish the callback with a bunch of boring. And then we go onto the meat of the problem. The routing.


handle_call(stop, _From, Host) ->
    {stop, normal, ok, Host}.

handle_cast(_Msg, Host) ->
    {noreply, Host}.

handle_info(_Msg, Host) ->
    {noreply, Host}.

terminate(_Reason, Host) ->
    ejabberd_router:unregister_route(Host),
    ok.

code_change(_OldVsn, Host, _Extra) ->
    {ok, Host}.

Our first step is to handle presence and subscriptions. Since we don't really care to block anyone we'll completely skip the whole building a roster bit, and simply send back a "subscribed" response. And since we only want other clients to know we're online we are always available.


route(From, To, {xmlelement, "presence", _, _} = Packet) ->
    case xml:get_tag_attr_s("type", Packet) of
        "subscribe" ->
            send_presence(To, From, "subscribe");
        "subscribed" ->
            send_presence(To, From, "subscribed"),
            send_presence(To, From, "");
        "unsubscribe" ->
            send_presence(To, From, "unsubscribed"),
            send_presence(To, From, "unsubscribe");
        "unsubscribed" ->
            send_presence(To, From, "unsubscribed");
        "" ->
            send_presence(To, From, "");
        "unavailable" ->
            ok;
        "probe" ->
            send_presence(To, From, "");
        _Other ->
            ?INFO_MSG("Other kind of presence~n~p", [Packet])
    end,
    ok;

So as far as routing goes, handling presence is pretty easy. As you can see we only use one helper function send_presence/3, and even that's very straight forward.


send_presence(From, To, "") ->
    ejabberd_router:route(From, To, {xmlelement, "presence", [], []});

send_presence(From, To, TypeStr) ->
    ejabberd_router:route(From, To, {xmlelement, "presence", [{"type", TypeStr}], []}).
We reuse ejabberd routing tools to simply push presence elements back at the user as our bot. But simply handling presence wouldn't be very much fun, so we might also want to handle messages.

route(From, To, {xmlelement, "message", _, _} = Packet) ->
    case xml:get_subtag_cdata(Packet, "body") of
    "" ->
        ok.
    Body ->
        case xml:get_tag_attr_s("type", Packet) of

"error" ->
            ?ERROR_MSG("Received error message~n~p -> ~p~n~p", [From, To, Packet]);
        _ ->
            echo(To, From, strip_bom(Body))
        end
    end,
    ok.

echo(From, To, Body) ->
    send_message(From, To, "chat", Body).

send_message(From, To, TypeStr, BodyStr) ->
    XmlBody = {xmlelement, "message",
           [{"type", TypeStr},
        {"from", jlib:jid_to_string(From)},
        {"to", jlib:jid_to_string(To)}],
           [{xmlelement, "body", [],
         [{xmlcdata, BodyStr}]}]},
    ejabberd_router:route(From, To, XmlBody).

To handle messages, we ignore all messages with empty bodies, raise errors on those that are error messages and call our function "echo" on those that don't meet those requirements. Where once again, echo simply takes advantage of ejabberd's route function and some simple xml construction.

we have one last helper function


strip_bom([239,187,191|C]) -> C;
strip_bom(C) -> C.

Which is used to strip the BOM or Byte Order Mark from the beginning of the body.

There we go, install the echo_bot just like the modules described in previous sessions. And you have a fast, lightweight, customizable xmpp bot. And hopefully in the next week I'll have a post that details how you can use that to make extremely powerful xmpp bots using tools like RabbitMQ.

Tags: |

Building Ejabberd Modules - Part 3 - HTTP Modules

Jul 31, 2008 - 12:44 p.m.

Update: Now with a link to code! mod_http_hello_world

Continued from Building Ejabberd Modules - Part 1 - Compiling Erlang and Building Ejabberd Modules - Part 2 - Generic Modules

One of the easy ways to get access to data hiding within ejabberd is to use an HTTP module. These modules are easy to write, and with a little bit of poking around ejabberd internals an easy way to get much of the data you want out. For our purposes today we'll be building an HTTP module to expose a JSON list of all the users currently logged into the server.

Before getting too complex we'll want to start with a simple example, simply being able to display something at a given URL. We'll begin by building off of the example from Part 2, by creating a basic gen_mod template.


-module(mod_http_hello_world).
-author('Anders Conbere').
-vsn('1.0').

-define(EJABBERD_DEBUG, true).

-behavior(gen_mod).

-export([
    start/2,
    stop/1,
    process/2
    ]).

-include("ejabberd.hrl").
-include("jlib.hrl").
-include("ejabberd_http.hrl").

start(_Host, _Opts) ->
    ok.

stop(_Host) ->
    ok.

process(_Path, _Request) ->
    "Hello World".

As you can see we've added the function process/2 from the example in part 2. Process/2 is the function that will handle HTTP calls, from the ejabberd HTTP server, the first argument of which is a list, which defines the path of the URL that will be handled. For example http://example.com/this/cool/article would be matched with process(["this", "cool", "article"], Request), or if you wanted to capture data from the URL http://example.com/articles/puppies/2 (the second page of the puppies article) you can define process(["articles", Article, Page], Request) which will result in the Article and Page variable being populated. For our purposes however we want the most simple definition, the catch all.

next you'll want to compile your new module


erlc -I /var/lib/ejabberd/include -pa /path/to/ejabberd/src mod_http_hello_world.erl
# note: the -I specifies an include path, necessary for finding .hrl files

# the -pa adds a new look up path, necessary for finding gen_mod.erl

# note2: the current lib path for ejabberd is changing, in newer version

# if can be found at /lib/ejabberd

mv mod_http_hello_world.beam /var/lib/ejabberd/ebin

Finally you'll want to configure ejabberd to start your module. So opening up ejabberd you're looking for the {listen, []} configuration, and then find the one for port 5280. This is currently the default port that ejabberd listens for HTTP requests on, and is used to serve the Admin tool. By default it should look something like this:


{5280, ejabberd_http, [http_poll, web_admin]}

We'll be updating that to include our little module by making it look like this:


{5280, ejabberd_http, [http_poll, web_admin,
                      {request_handlers, [{["hello_world"], mod_http_hello_world}
                                         ]}]}

Which of course looks like a bunch of gobbelty gook. But what we've done is registered a new request handler, specified a root location to listen on (so in this case our module will respond on http://example.com:5280/hello_world) and specified a handler for that request. Restarting the server now, and opening that URL should return a page that says "Hello World".

That's all well and good, but it's not very interesting. We want to be able to get some useful data out of our system. So copy mod_http_hello_world.erl to mod_http_registered_users.erl and modify the -module() directive accordingly. And then we'll update the process() function. As is the trick with most things in ejabberd, this sort of stuff is easy once you know how. So as it turns out ejabberd_auth has a handy function dirty_get_registered_users/0 which returns a list of registered users on your server in the format [{username, server}], for those wondering I didn't just know about that function, I did a little bit of grepping around the source code to dig it up.

The only thing left to do is to reformat the data returned from dirty_get_registered_users and put it into your process function.


process(_Path, _Request) ->
    [Username ++ "@" ++ Server || {Username, Server} <- ejabberd_auth:dirty_get_registered_users()].

Finally we can repeat the steps of compiling the module and configuring it, and now you should be able to expose a list of users to the web (okay so maybe that's a bad idea). But more importantly these same techniques can be used to build REST api's for creating new users, building openID/oAuth endpoints, for creating interactive web pages that talk to ejabberd, and more.

Tags: |

Rabbiter - Open Federated Pubsub Server

Jul 25, 2008 - 1:14 p.m.

Recently I was taking a look at RabbitMQ to do some of the queuing I need for a system I work on at Fidgt. At the time when I looked at it, AMQP seemed obtuse and unnecessarily complex, and I went with whipping up my own simple queue implementation in erlang. This as it turns out is very easy, though I'm sure my design is naive. But this post isn't about that, and in fact I'm pretty sure there's a good use case for building very simple, context specific queues in erlang simple because of the reduction in overhead, both conceptually and in terms of performance. This post is about a series of events that have lead up to the release of a neat server and XMPP bot by the folks at RabbitMQ.

Not long after I abandoned RabbitMQ (because I've consistently found the complexity of enterprise queues to be beyond the scopes of my projects), my attention was drawn back to it because of an interesting release they made that flitted through the XMPP community. They had just released an XMPP component that allowed access to their queue through xmpp. That was even more interesting to me (it's like they were tailoring these releases to sell this to me!). I read a bit about what they were doing, thought it was pretty awesome and decided to file it away for a future project and then promptly forgot about RabbitMQ again. Now last week they've done it again, and we reach the point of this post.

The folks there at RabbitMQ have now worked to release an interesting side project they built using those two products called Rabbiter. Rabbiter is a small jabber bot / component that works on top of RabbitMQ, the XMPP Queue Transport and EjabberD to build a twitter like XMPP service. Pretty quickly a bunch of my friends and I were using it to do MUC style chat and to see how it all worked. Though again, what's more interesting than this service is the way they built it.

Not too long ago I did some consulting for a company in LA building a twitter like service. They had two primary concerns. First, how do you build a service like this such that it scales, what are the pain points, what problems does twitter have and why? and Second, how do we use XMPP to make this better, what options to do have in terms of real time communication, and what are the tradeoffs. I wont even begin to claim I had all those answers, and really I think I learned more from all the folks there than I taught them. But there are some key things to take away.

First is to recognize that what twitter is doing is pubsub, people publish content, people subscribe to recieve updates about that content. Publish - Subscribe. Pubsub isn't a content problem, it's a communication problem, one that's already well understood, and has good solutions to handle. Second is recognizing that the obvious conclusion from the previous point, is that relational databases will never scale for these kinds of queries. Pubsub would dictate rather that each user has an inbox of updates, that exist entirely for themselves. Queries to retrieve the updates for the user can thus be sequential reads and are extremely fast.

So where do you start? You start with a message queue. Message queues, already have very efficient routing mechanisms and often support one to many publication right off the bat. Built to scale, to distribute their work load, and to quickly and efficiently pass messages, these systems will help you handle the incoming load, while giving you the framework to parse and store the messages in ways that allow for the second bit - Inbox Delivery. For inbox delivery you want a storage solution that will scale across disks, be cacheable, and have extremely efficient read access.

There are of course flaws in this style of architecture as well. Rebuilding the inboxes is extremely costly. There's extra work to be done when looking to attach semantic value to messages and running queries across the whole system. And probably plenty more that exceed what I'm here to talk about. But the result is an extremely efficient messages passing and parsing architecture, capable of scaling to extreme influxes of messages while maintaining high availability.

So why am I excited about Rabbiter (it should be clear by now). Because rabbiter wraps up in a single, easy to install package many of the pieces necessary to create large decentralized pubsub systems, in a scalable, efficient architecture. There are only a couple of more pieces to fill in. For inbox storage there's a good case to be made for Scalaris or CouchDB. Especially since write speed is much less critical that efficient partitioning and read speed. Memcache to cache those results, and then anything could be used to produce the front end, simply relying api's exposed to the backend storage.

Also there are some clear ways forward for federating rabbiter through XMPP nodes, and since that system works through Push and not polling, there's a real chance you could build very large federated systems capable of publish large amounts of content all around the world while staying up.

Tags: |

Building Ejabberd Modules - Part 2 - Generic Modules

Jul 17, 2008 - 5:34 p.m.

Continued from Building Ejabberd Modules - Part 1 - Compiling Erlang

Ejabberd uses a plugin system based on Erlang modules. All Ejabberd Plugins implement the "gen_mod" behavior (or the gen_mod interface if not explicitly evoking the behavior). Behaviors are simple methods within Erlang to protect an interface, which can be used (as is the case with many of the OTP behaviors) to encapsulate a certain set of functionality.

The gen_mod behavior required by all Ejabberd modules is exceptionally simple, and is defined in the ejabbed module documentation


   start(Host, Opts) -> ok
   stop(Host) -> ok
   * Host = string()
    Opts = [{Name, Value}]
 Name = Value = string() 
   

Note: The module name must match the filename. (Thanks Damon!)

A sample ejabberd module would then look something like this:


   -module(my_module).
   -author("Anders Conbere").
   -behavior(gen_mod).
   -include("ejabberd.hrl").

-export([start/2, stop/1]).

start(_Host, Opt) -> ok. stop(Host) -> ok.

Most of the time however you'll be implementing a set of particular callbacks to match up with the requirements of internal processes. As an example when making HTTP modules, you'll need to export the process function which captures an HTTP request, and returns an HTTP response. The set of callbacks for a custom auth module are slightly more complex, but are still rather straight forward.

So with a little bit of editing we could turn the example above into a completely useless but installable module. To do that let's first edit the start function to output a debug message.


   start(_Host, _Opt) -> 
       ?DEBUG("EXAMPLE MODULE LOADING", []).
   

We'll now want to save and compile our file (let's call it mod_first_module.erl). To compile the module run this command.


   $ erlc -I ~/path/to/ejabberd/src mod_first_module.erl
   

Of if you're running ejabberd trunk


   $ erlc -I ~/path/to/ejabberd/include mod_first_module.erl
   

This will build the file mod_first_module.beam a byte-compiled Erlang file capable of being run on the Erlang beam interpreter. And finally we'll want to put this file on a path that can be sourced by ejabberd. For most people that will mean linking it into the ejabberd bin directory.


   /var/lib/ejabberd/ebin $ ln -s /path/to/mod_first_module.beam
   

And last but not least you need to tell Ejabberd to start your module when it begins. So we'll edit the ejabberd.cfg file and add to the modules config


   {modules,
    [
     {mod_adhoc,    []},
     ...
     {mod_first_module, []},% to send options to your module populate the [] list
     ...
    ]
   }
   

now when we restart our ejabbed server (making sure that the loglevel is set to 5), we should see


   =INFO REPORT==== 2008-07-17 15:33:27 ===
   D(<0.37.0>:ejabberd_auth_my_auth:44) : EXAMPLE MODULE LOADING
   

in the module loading phase.

And there you have it, pretty much the most basic ejabberd module you could possibly install.

Tags: |

Building Ejabberd Modules - Part 1 - Compiling Erlang

Jul 16, 2008 - 10:33 p.m.

These posts presume some basic familiarity with Erlang/OTP, if you're not familiar with them, this might be an interesting peak into how very simple systems built in erlang look and behave, but you might be best served by learning Erlang first. To do that I heartily recommend Joe Armstrong's book Programming Erlang: Software for a Concurrent World.

To start with the basics I should say something about what Ejabberd is and does. Ejabberd is an XMPP server written in erlang, designed for high availability with great hooks and plugins for extending the basic server. My first step into extending the behavior of Ejabberd begain with writing packet filters and has since had me mucking about in a few other places. Most recently I designed a custom authentication module for my friends at Geni, and I wanted to talk a little about the process because I feel it illustrates some important concepts and tools to use when writing extension to Ejabberd.

So to start off you'll need to checkout and compile ejabberd on your own machine, as well as gather up an prerequisites that haven't been installed. In ubuntu you'll need to install build_essentials and possibly some other packages like libexpat.


$ svn co http://svn.process-one.net/ejabberd/trunk ejabberd

Then we'll want to enter into the src directory and run


$ ./configure
$ make
$ sudo make install

In OS X look out for libexpat not being on the search path (you can add it there by using ./configure --with-expat=/opt/local if you're using mac ports).

This will install the ejabberdctl and ejabberd links into your /sbin directory as well as install the code into /var/lib/ejabberd, that will be important later for a quick shortcut to success.

Tags: | |

Distribution Erlang and XMPP

Jul 15, 2008 - 2:30 a.m.

These last few months I've found myself in an increasingly interesting position. The work I've been doing in XMPP has landed me in jobs I enjoy, building tools that are increasingly complex and more and more often, delve into the guts of what makes XMPP tick.

The end result has been spending most of the last half year doing some pretty serious programming in Erlang. From distributing python worker processes to ejabberd HTTP modules to do OpenId, route packets to databases, a memcached client, and custom auth modules.

There's been a lot of talk about Erlang lately, some of it has been good some of it's been bad, and a lot of it's been stupid. I'm not going to pretend to know everything about all the different concurrency systems, or the granular differences between scala, Erlang, gambit scheme.

I know that I've always had difficulty with the Java platform (This is a personal failing that maybe in the coming years I can fix), I can't keep track of what version I need (J2EE, JRE, JDK, SDK, JKMLZ-what?), and things never seem to work like I expect. So that left scala out, not to mention a number of much smarter people than I am have expressed some concerns about the purity of the concurrency model in scala. It seems that the biggest argument in favor of solutions on the JVM is the large sets of libraries available for it, which is probably reasonable give some of my issues trying to get Erlangs gd library to work, but seems to be much less of an issue depending on the domain you're working in.

I also know that I've spent a lot of time programming python, and I love python for what it does, but twisted is devil spawn. It's not the callback style programing that bothers me, or the great deferred object, it's all the other crap that's been piled in. It's the lack of python style, and the way it inches it's way into the rest of the code using it, and turns it into unreadable mush. I much prefer the model that Kamaelia has chosen which is actually quite similar to the message passing style of Erlang.

Unfortunately... both of these python solutions are still single threaded, don't have great support to go across machines or networks, and provide none of the tools that Erlang/otp do for managing systems of multiple processes (like apps and supervisors).

Now Erlang is not without it's warts, I do have to spend some time fixing syntax errors from time to time, it could probably do with better library support (in particular with database drivers since this is actually in its domain space). OTP is really complex, and learning how all the pieces work and fit together is taxing, hard, and at least for me bent my mind. And lastly I find that immutable variables reduces some of the easy code-reuse you can do in ruby or python. But even these have caveats, the syntax might be overly verbose and require line endings, but it's very consistent and can be picked up quickly especially for anyone who has done any functional programming. OTP might be complex and difficult to learn initially, but the functionality that it provides is amazing, the applications fit together smoothly, and they 'Just work', it's truly an incredible piece of engineering. And lastly code-reuse, I don't have a could answer for this, but I suspect it has to do with meta-programming.

So I'm hoping to have some time in the next couple weeks to write a couple little pieces about how to make some new modules for ejabberd and use Erlang to make the best of it.

Tags: | |