Posterous
Joel is using Posterous to post everything online. Shouldn't you?
Dsc_5799_-_version_2__1__thumb
 

Tenerife Skunkworks

Boldly going where few have gone before

How to set up an ejabberd cluster on Amazon EC2 in 6 easy steps

1) Edit /etc/init.d/ejabberd

You need node=`hostname -f` since `hostname -s` does not work here.

2) Edit /etc/init.d/ejabberd

Use -name ejabberd@$node instead of -sname ejabberd everywhere. This applies to -sname ejabberdctl as well.

3) Edit /etc/init.d/ejabberd add mnesia_extra_db_nodes

See the start() function, find the line that says -detached and add the following right above

-mnesia extra_db_nodes \"[' ... hostname -f of a running node ... ']\" 

4) Remove the Mnesia db tables

cd /var/lib/ejabberd/spool && rm -f *

5) Edit /etc/ejabberd/ejabberdctl.cfg

Make sure you have this at the very end

ERLANG_NODE=ejabberd@`hostname -f`

6) Make sure your .erlang.cookie files are the same on all nodes

This will work with MySQL. Enjoy!

Filed under  //   ec2   ejabberd   erlang  
Posted February 7, 2009
// 0 Comments

Upgrading your Erlang cluster on Amazon EC2

This article describes how to upgrade an Erlang cluster in one fell swoop once you have deployed it on Amazon EC2.

Why not the Erlang/OTP upgrade procedure

The standard and sanctioned way of deploying and upgrading Erlang applications is described in chapters 10-12 of the OTP Design Principles. Calling the upgrade procedure complex is an understatement. 

Bowing to the OTP application packaging procedure, I wanted to have a way of upgrading applications with a "push of a button". More precisely, I wanted to be able to type make:all() to rebuild my application and then type sync:all() to push updated modules to all nodes in my cluster. These nodes were previously set up as diskless Amazon EC2 nodes that fetch their code from the boot server since I didn't want to reinvent the application packaging wheel.

The sync application

The principal application deployed in the cluster is the sync app. This is a gen_server set up according to chapter 2 of the OTP Design Principles. The gen_server handles requests to restart the Erlang node without shutting down and set environment variables, as well as requests upgrade the code by application or by process. Each sync gen_server joins the 'SYNC' distributed named process group and this is what enables upgrade of the whole cluster in one fell swoop. 

The sync server will invoke init:restart/0 to restart the node without shutting down upon receiving a RESTART request. This is incredibly handy since the restart sequence takes the contents of the Erlang VM to the trash can and then repeats the same steps taken by the Erlang VM when it is started from the command line. Which is to say that the VM loads the boot file from the boot server, parses the boot file, downloads the applications and runs them. If we have upgraded the code on the boot server then the Erlang VM will run new code after a restart. 

Upgrading by application or by process

The above procedure is quite intrusive since all apps running in the Erlang VM are killed. Any Erlang node will normally be running a number of apps and you may want to upgrade just one or two of them. This is where the "upgrade by application" procedure comes in. 

application:get_application/1 will give you the name of the application that a module belongs to. I build a unique list of applications that my changed modules belong to and then stop each application with application:stop/1, re-load changed modules and start the application with application:start/1

The upgrade process by process procedure first grabs a list of all processes running in the same node as the sync gen_server. It does this by calling processes(). I check whether each process is running the code in one of the modified modules using erlang:check_process_code/2. Next, I suspend affected processes with erlang:suspend_process/1, re-load changed modules with erlang:resume_process/1 and I'm done.

Reloading modules for fun and profit

I'm still not absolutely sure if I got reloading of changed modules right but it looks like this

    load_modules([]) ->
        ok;

    load_modules([Mod|T]) ->
        code:purge(Mod),
        code:soft_purge(Mod),
        {module, Mod} =  code:load_file(Mod),
        load_modules(T).

The need to call code:soft_purge/1 after code:purge/1 was determined empirically.

Everything I have described thus far is small bits of code.  The biggest chunk of code in the sync server figures out what modules were modified. 

What to reload: Inspecting module versions

Remember my original intent to run make:all/0 followed by sync:all/0 to upgrade all nodes in the cluster at the same time? It's only possible because 1) it's possible to grab the module version from a module loaded into memory, 2) it's possible to grab the same from a module on disk and, crucially, modules are not reloaded when make:all/0 is run.

The module version defaults to the MD5 checksum of the module if no -vsn(Vsn) attribute is given. For the life of me I can't remember where Module:module_info() is documented but this is what you use to grab the attributes of the module. It's a property list so you can use proplists:get_value/2 to grab the vsn property and thus the module version.

To take advantage of local processing power, the API initiating the upgrade request does no work apart from inspecting the SYNC distributed named process group and telling each sync gen_server in the group to initiate the upgrade procedure. This means that each module loaded into the Erlang node hosting the sync server needs to be checked for changes.

Grabbing the version of the BEAM file holding the code for a given module is done using  beam_lib:version/1.  This is complicated by the fact that all of the Erlang EC2 nodes in the cluster download their code from the boot server.  Normally, beam_lib:version/1 takes either a module name, a file name or a binary. 

I haven't documented why I'm not using a module name or a file name in the boot server scenario but I must have found them not to work. I had to resort to fetching the module BEAM file from the boot server and inspecting that. Fortunately, traffic between EC2 instances is free and fast and the same applies to your LAN.

To find out if a module is modified I grab the list of loaded modules with code:all_loaded/0 and inspect each module with code:is_loaded/1. I skip preloaded modules (see documentation for code:is_loaded) and use the path returned otherwise to instruct erl_prim_loader:get_file/1 to fetch the BEAM file. I then pass the file contents to beam_lib:version/1 and I have my disk version. After that it's a simple matter of comparing the two versions and reloading the module if they differ.

Filed under  //   ec2   erlang  
Posted October 13, 2007
// 0 Comments

Setting up Erlang on Amazon EC2

This article describes a project that I recently completed for a startup company. The code is proprietary and cannot be published but the company has graciously allowed me to write about my experience.

Why Erlang and Amazon EC2

There's no need to introduce the Amazon Elastic Computing Cloud (EC2) since everyone knows about it by now. In essence, EC2 allows you to rent computing power by the hour. That hour is just $0.10 which works out to about $70 per month. The virtual server that Amazon provides is called an instance. The important bit is that you are completely in control of the operating system that the instance runs and the software installed on it.

Amazon lets you run scores of instances at any given time. Major benefits are realized when EC2 instances work as a cluster, though. Think of GoogleBot, a page crawler that indexes your site's content. Such a crawler would surely benefit from being run on as many machines as possible, all indexing different pages and working in parallel. Once the crawler is finished, you can shut the machines down until next time.

Amazon does not provide tools to cluster your instances or replicate data among them. This is a task that Erlang copes with extremely well so Amazon EC2 and Erlang are a match made in haven!

How to set up Erlang on Amazon EC2

How do you start with Erlang and EC2? You need to build a Linux image that runs Erlang upon startup and automatically starts a new Erlang node. This node should then contact an existing Erlang node to join your Erlang cluster.

CEAN is a great way to set up the necessary components of Erlang on your new instance. Set up CEAN and have it install just the Erlang applications that you need. Create a script that will run Erlang when Linux starts. Make sure to adjust $HOME in this script and set $PROGNAME to start.sh in cean/start.sh. Use cean:install/1 to pull in the inets and sasl packages. You will likely need the compiler package as well.

The EC2 API lets you pass arguments to your newly started instance and these arguments can be retrieved by your Erlang code. One of the arguments you absolutely must pass is the name of an existing instance that is already part of your Erlang cluster. By connecting to Erlang running on the existing instance your new node will automatically become aware of the rest of the cluster.

Upgrading your Erlang code

The software available to your instance is normally part of your instance image. It's quite cumbersome to rebuild an image every time you deploy a software update, though. It's much better to push software updates to your instances whenever an update is available. Note that these updates need to be pushed to every instance in your Erlang cluster and reloaded every time an instance restarts. Fortunately, Erlang makes all this easy.

The boot server facility is probably one of the least documented and appreciated pieces of the Erlang infrastructure but one that comes in most handy here. A boot server enables Erlang nodes to fetch their configuration files and code from a central location, over the network. This neatly sidesteps the issue of pushing upgrades to your Erlang cluster. All you need to do is restart your instances one by one and have them fetch new software.

Note that you don't need to physically restart the EC2 instances themselves. All you need to do is tell our Erlang nodes to reboot without exiting the VM. This is done using init:restart/0.

The boot server

The Erlang boot server lives in the erlbootserver module and keeps a list of slave hosts authorized to connect to it. You can use man erlbootserver to read up on the boot server API.

The boot server will not have any hosts authorized to connect to it upon startup. A new EC2 instance that you are starting up needs to be added to the boot server slave list BEFORE you attempt to start an new Erlang node. This is easily accomplished by starting a controller node that will issue an RPC call to the boot server and add its own IP address to the boot server's slave list.

Once the controller node adds the internal Amazon instance address to the boot server authorized slave list, it can start the worker node and safely exit. Now that the boot server knows about the new slave it will allow connection and the worker node will successfully fetch its software from the boot server.

The boot server and all the slave nodes must share the same Erlang cookie. The cookie is stored in ~/.erlang.cookie. All nodes must also share the same OTP version.

So long as all the nodes are part of the same EC2 security group we should be reasonably secure that no node outside of our group will be able to make use of our boot server. This security is also aided by the requirement that all Erlang nodes in the cluster must use the same cookie to talk to each other. It's convenient to assign one of the existing instances as a boot server since it will then be within the EC2 security group.

Setting up

I use a script like this to start slave nodes. The path specified is on the boot server.

#!/bin/sh

COOKIE=RRFJBVGLSOUPFLWVEYJP
BOOT=/Users/joelr/work/erlang/sync/ebin/diskless
HOST=192.168.1.33
ID=diskless

erl -name $ID -boot $BOOT -setcookie $COOKIE -id $ID -loader inet -hosts $HOST -mode embedded ${1+"$@"}

You will also need to create a boot file which must be created with full paths inside (local option to systools:make_script/2).

To build a boot file I use these two lines of Erlang:

code:add_path("./ebin"). 
systools:make_script("diskless", [local, {outdir, "./ebin"}]).

script files are made from rel and app files. boot files are made from script files.

Note that diskless is the same boot file name that is used in the shell script above. I'm assuming that it lives in ./ebin and so I add it to the code path for make_script to find it.

If everything is done correctly your new EC2 instances will now fetch their code from the boot sever upon startup and whenever you restart Erlang nodes running on them with init:restart/0.

I may not always be convenient to pull updates from the boot server. I will describe a push facility that I implemented in another post.

Filed under  //   ec2   erlang  
Posted October 12, 2007
// 0 Comments