Tenerife Skunkworks

Boldly going where few have gone before

How to set up an ejabberd cluster on Amazon EC2 in 6 easy steps

1) Edit /etc/init.d/ejabberd

You need node=`hostname -f` since `hostname -s` does not work here.

2) Edit /etc/init.d/ejabberd

Use -name ejabberd@$node instead of -sname ejabberd everywhere. This applies to -sname ejabberdctl as well.

3) Edit /etc/init.d/ejabberd add mnesia_extra_db_nodes

See the start() function, find the line that says -detached and add the following right above

-mnesia extra_db_nodes \"[' ... hostname -f of a running node ... ']\" 

4) Remove the Mnesia db tables

cd /var/lib/ejabberd/spool && rm -f *

5) Edit /etc/ejabberd/ejabberdctl.cfg

Make sure you have this at the very end

ERLANG_NODE=ejabberd@`hostname -f`

6) Make sure your .erlang.cookie files are the same on all nodes

This will work with MySQL. Enjoy!

Filed under  //   ec2   ejabberd   erlang  

Optimizing Erlang: Array vs Tuple

This is the first in my series of posts on optimizing Erlang. I plan to tackle optimizing Mnesia, profiling and scalability.

You need to use arrays of up to 10,000 elements. Erlang offers you tuples as well as fixed-size and extensible pseudo-arrays. What is the fastest option?

Let us start the death match by pitting arrays against tuples in a death match. Trees were an option before the array module became available, so lets throw in trees just for laughs.

I'm running Mac OSX Leopard 10.5.4 on a Mac Pro 2x2.8Ghz Quad-Core Intel Xeon with 14Gb 800Mhz DDR2 FB-DIMM.

Erlang (BEAM) emulator version 5.6.3 [source] [64-bit] [smp:8] [async-threads:0] [kernel-poll:false]

27> arr:test().
Fixed-size array: get:     2921us, set:     5902us
Extensible array: get:     3336us, set:     8144us
Tuple:            get:      632us, set:   107467us
Tree:             get:     4321us, set:    45256us
ok

30> arr:test(100000).
Fixed-size array: get:    35314us, set:    74653us
Extensible array: get:    35349us, set:    74059us
Tuple:            get:     6411us, set: 24304490us
Tree:             get:    53681us, set:   632795us
ok

Note that timer:tc returns time in microseconds. I ran each test 3 times and the results above are from the third iteration.

Trees in Erlang (gb_trees) are built on top of regular tuples and so is the array module. The array module is much more efficient about using tuples than a regular tree, though, and this is the reason why it's so much faster.

The tuple test pre-allocates a tuple of 10k or 100k elements. There's no destructive assignment in Erlang and so the same large tuple needs to be allocated and discarded on every set operation. It's very inefficient to allocate and discard a large tuple on every set operation, thus naive tuple set is very slow.

The array module uses an efficient tree-like internal representation:

%% A tree is either a leaf, with LEAFSIZE elements (the "base"), an
%% internal node with LEAFSIZE+1 elements, or an unexpanded tree,
%% represented by a single integer: the number of elements that may be
%% stored in the tree when it is expanded. The last element of an
%% internal node caches the number of elements that may be stored in
%% each of its subtrees.
%%
%% Note that to update an entry in a tree of height h = log[b] n, the
%% total number of written words is (b+1)+(h-1)*(b+2), since tuples use
%% a header word on the heap. 4 is the optimal base for minimizing the
%% number of words written, but causes higher trees, which takes time.
%% The best compromise between speed and memory usage seems to lie
%% around 8-10. Measurements indicate that the optimum base for speed is
%% 24 - above that, it gets slower again due to the high memory usage.
%% Base 10 is a good choice, giving 2/3 of the possible speedup from
%% base 4, but only using 1/3 more memory. (Base 24 uses 65% more memory
%% per write than base 10, but the speedup is only 21%.)

It's far more efficient to allocate small tuples on every set and this is why the array module wins hands down. 

Use the code below to replicate my results on your hardware.

    

Filed under  //   erlang   performance  

Mnesia Unlimited

Mnesia is the Erlang embbedded distributed DBMS, that supports high scalability and fault tolerance through replication. Mnesia has been used to great success in all kinds of applications but it's not without limitations. These limitations mainly stem from the underlying ETS and DETS mechanisms used to implement Mnesia tables. 

There have been attempts to hack around Mnesia limitations before, e.g I wrote a Mnesia backend for Amazon S3 just last summer. All these attempts to add external table functionality to Mnesia suffered from being hard to extend, or were closed source implementations. 

Thanks to the Dukes of Erl, I finally got a chance to do a proper Mnesia extension, the one to rule them all! The result is a series of about 50 careful git patches that support all the features of regular Mnesia tables such as indexing, replication, distribution, fragmentation and backup, as well as set or bag semantics. Best of all, this functionality is available to anyone using OTP release R11B5. 

The external table API is supplied as an Erlang behavior so all you need to do is supply a callback module that conforms to it. 

You can now use Mnesia as a front-end to disk-based hash tables like SleepyCat/BerkeleyDB or Tokyo Cabinet, a memory-mapped table or MySQL.

You are no longer constrained by the scalability or size limitations of ETS and DETS so go wild and let me know what you come up with. Oh, and please petition the Erlang/OTP team to include this work in the official Erlang distribution!

Mnesia 4.3.5 with the extension mechanism is available from Google Code and so is the Tokyo Cabinet external table.

Filed under  //   erlang   hacks   mnesia  

Parsing text and binary files with Erlang

Erlang originated in the telecommunications industry where one of the major tasks is conversion of text and binary data from one format to another. This is a task that Erlang excels at! 

Parsing text and binary data is something that you will be doing very often in the course of writing your super-scalable internet servers so lets take a look at some efficient approaches to parsing text and binary data.

Strings vs binaries

Erlang does not have a built-in string data type. Strings are simulated on top of lists of integers. In a 32-bit Erlang virtual machine (VM) an integer is 4 bytes and we need 4 more bytes for a pointer to the next element of the list, for a total of 8 bytes per "character". In 64-bit VM this number doubles.

Why should you care?

Each network connection to your servers will require certain amount of memory to send and receive data. A lot of protocols used on the internet, such as XML, are text and quite verbose at that. Imagine receiving a 10 kilobyte XML message, for example. Converting this message to a string for processing will inflate its size to 80K or 160K respectively.

When network connections to your server number in the thousands, it becomes necessary to minimize the amount of memory each connection requires for processing data that is sent and received. Any received message will also become garbage once it's converted to an Erlang data structure and this garbage will need to be collected. The less garbage we generate, the less work the garbage collector has to do and the more responsive our server will become.

Lets keep binary data we receive from the network as binary data and avoid converting it to strings. Parsing of binary data is specially fast and convenient with special syntax for constructing binaries and matching binary patterns as well as bit strings and binary comprehensions. String processing enjoys no such advantage.

Remember that all Erlang input and output functions can deal with binary data. Any program that sticks to binary data processing will work much faster than a similar program that converts binary data to strings for processing!

That said, lets hammer a final nail into the Erlang string coffin and look at how we can process text as binary data.

Processing text files as binaries

Suppose we have a comma-delimited text file to parse. We need to split each line into a list of fields and collect our lines into a list. The file is not too large so we can afford to load it into memory in one fell swoop. This is our code in a nutshell.

    -module(act).
    -compile([export_all]).

    parse(Filename) when is_list(Filename) ->
        {ok, Bin} = file:read_file(Filename),
        parse(Bin).

    parse(Bin) when is_binary(Bin) ->
        parse(Bin, [], [], []).

Note that the empty lists are the initial values for our accumulators. Field is the list of characters that represents the current field we are processing. Line is the list of fields we have gathered while processing the current line. Finally, Acc is the list of lines we have gathered while processing our file.

We also have two functions named parse here: one that takes a Filename string (a list of integers) and another that takes a binary. Keeping the function name the same  and using guards is strictly a matter of taste. I could have just as well called the second parse function parse1 or do_parse

Erlang functions are distinguished based on their number of arguments and there's a special fun_name/num_args notation to reflect that. Our parse function above would be written as parse/1. When the number of arguments is the same, the functions are distinguished based on guards such as is_list or is_binary above.

It's good coding practice to present a neat interface to the outside world. parse/1 above would serve that purpose and *parse/4* below would not. parse/1 takes just one argument whereas parse/4 clutters the interface with three extra arguments. As a user of the parsing module I would not know the purpose of the Field, Line and Acc arguments to parse/4 below.

    parse(<<$\,, Rest/binary>>, Field, Line, Acc) ->
        parse(Rest, [], [lists:reverse(Field)|Line], Acc);

$\, matches the tab character and Rest matches the rest of the binary. It's quite fast to add to the beginning of the list but the accumulated list needs to be reversed when we are done. Also, we almost certainly have accumulated a field by the time we hit the tab. This is why we reverse the field accumulator, prepend it to the accumulated list of fields and start our next iteration (recurse) with an empty field accumulator.

    parse(<<$\r, Rest/binary>>, Field, Line, Acc) ->
        parse(Rest, Field, Line, Acc);

Just in case we have been given a file produced on Windows, we skip the carriage return character ($\r) and start on our next recursive iteration.

    parse(<<$\n, Rest/binary>>, Field, Line, Acc) ->
        Field1 = lists:reverse(Field),
        FieldList = [Field1|Line],
        parse(Rest, [], [], [FieldList|Acc]);

A new line ($\n) means that we have hit the end of our current line and need to start processing a new one. We start this processing with empty field and line accumulators and prepend the list of fields to the lines accumulator Acc.

    parse(<<Char, Rest/binary>>, Field, Line, Acc) ->
        parse(Rest, [Char|Field], Line, Acc);

We prepend any character not matching our field or line delimiters to the field accumulator and keep going.

    parse(<<>>, [], [], Acc) ->
        {ok, lists:reverse(Acc)};

We do run out of binary data to process at some point in time and detect this fact by matching the empty binary <<>>. If our field and line accumulators are empty then we are done. We do need to reverse the list of lines that we have accumulated to return them in their original order.

    parse(<<>>, Field, Line, Acc) ->
        parse(<<$\n>>, Field, Line, Acc).

What do we do if our accumulators are not empty by the time we are done processing? We can add custom processing code, of course, but wouldn't it be better to leverage code that we have already written? We already go through the proper motions when we find a new line so to make our job easier we make it look like we found one. We continue parsing by creating a small fake binary with a single new line character.

Processing binary data the hard way

    You did not misread it! Yes, the customary way of processing Erlang binaries is the hard way. It's low level and involves lots of typing and a good deal of code duplication. It's also the fastest and most efficient way.

I will show you a less efficient but more structured way to process binary data later in this chapter. We need to learn to walk before we learn to run so lets take a look at how you normally process binary data in Erlang.

Here's a chunk of the binary protocol that my OpenPoker server uses. 

Packet format:

    0  1   2        N
    +--+---+--- ... +
    | Size | Body   |
    +------+--- ... +

Body:

    0      1           N
    +------+--- ... ---+
    | Type | Arguments |
    +------+--- ... ---+

Each packet starts with a 2-byte packet size then a 1-byte packet type and the data payload. The body of a NOTIFY_JOIN command will then look like this:

    0    1     5     9      10    12
    +----+-----+-----+-------+-----+
    | 21 | GID | PID | Seat# | Seq |
    +----+-----+-----+-------+-----+

GID and PID are 4-byte integers, the seat number is a byte and the sequence number a 2-byte integer. We need one function to read this command from a binary packet and return something easy to deal with, e.g. a tuple.

    read(<<?PP_NOTIFY_JOIN, GID:32, PID:32, SeatNum, Seq:16>>) ->
        {21, GID, PID, SeatNum, Seq};

To send the command out through the socket, we first need to convert the tuple to a binary.

    write({21, GID, Player, SeatNum, Seq})
    when is_number(GID),
         is_pid(Player),
         is_number(SeatNum),
         is_number(Seq) ->
        PID = gen_server:call(Player, 'ID'),
        <<21, GID:32, PID:32, SeatNum, Seq:16>>;

There are scores of commands in the OpenPoker protocol and the number will grow as new functionality is added. I did write code like the above for each and every command in the OpenPoker protocol and I wish I knew a way that enabled more code reuse.

What you should walk away with here is that reading and writing binary data in Erlang is simple and straightforward. 

Pickler combinators

The OpenPoker protocol handling code is quite verbose. The protocol is also very much flat as it does not involve nested data structures. There is a way to describe reading and writing of structured data and generally save ourselves time and typing.

Andrew Kennedy coined the term pickler combinator in his 2004 'Functional Pearl of the same name. He wrote that 

The tedium of writing pickling and unpickling functions by hand is relieved using a combinator library similar in spirit to the well-known parser combinators. Picklers for primitive types are combined to support tupling, alternation, recursion, and structure sharing.

Andrew Kennedy's implementation used SML and is an Erlang book. We are still still functional programmers, though, so let's see what we can do...

    -module(pickle).

    -export([pickle/2, unpickle/2, test/0]).
    -export([byte/0, short/0, sshort/0, 
       int/0, sint/0, long/0, slong/0]).
    -export([list/2, choice/2, optional/1, wrap/2,
       tuple/1, record/2, binary/1, wstring/0]).

    -compile([export_all]).

Let's design and implement a pickling module that will save us a lot of typing down the road. The goal is to save us a lot of typing and enable us to describe our packet formats in terms of bytes, words and strings, as opposed to bits and binaries. 

    %%% Pickle and unpickle. We accumulate into a list.

    pickle({Pickler, _}, Value) ->
        lists:reverse(Pickler([], Value)).

    unpickle({_, Pickler}, Bin) ->
        element(1, Pickler(Bin)).

pickle and unpickle are responsible for doing the work for us and like a good manager they delegate the bulk of the work to their underlings. Note that the pickler combinator is represented by a two-element tuple where the first element is the function used for pickling and the second element for unpickling.

To pickle any Erlang term, we give pickle the tuple representing the pickler combinator as well as the value. Binaries, while convenient, don't lend themselves to accumulating values so we accumulate the pickled data into a list. Fortunately for us, the Erlang input/output system can take lists and convert them to binaries for us.

To unpickle a binary we invoke the un-pickler on it and take the first element of the resulting tuple. Why does the unpickler return a tuple? We may have data left over from processing and it needs to be stored somewhere. The first element of the tuple stores the result of the unpickle operation and the second stores the remainder of the data.

Without further ado lets implement a pickler combinator for serializing byte data.

    %%% Byte

    byte() -> 
        {fun write_byte/2, fun read_byte/1}.

    write_byte(Acc, Byte) -> 
        [<<Byte:8>>|Acc].

    read_byte(Bin) -> 
        <<Byte:8, Rest/binary>> = Bin,
        {Byte, Rest}.

    byte is the name of the combinator and byte/0 simply returns a tuple of the pickler and unpickler functions. This is the pattern that we will be using over and over again. Also, picklers take the list serving as accumulator as their fist argument and the Erlang value as their second argument. 

To pickle a byte we simply tell Erlang to prepend the binary representation of the byte <<Byte:8>> to the accumulator list. To unpickle a byte, read_byte/1 splits the binary into the byte itself and the remainder of the data and returns both as a tuple.

Simple, isn't it?

A pickler combinator for unsigned short values stored in little-endian format looks like this.

    %%% Unsigned short

    short() -> 
        {fun write_short/2, fun read_short/1}.
     
    write_short(Acc, Word) -> 
        [<<Word:16/little>>|Acc].

    read_short(Bin) -> 
        <<Word:16/little, Rest/binary>> = Bin,
        {Word, Rest}.

  Signed short is implemented similarly, the only difference is that we use little-signed instead of little.

    %%% Signed short

    sshort() -> 
        {fun write_sshort/2, fun read_sshort/1}.

    write_sshort(Acc, Word) -> 
        [<<Word:16/little-signed>>|Acc].

    read_sshort(Bin) -> 
        <<Word:16/little-signed, Rest/binary>> = Bin,
        {Word, Rest}.

  I will skip the implementation of signed and unsigned integers and long integers. You can find the full code at the end of this book. It's much more interesting to ponder the design of a list combinator.

    %%% List. We supply a pickler for list length 
    %%% as well as a pickler for list elements.

    list(Len, Elem) ->
        {fun(Acc, List) -> write_list(Len, Elem, Acc, List) end, 
         fun(Bin) -> read_list(Len, Elem, Bin) end }.

  To pickle and unpickle a list we need to pickle the length of the list and then pickle each element of the list in sequence. The only requirement for the picker and unpickler functions is to take certain arguments and return values in the format that the library expects.

list/2 returns a tuple of anonymous functions that follow this convention. The functions that are ultimately invoked look a bit different.    

    write_list({Len, _}, {Elem, _}, Acc, List) ->
        Acc1 = Len(Acc, length(List)),
        Fun = fun(A, Acc2) -> Elem(Acc2, A) end,
        lists:foldr(Fun, Acc1, List).

  write_list/4 takes both the pickler for the length of the list and the pickler for each element. Remember that each pickler returns a list of pickled binary chunks. The length pickler itself is primed with the list of previously pickled chunks given to write_list/4 (Acc). Acc1 is the list of binary chunks resulting from the pickling of the list length (Len) together with the list of chunks accumulated before the call to write_list/4

I won't be explaining lists:foldr/3 but we are creating an anonymous function (Fun) that will be invoked for each element of the to-be-pickled list we are given (List). This anonymous function will then invoke the list element pickler and return the list of binary chunks we have pickled so far, to prime list:foldr/3 on its next iteration.

Reading a pickled list aka unpickling is, again, a two step operation.

    read_list({_, Len}, {_, Elem}, Bin) ->
        {N, Bin1} = Len(Bin),
        read_list(N, [], Elem, Bin1).

  First we unpickle the list length. This gives us both the number of elements to read and the remainder of our binary data. We then proceed to unpickle the list itself using the data that was left over.

    read_list(0, Acc, _, Bin) -> {Acc, Bin};
    read_list(N, Acc, Elem, Bin) ->
        {E, Bin1} = Elem(Bin),
        read_list(N - 1, [E|Acc], Elem, Bin1).

  There are no for loops in Erlang but they are easily emulated with recursion. We unpickle each element of the list, accumulate it and proceed to unpickle the next element after decrementing the number of elements read. We finish once we have reached 0.

Often, we will need to store different data depending on some flag. This requires us to both store the value of the flag and store the data. The choice combinator implements this alternative selection.

    %%% Alternative selection. This could probably use some
    %%% deeper thinking. Otherwise, we take a pickler for the tag
    %%% as well as a tuple of two functions. The first one
    %%% returns the tag value and a pickler based on the supplied
    %%% value. The second one selects a pickler based on a tag value.

    choice(Tag, Choice) ->
        {fun(Acc, Value) -> write_choice(Tag, Choice, Acc, Value) end,
         fun(Bin) -> read_choice(Tag, Choice, Bin) end }.

  There's not much to say about choice itself. It looks a lot like the list combinator that we have just implemented.

    write_choice({Tag, _}, {Choice, _}, Acc, Value) 
      when is_function(Tag), 
           is_function(Choice) ->
        {T, {Pickler, _}} = Choice(Value),
        Acc1 = Tag(Acc, T),
        Pickler(Acc1, Value).
  
The choice helper function (selector) analyzes the data we are pickling and returns both a tag to store and the combinator to use for the data. We serialize the tag and the value right after it.

    read_choice({_, Tag}, {_, Choice}, Bin) 
      when is_function(Tag), 
           is_function(Choice) ->
        {T, Bin1} = Tag(Bin),
        {_, Pickler} = Choice(T),
        Pickler(Bin1).

  We work in the opposite direction to unpickle. We first unpickle the tag and let the choice function (selector) give us the combinator to use for the data. We use this combinator to unpickle the data.

    %%% Optional value. Use 'none' to indicate no value.

    optional(Pickler) ->
        {fun(Acc, Value) -> write_optional(Pickler, Acc, Value) end,
         fun(Bin) -> read_optional(Pickler, Bin) end}.
        
    write_optional(_, Acc, none) ->
        [<<0>>|Acc];

    write_optional({Pickler, _}, Acc, Value) ->
        Pickler([<<1>>|Acc], Value).

    read_optional({_, Pickler}, Bin) ->
        <<Opt:8, Bin1/binary>> = Bin,
        case Opt of 
          0 -> {none, Bin1};
          _ -> Pickler(Bin1)
        end.

  To serialize optional values (i.e. value or nothing) we pickle 0 when there's no value and 1 otherwise.

The value of the wrapper combinator may not be readily apparent but bear with me, it will come in handy to implement serialization of enumerations!

    %%% Wrapper. Take a pickler and a wrapper tuple of two functions
    %%% where the first one is used to convert the value before 
    %%% pickling and the second one after unpickling.

    wrap(Wrap, Pickler) ->
        {fun(Acc, Value) -> write_wrap(Wrap, Pickler, Acc, Value) end,
         fun(Bin) -> read_wrap(Wrap, Pickler, Bin) end}.

    write_wrap({Wrap, _}, {Pickler, _}, Acc, Value) ->
        Pickler(Acc, Wrap(Value)).

    read_wrap({_, Wrap}, {_, Pickler}, Bin) ->
        {Value, Bin1} = Pickler(Bin),
        {Wrap(Value), Bin1}.
  
This combinator resembles the list combinator we implemented previously but instead of taking a combinator used to serialize the list length, it takes a helper function used to transform the data before pickling and after unpickling.

Erlang does not support enumerations but I want to have enumerations where values increase sequentially such as {cow, sheep, horse}. I also want to be able to explicitly assign a value to each element of the enumeration, e.g. [{cow, 10}, {sheep, 100}]. Finally, the whole point of the exercise is to be able to marshal these enumerations back and forth. 

We will assume that enumerated values given as a tuple start from 1. 

    enum(Enum, Pickler) ->
        wrap(wrap_enum(Enum), Pickler).

  The enum/2 combinator takes both the format of the enumeration (list or tuple) as well as the pickler for the enumeration value. It's clear in its simplicity but it's also clear that it's hiding something!

What is wrap_enum/1 and why do we need it?

    wrap_enum(Enum) 
      when is_tuple(Enum) ->
        wrap_enum_1(prep_enum_tuple(Enum));

    wrap_enum(Enum) 
      when is_list(Enum) ->
        wrap_enum_1(prep_enum_list(Enum)).

  Recall that the format of our enumeration can be given both as a tuple and a list. We will pre-process the enumeration format to convert it to a list when it's given to us as a tuple.

    prep_enum_tuple(Enum)
      when is_tuple(Enum) ->
        prep_enum_tuple(Enum, size(Enum), [], []).

    prep_enum_tuple(_, 0, Acc1, Acc2) ->
        {Acc1, Acc2};

    prep_enum_tuple(Enum, N, Acc1, Acc2) ->
        prep_enum_tuple(Enum, N - 1, 
            [{element(N, Enum), N}|Acc1],
            [{N, element(N, Enum)}|Acc2]).

  The above will convert {cow, sheep, horse} into a pair (tuple of two elements) of lists [{cow, 1}, {sheep, 2}, {horse, 3}] and [{1, cow}, {2, sheep}, {3, horse}]. We need the regular list to convert cow into 1 for pickling and the inverted list to convert 1 back into cow.

    prep_enum_list(Enum) 
      when is_list(Enum) ->
        % expect a list of {tag, #value} pairs
        Inv = fun({Key, Val}) -> {Val, Key} end,
        InvEnum = lists:map(Inv, Enum),
        {Enum, InvEnum}.

  The only thing we need to do when the format of the enumeration is a list is to create the inverted list of pairs. The anonymous function Inv swaps {cow, 100} for {100, cow}. We use lists:map/2 to apply our anonymous function to each element of the enumeration specification list, thus inverting each element. 

And here's the ultimate wrapper, the function that we spent so much time gearing up to!

    wrap_enum_1({List1, List2}) ->
        F = fun(A, B) -> A < B end,
        %% gb_trees needs an ordered list
        Dict1 = lists:sort(F, List1),
        Dict2 = lists:sort(F, List2),
        Tree1 = gb_trees:from_orddict(Dict1),
        Tree2 = gb_trees:from_orddict(Dict2),
        {fun(Key) -> gb_trees:get(Key, Tree1) end,
         fun(Key) -> gb_trees:get(Key, Tree2) end}.

The gb_trees module implements a lightweight hash table using Erlang tuples. We order our enumeration lists and convert them to gb_trees to make lookups simple. We save two trees, one to look up cow using one and another to look one up using cow. 

This completes our processing of enumerations. There's one last hill to climb and then I promise you that we will be in combinator nirvana! But first we need to handle serialization of tuples and Erlang records. 

    %%% Tuple. Uses a tuple of picklers of the same size.

    tuple(Picklers) 
      when is_tuple(Picklers) ->
        wrap({fun tuple_to_list/1, 
        fun list_to_tuple/1}, 
       tuple_0(tuple_to_list(Picklers))).

  We reuse the wraping code we just develop to ensure that each tuple is pickled as a list and that each list is converted back to a tuple after we deserialize it.

    record(Tag, Picklers) 
      when is_tuple(Picklers) ->
        wrap({fun(Record) -> record_to_list(Tag, Record) end,
        fun(List) -> list_to_record(Tag, List) end}, 
       tuple_0(tuple_to_list(Picklers))).

  We rely on Erlang records being tuples and just add the record tag as the first element when unpickling the record.

    write_tuple_0([], Acc, _) ->
        Acc;

    write_tuple_0([{Pickler, _}|Rest], Acc, [Value|Tuple]) ->
        write_tuple_0(Rest, Pickler(Acc, Value), Tuple).

    read_tuple_0(Picklers, Bin) ->
        read_tuple_0(Picklers, Bin, []).

    read_tuple_0([], Bin, Acc) ->
        {lists:reverse(Acc), Bin};

    read_tuple_0([{_, Pickler}|Rest], Bin, Acc) ->
        {Value, Bin1} = Pickler(Bin),
        read_tuple_0(Rest, Bin1, [Value|Acc]).

  We serialize a tuple by requiring a tuple of combinators of the same length.We then convert the tuple of combinators to a list to simplify processing. We pickle and unpickle recursively, using the accumulator idiom.

    %%% It's convenient to be able to convert the tuple
    %%% to a list first as there's no erlang:prepend_element/2.

    tuple_0(Picklers) 
      when is_list(Picklers) ->
        {fun(Acc, Value) -> write_tuple_0(Picklers, Acc, Value) end,
         fun(Bin) -> read_tuple_0(Picklers, Bin) end}.

    record_to_list(Tag, Record) 
      when is_atom(Tag) ->
        lists:nthtail(1, tuple_to_list(Record)).

    list_to_record(Tag, List) 
      when is_atom(Tag), 
           is_list(List) ->
        list_to_tuple([Tag|List]).

    Erlang records are regular tuples where the first element of the tuple is the record tag. When pickling a record we convert the record to a list and drop the first element (tag). When unpickling we prepend the tag. We never store the tag itself since the record combinator knows it.

We'll skip the implementation of the binary combinator since it's nothing fancy. You can find the full code at the end of the book.

Now is a good time to tackle testing.

Erlang macro facilities could definitely be better but they are also nothing to sneeze at! Lets define a couple of macros to help us with writing our unit tests.

    -define(error1(Expr, Expected, Actual),
      io:format("~s is ~w instead of ~w at ~w:~w~n",
          [??Expr, Actual, Expected, ?MODULE, ?LINE])).

    -define(match(Expected, Expr),
            fun() ->
        Actual = (catch (Expr)),
        case Actual of
            Expected ->
          {success, Actual};
            _ ->
          ?error1(Expr, Expected, Actual),
          erlang:error("match failed", Actual)
        end
      end()).

  We will be using match/2 a lot to make sure that whatever we pickle matches whatever we get after unpickling.

    check(Pickler, Value) ->
        X = pickle(Pickler, Value),
        Bin = list_to_binary(X),
        unpickle(Pickler, Bin).

  This little function is our testing workhorse. It takes a pickler combinator and a value and proceed to pickle a value and unpicle it using the same combinator. 

    test1() ->
        X = 16#ff,
        ?match(X, check(byte(), X)).

    test3() ->
        X = -1,
        ?match(X, check(sshort(), X)).

    This is what our tests look like. We give check a combinator and a value and use our match macro to check results. I'll skip a few non-essential tests since you have the full source code at the end of the book. I do want to spend time on other tests, though, since these tests server as the manual and documentation for our pickler combinator library.

    test8() ->
        X = "Wazzup!",
        ?match(X, check(list(int(), byte()), X)).

  We serialize the length of the list as an integer and each elment as a byte. Erlang strings are nothing but lists of integers but ASCII character values fit within a byte. There's nothing preventing you from using the int combinator to serialize elements of the list, though.

    value2tag(Action) 
      when is_list(Action) ->
        {0, list(byte(), byte())};

  We want to serialize either a list of bytes where the length of the list is also stored as a byte or a long value. We use 0 as a tag for the list.

    value2tag(_) ->
        {1, long()}.

  And store 1 when the following value is a long.

    tag2value(0) ->
        list(byte(), byte());

    tag2value(1) ->
        long().

  We reverse things when unpickling. If a tag value of 0 is found then we return the list combinator and otherwise return the long one.

    selector() ->
        {fun value2tag/1, fun tag2value/1}.

  selector simply combines the above tag convertors together.

    test9() ->
        X1 = "Just testing",
        X2 = 16#ffff,
        ?match(X1, check(choice(byte(), selector()), X1)),
        ?match(X2, check(choice(byte(), selector()), X2)).

  We need to make sure that pickling and unpickling both the string (list) and the long value works. And it does!

    test10() ->
        X1 = none,
        X2 = 55,
        ?match(X1, check(optional(byte()), X1)),
        ?match(X2, check(optional(byte()), X2)).

  We use none to stand for no value. 

    test11() ->
        %% tuple enum
        Enum1 = {cow, sheep, horse},
        {FROM1, TO1} = wrap_enum(Enum1),
        ?match(1, FROM1(cow)),
        ?match(2, FROM1(sheep)),
        ?match(3, FROM1(horse)),
        ?match(cow, TO1(1)),
        ?match(sheep, TO1(2)),
        ?match(horse, TO1(3)),
        %% list enum
        Enum2 = [{cow, 20}, {sheep, 30}, {horse, 40}],
        {FROM2, TO2} = wrap_enum(Enum2),
        ?match(20, FROM2(cow)),
        ?match(30, FROM2(sheep)),
        ?match(40, FROM2(horse)),
        ?match(cow, TO2(20)),
        ?match(sheep, TO2(30)),
        ?match(horse, TO2(40)).

  Personally, I found the enumeration combinator the most tedious to describe, probably because there's so much supporting code. That supporting code needs to be tested and we do it here. wrap_enum works like a charm!

    test12() ->
        Enum1 = {cow, sheep, horse},
        Enum2 = [{cow, 20}, {sheep, 30}, {horse, 40}],
        ?match(cow, check(enum(Enum1, byte()), cow)),
        ?match(sheep, check(enum(Enum2, byte()), sheep)).

    Once we know our supporting code works, we can proceed with the rest. Here we test enumerations given as a tuple and a list of key/value pairs. Farm animals galore!

    test13() ->
        Tuple = {"Joel", 16#ff00, none},
        Spec = {list(byte(),byte()), short(), optional(byte())},
        ?match(Tuple, check(tuple(Spec), Tuple)).

  Tuple combinators, remember those? We use a tuple of combinators to pickle a tuple of values. The size of both tuples must be the same but the values can be anything. Here we use a string, a short and an optional byte.

    -record(foo, { a, b }).
    -record(bar, { c, d }).
    -record(baz, { e, f }).

  Records are important to an Erlang programmer, even though they are syntactic sugar on top of tuples. I would be quickly left without hair to tear out, if I always had to use element/2 to access tuple elements by number.

We define a we records for the purposes of testing.

    test14() ->
        R = #baz{ e = 30, f = "Enough nesting!" },
        R1 = #foo{ a = 10, b = #bar{ c = 20, d = R }},
        Pickler = record(foo, {
               byte(),
               record(bar, {
            int(),
            record(baz, {
               sshort(),
               list(byte(), byte())
              })
                 })
              }),
        ?match(R1, check(Pickler, R1)).

  Here's the mother of all tests where we pickle a bunch of nested records with mixed values and restore them back to their original glory. This concludes our presentation of pickler combinators. Please feel free to use them as you see fit and don't forget to send me any ideas for improvement!

The last thing to note is that the pickler combinator approach is definitely less efficient than processing binaries directly. Still, it is suitable for lots of applications where programming productivity is more important than ultimate efficiency.

Full pickler combinator source code is available.

Filed under  //   erlang  

Upgrading your Erlang cluster on Amazon EC2

This article describes how to upgrade an Erlang cluster in one fell swoop once you have deployed it on Amazon EC2.

Why not the Erlang/OTP upgrade procedure

The standard and sanctioned way of deploying and upgrading Erlang applications is described in chapters 10-12 of the OTP Design Principles. Calling the upgrade procedure complex is an understatement. 

Bowing to the OTP application packaging procedure, I wanted to have a way of upgrading applications with a "push of a button". More precisely, I wanted to be able to type make:all() to rebuild my application and then type sync:all() to push updated modules to all nodes in my cluster. These nodes were previously set up as diskless Amazon EC2 nodes that fetch their code from the boot server since I didn't want to reinvent the application packaging wheel.

The sync application

The principal application deployed in the cluster is the sync app. This is a gen_server set up according to chapter 2 of the OTP Design Principles. The gen_server handles requests to restart the Erlang node without shutting down and set environment variables, as well as requests upgrade the code by application or by process. Each sync gen_server joins the 'SYNC' distributed named process group and this is what enables upgrade of the whole cluster in one fell swoop. 

The sync server will invoke init:restart/0 to restart the node without shutting down upon receiving a RESTART request. This is incredibly handy since the restart sequence takes the contents of the Erlang VM to the trash can and then repeats the same steps taken by the Erlang VM when it is started from the command line. Which is to say that the VM loads the boot file from the boot server, parses the boot file, downloads the applications and runs them. If we have upgraded the code on the boot server then the Erlang VM will run new code after a restart. 

Upgrading by application or by process

The above procedure is quite intrusive since all apps running in the Erlang VM are killed. Any Erlang node will normally be running a number of apps and you may want to upgrade just one or two of them. This is where the "upgrade by application" procedure comes in. 

application:get_application/1 will give you the name of the application that a module belongs to. I build a unique list of applications that my changed modules belong to and then stop each application with application:stop/1, re-load changed modules and start the application with application:start/1

The upgrade process by process procedure first grabs a list of all processes running in the same node as the sync gen_server. It does this by calling processes(). I check whether each process is running the code in one of the modified modules using erlang:check_process_code/2. Next, I suspend affected processes with erlang:suspend_process/1, re-load changed modules with erlang:resume_process/1 and I'm done.

Reloading modules for fun and profit

I'm still not absolutely sure if I got reloading of changed modules right but it looks like this

    load_modules([]) ->
        ok;

    load_modules([Mod|T]) ->
        code:purge(Mod),
        code:soft_purge(Mod),
        {module, Mod} =  code:load_file(Mod),
        load_modules(T).

The need to call code:soft_purge/1 after code:purge/1 was determined empirically.

Everything I have described thus far is small bits of code.  The biggest chunk of code in the sync server figures out what modules were modified. 

What to reload: Inspecting module versions

Remember my original intent to run make:all/0 followed by sync:all/0 to upgrade all nodes in the cluster at the same time? It's only possible because 1) it's possible to grab the module version from a module loaded into memory, 2) it's possible to grab the same from a module on disk and, crucially, modules are not reloaded when make:all/0 is run.

The module version defaults to the MD5 checksum of the module if no -vsn(Vsn) attribute is given. For the life of me I can't remember where Module:module_info() is documented but this is what you use to grab the attributes of the module. It's a property list so you can use proplists:get_value/2 to grab the vsn property and thus the module version.

To take advantage of local processing power, the API initiating the upgrade request does no work apart from inspecting the SYNC distributed named process group and telling each sync gen_server in the group to initiate the upgrade procedure. This means that each module loaded into the Erlang node hosting the sync server needs to be checked for changes.

Grabbing the version of the BEAM file holding the code for a given module is done using  beam_lib:version/1.  This is complicated by the fact that all of the Erlang EC2 nodes in the cluster download their code from the boot server.  Normally, beam_lib:version/1 takes either a module name, a file name or a binary. 

I haven't documented why I'm not using a module name or a file name in the boot server scenario but I must have found them not to work. I had to resort to fetching the module BEAM file from the boot server and inspecting that. Fortunately, traffic between EC2 instances is free and fast and the same applies to your LAN.

To find out if a module is modified I grab the list of loaded modules with code:all_loaded/0 and inspect each module with code:is_loaded/1. I skip preloaded modules (see documentation for code:is_loaded) and use the path returned otherwise to instruct erl_prim_loader:get_file/1 to fetch the BEAM file. I then pass the file contents to beam_lib:version/1 and I have my disk version. After that it's a simple matter of comparing the two versions and reloading the module if they differ.

Filed under  //   ec2   erlang  

Setting up Erlang on Amazon EC2

This article describes a project that I recently completed for a startup company. The code is proprietary and cannot be published but the company has graciously allowed me to write about my experience.

Why Erlang and Amazon EC2

There's no need to introduce the Amazon Elastic Computing Cloud (EC2) since everyone knows about it by now. In essence, EC2 allows you to rent computing power by the hour. That hour is just $0.10 which works out to about $70 per month. The virtual server that Amazon provides is called an instance. The important bit is that you are completely in control of the operating system that the instance runs and the software installed on it.

Amazon lets you run scores of instances at any given time. Major benefits are realized when EC2 instances work as a cluster, though. Think of GoogleBot, a page crawler that indexes your site's content. Such a crawler would surely benefit from being run on as many machines as possible, all indexing different pages and working in parallel. Once the crawler is finished, you can shut the machines down until next time.

Amazon does not provide tools to cluster your instances or replicate data among them. This is a task that Erlang copes with extremely well so Amazon EC2 and Erlang are a match made in haven!

How to set up Erlang on Amazon EC2

How do you start with Erlang and EC2? You need to build a Linux image that runs Erlang upon startup and automatically starts a new Erlang node. This node should then contact an existing Erlang node to join your Erlang cluster.

CEAN is a great way to set up the necessary components of Erlang on your new instance. Set up CEAN and have it install just the Erlang applications that you need. Create a script that will run Erlang when Linux starts. Make sure to adjust $HOME in this script and set $PROGNAME to start.sh in cean/start.sh. Use cean:install/1 to pull in the inets and sasl packages. You will likely need the compiler package as well.

The EC2 API lets you pass arguments to your newly started instance and these arguments can be retrieved by your Erlang code. One of the arguments you absolutely must pass is the name of an existing instance that is already part of your Erlang cluster. By connecting to Erlang running on the existing instance your new node will automatically become aware of the rest of the cluster.

Upgrading your Erlang code

The software available to your instance is normally part of your instance image. It's quite cumbersome to rebuild an image every time you deploy a software update, though. It's much better to push software updates to your instances whenever an update is available. Note that these updates need to be pushed to every instance in your Erlang cluster and reloaded every time an instance restarts. Fortunately, Erlang makes all this easy.

The boot server facility is probably one of the least documented and appreciated pieces of the Erlang infrastructure but one that comes in most handy here. A boot server enables Erlang nodes to fetch their configuration files and code from a central location, over the network. This neatly sidesteps the issue of pushing upgrades to your Erlang cluster. All you need to do is restart your instances one by one and have them fetch new software.

Note that you don't need to physically restart the EC2 instances themselves. All you need to do is tell our Erlang nodes to reboot without exiting the VM. This is done using init:restart/0.

The boot server

The Erlang boot server lives in the erlbootserver module and keeps a list of slave hosts authorized to connect to it. You can use man erlbootserver to read up on the boot server API.

The boot server will not have any hosts authorized to connect to it upon startup. A new EC2 instance that you are starting up needs to be added to the boot server slave list BEFORE you attempt to start an new Erlang node. This is easily accomplished by starting a controller node that will issue an RPC call to the boot server and add its own IP address to the boot server's slave list.

Once the controller node adds the internal Amazon instance address to the boot server authorized slave list, it can start the worker node and safely exit. Now that the boot server knows about the new slave it will allow connection and the worker node will successfully fetch its software from the boot server.

The boot server and all the slave nodes must share the same Erlang cookie. The cookie is stored in ~/.erlang.cookie. All nodes must also share the same OTP version.

So long as all the nodes are part of the same EC2 security group we should be reasonably secure that no node outside of our group will be able to make use of our boot server. This security is also aided by the requirement that all Erlang nodes in the cluster must use the same cookie to talk to each other. It's convenient to assign one of the existing instances as a boot server since it will then be within the EC2 security group.

Setting up

I use a script like this to start slave nodes. The path specified is on the boot server.

#!/bin/sh

COOKIE=RRFJBVGLSOUPFLWVEYJP
BOOT=/Users/joelr/work/erlang/sync/ebin/diskless
HOST=192.168.1.33
ID=diskless

erl -name $ID -boot $BOOT -setcookie $COOKIE -id $ID -loader inet -hosts $HOST -mode embedded ${1+"$@"}

You will also need to create a boot file which must be created with full paths inside (local option to systools:make_script/2).

To build a boot file I use these two lines of Erlang:

code:add_path("./ebin"). 
systools:make_script("diskless", [local, {outdir, "./ebin"}]).

script files are made from rel and app files. boot files are made from script files.

Note that diskless is the same boot file name that is used in the shell script above. I'm assuming that it lives in ./ebin and so I add it to the code path for make_script to find it.

If everything is done correctly your new EC2 instances will now fetch their code from the boot sever upon startup and whenever you restart Erlang nodes running on them with init:restart/0.

I may not always be convenient to pull updates from the boot server. I will describe a push facility that I implemented in another post.

Filed under  //   ec2   erlang  

HiPE for Mac x86

It started innocently enough with my wondering why High-performance Erlang (HiPE) was not available on my shiny new MacBook Pro. This turned into a long but awesome email exchange and culminated in victory, and my port of HiPE to Mac Intel, less than two weeks later. 

I learned a fair number of things about the Mac OSX kernel, the FPU, floating-point exceptions, Intel 32-bit architecture and SSE2 along the way. I also had an awesome time with assembler code.

I could not have done it without the mach_override library and numerous tips from Mikael Pettersson as well as other folks on the Erlang Questions mailing list. I will be shipping my patches to the HiPE team this week.

Filed under  //   erlang   hipe   mac