Posterous
Joel is using Posterous to post everything online. Shouldn't you?
Dsc_5799_-_version_2__1__thumb
 

Tenerife Skunkworks

Boldly going where few have gone before

AlgoScript

I'm on my third iteration of a translator from EasyLanguage. The first two versions were written in Haskell and OCaml and I'm using Lisp now. My goal is to produce code for a trading engine that runs in a shared library or DLL and can be embedded in other products such as NinjaTrader or TradeStation.

The original translator produced C# code but found this approach untenable. Every trading platform I looked at has a different set of trading functions. Generating C# code would have required me to write a library of supporting functions for every target platform to plug in the holes. 

I would have to write the code and test my libraries over and over again. It would also have required me to become an expert in every trading platform I wanted to translate for and made expanding my market rather tedious. Last but not least, anyone could grok my logic by looking at a translation or two and then write the code themselves using the libraries that I have painstakingly produced.

It struck me that I could translate into an intermediate language and build an embeddable execution engine that could run in every trading product I would target with my translator. All the trading products I looked at support DLLs. So long as I supplied my engine as a DLL and exported a set of functions, I could take in price quotes and return buy or sell instructions. 

Targeting my own trading engine simplifies development and testing and lets me focus on adding value to my own products instead of the products of others. I can focus on producing the best embeddable trading engine ever. I will depend on the host platform for price quotes and sending orders to the exchange, at least initially, but will add market data and execution interfaces over time.

Most of the trading products that I'm aware of run on the Windows desktop and are either written using .NET or and are migrating to .NET as we speak. These trading products use C# as their trading language and the differences between them are becoming less and less pronounced. 

I have no intention of slugging it out in the extremely crowded desktop trading space. The embeddable cross-platform trading space, on the other hand, is a great niche. Think unattended execution of trading strategies, grid-based analysis of massive volumes of market data and other mouth-watering goodies. 

The main issue to consider is the choice of trading language for the embeddable engine. Just as with C# on .NET, it's a choice determined by the implementation language. A Haskell-like DSL would have been nice but I shudder at the thought of Haskell as a DLL. I'm sorry but I could not resist the poke!

The OCaml syntax is quite rigid, although the LexiFI folks have hacked it to suit their needs. I could use Camlp4 but I had a very unpleasant experience with it. I mean do you dig the <:expr<, $lid:tbl$, $lid:x$. I do not!

I would like to present a translation of the EKam Scalper in AlgoScript. A Lisp by any other name would smell as sweet?

Filed under  //   compilers   lisp   trading  
Posted July 31, 2008
// 0 Comments

What grownups do

A good data storage backend is the cornerstone of a real-time trading platform and KDB is the leader of the pack. There's quite a bit of information about KDB available on the net. 

Kdb+ provides a full relational database management system with time-series analysis that handles data in memory as well as stored data on disk. For advanced applications such as backtesting of auto trading strategies or operational risk management, it is essential to be able to compare streaming data against history.

According to Dennis Shasha

In Sigmod 2001, Arthur Whitney and I have a paper describing time series databases for finance that he built: Lots o' Ticks: real-time high performance time series queries on billions of trades and quotes.

The summary of that paper is that 

  1. KDB is fully vertically partitioned and replaces tables with ordered arrays so that moving averages, etc. can be done directly in the database. They store data by column instead of by row. 
  2. K shares the namespace of the database system. I'm not sure what this means. It could be that strategies run in the same memory space as the database but maybe not.
  3. Each function can apply to one or more columns rather than a single row. 
  4. Tick data approximately 40 bytes wide.
  5. A single machine with Ultra SCSI II and 500-1Ghz CPU can do time series and multi-dimensional aggregation on about 100Mb per second. This translates to 1-25 million of rows per second, depending on the number of columns used in analysis.
  6. Data is naturally date and time oriented, historical data is partitioned by date and within each date it's sorted by symbol and time. 
  7. The incoming data stream (up to 100,000 ticks per second) is simply appended to a delta file in time order. At night it's sorted by symbol and time and put into a partition which takes less than a minute.

The K language is a vector-based language and appending to the end of a vector should be very fast indeed. Dennis Shasha elaborates on KDB in "Time Series in Finance: The Array Database Approach".

Some people had negative experience with support provided by KX (see bottom of page 1) but I believe the biggest issue is cost. 

I inquired with sales and was told the following: 

To answer your question, Kdb+ is priced, on a per cpu basis. Our standard pricing is $40k per cpu and the minimum purchase is 4 cpu's. The +tick module is a further $40k. These are one time fees and there is 20% annual maintenance fee on the final purchase price. This type of 4cpu setup would allow you to capture all North American equities & futures using one 64bit machine.

I doubt anyone can match KX on performance although a few have tried. There's Vhayu and Xenomorph for example.

It makes no sense for me to attack such a stalwart head-on. A flanking move is called for and it should be possible to build KDB for the rest of us.

Filed under  //   kdb   trading  
Posted January 21, 2008
// 0 Comments

Who let the dogs out?

I've been quiet for the past couple of months and I can finally explain why.

I've been writing Topdog, a compiler from EasyLanguage to C#.

I realized back in February that I had enough money in the bank to last a few months and went through a brief period of soul-searching. Trading did not turn out so well for me but I still wanted to be involved. I always wanted to have a product of my own and on February 23 I set out to write Topdog.

I found the One-Day compilers presentation quite fascinating and so I started with OCaml. Three months have gone by and I now have two fully developed versions of the translator, one written in OCaml and another in Haskell. After briefly deploying the Haskell version I decided to throw it away and soldier on with OCaml.

Why Ocaml? Why not Haskell? I'll list the pros and contras below, in no particular order. I'll write up a summary first, for those of you not interested in reading the whole post.

Summary:

There's an elephant in the room. It's there, it's huge, it's something that nobody talks about: OCaml is the practical Haskell. It's functional, statically typed and blazingly fast. Performance of OCaml code is well defined. With OCaml you stop asking why your code is spending 70% of its time collecting garbage and start actually trying to polish your code.

I cannot emphasize this enough. GHC takes 3-4 minutes to rebuild my project whenever I touch the parser. Ocaml takes 2-3 seconds. I use a 2Ghz Core Duo MacBook Pro and Topdog is decidedly not a large project which makes the difference all the more glaring. With Haskell I was loath to touch the parser or my syntax tree definition, with OCaml I look forward to tweaking things to my hearts content. Normal recompilation time of Topdog is so far as to be almost unnoticeable.

Then there's F#, the OCaml for .NET.

Transforming Abstract Syntax Trees (ASTs)

Polymorphic variants in OCaml make creating "macros" easy, e.g. the following would work for two different ASTs.

let var = `Var 

You can share chunks of syntax trees with poly variants and type conversions:

http://gist.github.com/266868#file_gistfile8.ml

This is not possible in Haskell without boxing your AST chunks. To my best knowledge the above would look like this

data X = X Int

data Y = Y String

data Z = X1 X | Y1 Y

The above introduces an extra level of indirection that is not necessary in OCaml. It may not look like a big deal until you realize that any OCaml function that works on arguments of type X or Y can also work on Z. This is simply not possible in Haskell and requires extra code to unbox constructors X1 and Y1 in Z before the functions that work on types X and Y can be applied.

I also find that transforming ASTs with monads and type classes clutters up my code:

http://gist.github.com/266868#file_gistfile1.hs

        
Yes, you can get used to the return-s and liftM-s but I find they take away from the original intent of the code. Compare to OCaml code:

http://gist.github.com/266868#file_gistfile2.ml

Scrapping Your Boilerplate Code

On the Haskell side there's Scrap your boilerplate code (SYB). The compiler has a type checking pass and I embed token location in the syntax tree to help with error reporting.

http://gist.github.com/266868#file_gistfile3.hs


I need to strip token locations for testing and the syntax tree is quite large. Traversing the tree to dig down to TokenPos and extract Expr is pages of boilerplate code. This is where SYB comes to the rescue and reduces pages of code to the following

http://gist.github.com/266868#file_gistfile4.hs


Everything is not peachy, though. SYB works perfectly when GHC can automatically derive Data and Typeable for you (see Expr above). My previous approach was to embed the position returned by Parsec and to start processing the AST from the parser result. This required me derive Data and Typeable by hand since Parsec's SourcePos and ParserError are hidden in the Parsec library. This required silly code like this

http://gist.github.com/266868#file_gistfile5.hs

and

http://gist.github.com/266868#file_gistfile6.hs


It's less code than I would have had to write without SYB but annoying nonetheless.

What about OCaml? I'm glad you asked!

The OCaml version uses the Camlp4 preprocessor which I vastly prefer over Template Haskell.

http://gist.github.com/266868#file_gistfile7.ml


Other bits

OCaml has built-in marshalling which lets you easily dump data to disk and load it back. There's a big push behind binary serialization with GHC nowadays and it's not hard to marshall your data with Haskell, except it has to be done manually. Libraries like Data.Derive or DrIFT can help you with this task but marshalling with OCaml is still easier.

The Haskell version of Topdog uses Parsec for parsing and PPrint for pretty printing. The OCaml version uses Emmanuel Orzon's dypgen GLR parser generator and Christian Lindig's pretty printer, part of his Quest Tester project.

Filed under  //   compilers   haskell   ocaml   trading  
Posted May 26, 2007
// 0 Comments