Tenerife Skunkworks

Boldly going where few have gone before

The perils of benchmarking Q

Suppose we had a list of 10000000 phone numbers and wanted to take just the first 8 digits of each. 

We could create the list like this: 

q)l:10000000 10#99?"0123456789" 

Here, 99?... creates a vector of 99 random characters from the set [0-9] and 10#... takes 10 characters from each generated "phone number". 

10000000 ... repeats it that many times. 

A simple 8#l will just gives us the first 8 elements of the list which is not what we want. 

q)f1:{x@\:til 8} 
q)f2:{8#/:x} 
q)f3:{8#'x} 

The above 3 solutions will give us the first 8 digits of every phone number in the list. 

Which solution is the fastest one, though? 

The list is very very long specifically to make benchmarking easier. 

q)\t f1 l 
906 

\t here gives us execution time for f1 in milliseconds. We proceed to time f2 and f3:

q)\t f2 l 
738 
q)\t f3 l 
738 

Looks like f1 is much slower but is it really? Let's run the benchmark several times... 

q)\t f1 l 
624 
q)\t f2 l 
735

See this thread in the non-commercial q/kdb forum for an additional note from Stevan Apter.

Filed under  //   kdb   q  

What grownups do

A good data storage backend is the cornerstone of a real-time trading platform and KDB is the leader of the pack. There's quite a bit of information about KDB available on the net. 

Kdb+ provides a full relational database management system with time-series analysis that handles data in memory as well as stored data on disk. For advanced applications such as backtesting of auto trading strategies or operational risk management, it is essential to be able to compare streaming data against history.

According to Dennis Shasha

In Sigmod 2001, Arthur Whitney and I have a paper describing time series databases for finance that he built: Lots o' Ticks: real-time high performance time series queries on billions of trades and quotes.

The summary of that paper is that 

  1. KDB is fully vertically partitioned and replaces tables with ordered arrays so that moving averages, etc. can be done directly in the database. They store data by column instead of by row. 
  2. K shares the namespace of the database system. I'm not sure what this means. It could be that strategies run in the same memory space as the database but maybe not.
  3. Each function can apply to one or more columns rather than a single row. 
  4. Tick data approximately 40 bytes wide.
  5. A single machine with Ultra SCSI II and 500-1Ghz CPU can do time series and multi-dimensional aggregation on about 100Mb per second. This translates to 1-25 million of rows per second, depending on the number of columns used in analysis.
  6. Data is naturally date and time oriented, historical data is partitioned by date and within each date it's sorted by symbol and time. 
  7. The incoming data stream (up to 100,000 ticks per second) is simply appended to a delta file in time order. At night it's sorted by symbol and time and put into a partition which takes less than a minute.

The K language is a vector-based language and appending to the end of a vector should be very fast indeed. Dennis Shasha elaborates on KDB in "Time Series in Finance: The Array Database Approach".

Some people had negative experience with support provided by KX (see bottom of page 1) but I believe the biggest issue is cost. 

I inquired with sales and was told the following: 

To answer your question, Kdb+ is priced, on a per cpu basis. Our standard pricing is $40k per cpu and the minimum purchase is 4 cpu's. The +tick module is a further $40k. These are one time fees and there is 20% annual maintenance fee on the final purchase price. This type of 4cpu setup would allow you to capture all North American equities & futures using one 64bit machine.

I doubt anyone can match KX on performance although a few have tried. There's Vhayu and Xenomorph for example.

It makes no sense for me to attack such a stalwart head-on. A flanking move is called for and it should be possible to build KDB for the rest of us.

Filed under  //   kdb   trading