Who let the dogs out?
I've been quiet for the past couple of months and I can finally explain why.
I've been writing Topdog, a compiler from EasyLanguage to C#.I realized back in February that I had enough money in the bank to last a few months and went through a brief period of soul-searching. Trading did not turn out so well for me but I still wanted to be involved. I always wanted to have a product of my own and on February 23 I set out to write Topdog.I found the One-Day compilers presentation quite fascinating and so I started with OCaml. Three months have gone by and I now have two fully developed versions of the translator, one written in OCaml and another in Haskell. After briefly deploying the Haskell version I decided to throw it away and soldier on with OCaml.Why Ocaml? Why not Haskell? I'll list the pros and contras below, in no particular order. I'll write up a summary first, for those of you not interested in reading the whole post.Summary:There's an elephant in the room. It's there, it's huge, it's something that nobody talks about: OCaml is the practical Haskell. It's functional, statically typed and blazingly fast. Performance of OCaml code is well defined. With OCaml you stop asking why your code is spending 70% of its time collecting garbage and start actually trying to polish your code.I cannot emphasize this enough. GHC takes 3-4 minutes to rebuild my project whenever I touch the parser. Ocaml takes 2-3 seconds. I use a 2Ghz Core Duo MacBook Pro and Topdog is decidedly not a large project which makes the difference all the more glaring. With Haskell I was loath to touch the parser or my syntax tree definition, with OCaml I look forward to tweaking things to my hearts content. Normal recompilation time of Topdog is so far as to be almost unnoticeable.Then there's F#, the OCaml for .NET.Transforming Abstract Syntax Trees (ASTs)Polymorphic variants in OCaml make creating "macros" easy, e.g. the following would work for two different ASTs.let var = `VarYou can share chunks of syntax trees with poly variants and type conversions:http://gist.github.com/266868#file_gistfile8.ml
This is not possible in Haskell without boxing your AST chunks. To my best knowledge the above would look like this
data X = X Int data Y = Y String data Z = X1 X | Y1 YThe above introduces an extra level of indirection that is not necessary in OCaml. It may not look like a big deal until you realize that any OCaml function that works on arguments of type X or Y can also work on Z. This is simply not possible in Haskell and requires extra code to unbox constructors X1 and Y1 in Z before the functions that work on types X and Y can be applied.I also find that transforming ASTs with monads and type classes clutters up my code:http://gist.github.com/266868#file_gistfile1.hs
Yes, you can get used to the return-s and liftM-s but I find they take away from the original intent of the code. Compare to OCaml code:
Scrapping Your Boilerplate CodeOn the Haskell side there's Scrap your boilerplate code (SYB). The compiler has a type checking pass and I embed token location in the syntax tree to help with error reporting.http://gist.github.com/266868#file_gistfile3.hs
I need to strip token locations for testing and the syntax tree is quite large. Traversing the tree to dig down to TokenPos and extract Expr is pages of boilerplate code. This is where SYB comes to the rescue and reduces pages of code to the followinghttp://gist.github.com/266868#file_gistfile4.hs
Everything is not peachy, though. SYB works perfectly when GHC can automatically derive Data and Typeable for you (see Expr above). My previous approach was to embed the position returned by Parsec and to start processing the AST from the parser result. This required me derive Data and Typeable by hand since Parsec's SourcePos and ParserError are hidden in the Parsec library. This required silly code like thishttp://gist.github.com/266868#file_gistfile5.hs
It's less code than I would have had to write without SYB but annoying nonetheless.What about OCaml? I'm glad you asked!The OCaml version uses the Camlp4 preprocessor which I vastly prefer over Template Haskell.http://gist.github.com/266868#file_gistfile7.ml
Other bitsOCaml has built-in marshalling which lets you easily dump data to disk and load it back. There's a big push behind binary serialization with GHC nowadays and it's not hard to marshall your data with Haskell, except it has to be done manually. Libraries like Data.Derive or DrIFT can help you with this task but marshalling with OCaml is still easier.The Haskell version of Topdog uses Parsec for parsing and PPrint for pretty printing. The OCaml version uses Emmanuel Orzon's dypgen GLR parser generator and Christian Lindig's pretty printer, part of his Quest Tester project.



