I’m in love with Forth but there are no commercial Forth environments for Mac OSX. GForth is a free, fast and portable implementation of ANS Forth but it requires GCC and does not allow for binary distribution of code that uses foreign functions.
There are two excellent commercial implementations of ANS Forth and both run on Linux. I asked one of the companies if I could port their Forth to the Mac and promptly ended up with a tarball on my lap. There were no C or assembler files, it was all Forth source code.
The proper bootstrapping approach turned out to generate a Mac kernel on Linux, copy it over to the Mac and use it to compile the rest of the Forth environment. It’s called cross-compiling!
This required me to investigate how Mac binaries are laid out and how I could generate them without using gcc or a linker.
I would like to explain how I did it. Let’s start with a simple C program and feel free to browse the full source code.
The IMPORT section is where gcc allocates stubs for external functions. The dynamic linker will replace these with a jump to the real printf once libc is loaded.
What the code above does not include is proper alignment of the stack before the calls to printf and exit. This is required according to the Mac OSX ABI IA-32 Function Calling Conventions. It’s a slight of hand on the part of gcc which inserts a prolog before invoking our main function.
This prolog sets up the stack and gets hold of our program arguments, i.e. argc, argv and envp.
The Mach-O header is normally generated by the compiler and the linker (GCC & LD) but I’m using neither so I have to generate the header by hand. It’s doable, as long as NASM is instructed to simply dump a binary image to disk (-f bin) and it actually works!
Note that this can be done on any platform NASM runs on. I did it on Linux but assume it will work just as well on Windows.
Now, let’s take a good look at the code…
We need to tell NASM we are in 32-bit mode and that program code starts on the second VM page (0x1000 or 4096). The first page (PAGEZERO) is there to catch null pointer references.
PAGEZERO is where you end up when dereferencing a 0 pointer. This page is protected from reading and writing so any access to it causes a page fault and a memory access violation. This segment does not take any space in the file so its filesize is set to 0.
The text segment is where our code lives. It’s readable and executable (initprot). The load commands that form part of the Mach-O header itself need to be loaded somewhere. Here, they are part of the text segment which is why the segment starts at the beginning of the file (fileoff 0).
The IMPORT segment holds our jump table, the stubs for printf and exit. The dynamic linker will fill in the stubs for us with a jump to printf and exit in libc. This segment needs to be readable, writable and executable (initprot).
This segment describes our symbol table, including where the symbols and the strings naming them are located. I believe it’s mostly for the benefit of the debugger.
My guess is as good as yours here. I’m not ready to use a dynamic linker of my own but this is a distinct possibility! This load command clearly provides for it.
This load command specifies the contents of the registers at startup. I haven’t seen anything other than EIP populated, though. The program will not run unless this load command is present!
We can have as many dylib segments as dynamic libraries we would like to use. I’m only using libc since that’s where printf and exit live. I could have created stubs for dlopen, dlclose, dlsym and dlerror and used them to load libc and pull out printf and exit. Why bother, though, when the dynamic linker can do it for us?
It was a long road through the Mach-O header but we can finally relax and get some work done. There isn’t much to do apart from printing hello world and exiting but note the alignment of the stack on a 16-byte boundary, before each function call.
I’m taking the easy way out and aligning the stack one extra time, at the beginning of the program. This makes the rest of the alignment work much easier!
All values in the stack are 32-bit values. We are pushing a single argument which requires us to pad the stack with 12 more bytes (sub esp, 0x10). We pop arguments and padding right after the call to printf.
Data and stubs are easy. Note the alignment to a page boundary. A jump to a 32-bit address takes 5 bytes, thus 5 halt instructions are used for each stub.