Tenerife Skunkworks

Trading and technology

Creating Mac Binaries on Any Platform

I’m in love with Forth but there are no commercial Forth environments for Mac OSX. GForth is a free, fast and portable implementation of ANS Forth but it requires GCC and does not allow for binary distribution of code that uses foreign functions.

There are two excellent commercial implementations of ANS Forth and both run on Linux. I asked one of the companies if I could port their Forth to the Mac and promptly ended up with a tarball on my lap. There were no C or assembler files, it was all Forth source code.

The proper bootstrapping approach turned out to generate a Mac kernel on Linux, copy it over to the Mac and use it to compile the rest of the Forth environment. It’s called cross-compiling!

This required me to investigate how Mac binaries are laid out and how I could generate them without using gcc or a linker.

I would like to explain how I did it. Let’s start with a simple C program and feel free to browse the full source code.

1
2
3
4
5
6
7
8
#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
  printf("Hello world!\n");
  exit(0);
}

It can’t get any simpler!

1
2
3
gcc hello.c -o hello
./hello
Hello world!

What does it look like in assembler, though?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
.cstring
LC0:
.ascii "Hello world!\0"
.text
.globl _main
_main:
pushl  %ebp
movl   %esp, %ebp
pushl  %ebx
subl   $20, %esp
call   L3
"L00000000001$pb":
L3:
popl   %ebx
leal   LC0-"L00000000001$pb"(%ebx), %eax
movl   %eax, (%esp)
call   L_puts$stub
movl   $0, (%esp)
call   L_exit$stub
.section __IMPORT,__jump_table,symbol_stubs,self_modifying_code+pure_instructions,5
L_exit$stub:
.indirect_symbol _exit
hlt ; hlt ; hlt ; hlt ; hlt
L_puts$stub:
.indirect_symbol _puts
hlt ; hlt ; hlt ; hlt ; hlt
.subsections_via_symbols

The IMPORT section is where gcc allocates stubs for external functions. The dynamic linker will replace these with a jump to the real printf once libc is loaded.

What the code above does not include is proper alignment of the stack before the calls to printf and exit. This is required according to the Mac OSX ABI IA-32 Function Calling Conventions. It’s a slight of hand on the part of gcc which inserts a prolog before invoking our main function.

This prolog sets up the stack and gets hold of our program arguments, i.e. argc, argv and envp.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Breakpoint 1, 0x00001f6c in start ()
(gdb) disas
Dump of assembler code for function start:
0x00001f68 <start+00>:  push   $0x0
0x00001f6a <start+02>:  mov    %esp,%ebp
0x00001f6c <start+04>:  and    $0xfffffff0,%esp ; <-- stack alignment
0x00001f6f <start+07>:  sub    $0x10,%esp  ; <-- and here too!
0x00001f72 <start+10>:  mov    0x4(%ebp),%ebx
0x00001f75 <start+13>:  mov    %ebx,0x0(%esp)
0x00001f79 <start+17>:  lea    0x8(%ebp),%ecx
0x00001f7c <start+20>:  mov    %ecx,0x4(%esp)
0x00001f80 <start+24>:  add    $0x1,%ebx
0x00001f83 <start+27>:  shl    $0x2,%ebx
0x00001f86 <start+30>:  add    %ecx,%ebx
0x00001f88 <start+32>:  mov    %ebx,0x8(%esp)
0x00001f8c <start+36>:  mov    (%ebx),%eax
0x00001f8e <start+38>:  add    $0x4,%ebx
0x00001f91 <start+41>:  test   %eax,%eax
0x00001f93 <start+43>:  jne    0x1f8c <start+36>
0x00001f95 <start+45>:  mov    %ebx,0xc(%esp)
0x00001f99 <start+49>:  call   0x1fca <main>
0x00001f9e <start+54>:  mov    %eax,0x0(%esp)
0x00001fa2 <start+58>:  call   0x3000 <dyld_stub_exit>
0x00001fa7 <start+63>:  hlt
End of assembler dump.

Let’s tidy things up into a single NASM file. It’s less verbose than GAS and I much prefer it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
bits  32

section .text

GLOBAL start
extern _printf, _exit

start:
  and esp, 0xFFFFFFF0
  sub esp, 0x10
  mov dword [esp], hello.msg
  call _printf
  add esp, 0x10
  mov eax, 0          ; set return code
  call _exit
  hlt

section .data

hello.msg db 'Hello, World!', 0x0a, 0x00

The stubs are taken care of by nasm in Mach-O mode (-f macho below) and the code still works.

1
2
3
4
5
nasm -f macho hello.asm -o hello.o
ld hello.o -o hello -lc

./hello
Hello, World!

otool is indispensable for any sort of involved Mac forensics and the Mach-O file format is very well explained by Apple.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
otool -l hello
hello:
Load command 0
      cmd LC_SEGMENT
  cmdsize 56
  segname __PAGEZERO
   vmaddr 0x00000000
   vmsize 0x00001000
  fileoff 0
 filesize 0
  maxprot 0x00000000
 initprot 0x00000000
   nsects 0
    flags 0x0
...
Load command 8
     cmd LC_UUID
 cmdsize 24
   uuid 0xce 0x2c 0xd0 0xae 0xbb 0x29 0xb4 0xc5
        0xba 0x70 0x39 0x06 0x18 0x30 0x42 0x7b
Load command 9
        cmd LC_UNIXTHREAD
    cmdsize 80
     flavor i386_THREAD_STATE
      count i386_THREAD_STATE_COUNT
      eax 0x00000000 ebx    0x00000000 ecx 0x00000000 edx 0x00000000
      edi 0x00000000 esi    0x00000000 ebp 0x00000000 esp 0x00000000
      ss  0x00000000 eflags 0x00000000 eip 0x00001fd0 cs  0x00000000
      ds  0x00000000 es     0x00000000 fs  0x00000000 gs  0x00000000
Load command 10
          cmd LC_LOAD_DYLIB
      cmdsize 52
         name /usr/lib/libSystem.B.dylib (offset 24)
   time stamp 2 Thu Jan  1 01:00:02 1970
      current version 111.1.3
compatibility version 1.0.0

The Mach-O header is normally generated by the compiler and the linker (GCC & LD) but I’m using neither so I have to generate the header by hand. It’s doable, as long as NASM is instructed to simply dump a binary image to disk (-f bin) and it actually works!

1
2
3
4
nasm -f bin hello1.asm -o hello1
chmod +x hello1
./hello1
Hello, World!

Note that this can be done on any platform NASM runs on. I did it on Linux but assume it will work just as well on Windows.

Now, let’s take a good look at the code…

We need to tell NASM we are in 32-bit mode and that program code starts on the second VM page (0x1000 or 4096). The first page (PAGEZERO) is there to catch null pointer references.

1
2
3
4
5
;;; File: hello1.asm
;;; Build: nasm -f bin -o hello1 hello1.asm && chmod +x hello1

bits  32
org   0x1000

The header specifies that this is an x86-32 binary and a full-fledged executable file and that there are 10 load commands in the header.

1
2
3
4
5
6
7
8
mhdr:
   dd 0xFEEDFACE  ; magic
   dd 7           ; cputype
   dd 3           ; cpusubtype
   dd 2           ; filetype
   dd 10          ; ncmds
   dd sizeofcmds  ; sizeofcmds
   dd 0x85        ; flags

PAGEZERO is where you end up when dereferencing a 0 pointer. This page is protected from reading and writing so any access to it causes a page fault and a memory access violation. This segment does not take any space in the file so its filesize is set to 0.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
;;; Load command #0

pagezero:
   dd 1              ; LC_SEGMENT
   dd _pagezero      ; size
   db '__PAGEZERO'   ; segname
   times 6 db 0      ; padding to 16 chars
   dd 0              ; vmaddr
   dd 0x1000         ; vmsize
   dd 0              ; fileoff
   dd 0              ; filesize
   dd 0              ; maxprot
   dd 0              ; initprot
   dd 0              ; nsects
   dd 0              ; flags
_pagezero equ $-pagezero

The text segment is where our code lives. It’s readable and executable (initprot). The load commands that form part of the Mach-O header itself need to be loaded somewhere. Here, they are part of the text segment which is why the segment starts at the beginning of the file (fileoff 0).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
;;; Load command #1

code:
   dd 1              ; LC_SEGMENT
   dd _code          ; size
   db '__TEXT'       ; segname
   times 10 db 0     ; padding to 16 chars
   dd 0x1000         ; vmaddr
   dd 0x1000         ; vmsize
   dd 0              ; fileoff
   dd 0x1000         ; filesize
   dd 7              ; maxprot
   dd 5              ; initprot
   dd 1              ; nsects
   dd 0              ; flags

sect1:               ; section 0
   db '__text'       ; sectname
   times 10 db 0     ; padding to 16 chars
   db '__TEXT'       ; segname
   times 10 db 0     ; padding to 16 chars
   dd start          ; addr
   dd codesize       ; size
   dd start-$$       ; offset
   dd 0              ; align on 2^0
   dd 0              ; reloff
   dd 0              ; nreloc
   dd 0x80000400     ; flags
   dd 0              ; reserved1
   dd 0              ; reserved2
_code equ $-code

The data segment holds our “Hello world!” string.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
;;; Load command #2

data:
   dd 1              ; LC_SEGMENT
   dd _data          ; size
   db '__DATA'       ; segname
   times 10 db 0     ; padding to 16 chars
   dd 0x2000         ; vmaddr
   dd 0x1000         ; vmsize
   dd 0x1000         ; fileoff
   dd 0x1000         ; filesize
   dd 7              ; maxprot
   dd 3              ; initprot
   dd 1              ; nsects
   dd 0              ; flags

sect2:               ; section 0
   db '__const'      ; sectname
   times 9 db 0      ; padding to 16 chars
   db '__DATA'       ; segname
   times 10 db 0     ; padding to 16 chars
   dd 0x2000         ; addr
   dd 15             ; size, our string
   dd 4096           ; offset
   dd 0              ; align on 2^0
   dd 0              ; reloff
   dd 0              ; nreloc
   dd 0              ; flags
   dd 0              ; reserved1
   dd 0              ; reserved2
_data equ $-data

The IMPORT segment holds our jump table, the stubs for printf and exit. The dynamic linker will fill in the stubs for us with a jump to printf and exit in libc. This segment needs to be readable, writable and executable (initprot).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
;;; Load command #3

stubs:
   dd 1              ; LC_SEGMENT
   dd _stubs         ; size
   db '__IMPORT'     ; segname
   times 8 db 0      ; padding to 16 chars
   dd 0x3000         ; vmaddr
   dd 0x1000         ; vmsize
   dd 0x2000         ; fileoff
   dd 0x1000         ; filesize
   dd 7              ; maxprot
   dd 7              ; initprot
   dd 1              ; nsects
   dd 0              ; flags

sect3:               ; section 0
   db '__jump_table' ; sectname
   times 4 db 0      ; padding to 16 chars
   db '__IMPORT'     ; segname
   times 8 db 0      ; padding to 16 chars
   dd 0x3000         ; addr
   dd 10             ; size, two stubs
   dd 0x2000         ; offset
   dd 6              ; align on 2^6
   dd 0              ; reloff
   dd 0              ; nreloc
   dd 0x04000008     ; flags
   dd 0              ; reserved1
   dd 5              ; reserved2, stub size
_stubs equ $-stubs

The LINKEDIT segment holds the symbol table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
;;; Load command #4

linkage:
   dd 1              ; LC_SEGMENT
   dd _linkage       ; size
   db '__LINKEDIT'   ; link table
   times 6 db 0      ; padding
   dd 0x4000         ; vmaddr
   dd 0x1000         ; vmsize
   dd symbols-$$     ; fileoff
   dd _symbols       ; filesize
   dd 7              ; maxprot
   dd 1              ; initprot
   dd 0              ; nsects
   dd 0              ; flags
_linkage equ $-linkage

This segment describes our symbol table, including where the symbols and the strings naming them are located. I believe it’s mostly for the benefit of the debugger.

1
2
3
4
5
6
7
8
9
10
;;; Load command #5

symtab:
   dd 2              ; LC_SYMTAB
   dd _symtab        ; size
   dd symbols-$$     ; symoff
   dd 4              ; nsyms
   dd strings-$$     ; stroff
   dd _strings       ; strsize
_symtab equ $-symtab

This load command describes the dynamic symbol table. This is how the dynamic linker knows to plug the stubs (indirect).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
;;; Load command #6

dysymtab:
   dd 0x0b           ; LC_DYSYMTAB
   dd _dysymtab      ; size
   dd 0              ; ilocalsym
   dd 1              ; nlocalsym
   dd 1              ; iextdefsym
   dd 2              ; nextdefsym
   dd 2              ; iundefsym
   dd 2              ; nundefsym
   dd 0              ; tocoff
   dd 0              ; ntoc
   dd 0              ; modtaboff
   dd 0              ; nmodtab
   dd 0              ; extrefsymoff
   dd 0              ; nextrefsyms
   dd indirect-$$    ; indirectsymoff
   dd 2              ; nindirectsyms
   dd 0              ; extreloff
   dd 0              ; nextrel
   dd 0              ; locreloff
   dd 0              ; nlocrel
_dysymtab equ $-dysymtab

My guess is as good as yours here. I’m not ready to use a dynamic linker of my own but this is a distinct possibility! This load command clearly provides for it.

1
2
3
4
5
6
7
8
9
;;; Load command #7

dylinker:
   dd 0x0e           ; LC_LOAD_DYLINKER
   dd _dylinker      ; size
   dd 12             ; nameoff
   db '/usr/lib/dyld', 0
   align 4
_dylinker equ $-dylinker

This load command specifies the contents of the registers at startup. I haven’t seen anything other than EIP populated, though. The program will not run unless this load command is present!

1
2
3
4
5
6
7
8
9
10
11
;;; Load command #8

thrstate:
   dd 0x5            ; LC_UNIXTHREAD
   dd _thrstate      ; size
   dd 0x01           ; i386_THREAD_STATE
   dd 0x10           ; i386_THREAD_STATE_COUNT
   times 10 dd 0x00  ; cpu thread state
   dd start          ; eip
   times 05 dd 0x00  ;
_thrstate equ $-thrstate

We can have as many dylib segments as dynamic libraries we would like to use. I’m only using libc since that’s where printf and exit live. I could have created stubs for dlopen, dlclose, dlsym and dlerror and used them to load libc and pull out printf and exit. Why bother, though, when the dynamic linker can do it for us?

1
2
3
4
5
6
7
8
9
10
11
12
;;; Load command #9

dylib:
   dd 0x0c           ; LC_LOAD_DYLIB
   dd _dylib         ; size
   dd 0x18           ; nameoff
   dd 0x02           ; timestamp
   dd 0x006F0103     ; currentver
   dd 0x00010000     ; compatver
   db '/usr/lib/libSystem.B.dylib', 0
   align 4
_dylib equ $-dylib

It was a long road through the Mach-O header but we can finally relax and get some work done. There isn’t much to do apart from printing hello world and exiting but note the alignment of the stack on a 16-byte boundary, before each function call.

I’m taking the easy way out and aligning the stack one extra time, at the beginning of the program. This makes the rest of the alignment work much easier!

All values in the stack are 32-bit values. We are pushing a single argument which requires us to pad the stack with 12 more bytes (sub esp, 0x10). We pop arguments and padding right after the call to printf.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
GLOBAL start

start:

  and esp, 0xFFFFFFF0
  sub esp, 0x10
  mov dword [esp], hello.msg
  call _printf
  add esp, 0x10
  mov eax, 0          ; set return code
  call _exit
  hlt

codesize equ $-start

Data and stubs are easy. Note the alignment to a page boundary. A jump to a 32-bit address takes 5 bytes, thus 5 halt instructions are used for each stub.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
;;; Data

align 4096

hello.msg db 'Hello, World!', 0x0a, 0x00

;;; Stubs

align 4096

_printf:
  times 5 hlt

_exit:
  times 5 hlt

The symbol table has a well-defined format and each symbol needs to be described in excruciating detail!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
;;; Linkage

align 4096

symbols:           ; symbol table

; hello.msg

dd str01off    ; nstrx
db 0x0e        ; type
db 0x02        ; sect
dw 0x00        ; desc
dd hello.msg   ; value

; start

dd str02off    ; nstrx
db 0x0f        ; type
db 0x01        ; sect
dw 0x00        ; desc
dd start       ; value

; _printf

dd str03off    ; nstrx
db 0x01        ; type N_EXT
db 0x00        ; sect
dw 0x0101      ; desc
dd _printf     ; value

; _exit

dd str04off    ; nstrx
db 0x01        ; type N_EXT
db 0           ; sect
dw 0x0101      ; desc
dd _exit       ; value

The indirect symbol table tells the dynamic linker that elements 2 and 3 of the symbol table need to be looked up and their stubs plugged.

1
2
3
4
indirect:         ; indirect symbol table

   dd 0x02        ; _printf
   dd 0x03        ; _exit

The string table names the symbols above.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
strings:          ; string table

      db 0x20, 0x00

str01 db 'hello.msg', 0x00
str02 db 'start', 0x00
str03 db '_printf', 0x00
str04 db '_exit', 0x00

str01off equ str01 - strings
str02off equ str02 - strings
str03off equ str03 - strings
str04off equ str04 - strings

_strings equ $-strings
_symbols equ $-symbols

I don’t expect you to generate Mac binaries by hand on Linux or Windows but I hope this tutorial will be of help if you ever decide to try!

Comments