I thought of a cute problem: what is the smallest (size) ./a.out binary I can create?

Here are some rules the program should follow:

  • ./a.out must run successfully.
  • $? must deterministically be 0.
  • The binary must be produced by GCC only; no post-processing with objcopy, hex editors, or manual patching.

We begin with the simplest program possible:

// compiled with gcc empty.c
int main() {
    return 0;
}

This gives us a file size of 15816 bytes (from stat). Not too shabby, but we will need four of the RAM used in the Apollo guidance computer to fit our binary that does nothing.

Looking at file:

❯ file a.out
a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/jms7zxzm7w1whczwny5m3gkgdjghmi2r-glibc-2.42-51/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not stripped

not stripped looks suspicious. Whatever it is, surely it is better if we can strip stuff out of our binary. It turns out that gcc provides a -s flag that compiles the code without retaining any debugging information. We are now at 14352 bytes with our code stripped.

Between running ./a.out and hitting int main(), there are many sorceries happening behind the scenes - so much so that there was a one-hour talk by Matt Godbolt at cppcon about it. Let’s tweak the main function so that we have a freestanding binary that skips everything that happened before int main().

// compiled with gcc empty.c -s -nostartfiles
#include <cstdlib>

extern "C" __attribute((noreturn)) void _start() { exit(0); }

This only gives us a measly improvement to 13632 bytes. Given how much Matt complains is happening before int main, surely there is still code in there that we aren’t running but is still in our binary!

Checking objdump -x a.out, we can see a bunch of libraries being dynamically loaded:

❯ objdump -x a.out

a.out:     file format elf64-x86-64
a.out
architecture: i386:x86-64, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x0000000000001020

Program Header:
    PHDR off    0x0000000000000040 vaddr 0x0000000000000040 paddr 0x0000000000000040 align 2**3
         filesz 0x00000000000002a0 memsz 0x00000000000002a0 flags r--
  INTERP off    0x00000000000002e0 vaddr 0x00000000000002e0 paddr 0x00000000000002e0 align 2**0
         filesz 0x0000000000000053 memsz 0x0000000000000053 flags r--
  ...

Dynamic Section:
  NEEDED               libm.so.6
  NEEDED               libgcc_s.so.1
  NEEDED               libc.so.6
  RUNPATH              ...
  HASH                 0x0000000000000368
  GNU_HASH             0x0000000000000380
  STRTAB               0x00000000000003d0
  SYMTAB               0x00000000000003a0
  STRSZ                0x00000000000000c0
  SYMENT               0x0000000000000018
  DEBUG                0x0000000000000000
  PLTGOT               0x0000000000003fe8
  PLTRELSZ             0x0000000000000018
  PLTREL               0x0000000000000007
  JMPREL               0x00000000000004b8
  FLAGS_1              0x0000000008000000
  VERNEED              0x0000000000000498
  VERNEEDNUM           0x0000000000000001
  VERSYM               0x0000000000000490

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .interp       00000053  00000000000002e0  00000000000002e0  000002e0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  1 .note.gnu.property 00000030  0000000000000338  0000000000000338  00000338  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .hash         00000014  0000000000000368  0000000000000368  00000368  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 .gnu.hash     0000001c  0000000000000380  0000000000000380  00000380  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .dynsym       00000030  00000000000003a0  00000000000003a0  000003a0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 .dynstr       000000c0  00000000000003d0  00000000000003d0  000003d0  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 .gnu.version  00000004  0000000000000490  0000000000000490  00000490  2**1
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 .gnu.version_r 00000020  0000000000000498  0000000000000498  00000498  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 .rela.plt     00000018  00000000000004b8  00000000000004b8  000004b8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 .plt          00000020  0000000000001000  0000000000001000  00001000  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 10 .text         0000000b  0000000000001020  0000000000001020  00001020  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 11 .eh_frame_hdr 0000001c  0000000000002000  0000000000002000  00002000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 12 .eh_frame     0000005c  0000000000002020  0000000000002020  00002020  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 13 .dynamic      000001a0  0000000000003e48  0000000000003e48  00002e48  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 14 .got.plt      00000020  0000000000003fe8  0000000000003fe8  00002fe8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 15 .comment      00000012  0000000000000000  0000000000000000  00003008  2**0
                  CONTENTS, READONLY
SYMBOL TABLE:
no symbols

Since we have a dynamic section, our binary needs to store the stubs needed to load the dynamic section. At this point, the binary is still dynamically linked. It contains an interpreter path, dynamic symbol tables, relocation metadata, PLT/GOT machinery, and references to shared libraries. That is a lot of infrastructure for a program whose only ambition is to immediately exit. So let’s remove three big pieces:

  • -nostdlib: do not link the standard libraries.
  • -static: avoid dynamic linking machinery.
  • -no-pie: produce a fixed-address executable instead of a position-independent one.
// compiled with gcc -static -nostdlib -no-pie empty.c
extern "C" __attribute__((noreturn)) void _start() {
    __asm__ volatile(
        "mov $60, %%al\n"
        "xor %%dil, %%dil\n"
        "syscall\n" ::
            : "rax", "rdi");
    __builtin_unreachable();
}

We are now at 8704 bytes.

Looking at objdump again (this time using objdump -D a.out to see all sections):

❯ objdump -D a.out

a.out:     file format elf64-x86-64

Disassembly of section .note.gnu.property:

0000000000400190 <.note.gnu.property>:
  400190:	04 00                	add    $0x0,%al
  400192:	00 00                	add    %al,(%rax)
  400194:	20 00                	and    %al,(%rax)
  400196:	00 00                	add    %al,(%rax)
  400198:	05 00 00 00 47       	add    $0x47000000,%eax
  40019d:	4e 55                	rex.WRX push %rbp
  40019f:	00 01                	add    %al,(%rcx)
  4001a1:	00 01                	add    %al,(%rcx)
  4001a3:	c0 04 00 00          	rolb   $0x0,(%rax,%rax,1)
  4001a7:	00 01                	add    %al,(%rcx)
  4001a9:	00 00                	add    %al,(%rax)
  4001ab:	00 00                	add    %al,(%rax)
  4001ad:	00 00                	add    %al,(%rax)
  4001af:	00 02                	add    %al,(%rdx)
  4001b1:	00 01                	add    %al,(%rcx)
  4001b3:	c0 04 00 00          	rolb   $0x0,(%rax,%rax,1)
  4001b7:	00 01                	add    %al,(%rcx)
  4001b9:	00 00                	add    %al,(%rax)
  4001bb:	00 00                	add    %al,(%rax)
  4001bd:	00 00                	add    %al,(%rax)
        ...

Disassembly of section .text:

0000000000401000 <.text>:
  401000:	55                   	push   %rbp
  401001:	48 89 e5             	mov    %rsp,%rbp
  401004:	b0 3c                	mov    $0x3c,%al
  401006:	40 b7 01             	mov    $0x1,%dil
  401009:	0f 05                	syscall


Disassembly of section .eh_frame:

0000000000402000 <.eh_frame>:
  402000:	14 00                	adc    $0x0,%al
  402002:	00 00                	add    %al,(%rax)
  402004:	00 00                	add    %al,(%rax)
  402006:	00 00                	add    %al,(%rax)
  402008:	01 7a 52             	add    %edi,0x52(%rdx)
  40200b:	00 01                	add    %al,(%rcx)
  40200d:	78 10                	js     0x40201f
  40200f:	01 1b                	add    %ebx,(%rbx)
  402011:	0c 07                	or     $0x7,%al
  402013:	08 90 01 00 00 18    	or     %dl,0x18000001(%rax)
  402019:	00 00                	add    %al,(%rax)
  40201b:	00 1c 00             	add    %bl,(%rax,%rax,1)
  40201e:	00 00                	add    %al,(%rax)
  402020:	e0 ef                	loopne 0x402011
  402022:	ff                   	(bad)
  402023:	ff 0b                	decl   (%rbx)
  402025:	00 00                	add    %al,(%rax)
  402027:	00 00                	add    %al,(%rax)
  402029:	41 0e                	rex.B (bad)
  40202b:	10 86 02 43 0d 06    	adc    %al,0x60d4302(%rsi)
  402031:	00 00                	add    %al,(%rax)


Disassembly of section .comment:

0000000000000000 <.comment>:
   0:	47                   	rex.RXB
   1:	43                   	rex.XB
   2:	43 3a 20             	rex.XB cmp (%r8),%spl
   5:	28 47 4e             	sub    %al,0x4e(%rdi)
   8:	55                   	push   %rbp
   9:	29 20                	sub    %esp,(%rax)
   b:	31 35 2e 32 2e 30    	xor    %esi,0x302e322e(%rip)        # 0x302e323f
        ...

Some of you might wonder why there is a .comment section. We are trying to rid our binary of as many things as possible, and we definitely don’t want comments.

Turns out, the .comment section stores the compiler used to create the binary and in our case, GCC: (GNU) 15.2.0 (hex: 47 43 43 3a …). However, objdump interprets it as assembly, hence we get those strange instructions. Adding -fno-ident to gcc removes the .comment section and boosts us to 8616B.

The astute amongst you would also spot the .eh_frame section. That pesky section is used for stack unwinding, and our program that does nothing doesn’t need it for error handling. By telling GCC -fno-exceptions -fno-asynchronous-unwind-tables, we get to 4400B - almost small enough to fit into the Apollo guidance computer’s RAM.

The last section that we will want to remove is the .note.gnu.property section.

❯ readelf -n a.out

Displaying notes found in: .note.gnu.property
  Owner                Data size 	Description
  GNU                  0x00000020	NT_GNU_PROPERTY_TYPE_0
      Properties: x86 feature used: x86
        x86 ISA used: x86-64-baseline

GNU uses this section to leave notes for other tools to read. In this case, the assembler added that note, so we simply tell it not to by adding -Wa,-mx86-used-note=no. With that, we are now at 4320 bytes, and our objdump now looks like it stores nothing except the instructions:

❯ objdump -D a.out

a.out:     file format elf64-x86-64


Disassembly of section .text:

0000000000401000 <.text>:
  401000:	55                   	push   %rbp
  401001:	48 89 e5             	mov    %rsp,%rbp
  401004:	b0 3c                	mov    $0x3c,%al
  401006:	40 b7 01             	mov    $0x1,%dil
  401009:	0f 05                	syscall

…which is ridiculous. How can five lines of instructions result in a 4320B binary??? The current output of readelf look as follows

❯ readelf -a a.out
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x401000
  Start of program headers:          64 (bytes into file)
  Start of section headers:          4128 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         3
  Size of section headers:           64 (bytes)
  Number of section headers:         3
  Section header string table index: 2

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000401000  00001000
       000000000000000b  0000000000000000  AX       0     0     1
  [ 2] .shstrtab         STRTAB           0000000000000000  0000100b
       0000000000000011  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000e8 0x00000000000000e8  R      0x1000
  LOAD           0x0000000000001000 0x0000000000401000 0x0000000000401000
                 0x000000000000000b 0x000000000000000b  R E    0x1000
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10

 Section to Segment mapping:
  Segment Sections...
   00
   01     .text
   02

There is no dynamic section in this file.

There are no relocations in this file.
No processor specific unwind information to decode

No version information found in this file.

The program header is the table that tells the OS loader how to map the file into memory segments when starting a program. Here, we see a LOAD of 232B (0xe8), which corresponds to (64B ELF header and three 56B program headers). There are also the instruction segments and the stack. The LOADs, however, have a Align requirement of 0x1000. To fulfil this, the linker had to put the .text after the paddings. We thus pass to GCC -Wl,–nmagic to tell the linker not to have that assumption, nicely making our readelf output look as follows:

❯ readelf -a a.out
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x4000b0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          208 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         2
  Size of section headers:           64 (bytes)
  Number of section headers:         3
  Section header string table index: 2

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         00000000004000b0  000000b0
       000000000000000b  0000000000000000  AX       0     0     1
  [ 2] .shstrtab         STRTAB           0000000000000000  000000bb
       0000000000000011  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  D (mbind), l (large), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x00000000000000b0 0x00000000004000b0 0x00000000004000b0
                 0x000000000000000b 0x000000000000000b  R E    0x1
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10

 Section to Segment mapping:
  Segment Sections...
   00     .text
   01

There is no dynamic section in this file.

There are no relocations in this file.
No processor specific unwind information to decode

No version information found in this file.

There is only one LOAD now because we can map the combined data of ELF’s metadata and the .text section at the same time. Finally, we leapfrogged to 400 bytes! But can we do any better?

I believe the answer is no. Our binary looks like this:

+----------------------------------+
| ELF header                       | 64 B
+----------------------------------+
| Program header: PT_LOAD          | 56 B
+----------------------------------+
| Program header: PT_GNU_STACK     | 56 B
+----------------------------------+
| .text section contents           | 11 B
+----------------------------------+
| .shstrtab section contents       | 17 B
| "\0.shstrtab\0.text\0"           |
+----------------------------------+
| padding for section header       | 4 B
+----------------------------------+
| Section header [0]: NULL         | 64 B
+----------------------------------+
| Section header [1]: .text        | 64 B
+----------------------------------+
| Section header [2]: .shstrtab    | 64 B
+----------------------------------+

The ELF header is self-explanatory. PT_LOAD is needed to load the instructions, PT_GNU_STACK will always be produced by GCC. .shstrtab can’t be removed by GCC. The first section header entry is required by the System V ABI ELF specification to be reserved for the undefined section index, SHN_UNDEF, whose value is 0. In practice, this entry has type SHT_NULL, so tools display it as the NULL section. Tools like objcopy, however, will allow us to cut out some additional things, but that is out of today’s scope.

StepFlags / changeSize (bytes)
Normal maingcc empty.c15,816
Strip symbols-s14,352
Freestanding-nostartfiles13,632
No libc / static / no PIE-nostdlib -static -no-pie8,704
Remove .comment section-fno-ident8,616
Remove unwind info-fno-asynchronous-unwind-tables -fno-exceptions4,400
Remove GNU property note-Wa,-mx86-used-note=no4,320
Reduce alignment-Wl,--nmagic / -Wl,-n400

So there we have it - the final code used:

// gcc empty.c -nostdlib -static  \
//    -fno-asynchronous-unwind-tables -fno-exceptions -fno-rtti \
//    -Wl,--build-id=none,-n,--strip-all
extern "C" __attribute__((noreturn)) void _start() {
    __asm__ volatile(
        "mov $60, %%al\n"
        "xor %%dil, %%dil\n"
        "syscall\n" ::
            : "rax", "rdi");
    __builtin_unreachable();
}