I thought of a cute problem: what is the smallest (size) ./a.out binary I can create?
Here are some rules the program should follow:
./a.outmust run successfully.$?must deterministically be0.- The binary must be produced by GCC only; no post-processing with
objcopy, hex editors, or manual patching.
We begin with the simplest program possible:
// compiled with gcc empty.c
int main() {
return 0;
}
This gives us a file size of 15816 bytes (from stat). Not too shabby, but we will need four of the RAM used in the Apollo guidance computer to fit our binary that does nothing.
Looking at file:
❯ file a.out
a.out: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /nix/store/jms7zxzm7w1whczwny5m3gkgdjghmi2r-glibc-2.42-51/lib/ld-linux-x86-64.so.2, for GNU/Linux 3.10.0, not stripped
not stripped looks suspicious. Whatever it is, surely it is better if we can strip stuff out of our binary. It turns out that gcc provides a -s flag that compiles the code without retaining any debugging information. We are now at 14352 bytes with our code stripped.
Between running ./a.out and hitting int main(), there are many sorceries happening behind the scenes - so much so that there was a one-hour talk by Matt Godbolt at cppcon about it. Let’s tweak the main function so that we have a freestanding binary that skips everything that happened before int main().
// compiled with gcc empty.c -s -nostartfiles
#include <cstdlib>
extern "C" __attribute((noreturn)) void _start() { exit(0); }
This only gives us a measly improvement to 13632 bytes. Given how much Matt complains is happening before int main, surely there is still code in there that we aren’t running but is still in our binary!
Checking objdump -x a.out, we can see a bunch of libraries being dynamically loaded:
❯ objdump -x a.out
a.out: file format elf64-x86-64
a.out
architecture: i386:x86-64, flags 0x00000150:
HAS_SYMS, DYNAMIC, D_PAGED
start address 0x0000000000001020
Program Header:
PHDR off 0x0000000000000040 vaddr 0x0000000000000040 paddr 0x0000000000000040 align 2**3
filesz 0x00000000000002a0 memsz 0x00000000000002a0 flags r--
INTERP off 0x00000000000002e0 vaddr 0x00000000000002e0 paddr 0x00000000000002e0 align 2**0
filesz 0x0000000000000053 memsz 0x0000000000000053 flags r--
...
Dynamic Section:
NEEDED libm.so.6
NEEDED libgcc_s.so.1
NEEDED libc.so.6
RUNPATH ...
HASH 0x0000000000000368
GNU_HASH 0x0000000000000380
STRTAB 0x00000000000003d0
SYMTAB 0x00000000000003a0
STRSZ 0x00000000000000c0
SYMENT 0x0000000000000018
DEBUG 0x0000000000000000
PLTGOT 0x0000000000003fe8
PLTRELSZ 0x0000000000000018
PLTREL 0x0000000000000007
JMPREL 0x00000000000004b8
FLAGS_1 0x0000000008000000
VERNEED 0x0000000000000498
VERNEEDNUM 0x0000000000000001
VERSYM 0x0000000000000490
Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 00000053 00000000000002e0 00000000000002e0 000002e0 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.gnu.property 00000030 0000000000000338 0000000000000338 00000338 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .hash 00000014 0000000000000368 0000000000000368 00000368 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .gnu.hash 0000001c 0000000000000380 0000000000000380 00000380 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .dynsym 00000030 00000000000003a0 00000000000003a0 000003a0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .dynstr 000000c0 00000000000003d0 00000000000003d0 000003d0 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .gnu.version 00000004 0000000000000490 0000000000000490 00000490 2**1
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .gnu.version_r 00000020 0000000000000498 0000000000000498 00000498 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 .rela.plt 00000018 00000000000004b8 00000000000004b8 000004b8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .plt 00000020 0000000000001000 0000000000001000 00001000 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
10 .text 0000000b 0000000000001020 0000000000001020 00001020 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
11 .eh_frame_hdr 0000001c 0000000000002000 0000000000002000 00002000 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
12 .eh_frame 0000005c 0000000000002020 0000000000002020 00002020 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
13 .dynamic 000001a0 0000000000003e48 0000000000003e48 00002e48 2**3
CONTENTS, ALLOC, LOAD, DATA
14 .got.plt 00000020 0000000000003fe8 0000000000003fe8 00002fe8 2**3
CONTENTS, ALLOC, LOAD, DATA
15 .comment 00000012 0000000000000000 0000000000000000 00003008 2**0
CONTENTS, READONLY
SYMBOL TABLE:
no symbols
Since we have a dynamic section, our binary needs to store the stubs needed to load the dynamic section. At this point, the binary is still dynamically linked. It contains an interpreter path, dynamic symbol tables, relocation metadata, PLT/GOT machinery, and references to shared libraries. That is a lot of infrastructure for a program whose only ambition is to immediately exit. So let’s remove three big pieces:
-nostdlib: do not link the standard libraries.-static: avoid dynamic linking machinery.-no-pie: produce a fixed-address executable instead of a position-independent one.
// compiled with gcc -static -nostdlib -no-pie empty.c
extern "C" __attribute__((noreturn)) void _start() {
__asm__ volatile(
"mov $60, %%al\n"
"xor %%dil, %%dil\n"
"syscall\n" ::
: "rax", "rdi");
__builtin_unreachable();
}
We are now at 8704 bytes.
Looking at objdump again (this time using objdump -D a.out to see all sections):
❯ objdump -D a.out
a.out: file format elf64-x86-64
Disassembly of section .note.gnu.property:
0000000000400190 <.note.gnu.property>:
400190: 04 00 add $0x0,%al
400192: 00 00 add %al,(%rax)
400194: 20 00 and %al,(%rax)
400196: 00 00 add %al,(%rax)
400198: 05 00 00 00 47 add $0x47000000,%eax
40019d: 4e 55 rex.WRX push %rbp
40019f: 00 01 add %al,(%rcx)
4001a1: 00 01 add %al,(%rcx)
4001a3: c0 04 00 00 rolb $0x0,(%rax,%rax,1)
4001a7: 00 01 add %al,(%rcx)
4001a9: 00 00 add %al,(%rax)
4001ab: 00 00 add %al,(%rax)
4001ad: 00 00 add %al,(%rax)
4001af: 00 02 add %al,(%rdx)
4001b1: 00 01 add %al,(%rcx)
4001b3: c0 04 00 00 rolb $0x0,(%rax,%rax,1)
4001b7: 00 01 add %al,(%rcx)
4001b9: 00 00 add %al,(%rax)
4001bb: 00 00 add %al,(%rax)
4001bd: 00 00 add %al,(%rax)
...
Disassembly of section .text:
0000000000401000 <.text>:
401000: 55 push %rbp
401001: 48 89 e5 mov %rsp,%rbp
401004: b0 3c mov $0x3c,%al
401006: 40 b7 01 mov $0x1,%dil
401009: 0f 05 syscall
Disassembly of section .eh_frame:
0000000000402000 <.eh_frame>:
402000: 14 00 adc $0x0,%al
402002: 00 00 add %al,(%rax)
402004: 00 00 add %al,(%rax)
402006: 00 00 add %al,(%rax)
402008: 01 7a 52 add %edi,0x52(%rdx)
40200b: 00 01 add %al,(%rcx)
40200d: 78 10 js 0x40201f
40200f: 01 1b add %ebx,(%rbx)
402011: 0c 07 or $0x7,%al
402013: 08 90 01 00 00 18 or %dl,0x18000001(%rax)
402019: 00 00 add %al,(%rax)
40201b: 00 1c 00 add %bl,(%rax,%rax,1)
40201e: 00 00 add %al,(%rax)
402020: e0 ef loopne 0x402011
402022: ff (bad)
402023: ff 0b decl (%rbx)
402025: 00 00 add %al,(%rax)
402027: 00 00 add %al,(%rax)
402029: 41 0e rex.B (bad)
40202b: 10 86 02 43 0d 06 adc %al,0x60d4302(%rsi)
402031: 00 00 add %al,(%rax)
Disassembly of section .comment:
0000000000000000 <.comment>:
0: 47 rex.RXB
1: 43 rex.XB
2: 43 3a 20 rex.XB cmp (%r8),%spl
5: 28 47 4e sub %al,0x4e(%rdi)
8: 55 push %rbp
9: 29 20 sub %esp,(%rax)
b: 31 35 2e 32 2e 30 xor %esi,0x302e322e(%rip) # 0x302e323f
...
Some of you might wonder why there is a .comment section. We are trying to rid our binary of as many things as possible, and we definitely don’t want comments.
Turns out, the .comment section stores the compiler used to create the binary and in our case, GCC: (GNU) 15.2.0 (hex: 47 43 43 3a …). However, objdump interprets it as assembly, hence we get those strange instructions. Adding -fno-ident to gcc removes the .comment section and boosts us to 8616B.
The astute amongst you would also spot the .eh_frame section. That pesky section is used for stack unwinding, and our program that does nothing doesn’t need it for error handling. By telling GCC -fno-exceptions -fno-asynchronous-unwind-tables, we get to 4400B - almost small enough to fit into the Apollo guidance computer’s RAM.
The last section that we will want to remove is the .note.gnu.property section.
❯ readelf -n a.out
Displaying notes found in: .note.gnu.property
Owner Data size Description
GNU 0x00000020 NT_GNU_PROPERTY_TYPE_0
Properties: x86 feature used: x86
x86 ISA used: x86-64-baseline
GNU uses this section to leave notes for other tools to read. In this case, the assembler added that note, so we simply tell it not to by adding -Wa,-mx86-used-note=no. With that, we are now at 4320 bytes, and our objdump now looks like it stores nothing except the instructions:
❯ objdump -D a.out
a.out: file format elf64-x86-64
Disassembly of section .text:
0000000000401000 <.text>:
401000: 55 push %rbp
401001: 48 89 e5 mov %rsp,%rbp
401004: b0 3c mov $0x3c,%al
401006: 40 b7 01 mov $0x1,%dil
401009: 0f 05 syscall
…which is ridiculous. How can five lines of instructions result in a 4320B binary??? The current output of readelf look as follows
❯ readelf -a a.out
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x401000
Start of program headers: 64 (bytes into file)
Start of section headers: 4128 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 3
Size of section headers: 64 (bytes)
Number of section headers: 3
Section header string table index: 2
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 0000000000401000 00001000
000000000000000b 0000000000000000 AX 0 0 1
[ 2] .shstrtab STRTAB 0000000000000000 0000100b
0000000000000011 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000
0x00000000000000e8 0x00000000000000e8 R 0x1000
LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000
0x000000000000000b 0x000000000000000b R E 0x1000
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
Section to Segment mapping:
Segment Sections...
00
01 .text
02
There is no dynamic section in this file.
There are no relocations in this file.
No processor specific unwind information to decode
No version information found in this file.
The program header is the table that tells the OS loader how to map the file into memory segments when starting a program. Here, we see a LOAD of 232B (0xe8), which corresponds to (64B ELF header and three 56B program headers). There are also the instruction segments and the stack. The LOADs, however, have a Align requirement of 0x1000. To fulfil this, the linker had to put the .text after the paddings. We thus pass to GCC -Wl,–nmagic to tell the linker not to have that assumption, nicely making our readelf output look as follows:
❯ readelf -a a.out
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4000b0
Start of program headers: 64 (bytes into file)
Start of section headers: 208 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 2
Size of section headers: 64 (bytes)
Number of section headers: 3
Section header string table index: 2
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .text PROGBITS 00000000004000b0 000000b0
000000000000000b 0000000000000000 AX 0 0 1
[ 2] .shstrtab STRTAB 0000000000000000 000000bb
0000000000000011 0000000000000000 0 0 1
Key to Flags:
W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
L (link order), O (extra OS processing required), G (group), T (TLS),
C (compressed), x (unknown), o (OS specific), E (exclude),
D (mbind), l (large), p (processor specific)
There are no section groups in this file.
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
LOAD 0x00000000000000b0 0x00000000004000b0 0x00000000004000b0
0x000000000000000b 0x000000000000000b R E 0x1
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
Section to Segment mapping:
Segment Sections...
00 .text
01
There is no dynamic section in this file.
There are no relocations in this file.
No processor specific unwind information to decode
No version information found in this file.
There is only one LOAD now because we can map the combined data of ELF’s metadata and the .text section at the same time. Finally, we leapfrogged to 400 bytes! But can we do any better?
I believe the answer is no. Our binary looks like this:
+----------------------------------+
| ELF header | 64 B
+----------------------------------+
| Program header: PT_LOAD | 56 B
+----------------------------------+
| Program header: PT_GNU_STACK | 56 B
+----------------------------------+
| .text section contents | 11 B
+----------------------------------+
| .shstrtab section contents | 17 B
| "\0.shstrtab\0.text\0" |
+----------------------------------+
| padding for section header | 4 B
+----------------------------------+
| Section header [0]: NULL | 64 B
+----------------------------------+
| Section header [1]: .text | 64 B
+----------------------------------+
| Section header [2]: .shstrtab | 64 B
+----------------------------------+
The ELF header is self-explanatory. PT_LOAD is needed to load the instructions, PT_GNU_STACK will always be produced by GCC. .shstrtab can’t be removed by GCC. The first section header entry is required by the System V ABI ELF specification to be reserved for the undefined section index, SHN_UNDEF, whose value is 0. In practice, this entry has type SHT_NULL, so tools display it as the NULL section. Tools like objcopy, however, will allow us to cut out some additional things, but that is out of today’s scope.
| Step | Flags / change | Size (bytes) |
|---|---|---|
Normal main | gcc empty.c | 15,816 |
| Strip symbols | -s | 14,352 |
| Freestanding | -nostartfiles | 13,632 |
| No libc / static / no PIE | -nostdlib -static -no-pie | 8,704 |
Remove .comment section | -fno-ident | 8,616 |
| Remove unwind info | -fno-asynchronous-unwind-tables -fno-exceptions | 4,400 |
| Remove GNU property note | -Wa,-mx86-used-note=no | 4,320 |
| Reduce alignment | -Wl,--nmagic / -Wl,-n | 400 |
So there we have it - the final code used:
// gcc empty.c -nostdlib -static \
// -fno-asynchronous-unwind-tables -fno-exceptions -fno-rtti \
// -Wl,--build-id=none,-n,--strip-all
extern "C" __attribute__((noreturn)) void _start() {
__asm__ volatile(
"mov $60, %%al\n"
"xor %%dil, %%dil\n"
"syscall\n" ::
: "rax", "rdi");
__builtin_unreachable();
}