Binary format parsing
- Main article: Shellcode
A runtime linker parses through either the PE (Portable Executable) or ELF (Executable and Linkable Format) executable formats to identify function pointers. This is useful when writing code that must link to different versions of the same shared library. For example, 32-bit and 64-bit linux system calls have different numbers, so a runtime linker could run dispite this limitation. Each shared library format has a respective export table for functions accessible by third party applications, which is best used when writing version-indifferent code.
Self-linking (or runtime-linking) shellcode refers to machine code's ability to use what functions are already present in memory as opposed to carrying all of its functionality within itself. From a general perspective, a linker is comprised of two parts. One part of the runtime linker must be able to isolate the base pointer of any given library loaded into memory, and the other part of the runtime linker must be able to parse the library and return the memory address/pointer for the start of any given function.
This is called self-linking shellcode or self-linking machine code because it does not rely on being linked with any kernel, in stead it finds the functionality it needs within the run-time environment and calls already existing functions out of memory. This will save the programmer time and size, and potentially even allow the programmer to write a cross-OS machine code application that is fully capable of using pre-built-in functionality of the operating system by linking itself in stead of relying on an external linker to both link and format the binary properly.
- Diagram of a 64-bit ELF Header:
0x0 - 0xf = "ELF Format Information"
Entry-point = 0x18 - 0x1f
Start of section headers = 0x28 - 0x2f
Size of each section = 0x3a - 0x3b
Number of section headers = 0x3c - 0x3d
- Diagram of a 64-bit section header: (length defined in ELF header)
[0x0-0x3] shstrtab offset for section name.
shstrtab is defined between the end of
.text and the beginning of the section
headers
[0x4-0x7] section type - 0 is null, 1 is progbits, 2 is symtab, 3 is strtab
[0x8-0xf] section flags
[0x10-0x17] section address
[0x18-0x1f] section offset
[0x20-0x27] section size
[0x28-0x2b] Section Link
[0x2c-0x2f] Section Info
[0x30-0x37] Section Align
[0x38-0x3f] Section EntSize
- Diagram of a 64-bit symbol table entry: (0x18 bytes in length)
[0x0-0x3] Name offset from next string table
[0x4-0x5] Bind
[0x6-0x7] Ndx
[0x8-0xf] Symbol pointer (Function pointer, data pointer, etc)
[0x10-0x17] Null barrier
Example: Printing symbol names
It is relatively trivial to find an imagebase at runtime using some small assembly, but more difficult to actually parse out the ELF image. Here's an unstable (no error or size checking) assembly code (not shellcode) that will dump its own symbols:
- Pointing out a pointer to newlines for later
startup:
xor %r15, %r15
push $0x0a0a0a
mov %rsp, %r15
|
- Get the location of currently executing code so we can calculate the base pointer
call getpc # this getpc returns the address of dec rax on the next line into %rax.
dec %rax
xor %rcx, %rcx
push $0x2
pop %rsi
|
- Build a loop to determine the base pointer of our file. We know that all ELF files start with 0x7fELF, so:
find_header:
cmpl $0x464c457f, (%rax,%rcx,4) # Did we find our ELF base pointer?
je find_sections
dec %rax
jmp find_header
|
- Extract the section header offset from the ELF header
find_sections:
# %rax now = base pointer of ELF image.
xor %rbx, %rbx
add $0x28, %bl
xorl (%rax,%rbx,1), %ecx # %rcx = offset to section headers
addq %rax, %rcx # %rcx = absolute address to section headers
|
- Iterate through the section headers, looking for a symbol table section header
# each section header is 0x40 bytes in length.
next_section:
xor %rbx, %rbx
xor %rbp, %rbp
add $0x40, %rcx
# %rcx now = address to first entry
add $0x04, %bl
xor (%rcx,%rbx,1), %ebp # %rbp now contains type
cmp $0x02, %bpl
jne next_section
|
- The next header is the string table section header
found_symbols:
xor %r8, %r8
mov %rcx, %r8 # %rcx = pointer to top of symbol section header
add $0x40, %r8 # %r8 = pointer to top of string table section header
|
- Get the addresses to the actual symbol table and string table
xor %rbx, %rbx
xor $0x18, %bl # pointer to actual section is $0x18 bytes from header base
xor %r9, %r9
xor %r10, %r10
xor (%rcx,%rbx,1), %r9
xor (%r8,%rbx,1), %r10
addq %rax, %r9 # r9 should now point to the first symbol
addq %rax, %r10 # r10 should now point to the first string
addq $0x18, %r9
|
- Iterate through the symbol table, extracting string pointers:
next_symbol:
addq $0x18,%r9
xor %rcx, %rcx
xor %rbp, %rbp
xor %rdi, %rdi
xor (%r9,%rcx,1), %ebp # %rbp now contains string offset.
cmp %rbp, %rdi
je next_symbol
|
- Call strlen() on the string pointers for write()
print_symbol_name:
mov %rbp, %rsi
addq %r10, %rsi # %rsi should now be a pointer to a string
push $0x01
pop %rax
push %rax
pop %rdi
call strlen
syscall
|
strlen:
xor %rdx, %rdx
next_byte:
inc %rdx
cmpb $0x00, (%rsi,%rdx,1);
jne next_byte
ret
|
- Write the string to terminal:
write_to_terminal:
push $0x01
pop %rax
push %rax
pop %rdi
push $0x02
pop %rdx
push %r15
pop %rsi
syscall
jmp next_symbol
|
[user@host ~]$ ./test_parser
startup
getpc
find_header
find_sections
next_section
found_symbols
next_symbol
print_symbol_name
strlen
next_byte
_start
__bss_start
_edata
_end
Segmentation fault