Difference between revisions of "Shellcode/Parsing"
(Created page with "== Binary format parsing == A runtime linker parses through either the ''PE'' (Portable Executable) or ''ELF'' (Executable and Linkable Format) executable formats to identify ''f...") |
GertieUbpgdd (Talk | contribs) |
||
Line 1: | Line 1: | ||
− | == Binary format parsing == | + | == Binary format parsing ==<noinclude> |
− | + | ||
− | Self-linking (or runtime-linking) shellcode refers to machine code's ability to use what functions are already present in memory as opposed to carrying all of its functionality within itself. From a general perspective, a linker is comprised of two parts. One part of the runtime linker must be able to isolate the base pointer of any given library loaded into memory, and the other part of the runtime linker must be able to parse the library and return the memory address/pointer for the start of any given function. | + | {{main|Shellcode}} |
+ | |||
+ | </noinclude>A runtime linker parses through either the ''PE'' (Portable Executable) or ''ELF'' (Executable and Linkable Format) executable formats to identify ''function pointers''. This is useful when writing code that must link to different versions of the same shared library. For example, 32-bit and 64-bit linux system calls have different numbers, so a runtime linker could run dispite this limitation. Each shared library format has a respective export table for functions accessible by third party applications, which is best used when writing version-indifferent code. | ||
+ | |||
+ | Self-linking (or runtime-linking) [[shellcode]] refers to [[machine code]]'s ability to use what functions are already present in memory as opposed to carrying all of its functionality within itself. From a general perspective, a linker is comprised of two parts. One part of the runtime linker must be able to isolate the base pointer of any given library loaded into memory, and the other part of the runtime linker must be able to parse the library and return the memory address/pointer for the start of any given function. | ||
This is called self-linking shellcode or self-linking machine code because it does not rely on being linked with any kernel, in stead it finds the functionality it needs within the run-time environment and calls already existing functions out of memory. This will save the programmer time and size, and potentially even allow the programmer to write a cross-OS machine code application that is fully capable of using pre-built-in functionality of the operating system by linking itself in stead of relying on an external linker to both link and format the binary properly. | This is called self-linking shellcode or self-linking machine code because it does not rely on being linked with any kernel, in stead it finds the functionality it needs within the run-time environment and calls already existing functions out of memory. This will save the programmer time and size, and potentially even allow the programmer to write a cross-OS machine code application that is fully capable of using pre-built-in functionality of the operating system by linking itself in stead of relying on an external linker to both link and format the binary properly. | ||
Line 40: | Line 43: | ||
=== Example: Printing symbol names === | === Example: Printing symbol names === | ||
− | It is relatively trivial to find | + | It is relatively trivial to find an imagebase at runtime using some small assembly, but more difficult to actually parse out the ELF image. Here's an unstable (no error or size checking) [[assembly]] code (not shellcode) that will dump its own symbols: |
− | * | + | * Pointing out a pointer to newlines for later |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
startup: | startup: | ||
Line 61: | Line 64: | ||
</source>}} | </source>}} | ||
− | * | + | * Build a loop to determine the base pointer of our file. We know that all ELF files start with 0x7fELF, so: |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
find_header: | find_header: | ||
Line 140: | Line 143: | ||
</source>}} | </source>}} | ||
− | * | + | * The example's strlen: |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
strlen: | strlen: |
Revision as of 09:00, 30 November 2012
Binary format parsing
- Main article: Shellcode
A runtime linker parses through either the PE (Portable Executable) or ELF (Executable and Linkable Format) executable formats to identify function pointers. This is useful when writing code that must link to different versions of the same shared library. For example, 32-bit and 64-bit linux system calls have different numbers, so a runtime linker could run dispite this limitation. Each shared library format has a respective export table for functions accessible by third party applications, which is best used when writing version-indifferent code.
Self-linking (or runtime-linking) shellcode refers to machine code's ability to use what functions are already present in memory as opposed to carrying all of its functionality within itself. From a general perspective, a linker is comprised of two parts. One part of the runtime linker must be able to isolate the base pointer of any given library loaded into memory, and the other part of the runtime linker must be able to parse the library and return the memory address/pointer for the start of any given function.
This is called self-linking shellcode or self-linking machine code because it does not rely on being linked with any kernel, in stead it finds the functionality it needs within the run-time environment and calls already existing functions out of memory. This will save the programmer time and size, and potentially even allow the programmer to write a cross-OS machine code application that is fully capable of using pre-built-in functionality of the operating system by linking itself in stead of relying on an external linker to both link and format the binary properly.
Header Diagrams
- Diagram of a 64-bit ELF Header:
0x0 - 0xf = "ELF Format Information" Entry-point = 0x18 - 0x1f Start of section headers = 0x28 - 0x2f Size of each section = 0x3a - 0x3b Number of section headers = 0x3c - 0x3d
- Diagram of a 64-bit section header: (length defined in ELF header)
[0x0-0x3] shstrtab offset for section name. shstrtab is defined between the end of .text and the beginning of the section headers
[0x4-0x7] section type - 0 is null, 1 is progbits, 2 is symtab, 3 is strtab [0x8-0xf] section flags [0x10-0x17] section address [0x18-0x1f] section offset [0x20-0x27] section size [0x28-0x2b] Section Link [0x2c-0x2f] Section Info [0x30-0x37] Section Align [0x38-0x3f] Section EntSize
- Diagram of a 64-bit symbol table entry: (0x18 bytes in length)
[0x0-0x3] Name offset from next string table [0x4-0x5] Bind [0x6-0x7] Ndx [0x8-0xf] Symbol pointer (Function pointer, data pointer, etc) [0x10-0x17] Null barrier
Example: Printing symbol names
It is relatively trivial to find an imagebase at runtime using some small assembly, but more difficult to actually parse out the ELF image. Here's an unstable (no error or size checking) assembly code (not shellcode) that will dump its own symbols:
- Pointing out a pointer to newlines for later
startup: xor %r15, %r15 push $0x0a0a0a mov %rsp, %r15 |
- Get the location of currently executing code so we can calculate the base pointer
call getpc # this getpc returns the address of dec rax on the next line into %rax. dec %rax xor %rcx, %rcx push $0x2 pop %rsi |
- Build a loop to determine the base pointer of our file. We know that all ELF files start with 0x7fELF, so:
find_header: cmpl $0x464c457f, (%rax,%rcx,4) # Did we find our ELF base pointer? je find_sections dec %rax jmp find_header |
- Extract the section header offset from the ELF header
find_sections: # %rax now = base pointer of ELF image. xor %rbx, %rbx add $0x28, %bl xorl (%rax,%rbx,1), %ecx # %rcx = offset to section headers addq %rax, %rcx # %rcx = absolute address to section headers |
- Iterate through the section headers, looking for a symbol table section header
# each section header is 0x40 bytes in length. next_section: xor %rbx, %rbx xor %rbp, %rbp add $0x40, %rcx # %rcx now = address to first entry add $0x04, %bl xor (%rcx,%rbx,1), %ebp # %rbp now contains type cmp $0x02, %bpl jne next_section |
- The next header is the string table section header
found_symbols: xor %r8, %r8 mov %rcx, %r8 # %rcx = pointer to top of symbol section header add $0x40, %r8 # %r8 = pointer to top of string table section header |
- Get the addresses to the actual symbol table and string table
xor %rbx, %rbx xor $0x18, %bl # pointer to actual section is $0x18 bytes from header base xor %r9, %r9 xor %r10, %r10 xor (%rcx,%rbx,1), %r9 xor (%r8,%rbx,1), %r10 addq %rax, %r9 # r9 should now point to the first symbol addq %rax, %r10 # r10 should now point to the first string addq $0x18, %r9 |
- Iterate through the symbol table, extracting string pointers:
next_symbol: addq $0x18,%r9 xor %rcx, %rcx xor %rbp, %rbp xor %rdi, %rdi xor (%r9,%rcx,1), %ebp # %rbp now contains string offset. cmp %rbp, %rdi je next_symbol |
- Call strlen() on the string pointers for write()
print_symbol_name: mov %rbp, %rsi addq %r10, %rsi # %rsi should now be a pointer to a string push $0x01 pop %rax push %rax pop %rdi call strlen syscall |
- The example's strlen:
strlen: xor %rdx, %rdx next_byte: inc %rdx cmpb $0x00, (%rsi,%rdx,1); jne next_byte ret |
- Write the string to terminal:
write_to_terminal: push $0x01 pop %rax push %rax pop %rdi push $0x02 pop %rdx push %r15 pop %rsi syscall jmp next_symbol |
[user@host ~]$ ./test_parser startup getpc find_header find_sections next_section found_symbols next_symbol print_symbol_name strlen next_byte _start __bss_start _edata _end Segmentation fault