Questions about this topic? Sign up to ask in the talk tab.

Difference between revisions of "Shellcode/Loaders"

From NetSec
Jump to: navigation, search
(Writing dynamic shellcode loaders in C)
 
(21 intermediate revisions by 2 users not shown)
Line 1: Line 1:
{{info|<center>The code discussed here is revisited later to be converted into shellcode for use in a [[polymorphic]] shellcode decoder, so it is nearly [[null-free shellcode|null-free]].  Full source available in [[Shellcode/Appendix#Loaders|the appendix]] or by downloading [[shellcodecs]].</center>}}
+
'''Shellcode loaders''' are used to test [[shellcode]] before use in a [[buffer overflow]] or other form of [[binary]] [[exploitation]]. The best way to construct a loader for user-friendly operations is by taking the shellcode as a command line argument and passing it to freshly allocated executable [[ram|memory space]].  This article examines the construction of such a loader for [[Linux]] in [[assembly]] language for the 64-bit x86 [[instruction set architecture]], though [[Shellcode/Appendix#loader-32.s|a 32-bit shellcode loader is provided in the appendix]].
  
Shellcode has to be tested before it can be used, so a shellcode loader is needed. The best way to construct a loader for user-friendly operations is by taking the shellcode as a command line argument.  
+
{{info|<center>The code discussed here is revisited later to be converted into shellcode for use in a [[polymorphic]] [[Shellcode/Self-modifying|shellcode decoder]], so it is nearly [[Shellcode/Null-free|null-free]].  Full source available in [[Shellcode/Appendix#Loaders|the appendix]] or by downloading [[shellcodecs]].</center>}}
  
== Writing an unlinked loaders==
+
== Executable loader ==
=== Executable ===
+
:''This section examines the [[Shellcode/Appendix#loader-64.s|64-bit loader]] provided in the [[shellcodecs]] package.''
==== Command Line Arguments ====
+
=== Command Line Arguments ===
[[Bash|Command line]] arguments are [[push|pushed]] onto the stack in this order: second argument, first argument, number of arguments. Therefore, in order to get the shellcode from the arguments, pop the %rbx register three times. Once this is done, the %rbx register will contain a pointer to the shellcode:
+
[[Bash|Command line]] arguments are [[push|pushed]] onto the stack in this order: second argument, first argument, number of arguments. Therefore, in order to get the shellcode from the arguments, ''pop'' the ''%rbx'' [[register]] three times. Once this is done, the ''%rbx'' register will contain a [[pointer]] to the [[shellcode]]:
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 15: Line 15:
 
</source>}}
 
</source>}}
  
==== Executable memory allocation with mmap() ====
+
=== Executable memory allocation with mmap() ===
 
:''See also: [[Linux_assembly#Unlinked_system_calls_for_64_bit_systems|Unlinked 64-bit system calls]], the [[Linux_assembly#64_bit_syscall_table|64-bit system call table]]''
 
:''See also: [[Linux_assembly#Unlinked_system_calls_for_64_bit_systems|Unlinked 64-bit system calls]], the [[Linux_assembly#64_bit_syscall_table|64-bit system call table]]''
  
Because modern [[operating system]]s have non-executable stacks by default, an executable stack must be constructed for successful code execution. This is done with the mmap() system call.  
+
Because modern [[operating system]]s have non-executable stacks by default, an executable stack must be constructed for successful code execution. This is done with the ''mmap''() system call.  
  
The prototype for mmap() is:
+
The prototype for ''mmap''() is:
 
{{code|text=<source lang="c">
 
{{code|text=<source lang="c">
 
   void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off);
 
   void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off);
Line 32: Line 32:
 
</source>}}
 
</source>}}
  
First, the system call number for ''mmap()'' is placed into %rax:
+
First, the system call number for ''mmap()'' is placed into ''%rax'':
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     push $0x9
 
     push $0x9
Line 38: Line 38:
 
</source>}}
 
</source>}}
  
The first argument (%rdi) of mmap() should be null, so using [[xor]], %rdi is set to zero.  
+
The first argument (''%rdi'') of ''mmap''() should be null, so using ''[[xor]]'', ''%rdi'' is set to zero.  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     xor %rdi, %rdi
 
     xor %rdi, %rdi
 
</source>}}
 
</source>}}
  
* The desired size of the buffer (4096 bytes or 0x1000) is passed into %rsi as the second argument to mmap.
+
* The desired size of the [[buffer]] (4096 bytes or 0x1000 in hex) is passed into ''%rsi'' as the second argument to ''mmap''.
The %rsi register is initialized to zero by pushing %rdi and popping %rsi:
+
The ''%rsi'' [[register]] is initialized to zero by ''push''ing ''%rdi'' and ''pop''ping ''%rsi'':
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     push %rdi
 
     push %rdi
Line 55: Line 55:
 
</source>}}
 
</source>}}
  
And shifted left 12 bits (1 shifted left 12 bits will become 0x1000 or binary 00010000 00000000):
+
And [[bit shift|shifted left]] 12 bits (1 shifted left 12 bits will become 0x1000 or binary 00010000 00000000):
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     shl $0x12, %rsi
 
     shl $0x12, %rsi
 
</source>}}
 
</source>}}
  
The third argument (%rdx) contains the memory permissions (read, write, execute, or none), for multiple, they are put together using [[or|bitwise or]].  Since 7 is the result of ORing the flags PROT_READ, PROT_WRITE, and PROT_EXEC, the [[or]] itself is skipped and its value is stored in %rdx.
+
The third argument (''%rdx'') contains the [[ram|memory]] permissions (read, write, execute, or none), for multiple, they are put together using [[or|bitwise or]].  Since 7 is the result of ''[[Bitwise_math#OR|OR]]''ing the flags ''PROT_READ'', ''PROT_WRITE'', and ''PROT_EXEC'', the [[or]] itself is skipped and its value (7) is stored in the ''%rdx'' [[register]].
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     push $0x7
 
     push $0x7
Line 66: Line 66:
 
</source>}}
 
</source>}}
  
The flags argument functions the same way as the "prot" argument, but requires constants for mapping. In this case MAP_PRIVATE|MAP_ANONYMOUS, which maps out to 0x22, which is stored in %r10.  
+
The flags argument functions the same way as the "prot" argument, but requires constants for mapping. In this case ''MAP_PRIVATE|MAP_ANONYMOUS'', which maps out to 0x22, is stored in ''%r10''.  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     push $0x22
 
     push $0x22
Line 72: Line 72:
 
</source>}}
 
</source>}}
  
The final two arguments should be null and stored in %r8 and %r9.  
+
The final two arguments should be null and stored in ''%r8'' and ''%r9''.  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     push %rdi
 
     push %rdi
Line 80: Line 80:
 
</source>}}
 
</source>}}
  
Once the [[register]]s are set, a syscall is used to invoke mmap().
+
Once the [[register]]s are set, a syscall is used to invoke ''mmap''().
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     syscall  # The syscall for the mmap().
 
     syscall  # The syscall for the mmap().
 
</source>}}
 
</source>}}
The %rax [[register]] now contains a pointer to the buffer returned by mmap() to copy the [[shellcode]] into.
+
The ''%rax'' [[register]] now contains a [[pointer]] to the [[buffer]] returned by ''mmap''() to copy the [[shellcode]] into.
  
==== Copying the code into the new memory ====
+
=== Copying the code into the new memory ===
The %rsi register is initialized to 0 to be used as a counter:
+
The ''%rsi'' [[register]] is initialized to 0 to be used as a counter:
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
inject:  
 
inject:  
Line 93: Line 93:
 
</source>}}
 
</source>}}
  
%rdi will be null as well because the current byte is compared to %dil to determine when we have reached the end of the shellcode.
+
''%rdi'' will be null as well because the current [[byte]] is compared to ''%dil'' to determine when the end of the [[shellcode]] has been reached.
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     push %rsi
 
     push %rsi
Line 99: Line 99:
 
</source>}}
 
</source>}}
  
If so, then we jump to inject_finished and actually execute the code.  
+
If so, then jump to inject_finished and actually execute the code.  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
inject_loop:
 
inject_loop:
Line 106: Line 106:
 
</source>}}
 
</source>}}
  
Each byte of the shellcode is moved from %rbx + %rsi (current location) into %rax + %rsi (new executable memory) through the %r10b single-byte sub-register of %r10:
+
Each byte of the shellcode is moved from ''%rbx + %rsi'' (current location) into ''%rax + %rsi'' (new executable [[ram|memory]]) through the '''%r10b'' single-[[byte]] sub-[[register]] of ''%r10'':
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     movb (%rbx, %rsi, 1), %r10b
 
     movb (%rbx, %rsi, 1), %r10b
Line 112: Line 112:
 
</source>}}
 
</source>}}
  
%rsi is incremented as both the offset and the counter:  
+
''%rsi'' is incremented as both the offset and the counter:  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     inc %rsi
 
     inc %rsi
Line 122: Line 122:
 
</source>}}
 
</source>}}
  
The inject_finished routine then appends the ret opcode, 0xc3, to the end of the shellcode:  
+
The ''inject_finished'' routine then appends the ret opcode, 0xc3, to the end of the shellcode:  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
inject_finished:
 
inject_finished:
Line 128: Line 128:
 
</source>}}
 
</source>}}
  
(note: opcodes are the instructions, whereas bytecode represents the opcode and its arguments, called operands in proper machine code terminology)
+
{{info|<center>In proper [[machine code|machine language]] terminology, an opcode is an [[Assembly#Instructions|instruction]] while bytecode refers to the opcode as well as its arguments, called operands.</center>}}
  
==== Returning to the code ====
+
=== Returning to the code ===
The reason that the code is '''returned to''' rather than jumped to or called is because this more adequately provides the environment similar to that of a vulnerable application at the time of overflow.  A payload is returned to, and therefore, when shellcode is loaded, it should also be returned to.
+
The reason that the code is '''returned to''' rather than jumped to or called is because this more adequately simulates the environment similar to that of a [[vulnerability|vulnerable]] [[application]] at the time of a [[buffer overflow]].  A payload is returned to, and therefore, when [[shellcode]] is loaded, it should also be returned to.
  
We then call ret_to_shellcode, this causes the address of exit to be pushed onto the stack, so that the end of the shellcode now returns to <address of exit>.  
+
First, ''ret_to_shellcode'' is called. This causes the address of ''exit'' to be pushed onto the stack, so that the end of the shellcode now returns to <address of exit>.  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     call ret_to_shellcode
 
     call ret_to_shellcode
 
</source>}}
 
</source>}}
  
We then replace the original return address with the address of our shellcode and return into it.  
+
The original [[return address]] is then overwritten with the address of the [[shellcode]], and returned into.  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
ret_to_shellcode:
 
ret_to_shellcode:
Line 145: Line 145:
 
</source>}}
 
</source>}}
  
When the shellcode completes, it will return to our exit function to exit cleanly:
+
When the [[shellcode]] completes, it will return to the ''exit'' function to exit cleanly:
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
exit:
 
exit:
Line 154: Line 154:
 
</source>}}
 
</source>}}
  
Once the code is complete we can run it along with our test shellcode:
+
Once the code is complete ([[Shellcode/Appendix#loader-64.s|source]]) it can be built the same way as any [[assembly]] program:
  
 
   ╭─user@host ~   
 
   ╭─user@host ~   
   ╰─➤  as -oloader.o loader.s
+
   ╰─➤  as -oloader-64.o loader-64.s
 
   ╭─user@host ~
 
   ╭─user@host ~
   ╰─➤  ld -oloader loader.o
+
   ╰─➤  ld -oloader-64 loader-64.o
  
==== Using the executable loader ====
+
Or by typing `make' from the root directory of [[shellcodecs]].
The shellcode we'll invoke here is the same as the shellcode we constructed and extracted earlier.  Notice the change in prompt, and that exit returns the original prompt.  This indicates that our shellcode executed successfully.
+
 
 +
=== Using the executable loader ===
 +
The [[Shellcode/Appendix#setuid_binsh.s|shellcode invoked here]] is the same as the [[Shellcode#Shellcode_mechanics|shellcode constructed and extracted earlier]].  Notice the change in prompt, and that exit returns the original prompt.  This indicates that the [[shellcode]] executed successfully.
  
 
   ╭─user@host ~
 
   ╭─user@host ~
Line 171: Line 173:
 
   ╰─➤
 
   ╰─➤
  
=== Return oriented ===
+
== Return oriented loader ==
Return oriented code can be tested using a loader as well; though a much smaller loader is used as return-oriented code ''should not require executable memory allocation'':
+
[[ROP|Return oriented code]] can be tested using a loader as well; though a much smaller loader is used as [[ROP|return-oriented code]] ''should not require [[#Executable_memory_allocation_with_mmap.28.29|executable memory allocation]]'':
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
_start:
 
_start:
Line 181: Line 183:
 
ret
 
ret
 
</source>}}
 
</source>}}
 +
 +
 +
== See also ==
 +
:''Related tool: '''[[shellcodecs]]'''''
 +
Other loaders include the [[Shellcode/Appendix#dynamic-loader.c|dynamic loader]] and the [[Shellcode/Appendix#socket-loader.c|dynamic socket loader]].  These are used when the [[shellcode]] depends on the context of the [[vulnerability|vulnerable]] [[binary]] [[application]] containing a dynamic section for linking.
 +
 +
* [[Shellcode]]
 +
* [[Shellcode/Environment|Shellcode environment]]
 +
* [[Shellcode/Dynamic|Dynamic shellcode]]
 +
 +
{{social}}

Latest revision as of 02:18, 25 April 2013

Shellcode loaders are used to test shellcode before use in a buffer overflow or other form of binary exploitation. The best way to construct a loader for user-friendly operations is by taking the shellcode as a command line argument and passing it to freshly allocated executable memory space. This article examines the construction of such a loader for Linux in assembly language for the 64-bit x86 instruction set architecture, though a 32-bit shellcode loader is provided in the appendix.

c3el4.png
The code discussed here is revisited later to be converted into shellcode for use in a polymorphic shellcode decoder, so it is nearly null-free. Full source available in the appendix or by downloading shellcodecs.

Executable loader

This section examines the 64-bit loader provided in the shellcodecs package.

Command Line Arguments

Command line arguments are pushed onto the stack in this order: second argument, first argument, number of arguments. Therefore, in order to get the shellcode from the arguments, pop the %rbx register three times. Once this is done, the %rbx register will contain a pointer to the shellcode:

 
_start:
    pop %rbx  # argc
    pop %rbx  # arg0
    pop %rbx  # arg1 pointer
 

Executable memory allocation with mmap()

See also: Unlinked 64-bit system calls, the 64-bit system call table

Because modern operating systems have non-executable stacks by default, an executable stack must be constructed for successful code execution. This is done with the mmap() system call.

The prototype for mmap() is:

 
  void *mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t off);
 

On 64-bit processors, function arguments are passed like so:

 
   function_call(%rax) = function(%rdi,  %rsi,  %rdx,  %r10,  %r8,  %r9)
                 ^system          ^arg1  ^arg2  ^arg3  ^arg4  ^arg5 ^arg6
                  call #
 

First, the system call number for mmap() is placed into %rax:

 
    push $0x9
    pop %rax
 

The first argument (%rdi) of mmap() should be null, so using xor, %rdi is set to zero.

 
    xor %rdi, %rdi
 
  • The desired size of the buffer (4096 bytes or 0x1000 in hex) is passed into %rsi as the second argument to mmap.

The %rsi register is initialized to zero by pushing %rdi and popping %rsi:

 
    push %rdi
    pop %rsi
 

Then incremented to get it to 0x0001:

 
    inc %rsi
 

And shifted left 12 bits (1 shifted left 12 bits will become 0x1000 or binary 00010000 00000000):

 
    shl $0x12, %rsi
 

The third argument (%rdx) contains the memory permissions (read, write, execute, or none), for multiple, they are put together using bitwise or. Since 7 is the result of ORing the flags PROT_READ, PROT_WRITE, and PROT_EXEC, the or itself is skipped and its value (7) is stored in the %rdx register.

 
    push $0x7
    pop %rdx
 

The flags argument functions the same way as the "prot" argument, but requires constants for mapping. In this case MAP_PRIVATE|MAP_ANONYMOUS, which maps out to 0x22, is stored in %r10.

 
    push $0x22
    pop %r10
 

The final two arguments should be null and stored in %r8 and %r9.

 
    push %rdi
    push %rdi
    pop %r8
    pop %r9
 

Once the registers are set, a syscall is used to invoke mmap().

 
    syscall   # The syscall for the mmap().
 

The %rax register now contains a pointer to the buffer returned by mmap() to copy the shellcode into.

Copying the code into the new memory

The %rsi register is initialized to 0 to be used as a counter:

 
inject: 
    xor %rsi, %rsi
 

%rdi will be null as well because the current byte is compared to %dil to determine when the end of the shellcode has been reached.

 
    push %rsi
    pop %rdi    
 

If so, then jump to inject_finished and actually execute the code.

 
inject_loop:
    cmpb %dil, (%rbx, %rsi, 1)
    je inject_finished
 

Each byte of the shellcode is moved from %rbx + %rsi (current location) into %rax + %rsi (new executable memory) through the '%r10b single-byte sub-register of %r10:

 
    movb (%rbx, %rsi, 1), %r10b
    movb %r10b, (%rax,%rsi,1)
 

%rsi is incremented as both the offset and the counter:

 
    inc %rsi
 

And then the loop restarts:

 
    jmp inject_loop
 

The inject_finished routine then appends the ret opcode, 0xc3, to the end of the shellcode:

 
inject_finished:
    movb $0xc3, (%rax, %rsi, 1)
 
c3el4.png
In proper machine language terminology, an opcode is an instruction while bytecode refers to the opcode as well as its arguments, called operands.

Returning to the code

The reason that the code is returned to rather than jumped to or called is because this more adequately simulates the environment similar to that of a vulnerable application at the time of a buffer overflow. A payload is returned to, and therefore, when shellcode is loaded, it should also be returned to.

First, ret_to_shellcode is called. This causes the address of exit to be pushed onto the stack, so that the end of the shellcode now returns to <address of exit>.

 
    call ret_to_shellcode
 

The original return address is then overwritten with the address of the shellcode, and returned into.

 
ret_to_shellcode:
    push %rax
    ret
 

When the shellcode completes, it will return to the exit function to exit cleanly:

 
exit:
    push $60
    pop %rax
    xor %rdi, %rdi
    syscall
 

Once the code is complete (source) it can be built the same way as any assembly program:

 ╭─user@host ~  
 ╰─➤  as -oloader-64.o loader-64.s
 ╭─user@host ~
 ╰─➤  ld -oloader-64 loader-64.o

Or by typing `make' from the root directory of shellcodecs.

Using the executable loader

The shellcode invoked here is the same as the shellcode constructed and extracted earlier. Notice the change in prompt, and that exit returns the original prompt. This indicates that the shellcode executed successfully.

 ╭─user@host ~
 ╰─➤  ./loader "$(echo -en "\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58\x0f\x05");" 
 [user@host ~]$ exit
 exit
 ╭─user@host ~
 ╰─➤

Return oriented loader

Return oriented code can be tested using a loader as well; though a much smaller loader is used as return-oriented code should not require executable memory allocation:

 
_start:
 
pop %rbx  
pop %rbx  
pop %rsp  # %rsp now points to arg1 in the stack
ret
 


See also

Related tool: shellcodecs

Other loaders include the dynamic loader and the dynamic socket loader. These are used when the shellcode depends on the context of the vulnerable binary application containing a dynamic section for linking.