Difference between revisions of "Shellcode/Environment"
(→Alphanumeric) |
Chantal21I (Talk | contribs) (Tried to grammar everything. →Everything) |
||
Line 1: | Line 1: | ||
− | It is possible use [[shellcode]] to [[#x86/x64_GetCPU_(any_OS)|determine instruction set architecture]], [[#GetPc| | + | It is possible use [[shellcode]] to [[#x86/x64_GetCPU_(any_OS)|determine instruction set architecture]], [[#GetPc|the process counter]], [[#Last_call|the location last returned to]], or [[#int3_breakpoints|bypass and detect int3 breakpoints]] within the current execution environment. |
{{info|<center>The code and ideas discussed here are part of an [[shellcode|all-encompassing shellcode portal]]. Everything described here and the full source of any given code is available in [[Shellcode/Appendix#Environment|the appendix]], as well as in the downloadable [[shellcodecs]] package.</center>}} | {{info|<center>The code and ideas discussed here are part of an [[shellcode|all-encompassing shellcode portal]]. Everything described here and the full source of any given code is available in [[Shellcode/Appendix#Environment|the appendix]], as well as in the downloadable [[shellcodecs]] package.</center>}} | ||
Line 6: | Line 6: | ||
== Alphanumeric x86/x64 GetCPU (any OS) == | == Alphanumeric x86/x64 GetCPU (any OS) == | ||
− | Architecture can only be determined when compatible channels between the target [[instruction set architecture]]s can be isolated. As long as the [[assembly#instructions|instructions]] perform valid behavior and do not cause [[segmentation fault|access faults]] on [[operating system]]s native to the architecture, it is possible to use a single bytecode sequence in order to determine architecture across a variety of processors. It takes a | + | Architecture can only be determined when compatible channels between the target [[instruction set architecture]]s can be isolated. As long as the [[assembly#instructions|instructions]] perform valid behavior and do not cause [[segmentation fault|access faults]] on [[operating system]]s native to the architecture, it is possible to use a single bytecode sequence in order to determine architecture across a variety of processors. It generally takes a great amount of familiarity and experience with two or more given instruction sets to write [[shellcode]] for multiple architectures. |
− | The x86_64 [[instruction set architecture]] (also known as ''x64'') does not vastly differ from x86 because '''AMD''' | + | The x86_64 [[instruction set architecture]] (also known as ''x64'') does not vastly differ from x86 because of '''AMD''''s correcting of Intel's calling convention and architecture. |
− | An [[Shellcode/Appendix/Alphanumeric_opcode#x86_Intercompatibility|alphanumeric instruction compatibility chart]] can be derived | + | An [[Shellcode/Appendix/Alphanumeric_opcode#x86_Intercompatibility|alphanumeric instruction compatibility chart]] can be derived by cross referencing [[Shellcode/Appendix/Alphanumeric_opcode#64-bit_alphanumeric_opcodes|available alphanumeric 64 bit instructions]] with [[Shellcode/Appendix/Alphanumeric_opcode#32-bit_printable_opcodes|available printable 32 bit instructions]]. |
=== Inter-compatibility theory === | === Inter-compatibility theory === | ||
− | + | Specifically non-compatible are the 32 and 64 bit opcodes in the range '''0x40-0x4f''', as they allow a 32 bit processor to increment or decrement its general-purpose registers, but become prefixes for manipulation of 64 bit registers and 8 additional 64 bit general purpose registers in x64 environments, '''%r8-%r15'''. | |
− | + | Since not ''all'' opcodes are intercompatible, yet comparisons and conditional jumps ''are'' intercompatible, it is possible to determine the architecture of an x86 processor using exclusively alphanumeric opcodes. | |
− | By making use of these additional registers (which 32 bit processors do not have), one can perform an operation that will set a value on a different register in the two processors. | + | By making use of these additional registers (which 32 bit processors do not have), one can perform an operation that will set a value on a different register in the two processors. Following this, a conditional statement can be made against one of the two registers to determine if the value was set. |
− | + | ||
− | Following this, a conditional statement can be made against one of the two registers to determine if the value was set. | + | |
Using the '''pop''' instruction is the most effective way to set the value of a register due to instructional limitations (to keep the code alphanumeric). Using an alternative register to %rsp or %esp as a placeholder for the stack pointer enables the use of an effective conditional statement to determine if the value of a register is equal to the most recent thing pushed or popped from the stack. | Using the '''pop''' instruction is the most effective way to set the value of a register due to instructional limitations (to keep the code alphanumeric). Using an alternative register to %rsp or %esp as a placeholder for the stack pointer enables the use of an effective conditional statement to determine if the value of a register is equal to the most recent thing pushed or popped from the stack. | ||
Line 86: | Line 84: | ||
|} | |} | ||
− | This code executes similarly on both a 32-bit and 64-bit system | + | This code executes similarly, but not identically, on both a 32-bit and 64-bit system. The key discrepancy between how the two architectures execute this code is centered around the opcodes "0x41 - 0x4f". |
− | Under a 32-bit architecture, this code pops the value of esp (which was pushed onto the stack previously) into edx and increments ecx. | + | Under a 32-bit architecture, this code pops the value of esp (which was pushed onto the stack previously) into edx and increments ecx. Since the esp register is a pointer to the top of the stack, rdx now contains a pointer to the top of the stack. After this is done, the shellcode continues on to push esi onto the stack - so the value of esi now resides at the top of the stack. When the code executes its final comparison - "cmp %esi, (%edx)" - it is comparing the value of esi to the value that edx points to. As edx points to the top of the stack, and because esi has just been pushed to the top of the stack, the resulting comparison is between the value esi and the value esi. Therefore, ''equal'' is returned. |
− | + | Under a 64-bit architecture this shellcode is ''not equal'' . The opcodes "41 5a 41 59" have a different function - instead of popping the value of esp into rdx, it pops it into the 64-bit register %r10 without incrementing %ecx or %rcx, as 0x41 is used as a prefix to indicate the access to the 64-bit architecture. As a result, when the final comparison is made between rsi and the value referenced by rdx, it returns not equal, as rdx does not point to the top of the stack. | |
On a 64-bit system, this will not cause a [[segmentation fault|segfault]] because (%rdx) points to somewhere inside the stack. | On a 64-bit system, this will not cause a [[segmentation fault|segfault]] because (%rdx) points to somewhere inside the stack. | ||
== GetPc == | == GetPc == | ||
− | The '''GetPc''' technique is | + | The '''GetPc''' technique is implementation of code which obtains the current instruction pointer. This can be useful when writing [[Shellcode/Self-modifying|self-modifying shellcode]], or other code that must become aware of its environment, as environment information cannot be supplied prior to execution of the code. |
=== x86 (32 bit) === | === x86 (32 bit) === | ||
Line 127: | Line 125: | ||
== Last call == | == Last call == | ||
− | Typically, when [[shellcode]] is being executed at the time of [[buffer overflow]], assuming that the nop sled does not modify the stack, the [[memory addresses|pointer]] to the beginning of the executing code is at -0x8(%rsp), or -0x4(%esp), because it was just ''[[return oriented programming|returned to]]'' as a result of the [[call stack]] being overwritten during the overflow process. In many cases, this can be used in place of a '''[[#GetPc|GetPc]]''' for [[Shellcode/Self-modifying|polymorphic shellcode]]. | + | Typically, when [[shellcode]] is being executed at the time of a [[buffer overflow]], assuming that the nop sled does not modify the stack, the [[memory addresses|pointer]] to the beginning of the executing code is at -0x8(%rsp), or -0x4(%esp), because it was just ''[[return oriented programming|returned to]]'' as a result of the [[call stack]] being overwritten during the overflow process. In many cases, this can be used in place of a '''[[#GetPc|GetPc]]''' for [[Shellcode/Self-modifying|polymorphic shellcode]]. |
=== 32-bit === | === 32-bit === | ||
Line 155: | Line 153: | ||
{{code|text=<center>Assembled ''x64''<br />'''XTX4E4UH10H30'''</center>}} | {{code|text=<center>Assembled ''x64''<br />'''XTX4E4UH10H30'''</center>}} | ||
− | + | The steps taken in order to obtain the address to the beginning of the [[shellcode]] in only [[Alphanumeric_shellcode|alphanumeric]] code are a little more complex. | |
+ | |||
+ | First, to prevent destroying the return pointer (the target data), at least 8 must be added to the stack pointer. This can be done with the use of any conventional ''pop'' operation, in this case, ''pop %rax'', which then moves ''%rsp'' into ''%rax'' through a ''[[push]]'' / ''[[pop]]'' mov emulation. | ||
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
Line 172: | Line 172: | ||
</source>}} | </source>}} | ||
− | This effectively runs a ''[[not]]'' operation against a single bit. Because of this, the operation is actually + or -, and is not always guaranteed to work its first execution | + | This effectively runs a ''[[not]]'' operation against a single bit. Because of this, the operation is actually + or -, and is not always guaranteed to work its first execution; the problem of which is very easily mitigated through repeated execution. |
The most recently returned-from [[return address]] is then moved into ''%rsi'' through the use of an ''[[xor]]'' mov emulation: | The most recently returned-from [[return address]] is then moved into ''%rsi'' through the use of an ''[[xor]]'' mov emulation: | ||
Line 219: | Line 219: | ||
7: 75 09 jne 0x09 | 7: 75 09 jne 0x09 | ||
− | However, | + | However, it is read by an inline disassembler as: |
d: 68 6a 68 58 34 pushq $0x3458686a | d: 68 6a 68 58 34 pushq $0x3458686a | ||
Line 225: | Line 225: | ||
17: 90 nop | 17: 90 nop | ||
− | This is because an inline disassembler does not recognize code based on how it is executed but on how it looks in memory | + | This is because an inline disassembler does not recognize code based on how it is executed but on how it looks in memory; however, because the first ''0x68'' is skipped completely, the code is executed differently than what appears in memory. What this code actually does is detect breakpoints. First, it moves ''0x68'' into ''%rax''. Then, if a breakpoint has been set on the second push instruction, the ''xor $0x68,%al'' instruction will become ''xor $0xcc,%al'' (0xcc is the breakpoint instruction), and instead of ''%rax'' being nulled (0x68 xor 0x68 becomes 0), it will become 0xa4. The test instruction checks if ''%rax'' is zero: if it is not zero the code then ''jmp''s 0x09 bytes forward (this behaviour can be adjusted to act however the programmer desires). This code allows arbitrary shellcode to detect breakpoints and act differently depending on whether or not they exist. |
− | The following is a demonstration of this code | + | The following is a demonstration of this specific code in use. In the first demonstration, a breakpoint is set on the ''nop'' instruction and the breakpoint is hit. In the second, the breakpoint is set on the second ''push'' instruction, and the breakpoint is skipped. |
{} shellcode gdb loaders/loader-64 | {} shellcode gdb loaders/loader-64 |
Revision as of 02:34, 2 December 2012
It is possible use shellcode to determine instruction set architecture, the process counter, the location last returned to, or bypass and detect int3 breakpoints within the current execution environment.
Contents
Alphanumeric x86/x64 GetCPU (any OS)
Architecture can only be determined when compatible channels between the target instruction set architectures can be isolated. As long as the instructions perform valid behavior and do not cause access faults on operating systems native to the architecture, it is possible to use a single bytecode sequence in order to determine architecture across a variety of processors. It generally takes a great amount of familiarity and experience with two or more given instruction sets to write shellcode for multiple architectures.
The x86_64 instruction set architecture (also known as x64) does not vastly differ from x86 because of AMD's correcting of Intel's calling convention and architecture.
An alphanumeric instruction compatibility chart can be derived by cross referencing available alphanumeric 64 bit instructions with available printable 32 bit instructions.
Inter-compatibility theory
Specifically non-compatible are the 32 and 64 bit opcodes in the range 0x40-0x4f, as they allow a 32 bit processor to increment or decrement its general-purpose registers, but become prefixes for manipulation of 64 bit registers and 8 additional 64 bit general purpose registers in x64 environments, %r8-%r15.
Since not all opcodes are intercompatible, yet comparisons and conditional jumps are intercompatible, it is possible to determine the architecture of an x86 processor using exclusively alphanumeric opcodes.
By making use of these additional registers (which 32 bit processors do not have), one can perform an operation that will set a value on a different register in the two processors. Following this, a conditional statement can be made against one of the two registers to determine if the value was set.
Using the pop instruction is the most effective way to set the value of a register due to instructional limitations (to keep the code alphanumeric). Using an alternative register to %rsp or %esp as a placeholder for the stack pointer enables the use of an effective conditional statement to determine if the value of a register is equal to the most recent thing pushed or popped from the stack.
Practically Applied: Code
This simple alphanumeric bytecode is 15 bytes long, ending in a comparison which returns equal on a 32 bit system and not equal on a 64 bit system.
When implementing this bytecode, a conditional jump afterwards may be best reserved for the t and u instructions, jump if equal and jump if not equal, respectively.
- Assembled:
- TX4HPZTAZAYVH92
- Disassembly:
The table here shows opcodes on the left when instructions are equivocal, and opcodes on the right when they differentiate per instruction set.
OpCodes | x86 | x64 |
---|---|---|
54 | push %esp |
push %rsp |
58 | pop %eax |
pop %rax |
34 48 | xor $0x48, %al |
xor $0x48, %al |
50 | push %eax |
push %rax |
5a | pop %edx |
pop %rdx |
54 | push %esp |
push %rsp |
41
5a |
inc %ecx pop %edx |
pop %r10 |
41
59 |
inc %ecx pop %ecx |
pop %r9 |
56 | push %esi |
push %rsi |
48
39 32 |
dec %eax cmp %esi,(%edx) |
cmp %rsi,(%rdx) |
This code executes similarly, but not identically, on both a 32-bit and 64-bit system. The key discrepancy between how the two architectures execute this code is centered around the opcodes "0x41 - 0x4f".
Under a 32-bit architecture, this code pops the value of esp (which was pushed onto the stack previously) into edx and increments ecx. Since the esp register is a pointer to the top of the stack, rdx now contains a pointer to the top of the stack. After this is done, the shellcode continues on to push esi onto the stack - so the value of esi now resides at the top of the stack. When the code executes its final comparison - "cmp %esi, (%edx)" - it is comparing the value of esi to the value that edx points to. As edx points to the top of the stack, and because esi has just been pushed to the top of the stack, the resulting comparison is between the value esi and the value esi. Therefore, equal is returned.
Under a 64-bit architecture this shellcode is not equal . The opcodes "41 5a 41 59" have a different function - instead of popping the value of esp into rdx, it pops it into the 64-bit register %r10 without incrementing %ecx or %rcx, as 0x41 is used as a prefix to indicate the access to the 64-bit architecture. As a result, when the final comparison is made between rsi and the value referenced by rdx, it returns not equal, as rdx does not point to the top of the stack.
On a 64-bit system, this will not cause a segfault because (%rdx) points to somewhere inside the stack.
GetPc
The GetPc technique is implementation of code which obtains the current instruction pointer. This can be useful when writing self-modifying shellcode, or other code that must become aware of its environment, as environment information cannot be supplied prior to execution of the code.
x86 (32 bit)
jmp startup getpc: mov (%esp), %eax ret startup: call getpc ; the %eax register now contains %eip on the next line |
x64
jmp startup getpc: mov (%rsp), %rax ret startup: call getpc ; the %rax register now contains %rip on the next line |
- Alternatively:
jmp startup pc: nop startup: lea -1(%rip), %rax ; the %rax register now contains the address of `pc'. |
Last call
Typically, when shellcode is being executed at the time of a buffer overflow, assuming that the nop sled does not modify the stack, the pointer to the beginning of the executing code is at -0x8(%rsp), or -0x4(%esp), because it was just returned to as a result of the call stack being overwritten during the overflow process. In many cases, this can be used in place of a GetPc for polymorphic shellcode.
32-bit
Null-free
mov -0x4(%esp), %eax |
Alphanumeric
pop %eax push %esp pop %eax xor $0x45, %al xor $0x41, %al xor %esi, (%eax) xor (%eax), %esi |
64-bit
Null-free
mov -0x8(%rsp), %rax |
Alphanumeric
XTX4E4UH10H30 |
The steps taken in order to obtain the address to the beginning of the shellcode in only alphanumeric code are a little more complex.
First, to prevent destroying the return pointer (the target data), at least 8 must be added to the stack pointer. This can be done with the use of any conventional pop operation, in this case, pop %rax, which then moves %rsp into %rax through a push / pop mov emulation.
pop %rax push %rsp # move pointer to %rsp into %rax pop %rax |
Because the pop has added 0x8 to %rsp, 0x10 must be substracted from %rax in order to access the return pointer, this is emulated by XORing %al with 0x45 and then 0x55:
xor $0x45,%al # subtract 0x10 from %rax xor $0x55,%al |
This effectively runs a not operation against a single bit. Because of this, the operation is actually + or -, and is not always guaranteed to work its first execution; the problem of which is very easily mitigated through repeated execution.
The most recently returned-from return address is then moved into %rsi through the use of an xor mov emulation:
xor %rsi,(%rax) xor (%rax),%rsi # move address to last instruction into %rax |
int3 breakpoints
Int3 breakpoints can be detected during out-of-line code execution when the code in question is being debugged by an in-line debugger.
.text .global _start _start: jmp startup go_retro: pop %rcx inc %rcx jmp *%rcx startup: call go_retro volatile_segment: push $0x3458686a push $0x0975c084 nop |
The relevant code in this snippet is:
push $0x3458686a push $0x0975c084 |
When the code jumps to the code directly after the first push (0x68), it gets read by the CPU as:
0: 6a 68 pushq $0x68 2: 58 pop %rax 3: 34 68 xor $0x68,%al 5: 85 c0 test %eax,%eax 7: 75 09 jne 0x09
However, it is read by an inline disassembler as:
d: 68 6a 68 58 34 pushq $0x3458686a 12: 68 84 c0 75 09 pushq $0x975c084 17: 90 nop
This is because an inline disassembler does not recognize code based on how it is executed but on how it looks in memory; however, because the first 0x68 is skipped completely, the code is executed differently than what appears in memory. What this code actually does is detect breakpoints. First, it moves 0x68 into %rax. Then, if a breakpoint has been set on the second push instruction, the xor $0x68,%al instruction will become xor $0xcc,%al (0xcc is the breakpoint instruction), and instead of %rax being nulled (0x68 xor 0x68 becomes 0), it will become 0xa4. The test instruction checks if %rax is zero: if it is not zero the code then jmps 0x09 bytes forward (this behaviour can be adjusted to act however the programmer desires). This code allows arbitrary shellcode to detect breakpoints and act differently depending on whether or not they exist.
The following is a demonstration of this specific code in use. In the first demonstration, a breakpoint is set on the nop instruction and the breakpoint is hit. In the second, the breakpoint is set on the second push instruction, and the breakpoint is skipped.
{} shellcode gdb loaders/loader-64 Reading symbols from /home/user/loaders/loader-64...(no debugging symbols found)...done. (gdb) break ret_to_shellcode Breakpoint 1 at 0x4000b1 (gdb) run "$(generators/shellcode-generator.py --file=int3 --raw)" Starting program: /home/user/loaders/loader-64 "$(generators/shellcode-generator.py --file=int3 --raw)" Breakpoint 1, 0x00000000004000b1 in ret_to_shellcode () (gdb) x/24i $rax 0x7ffff7fbe000: jmp 0x7ffff7fbe008 0x7ffff7fbe002: pop %rcx 0x7ffff7fbe003: inc %rcx 0x7ffff7fbe006: jmpq *%rcx 0x7ffff7fbe008: callq 0x7ffff7fbe002 0x7ffff7fbe00d: pushq $0x3458686a 0x7ffff7fbe012: pushq $0x975c084 0x7ffff7fbe017: nop ... (gdb) break *0x7ffff7fbe017 Breakpoint 2 at 0x7ffff7fbe017 (gdb) c Continuing. Breakpoint 2, 0x00007ffff7fbe017 in ?? () (gdb) quit A debugging session is active. Inferior 1 [process 9760] will be killed. Quit anyway? (y or n) y
{} shellcode gdb loaders/loader-64 Reading symbols from /home/user/loaders/loader-64...(no debugging symbols found)...done. (gdb) break ret_to_shellcode Breakpoint 1 at 0x4000b1 (gdb) run "$(generators/shellcode-generator.py --file=int3 --raw)" Starting program: /home/user/loaders/loader-64 "$(generators/shellcode-generator.py --file=int3 --raw)" Breakpoint 1, 0x00000000004000b1 in ret_to_shellcode () (gdb) x/24i $rax 0x7ffff7fbe000: jmp 0x7ffff7fbe008 0x7ffff7fbe002: pop %rcx 0x7ffff7fbe003: inc %rcx 0x7ffff7fbe006: jmpq *%rcx 0x7ffff7fbe008: callq 0x7ffff7fbe002 0x7ffff7fbe00d: pushq $0x3458686a 0x7ffff7fbe012: pushq $0x975c084 0x7ffff7fbe017: nop ... (gdb) break *0x7ffff7fbe012 Breakpoint 2 at 0x7ffff7fbe012 (gdb) c Continuing. [Inferior 1 (process 9778) exited normally] (gdb)