Questions about this topic? Sign up to ask in the talk tab.

Difference between revisions of "C"

From NetSec
Jump to: navigation, search
(If/Else)
Line 381: Line 381:
  
 
{{programming}}{{social}}
 
{{programming}}{{social}}
 +
 +
= Intermediate Concepts
 +
 +
== The Language and the Machine ==
 +
 +
C is often misinterpreted as a low-level language since as it abstracts little in terms of memory management and doesn't provide many "abstracted" data structures. In fact, the primitive types in C are based on what is available in the hardware, 32-bit integers, 64-bit floats, and so on. Although it is very close to the machine in this sense, and often touted as a "portable assembler", it is still a high-level language since it's grammar is not a 1 to 1 (or close) mapping to the underlying instruction set. It requires a compiler because it is a high level language, as low-level languages need only assemblers or substitution/translator operations.
 +
 +
Having the language be so close to the machine is often seen as a disadvantage, but if you understand the underlying implementation stack and the environment you're working in, it instead becomes a huge advantage. It allows you to create applications that perform well, and most importantly, perform predictably, and hence are secure.
 +
 +
=== Pointers in C ===
 +
 +
Pointers are an important concept and feature in C. All variables can be treated as pointers, and the ability to inspect pointers gives one an insight into how the memory model of the underlying machine works. In short, pointers are exactly what their name implies, they "point" to other data. When you access a variable, you are accessing the data the variable is associated with. When you access the value of a pointer, you are accessing the address of that variable. When you "dereference" a pointer, you are accessing the value -stored- at the address.  Pointers are declared like so:
 +
 +
{{code|text=
 +
<syntaxhighlight lang="c">
 +
  int *pointer; // declare integer pointer
 +
  int number = 4; // declare variable that holds the value 4
 +
  pointer = &number; // Assign the address of the "number" variable to the pointer
 +
  *pointer = 5; // Assign the value 5 to the address contained within "pointer", which at the moment is the address of "number", which changes the value of "number" to 5
 +
</syntaxhighlight>
 +
}}
 +
 +
The data type comes first, then the pointer's name prefixed by the *, usually called the "dereference" operator. The data type is necessary as it tells the compiler how much data the pointer's domain covers, as where a char is one byte, an int is usually 4 bytes, so if you try to access an int through a char pointer, you will only access the first (or last in little endian) byte. This is allowed, and used in many cases, but you should make sure that it's intentional, the compiler will always issue warnings unless you manually dispell them using a cast.
 +
 +
In reality, every variable is in itself a pointer. If you use the address-of operator &, you can get the address of any variable, which is exactly what a pointer stores as its value. It's just that accessing a variable's value directly accesses its memory, where accessing a pointer's value directly accesses the address of where that value is stored. It's only a conceptual abstraction designed to give the programmer the ability to think of stored values as discrete and isolated.
 +
  
 
[[Category:Programming Languages]] [[Category:Compiled languages]]
 
[[Category:Programming Languages]] [[Category:Compiled languages]]

Revision as of 20:07, 2 June 2012

C is a high-level programming language which allows you to construct programs writing in a syntactical form. When compiled (typically using cc (short for C compiler) or gcc (GNU C compiler)), the C code will be converted into machine-readable code to execute the program. Most distributions have gcc as base package included so that no further setup is necessary in order to start developing C-programs, if not however you can install all necessary applications through your respective package manager with these commands:

  • Debian/Ubuntu
 apt-get install build-essential
  • Arch Linux
 pacman -S base-devel

Overview

Basic programs can be broken down into 3 main categories: variables, loops, and If/Else statements.

Basic Formatting

Each C program follows a general format.

Includes

Includes are calls from within a C program which reference a set of declarations, whether functions or global variables, or compile-time defines (macros). They are used for including sets of previously implemented functionality, as opposed to reinventing the wheel with each program. There is a collection of "standard" includes which make up the standard C library, and on top of that there are OS-standard includes, such as the ones defined in POSIX.

Includes in C follow this syntax:

<syntaxhighlight lang="c">

  1. include <library.h>

//searches for library.h in the default directory of libraries

  1. include "/path/library.h"

//searches for library.h in the defined path </syntaxhighlight>

A few includes are recommended for every C program - namely stdio.h (a library defining functions to deal with basic input and output). A C program can be compiled without any includes, but you will be fairly limited in the functionality you are able to leverage. By convention, includes are normally placed at the beginning of a program, although it is not necessary.

The main() Function

The main() function is the entry point of the program, unlike interpreted languages which are parsed linearly and then run after the definition tree is built, most executable formats require an entry point so that the Operating System knows where to "start".

The main function is the function that calls all other functions. As an example, consider the canonical "Hello World" in C:

<syntaxhighlight lang="c">

  1. include <stdio.h>

int main() {

   printf(%s, "Hello, world!\n");
   return 0;

} </syntaxhighlight>

Execution starts inside of main, which prints out "Hello, world!" and exits on "return 0".

Variables

A variable is a value that stores data that can be edited, modified, and used at a later time. To declare a variable in the C language your first declare its type and then the variable name. Some of the basic variable types are:

 
int iName;
float fName;
double dName;
char cName;
 

Integer or int variables can store whole numbers while floats and doubles can hold integer values with decimal places. A char type variable can only hold a single character. while C itself does not have a string variable type you can create a array of characters refereed to as a CString to accomplish the same task.

Further you can use Variables of any type as pointers or arrays by adding their respective signs. These can be used to store multiple Values in one Variable (Array) or to store information about memory allocation (Pointer) :

 
int *iPointer;
char cArray[];
 

Loops

In C there are three types of loops that allow the user to accomplish a repetitive task without repeating numerous lines of code. These three basic loops are called the for loop, the while loop, and the do while loop. Each loop has their own purpose for being used and normally follow the same syntax. All loops are based of an equation and if that equation does not evaluate to true then the looping will not halt. A for loop is good for a repetitive task that you know how many times you want to repeat, while a while loops is normally used when how many times you need to loop is unknown like when you are reading a text document. A do while loop is almost the same as a while loop except for one difference, it runs its code at least once before checking if it should stop looping


The for loop loop deserves special attention as it's slightly more complicated than the other two:


<syntaxhighlight lang="c"> int i; for(i = 0; i < 10; i++) //Assign 0 to integer one; when "i" is less then 10 increment "i" by one {

  //code to repeat 9 times

} </syntaxhighlight>

The loop check has 3 parts. The part before the first semicolon, in this case "i = 0", is the loop initializer. The code inside it is run once when the loop is first called, so in this case, it sets the value of "i" to 0. The second part is the check or predicate, this is a statement that's treated as a boolean (true/false) value, if the statement ever evaluates to false (or 0), the loop ends. The last part is the block that's executed *at the end of* a full iteration of the loop.


<syntaxhighlight lang="c"> char myChar; while(myChar != 'c') //While "myChar" does not equal "c" continue to loop {

  scanf("%c", &myChar); //get input from the user and put it into variable "myChar"

} </syntaxhighlight>


<syntaxhighlight lang="c"> do //loop at least once {

  x = x + 1;   //variable x equals itself plus one (if x equals 0 then x = 0 + 1)

} while(x < 2); //check to see if condition to stop looping is met </syntaxhighlight>


If/Else

If/Else statements are used when you need some way to control the flow of execution of your code. These statements are just like asking questions and depending upon if the answer is true or false the program may execute differently.

Simple example:


<syntaxhighlight lang="c"> if(1 == 1) //if 1 equals 1 execute the true code block {

  printf("This is the true code block");  //execute the true code block

} else{ //if the statement is not true

  printf("This is the false code block"); //execute the false code block

} </syntaxhighlight>


You can chain "else" and "if" to create complex flows. Here is an example:


<syntaxhighlight lang="c"> if(a == 1) //if a equals 1 execute the following block {

  printf("a equals 1!");  

} else if (b == 1) { //otherwise, check if b equals 1

  printf("b equals 1!");

} else { // This executes only if all the predicates before it failed, else can be used as a "catch-all"

  printf("Neither a or b equal 1!");

} </syntaxhighlight>

Optimizing Security of your Programs

In this part we will present you a few functions that should be avoided and their counter pieces which are to be preferred, as well as general advice on secure programming with C.

Avoiding Buffer Overflow Vulnerabilities

Buffer Overflows occur when programs try to store more information in a variable than it has memory allocated for. For example if you declare a variable that's defined as an array of 9 characters it has space for 8 characters plus the terminating null, so if this variable receives its input from stdin it is easy for a user to corrupt your programs functionality simply by assigning it 9 or more characters. This can only be avoided by sanitizing your input, for example consider the following code snippet:

<syntaxhighlight lang="c">

 char password[9];
 printf("Please enter your password: ");
 fflush(stdout);
 gets(password);

</syntaxhighlight>

The danger here lies within the gets() function which simply copies the whole input onto the stack which will most likely result in data corruption. The safe way to solve this would be to use fgets() instead, which has additional parameters that allow you to sanitize input. Simply change the last line of the program to

   fgets(password, 9, stdin);

Like this the fgets() function only copies the first 8 characters from stdin onto the stack, following the terminating null. Because of that you should also always check the length of the input, it needs to be 'sizeof() - 1' in order to avoid overwriting the null byte.

Other functions that are to be avoided for the same reasons are strcpy(), strlen() and sprintf(). Instead use their safer counterparts strncpy(), strnlen() and snprintf().

Program Environment

A well written program should never be designed in such way that it relies on information about its environment, such as the working directory or the value of its umask, which is why you should only use full-path names instead of relative names in order to work with external files. Also you should consider what UID and GID you will let your program run under, for example it is extremely important that vulnerable or potentially malicious programs don't run as root. It is always advised to grant the program only the permissions it absolutely needs to perform its tasks, so you can for example consider to run the program under a designated UID and a correspondant GID, giving them only restricted access to whatever the program requires.

system() & popen()

These functions are used to call exterior programs that are installed on your system. They should be avoided because they will spawn a shell in order to do so. Instead use fork() or exec() to achieve the same goal without compromising the security of your programs environment.


Compilation

In order to run a finished program you will have to create an executable binary by compiling the source code, which is done with cc/gcc

 gcc -Wall -o <output file> <sourcecode file>
 chmod +x <output file>

where -Wall enables full documentation of warnings during the compilation process.

Sometimes you will have to include a specific library and if it's not found, specify the directory in which it is contained. This is done with the -l and -L parameters, for example if you need to include the library libconv-core.a which is located in /usr/local/char/lib the command to use would look like this

 gcc -Wall -lconv-core -L/usr/local/char/lib -o <output file> <sourcecode file>

Example Program

This simple little script makes use of two subfunctions (datew and daten) which are each called by another function (main->daten->datew) in order to generate a list of all dates inbetween the year 999 and the end of 2012, in all numeric and a few alphabetic formats, with only those simple methods described in this article.

<syntaxhighlight lang="c">

  1. include <stdio.h>

int daten(char *limiter) {

   int month;
   int day;
   int n;
   for (n = 999; n < 2013; n++) {
       for (month = 1; month < 13; month++) {
           for (day = 1; day < 32; day++) {
               if ((day < 10) && (month < 10)) {
                   printf("0%d%s%d%s%d\n", day, limiter, month, limiter, n); 
                   printf("%d%s%d%s0%d\n", n, limiter, month, limiter, day); 
                   printf("%d%s0%d%s%d\n", day, limiter, month, limiter, n); 
                   printf("%d%s0%d%s%d\n", n, limiter, month, limiter, day); 
                   printf("0%d%s0%d%s%d\n", day, limiter, month, limiter, n); 
                   printf("%d%s0%d%s0%d\n", n, limiter, month, limiter, day); 
                   printf("%d%s%d%s%d\n", day, limiter, month, limiter, n); 
                   printf("%d%s%d%s%d\n", n, limiter, month, limiter, day); 
               }
               else if ((day < 10) && (month >= 10)) {
                   printf("0%d%s%d%s%d\n", day, limiter, month, limiter, n); 
                   printf("%d%s0%d%s%d\n", month, limiter, day, limiter, n); 
                   printf("%d%s%d%s0%d\n", n, limiter, month, limiter, day); 
                   printf("%d%s%d%s%d\n", day, limiter, month, limiter, n);
                   printf("%d%s%d%s%d\n", month, limiter, day, limiter, n); 
                   printf("%d%s%d%s%d\n", n, limiter, month, limiter, day); 
               }
               else if ((month < 10) && (day >= 10)) {
                   printf("%d%s0%d%s%d\n", day, limiter, month, limiter, n); 
                   printf("0%d%s%d%s%d\n", month, limiter, day, limiter, n); 
                   printf("%d%s0%d%s%d\n", n, limiter, month, limiter, day); 
                   printf("%d%s%d%s%d\n", day, limiter, month, limiter, n);
                   printf("%d%s%d%s%d\n", month, limiter, day, limiter, n); 
                   printf("%d%s%d%s%d\n", n, limiter, month, limiter, day); 
               }
               else {
                   printf("%d%s%d%s%d\n", day, limiter, month, limiter, n);
                   printf("%d%s%d%s%d\n", month, limiter, day, limiter, n); 
                   printf("%d%s%d%s%d\n", n, limiter, month, limiter, day);
               }       
               datew(day, month, n, limiter);    
           }
       }
   }

}

int datew(int day, int month, int n, char *limiter) {

   char *en1;
   char *en2;
   char *fr1;
   char *fr2;
   if (month == 1) {
       en1 = "January";
       en2 = "january";
       fr1 = "Janvier";
       fr2 = "janvier";
   }
   if (month == 2) {
       en1 = "February";
       en2 = "february";
       fr1 = "F\xe9" "vrier";
       fr2 = "f\xe9" "vrier";
   }
   if (month == 3) {
       en1 = "March";
       en2 = "march";
       fr1 = "Mars";
       fr2 = "mars";
   }
   if (month == 4) {
       en1 = "April";
       en2 = "april";
       fr1 = "Avril";
       fr2 = "avril";
   }
   if (month == 5) {
       en1 = "May";
       en2 = "may";
       fr1 = "Mai";
       fr2 = "mai";
   }
   if (month == 6) {
       en1 = "June";
       en2 = "june";
       fr1 = "Juin";
       fr2 = "juin";
   }
   if (month == 7) {
       en1 = "July";
       en2 = "july";
       fr1 = "Juillet";
       fr2 = "juillet";
   }
   if (month == 8) {
       en1 = "August";
       en2 = "august";
       fr1 = "Ao\xfb" "t";
       fr2 = "ao\xfb" "t";
   }
   if (month == 9) {
       en1 = "Septembre";
       en2 = "septembre";
       fr1 = "Septembre";
       fr2 = "septembre";
   }
   if (month == 10) {
       en1 = "October";
       en2 = "october";
       fr1 = "Octobre";
       fr2 = "octobre";
   }
   if (month == 11) {
       en1 = "November";
       en2 = "november";
       fr1 = "Novembre";
       fr2 = "novembre";
   }
   if (month == 12) {
       en1 = "December";
       en2 = "december";
       fr1 = "D\xe9" "cembre";
       fr2 = "d\xe9" "cembre";
   }
   if (day < 10) {
       printf("0%d%s%s%s%d\n", day, limiter, fr1, limiter, n); 
       printf("%s%s0%d%s%d\n", fr1, limiter, day, limiter, n); 
       printf("0%d%s%s%s%d\n", day, limiter, fr2, limiter, n); 
       printf("%s%s0%d%s%d\n", fr2, limiter, day, limiter, n); 
       printf("0%d%s%s%s%d\n", day, limiter, en1, limiter, n); 
       printf("%s%s0%d%s%d\n", en1, limiter, day, limiter, n); 
       printf("0%d%s%s%s%d\n", day, limiter, en2, limiter, n); 
       printf("%s%s0%d%s%d\n", en2, limiter, day, limiter, n); 
   }
   printf("%d%s%s%s%d\n", day, limiter, en1, limiter, n);
   printf("%s%s%d%s%d\n", en1, limiter, day, limiter, n); 
   printf("%d%s%s%s%d\n", day, limiter, en2, limiter, n);
   printf("%s%s%d%s%d\n", en2, limiter, day, limiter, n); 
   printf("%d%s%s%s%d\n", day, limiter, fr1, limiter, n);
   printf("%s%s%d%s%d\n", fr1, limiter, day, limiter, n); 
   printf("%d%s%s%s%d\n", day, limiter, fr2, limiter, n);
   printf("%s%s%d%s%d\n", fr2, limiter, day, limiter, n); 

}

int main(void) {

   daten("\\");
   daten("/");
   daten(" ");
   daten("_");
   daten("*");
   daten("^");
   daten("-");

} </syntaxhighlight>

In order to compile and save the binary under the filename "date-gen" run the following:

 gcc <filename.c> -o date-gen
 chmod +x date-gen

If you sort -u the output and use awk to delete all entries that're shorter than 8 characters this script will create a wordlist of a few million entries which can be used for the purpose of WPA password-recovery and the likes.

 ./date-gen | sort -u >> wordlist.txt 
 awk '{if ((length($0) >= 8) && (length($0) <= 63)){ print $0 }}' wordlist.txt > wordlist.wpa.txt
C is part of a series on programming.
<center>

</center>

= Intermediate Concepts

The Language and the Machine

C is often misinterpreted as a low-level language since as it abstracts little in terms of memory management and doesn't provide many "abstracted" data structures. In fact, the primitive types in C are based on what is available in the hardware, 32-bit integers, 64-bit floats, and so on. Although it is very close to the machine in this sense, and often touted as a "portable assembler", it is still a high-level language since it's grammar is not a 1 to 1 (or close) mapping to the underlying instruction set. It requires a compiler because it is a high level language, as low-level languages need only assemblers or substitution/translator operations.

Having the language be so close to the machine is often seen as a disadvantage, but if you understand the underlying implementation stack and the environment you're working in, it instead becomes a huge advantage. It allows you to create applications that perform well, and most importantly, perform predictably, and hence are secure.

Pointers in C

Pointers are an important concept and feature in C. All variables can be treated as pointers, and the ability to inspect pointers gives one an insight into how the memory model of the underlying machine works. In short, pointers are exactly what their name implies, they "point" to other data. When you access a variable, you are accessing the data the variable is associated with. When you access the value of a pointer, you are accessing the address of that variable. When you "dereference" a pointer, you are accessing the value -stored- at the address. Pointers are declared like so:

<syntaxhighlight lang="c">

 int *pointer; // declare integer pointer
 int number = 4; // declare variable that holds the value 4
 pointer = &number; // Assign the address of the "number" variable to the pointer
 *pointer = 5; // Assign the value 5 to the address contained within "pointer", which at the moment is the address of "number", which changes the value of "number" to 5 

</syntaxhighlight>

The data type comes first, then the pointer's name prefixed by the *, usually called the "dereference" operator. The data type is necessary as it tells the compiler how much data the pointer's domain covers, as where a char is one byte, an int is usually 4 bytes, so if you try to access an int through a char pointer, you will only access the first (or last in little endian) byte. This is allowed, and used in many cases, but you should make sure that it's intentional, the compiler will always issue warnings unless you manually dispell them using a cast.

In reality, every variable is in itself a pointer. If you use the address-of operator &, you can get the address of any variable, which is exactly what a pointer stores as its value. It's just that accessing a variable's value directly accesses its memory, where accessing a pointer's value directly accesses the address of where that value is stored. It's only a conceptual abstraction designed to give the programmer the ability to think of stored values as discrete and isolated.