Difference between revisions of "C"
m (were keeping # before code if it needs to be run as root) |
|||
(76 intermediate revisions by 9 users not shown) | |||
Line 1: | Line 1: | ||
− | C is a | + | '''C''' is a high-level programming language which allows you to construct programs writing in a syntactical form. When compiled (typically using <code>cc</code> (short for C [[compiler]]) or <code>gcc</code> (GNU C [[compiler]])), the C code will be converted into machine-readable code to execute the program. |
+ | |||
+ | = Installation = | ||
+ | |||
+ | Most distributions have <code>gcc</code> as base package included so that no further setup is necessary in order to start developing C programs, if not however you can install all necessary applications through your respective package manager with these commands: | ||
+ | |||
+ | *Debian/Ubuntu | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="bash"> | ||
+ | # apt-get install build-essential | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | *Arch Linux | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="bash"> | ||
+ | # pacman -S base-devel | ||
+ | </source> | ||
+ | }} | ||
= Overview = | = Overview = | ||
Line 5: | Line 25: | ||
== Basic Formatting == | == Basic Formatting == | ||
− | Each C program follows a general format. | + | |
+ | Each C program follows a general format. | ||
=== Includes === | === Includes === | ||
− | Includes are calls from within a C program which reference a | + | |
+ | Includes are calls from within a C program which reference a set of declarations, whether functions or global variables, or compile-time macros. They are used for including sets of previously implemented functionality, as opposed to reinventing the wheel with each program. There is a collection of "standard" includes which make up the standard C library, and on top of that there are OS-standard includes, such as the ones defined in POSIX. | ||
Includes in C follow this syntax: | Includes in C follow this syntax: | ||
{{code|text= | {{code|text= | ||
− | + | <source lang="c"> | |
− | < | + | #include <library.h> |
− | #include <library.h> | + | |
//searches for library.h in the default directory of libraries | //searches for library.h in the default directory of libraries | ||
− | #include "/path/library.h" | + | #include "/path/library.h" |
//searches for library.h in the defined path | //searches for library.h in the defined path | ||
− | </ | + | </source> |
}} | }} | ||
− | A few includes are | + | A few includes are recommended for every C program - namely <code>stdio.h</code> (a library defining functions to deal with basic input and output). A C program can be compiled without any includes, but you will be fairly limited in the functionality you are able to leverage. By convention, includes are normally placed at the beginning of a program, although it is not necessary. |
− | === The main() Function === | + | === The <code>main()</code> Function === |
− | The | + | The <code>main()</code> function is the entry point of the program, unlike interpreted languages which are parsed linearly and then run after the definition tree is built, most executable formats require an entry point so that the Operating System knows where to "start". |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
+ | The main function is the function that calls all other functions. As an example, consider the canonical "Hello World" in C: | ||
{{code|text= | {{code|text= | ||
− | < | + | <source lang="c"> |
#include <stdio.h> | #include <stdio.h> | ||
Line 41: | Line 59: | ||
return 0; | return 0; | ||
} | } | ||
− | </ | + | </source> |
}} | }} | ||
+ | |||
+ | Execution starts inside of main, which prints out <code>Hello, world!</code> and exits on <code>return 0</code>. | ||
== Variables == | == Variables == | ||
− | A variable is a value that stores data that can be edited, modified, and used at a later time. To declare a variable in the C language your first declare its type and then the variable name. Some of the basic variable types are: | + | A variable is a value that stores data that can be edited, modified, and used at a later time. To declare a variable in the C language your first declare its '''type''' and then the variable '''name'''. Some of the basic variable types are: |
{{code|text= | {{code|text= | ||
Line 56: | Line 76: | ||
}} | }} | ||
− | Integer or int variables can store whole numbers while | + | Integer or <code>int</code> variables can store whole numbers while <code>float</code>s and <code>double</code>s can hold integer values with decimal places. A <code>char</code> type variable can only hold a single character. while C itself does not have a string variable type you can create an array of characters referred to as a <code>char *</code> (or CString on Microsoft realm) to accomplish the same task. |
+ | |||
+ | Further you can use Variables of any type as pointers or arrays by adding their respective signs. These can be used to store multiple Values in one Variable (Array) or to store information about memory allocation (Pointer) : | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | int *iPointer; | ||
+ | char cArray[]; | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | == Logical Operators == | ||
+ | In C, Logical Operators are used to help program flow, and expand upon <code>if</code>, <code>while</code>, <code>for</code> statements. It can help increase possibilities, expand <code>if</code> options, and as stated, generally help along those functions. There are three logical operators in C, being a Logical OR, a Logical AND and lastly a Logical NOT. | ||
+ | |||
+ | * || | ||
+ | The above is a logical OR, and is used as so: | ||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | if(a == 5 || a == 10) | ||
+ | printf("The 'a' variable is either 5, or 10\n"); | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | * ! | ||
+ | This operator is used to reverse a bit, such as 1111 ^ 0010 = 1101. | ||
+ | |||
+ | * && | ||
+ | The AND operator is used similar to the OR operator, an example is shown below: | ||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | /* Assume a and b are user input */ | ||
+ | if(a == 5 && b == 10) | ||
+ | printf("'a' and 'b' are correct.\n"); | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | == Bitwise Operators == | ||
+ | C has six bitwise operators which are used to modify bits, similar to logical operators. Bitwise operators have more flexibility and, have twice the amount of operators than Logical ones. | ||
+ | |||
+ | * & | ||
+ | That is the AND bitwise operator, and works by turning on bits, if when comparing two values, such as <code>1101</code> and <code>0011</code>, at least one bit is simultaneously turned on. An example: | ||
+ | |||
+ | {| | ||
+ | | 1101 & | ||
+ | |- | ||
+ | | 0011 | ||
+ | |- | ||
+ | | ---------- | ||
+ | |- | ||
+ | | 0001 | ||
+ | |} | ||
+ | |||
+ | * | | ||
+ | This is a bitwise OR operator, and works just as the AND bitwise operator, however, as long as one of the bits in the two values are turned on, the result will return 1: | ||
+ | |||
+ | {| | ||
+ | | 0110 | | ||
+ | |- | ||
+ | | 1101 | ||
+ | |- | ||
+ | | ---------- | ||
+ | |- | ||
+ | | 1111 | ||
+ | |} | ||
+ | |||
+ | * ^ | ||
+ | The XOR, exclusive OR operator, only has one difference from the OR operator. This being, if more than one bit is turned on, then the output will return a 0: | ||
+ | |||
+ | {| | ||
+ | | 0011 ^ | ||
+ | |- | ||
+ | | 1010 | ||
+ | |- | ||
+ | | ---------- | ||
+ | |- | ||
+ | | 1001 | ||
+ | |} | ||
+ | |||
+ | * ~ | ||
+ | The unary operator reverses a bit. Such that, if the value is 1, the output will be 0, and vice versa. | ||
+ | These next two operators are a little bit more complicated, however, still very simple to grasp and use. | ||
+ | |||
+ | * >> and << | ||
+ | A right shift, or left shit, is used when there is need to shift bits to either side, by however many steps. | ||
+ | [ Shift right by 1 ] <code>00111101 >> 00011110</code> | ||
+ | [ Shift right by 2 ] <code>00111101 >> 00001111</code> | ||
+ | |||
+ | When shifting bits through C code, the number following the operand, is the value to shift the bits to either side. An example of this: | ||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | int a = 1337, b = 2, c; | ||
+ | printf("'a' is currently %d\n", a); | ||
+ | c = a >> b; | ||
+ | |||
+ | printf("'a' is now %d\n", c); | ||
+ | return 0; | ||
+ | </source> | ||
+ | }} | ||
+ | The program would output the following: | ||
+ | |||
+ | 'a' is 1337 | ||
+ | 'a' is now 334 | ||
+ | |||
+ | ...which is exactly what the value would be if 1337 was converted into binary, shifted two bits to the right, and back into decimal. | ||
+ | The same applies for a left shift, however, to the opposite side: | ||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | int a = 1337, b = 2, c; | ||
+ | printf("'a' is currently %d\n", a); | ||
+ | c = a << b; | ||
+ | |||
+ | printf("'a' is now %d\n", c); | ||
+ | return 0; | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | And the program output to follow: | ||
+ | |||
+ | 'a' is 1337 | ||
+ | 'a' is now 5348 | ||
+ | |||
+ | == Arithmetic == | ||
+ | |||
+ | In C, there are arithmetic operators for addition, subtraction, multiplication, division, and modulus. | ||
+ | |||
+ | {| class="wikitable" style="text-align: center; | ||
+ | ! align="left"| Operation | ||
+ | ! Operator | ||
+ | ! Assignment Operator | ||
+ | |- | ||
+ | | Addition || + || += | ||
+ | |- | ||
+ | | Subtraction || - || -= | ||
+ | |- | ||
+ | | Multiplication || * || *= | ||
+ | |- | ||
+ | | Division || / || /= | ||
+ | |- | ||
+ | | Modulus || % || %= | ||
+ | |} | ||
+ | |||
+ | Each operator can be used as an assignment operator: | ||
+ | |||
+ | {{code|text=<source lang="c"> | ||
+ | int main(int argc, const char *argv[]) | ||
+ | { | ||
+ | int a = 3, b = 4; | ||
+ | a *= 4; | ||
+ | a += b; | ||
+ | return 0; | ||
+ | } | ||
+ | </source>}} | ||
+ | |||
+ | On line 4, the variable ''a'' will be equal to ''a'' multiplied by 4, on line 5, ''a'' is equal to ''a'' plus ''b''. | ||
== Loops == | == Loops == | ||
In C there are three types of loops that allow the user to accomplish a repetitive task without repeating numerous lines of code. These three basic loops are called the for loop, the while loop, and the do while loop. Each loop has their own purpose for being used and normally follow the same syntax. All loops are based of an equation and if that equation does not evaluate to true then the looping will not halt. A for loop is good for a repetitive task that you know how many times you want to repeat, while a while loops is normally used when how many times you need to loop is unknown like when you are reading a text document. A do while loop is almost the same as a while loop except for one difference, it runs its code at least once before checking if it should stop looping | In C there are three types of loops that allow the user to accomplish a repetitive task without repeating numerous lines of code. These three basic loops are called the for loop, the while loop, and the do while loop. Each loop has their own purpose for being used and normally follow the same syntax. All loops are based of an equation and if that equation does not evaluate to true then the looping will not halt. A for loop is good for a repetitive task that you know how many times you want to repeat, while a while loops is normally used when how many times you need to loop is unknown like when you are reading a text document. A do while loop is almost the same as a while loop except for one difference, it runs its code at least once before checking if it should stop looping | ||
− | |||
+ | The for loop loop deserves special attention as it's slightly more complicated than the other two: | ||
---- | ---- | ||
{{code|text= | {{code|text= | ||
− | < | + | <source lang="c"> |
int i; | int i; | ||
− | for(i = 0; i < 10; i++) // | + | for(i = 0; i < 10; i++) //Assign 0 to integer one; when "i" is less then 10 increment "i" by one |
{ | { | ||
//code to repeat 9 times | //code to repeat 9 times | ||
} | } | ||
− | </ | + | </source> |
}} | }} | ||
+ | |||
+ | The loop check has 3 parts. The part before the first semicolon, in this case "i = 0", is the loop initializer. The code inside it is run once when the loop is first called, so in this case, it sets the value of "i" to 0. The second part is the check or predicate, this is a statement that's treated as a boolean (true/false) value, if the statement ever evaluates to false (or 0), the loop ends. The last part is the block that's executed *at the end of* a full iteration of the loop. | ||
---- | ---- | ||
{{code|text= | {{code|text= | ||
− | < | + | <source lang="c"> |
char myChar; | char myChar; | ||
while(myChar != 'c') //While "myChar" does not equal "c" continue to loop | while(myChar != 'c') //While "myChar" does not equal "c" continue to loop | ||
Line 83: | Line 258: | ||
scanf("%c", &myChar); //get input from the user and put it into variable "myChar" | scanf("%c", &myChar); //get input from the user and put it into variable "myChar" | ||
} | } | ||
− | </ | + | </source> |
}} | }} | ||
Line 89: | Line 264: | ||
{{code|text= | {{code|text= | ||
− | < | + | <source lang="c"> |
do //loop at least once | do //loop at least once | ||
{ | { | ||
x = x + 1; //variable x equals itself plus one (if x equals 0 then x = 0 + 1) | x = x + 1; //variable x equals itself plus one (if x equals 0 then x = 0 + 1) | ||
} while(x < 2); //check to see if condition to stop looping is met | } while(x < 2); //check to see if condition to stop looping is met | ||
− | </ | + | </source> |
}} | }} | ||
Line 100: | Line 275: | ||
== If/Else == | == If/Else == | ||
− | If/Else statements are used when you need some way to control the flow of execution of your code. These statements are just like asking questions and depending upon if the answer is true or false the program may execute differently. | + | If/Else statements are used when you need some way to control the flow of execution of your code. These statements are just like asking questions and depending upon if the answer is true or false the program may execute differently. |
− | + | ||
− | + | Simple example: | |
---- | ---- | ||
{{code|text= | {{code|text= | ||
− | < | + | <source lang="c"> |
if(1 == 1) //if 1 equals 1 execute the true code block | if(1 == 1) //if 1 equals 1 execute the true code block | ||
{ | { | ||
printf("This is the true code block"); //execute the true code block | printf("This is the true code block"); //execute the true code block | ||
− | }else{ //if the statement is not true | + | } |
+ | else{ //if the statement is not true | ||
printf("This is the false code block"); //execute the false code block | printf("This is the false code block"); //execute the false code block | ||
} | } | ||
− | </ | + | </source> |
}} | }} | ||
---- | ---- | ||
+ | You can chain "else" and "if" to create complex flows. Here is an example: | ||
+ | |||
+ | ---- | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | if(a == 1) //if a equals 1 execute the following block | ||
+ | { | ||
+ | printf("a equals 1!"); | ||
+ | } | ||
+ | else if (b == 1) { //otherwise, check if b equals 1 | ||
+ | printf("b equals 1!"); | ||
+ | } | ||
+ | else { // This executes only if all the predicates before it failed, else can be used as a "catch-all" | ||
+ | printf("Neither a or b equal 1!"); | ||
+ | } | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | ---- | ||
+ | == File Stream == | ||
+ | |||
+ | Opening a file with C so that your program can use the containing information is a bit more complicated than one would suspect. The following example shows how to store all the information from the file in a local variable. | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | #include <stdio.h> | ||
+ | #include <stdlib.h> | ||
+ | |||
+ | int main(void) { | ||
+ | int fLen = 0; | ||
+ | FILE *textFile = NULL; | ||
+ | char *fileBytes = NULL; | ||
+ | |||
+ | if ((textFile = fopen("/home/directory/textfile", "r")) == NULL) { // if /home/../textfile is found assign its content to the variable *textFile | ||
+ | printf("No file.\n"); // else print "No file." | ||
+ | return 0; | ||
+ | } | ||
+ | |||
+ | fseek(textFile, 0, SEEK_END); // determine file length | ||
+ | fLen = ftell(textFile); | ||
+ | |||
+ | if ((fileBytes = malloc(fLen * sizeof(char) + 1)) == NULL) { // if space the size of fLen + 1 byte can be allocated, assign it to *fileBytes | ||
+ | printf("Memory allocation failed.\n"); // else close the file. | ||
+ | fclose(textFile); | ||
+ | return 0; | ||
+ | } | ||
+ | |||
+ | fseek(textFile, SEEK_SET, 0); // copy the content from *textFile over to *fileBytes | ||
+ | fread(fileBytes, sizeof(char), fLen, textFile); | ||
+ | fileBytes[fLen + 1] = '\0'; | ||
+ | |||
+ | printf("%s", fileBytes); // print out *fileBytes | ||
+ | } | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | ---- | ||
+ | |||
+ | == Compilation == | ||
+ | |||
+ | In order to run a finished program you will have to create an executable binary by compiling the source code, which is done with cc/gcc | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="bash"> | ||
+ | gcc -Wall -o <output file> <sourcecode file> | ||
+ | chmod +x <output file> | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | where <i>-Wall</i> enables full documentation of warnings during the compilation process. | ||
+ | |||
+ | Sometimes you will have to include a specific library and if it's not found, specify the directory in which it is contained. This is done with the -l and -L parameters, for example if you need to include the library libconv-core.a which is located in /usr/local/char/lib the command to use would look like this | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="bash"> | ||
+ | gcc -Wall -lconv-core -L/usr/local/char/lib -o <output file> <sourcecode file> | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | ---- | ||
+ | |||
+ | == Example Program == | ||
+ | |||
+ | This simple little script makes use of two subfunctions (datew and daten) which are each called by another function (main->daten->datew) in order to generate a list of all dates inbetween the year 999 and the end of 2012, in all numeric and a few alphabetic formats, with only those simple methods described in this article. | ||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | #include <stdio.h> | ||
+ | |||
+ | int daten(char *limiter) { | ||
+ | int month; | ||
+ | int day; | ||
+ | int n; | ||
+ | for (n = 999; n < 2013; n++) { | ||
+ | for (month = 1; month < 13; month++) { | ||
+ | for (day = 1; day < 32; day++) { | ||
+ | if ((day < 10) && (month < 10)) { | ||
+ | printf("0%d%s%d%s%d\n", day, limiter, month, limiter, n); | ||
+ | printf("%d%s%d%s0%d\n", n, limiter, month, limiter, day); | ||
+ | printf("%d%s0%d%s%d\n", day, limiter, month, limiter, n); | ||
+ | printf("%d%s0%d%s%d\n", n, limiter, month, limiter, day); | ||
+ | printf("0%d%s0%d%s%d\n", day, limiter, month, limiter, n); | ||
+ | printf("%d%s0%d%s0%d\n", n, limiter, month, limiter, day); | ||
+ | printf("%d%s%d%s%d\n", day, limiter, month, limiter, n); | ||
+ | printf("%d%s%d%s%d\n", n, limiter, month, limiter, day); | ||
+ | } | ||
+ | else if ((day < 10) && (month >= 10)) { | ||
+ | printf("0%d%s%d%s%d\n", day, limiter, month, limiter, n); | ||
+ | printf("%d%s0%d%s%d\n", month, limiter, day, limiter, n); | ||
+ | printf("%d%s%d%s0%d\n", n, limiter, month, limiter, day); | ||
+ | printf("%d%s%d%s%d\n", day, limiter, month, limiter, n); | ||
+ | printf("%d%s%d%s%d\n", month, limiter, day, limiter, n); | ||
+ | printf("%d%s%d%s%d\n", n, limiter, month, limiter, day); | ||
+ | } | ||
+ | else if ((month < 10) && (day >= 10)) { | ||
+ | printf("%d%s0%d%s%d\n", day, limiter, month, limiter, n); | ||
+ | printf("0%d%s%d%s%d\n", month, limiter, day, limiter, n); | ||
+ | printf("%d%s0%d%s%d\n", n, limiter, month, limiter, day); | ||
+ | printf("%d%s%d%s%d\n", day, limiter, month, limiter, n); | ||
+ | printf("%d%s%d%s%d\n", month, limiter, day, limiter, n); | ||
+ | printf("%d%s%d%s%d\n", n, limiter, month, limiter, day); | ||
+ | } | ||
+ | else { | ||
+ | printf("%d%s%d%s%d\n", day, limiter, month, limiter, n); | ||
+ | printf("%d%s%d%s%d\n", month, limiter, day, limiter, n); | ||
+ | printf("%d%s%d%s%d\n", n, limiter, month, limiter, day); | ||
+ | } | ||
+ | datew(day, month, n, limiter); | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | |||
+ | int datew(int day, int month, int n, char *limiter) { | ||
+ | char *en; | ||
+ | char *fr; | ||
+ | if (month == 1) { | ||
+ | en = "January"; | ||
+ | fr = "Janvier"; | ||
+ | } | ||
+ | if (month == 2) { | ||
+ | en = "February"; | ||
+ | fr = "F\xe9" "vrier"; | ||
+ | } | ||
+ | if (month == 3) { | ||
+ | en = "March"; | ||
+ | fr = "Mars"; | ||
+ | } | ||
+ | if (month == 4) { | ||
+ | en = "April"; | ||
+ | fr = "Avril"; | ||
+ | } | ||
+ | if (month == 5) { | ||
+ | en = "May"; | ||
+ | fr = "Mai"; | ||
+ | } | ||
+ | if (month == 6) { | ||
+ | en = "June"; | ||
+ | fr = "Juin"; | ||
+ | } | ||
+ | if (month == 7) { | ||
+ | en = "July"; | ||
+ | fr = "Juillet"; | ||
+ | } | ||
+ | if (month == 8) { | ||
+ | en = "August"; | ||
+ | fr = "Ao\xfb" "t"; | ||
+ | } | ||
+ | if (month == 9) { | ||
+ | en = "Septembre"; | ||
+ | fr = "Septembre"; | ||
+ | } | ||
+ | if (month == 10) { | ||
+ | en = "October"; | ||
+ | fr = "Octobre"; | ||
+ | } | ||
+ | if (month == 11) { | ||
+ | en = "November"; | ||
+ | fr = "Novembre"; | ||
+ | } | ||
+ | if (month == 12) { | ||
+ | en = "December"; | ||
+ | fr = "D\xe9" "cembre"; | ||
+ | } | ||
+ | if (day < 10) { | ||
+ | printf("0%d%s%s%s%d\n", day, limiter, en, limiter, n); | ||
+ | printf("%s%s0%d%s%d\n", en, limiter, day, limiter, n); | ||
+ | printf("0%d%s%s%s%d\n", day, limiter, fr, limiter, n); | ||
+ | printf("%s%s0%d%s%d\n", fr, limiter, day, limiter, n); | ||
+ | } | ||
+ | printf("%d%s%s%s%d\n", day, limiter, en, limiter, n); | ||
+ | printf("%s%s%d%s%d\n", en, limiter, day, limiter, n); | ||
+ | printf("%d%s%s%s%d\n", day, limiter, fr, limiter, n); | ||
+ | printf("%s%s%d%s%d\n", fr, limiter, day, limiter, n); | ||
+ | } | ||
+ | |||
+ | int main(void) { | ||
+ | daten("\\"); | ||
+ | daten("/"); | ||
+ | daten(" "); | ||
+ | daten("_"); | ||
+ | daten("*"); | ||
+ | daten("^"); | ||
+ | daten("-"); | ||
+ | } | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | In order to compile and save the binary under the filename "date-gen" run the following: | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="bash"> | ||
+ | gcc <filename.c> -o date-gen | ||
+ | chmod +x date-gen | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | <i>If you sort -u the output and use awk to delete all entries that're shorter than 8 characters this script will create a wordlist of a few million entries which can be used for the purpose of WPA password-recovery and the likes.</i> | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="bash"> | ||
+ | ./date-gen | sort -u >> wordlist.txt | ||
+ | awk '{if ((length($0) >= 8) && (length($0) <= 63)){ print $0 }}' wordlist.txt > wordlist.wpa.txt | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | = Intermediate Concepts = | ||
+ | |||
+ | In this section the implementation and design of the C language is covered, along with some of the harder to understand concepts in C. | ||
+ | |||
+ | == The Language and the Machine == | ||
+ | |||
+ | C is often misinterpreted as a low-level language since as it abstracts little in terms of memory management and doesn't provide many "abstracted" data structures. In fact, the primitive types in C are based on what is available in the hardware, 32-bit integers, 64-bit floats, and so on. Although it is very close to the machine in this sense, and often touted as a "portable assembler", it is still a high-level language since it's grammar is not a 1 to 1 (or close) mapping to the underlying instruction set. It requires a compiler because it is a high level language, as low-level languages need only assemblers or substitution/translator operations. | ||
+ | |||
+ | Having the language be so close to the machine is often seen as a disadvantage, but if you understand the underlying implementation stack and the environment you're working in, it instead becomes a huge advantage. It allows you to create applications that perform well, and most importantly, perform predictably, and hence are secure. | ||
+ | |||
+ | === Pointers in C === | ||
+ | |||
+ | Pointers are an important concept and feature in C. All variables can be treated as pointers, and the ability to inspect pointers gives one an insight into how the memory model of the underlying machine works. In short, pointers are exactly what their name implies, they "point" to other data. When you access a variable, you are accessing the data the variable is associated with. When you access the value of a pointer, you are accessing the address of that variable. When you "dereference" a pointer, you are accessing the value -stored- at the address. Pointers are declared like so: | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | int *pointer; // declare integer pointer | ||
+ | int number = 4; // declare variable that holds the value 4 | ||
+ | pointer = &number; // Assign the address of the "number" variable to the pointer | ||
+ | *pointer = 5; // Assign the value 5 to the address contained within "pointer", | ||
+ | // which at the moment is the address of "number", which changes the value of "number" to 5 | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | The data type comes first, then the pointer's name prefixed by the *, usually called the "dereference" operator. The data type is necessary as it tells the compiler how much data the pointer's domain covers, as where a char is one byte, an int is usually 4 bytes, so if you try to access an int through a char pointer, you will only access the first (or last in little endian) byte. This is allowed, and used in many cases, but you should make sure that it's intentional, the compiler will always issue warnings unless you manually dispell them using a cast (a cast is just a way to tell the compiler explicitly how to treat the data, so that it doesn't complain, it will be covered in the later sections). | ||
+ | |||
+ | In reality, every variable is in itself a pointer. If you use the address-of operator &, you can get the address of any variable, which is exactly what a pointer stores as its value. It's just that accessing a variable's value directly accesses its memory, where accessing a pointer's value directly accesses the address of where that value is stored. It's only a conceptual abstraction designed to give the programmer the ability to think of stored values as discrete and isolated. | ||
+ | |||
+ | === The memory model === | ||
+ | |||
+ | What is meant by "the address of" a variable? Implementation-wise, all variables are stored on a linear memory array called a "stack". The implementation of a stack itself will be covered later on in the Data Structures section, but in short, it has the properties that you can push data onto it, and pop data from it. A "push" puts data on top of the stack, and a "pop" retrieves data from the top of the stack. You can think of the stack as a pile of papers, and each paper has a number. The one on the bottom has the lowest page number, '0', and the one on the top has the number of all the papers currently in the pile. If you "pop" a paper from the pile, you take one off of the pile. If you "push" a paper to the pile, you add one to the top of the pile, increasing the page count. Each page has an "address", which is its page number, and the value of that "address" can be seen as what is written onto the page. In the computer's stack, each address can hold a 1-byte value, which can be represented in the paper stack as two [[Assembly_Basics#Counting|hexadecimal]] numbers. Now, in the machine's memory model, the stack is allocated at a high address and "grows" down as more data is allocated. The paper pile analogy lends itself well here nonetheless, if for example the stack starts at memory address 100, and I "push" 4 bytes (pieces of paper) onto it, the new address (number of free pages) will be 96. So you can view the machine's stack as a subtractive model. You start with a certain number of blank pages, and when you "push" the number of blank pages decrease. When you "pop" it is similar to blanking out the pages again. The current address, or which page to write to next or pop from, is stored in the stack pointer. | ||
+ | |||
+ | As a practical example, I'll describe what happens in the stack in the following C source snippet: | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | int varone; | ||
+ | int vartwo; | ||
+ | int *point; | ||
+ | varone = 5; | ||
+ | point = &varone; | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | Here we declare 3 variables. Two integers and one integer pointer. The two integers each take up 4 bytes, so if the stack space started at 100, the first integer is stored at 97-100, the second integer is stored at 93-96, and, assuming a 32-bit architecture, the integer pointer is stored at 89-92. Now, when we assign the value '5' to varone, we are actually writing that to the pages 97-100. Stored as a 32-bit integer in [[Byte#Little-endianness|little-endian]], the pages would look as follows: 0x05,0x00,0x00,0x00 where 0x05 is page 97 and pages 98, 99, and 100 have a 0x00 on them. In the next line we assign the address of varone to point. Remember the address of varone is 97, since that's where its data starts on the page, so we are writing 0x61,0x00,0x00,0x00 to pages 89, 90, 91, 92 respectively. 0x61 equals 97 in hex. So when we dereference the pointer "point" in C, we are actually saying "access the data at page 97-100". When we assign a value to the pointer, we are just changing the page number saved inside the pointer. In this sense you can think of pointers like memory bookmarks. | ||
+ | |||
+ | === By reference vs By value === | ||
+ | |||
+ | Building on pointers, in C you can pass values to functions either by reference or by value. Passing variables by reference means you're passing the address of that variable to the function, so only the 4 (or 8) byte address gets pushed onto the stack. If you pass variables by value, the entire data block the variable references is pushed onto the stack. An important difference is that if data is passed by value, a function has access to a local -copy- of the data, which disappears after the function finishes. If a variable is passed by reference, the function operates on the same memory the original variable referenced, so any changes made will persist when the function ends. | ||
+ | |||
+ | Here's an example of pass by value: | ||
+ | |||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | |||
+ | void byvalue(int number){ | ||
+ | number = 10; | ||
+ | printf("%d\n", number); | ||
+ | return; | ||
+ | } | ||
+ | |||
+ | int main(void){ | ||
+ | int local = 5; | ||
+ | byvalue(local); // prints 10 | ||
+ | printf("%d\n", local); // prints 5, value unchanged as function only got a copy | ||
+ | return 0; | ||
+ | } | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | Here we are pushing the value "5" onto the stack, making a copy, and the function byvalue builds a reference to it named "number". The memory changed to contain "10", the value is then printed, and when the function ends, the stack gets popped again, and the memory where the "10" is held is relinquished back to the system. | ||
+ | |||
+ | Here, instead, we will do the same thing, but we'll pass the variable by reference: | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | |||
+ | void byref(int *number){ | ||
+ | *number = 10; | ||
+ | printf("%d\n", *number); | ||
+ | return; | ||
+ | } | ||
+ | |||
+ | int main(void){ | ||
+ | int local = 5; | ||
+ | byref(&local); // prints 10 | ||
+ | printf("%d\n", local); // prints 10, value changed since the function | ||
+ | //accessed the memory directly instead of making a copy | ||
+ | return 0; | ||
+ | } | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | Here, instead of the value "5" being pushed to the stack, we are pushing the page number of "local" to the stack. When the function is called, it uses the page number to find the memory and assigns "10" to it, prints the new value at that memory, and returns. When the function returns, only the memory bookmark (the pointer) is released back to the system, the value remains changed since the function wrote to previously allocated memory directly. So when the main function prints the value of "local", it results in the new value being printed, "10". | ||
+ | |||
+ | In most cases, especially when dealing with large data structures, it is more efficient to pass variables by reference. If the function does not or should not have access to the original memory page, then a call by value can be done, or you may mark the memory as constant using the "const" identifier, but this causes problems when the data passed to the function needs to be changed temporarily, so it is something to think about on a case by case basis. | ||
+ | |||
+ | Function returns work in the same way. If a function returns a pointer, it must be a pointer that holds an address allocated on the heap (with malloc), if it returns a pointer that has data allocated on the function's local stack, that data will be released as soon as the function terminates, and this is a common cause of segfaults. If you return a variable by value, then a copy of the data block is passed to the calling function, and it will persist in the calling function's scope. | ||
+ | |||
+ | == Data Structures == | ||
+ | |||
+ | C is a simple language by design, and therefore does not natively support many of the complex data structures other languages support, but any data structure that is computationally feasible can be implemented in C, and being able to implement the data structures yourself has the advantage that, you can customize them to do what you want, and as you'll actually understand them when implementing them, you are able to use them more efficiently than if you were just given them as an abstracted device. | ||
+ | |||
+ | === The struct === | ||
+ | |||
+ | The general data structure in C is struct. A struct definition is just a collection of values unified inside of a single variable reference. Here's an example: | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | struct String { | ||
+ | int length; | ||
+ | char *string; | ||
+ | }; | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | This defines a structure named "String", which has one integer member named "length", and one character pointer named "string". Note that this does not declare the structure, as in, no memory is allocated yet. The structure is only defined. To actually declare a variable with a type of this structure, this syntax is used: | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | struct String mystring; | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | This declares a variable "mystring" of type "struct String". To access the members of "mystring" the . (dot) operator is used: | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | mystring.string = "A sample string"; | ||
+ | mystring.length = strlen(mystring.string); | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | Here we assign the string "A sample string" to the character pointer (the "A sample string" is stored in static memory, we've only assigned the location of the string to the pointer), and assigns the length of the string to mystring.length using the standard library function strlen, which takes a character pointer and returns the length of the string the pointer points to. | ||
+ | |||
+ | Struct definitions are used to compartmentalize data, and along with wrapper functions, you can abstract away operations and implement any data type. As a simple example, using the String structure defined, I'll implement a String data type, which will give us the ability to use strings having to worry explicitly about their length: | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | |||
+ | #include <stdio.h> | ||
+ | #include <string.h> | ||
+ | #include <stdlib.h> | ||
+ | |||
+ | struct String { | ||
+ | int length; | ||
+ | char *string; | ||
+ | } typedef String; | ||
+ | |||
+ | |||
+ | String newString(const char *string){ | ||
+ | |||
+ | String myString; | ||
+ | myString.length = strlen(string); | ||
+ | myString.string = malloc(myString.length); | ||
+ | memcpy(myString.string, string, myString.length); | ||
+ | |||
+ | return myString; | ||
+ | } | ||
+ | |||
+ | String catString(String first, String second){ | ||
+ | |||
+ | String myString; | ||
+ | myString.length = first.length + second.length; | ||
+ | myString.string = malloc(myString.length); | ||
+ | memcpy(myString.string, first.string, first.length); | ||
+ | memcpy(myString.string+first.length, second.string, second.length); | ||
+ | |||
+ | return myString; | ||
+ | } | ||
+ | |||
+ | String cloneString(String toClone){ | ||
+ | |||
+ | String myString; | ||
+ | myString.length = toClone.length; | ||
+ | myString.string = malloc(toClone.length); | ||
+ | memcpy(myString.string, toClone.string, toClone.length); | ||
+ | |||
+ | return myString; | ||
+ | |||
+ | } | ||
+ | |||
+ | void appendString(String *myString, const char *toAppend){ | ||
+ | |||
+ | int toAppendLength = strlen(toAppend); | ||
+ | myString->string = realloc(myString->string, myString->length + toAppendLength); | ||
+ | memcpy(myString->string+myString->length, toAppend, toAppendLength); | ||
+ | myString->length = myString->length + toAppendLength; | ||
+ | |||
+ | return; | ||
+ | } | ||
+ | |||
+ | void prependString(String *myString, const char *toPrepend){ | ||
+ | int toPrependLength = strlen(toPrepend); | ||
+ | myString->string = realloc(myString->string, myString->length + toPrependLength); | ||
+ | char *temp = malloc(myString->length); | ||
+ | memcpy(temp, myString->string, myString->length); | ||
+ | memcpy(myString->string, toPrepend, toPrependLength); | ||
+ | memcpy(myString->string+toPrependLength, temp, myString->length); | ||
+ | myString->length = myString->length + toPrependLength; | ||
+ | free(temp); | ||
+ | |||
+ | return; | ||
+ | |||
+ | } | ||
+ | |||
+ | void destroyString(String *myString){ | ||
+ | |||
+ | myString->length = 0; | ||
+ | free(myString->string); | ||
+ | |||
+ | return; | ||
+ | } | ||
+ | |||
+ | void printString(String myString){ | ||
+ | |||
+ | fprintf(stdout, "%s\n", myString.string); | ||
+ | |||
+ | return; | ||
+ | |||
+ | } | ||
+ | |||
+ | int lengthOfString(String myString){ | ||
+ | |||
+ | return myString.length; | ||
+ | |||
+ | } | ||
+ | |||
+ | int main(void){ | ||
+ | |||
+ | String aString = newString("A sample string... "); | ||
+ | |||
+ | String bString = newString("Another sample string... "); | ||
+ | |||
+ | String cString = catString(aString, bString); | ||
+ | |||
+ | String cStringCopy = cloneString(cString); | ||
+ | |||
+ | printString(cString); | ||
+ | |||
+ | fprintf(stdout, "(Before append/prepend) Length of cStringCopy: %d\n", lengthOfString(cStringCopy)); | ||
+ | |||
+ | appendString(&cStringCopy, "(This is a copy)"); | ||
+ | prependString(&cStringCopy, "(This is a copy) "); | ||
+ | |||
+ | printString(cStringCopy); | ||
+ | |||
+ | fprintf(stdout, "(After append/prepend) Length of cStringCopy: %d\n", lengthOfString(cStringCopy)); | ||
+ | |||
+ | /* Free all the strings */ | ||
+ | |||
+ | destroyString(&aString); | ||
+ | destroyString(&bString); | ||
+ | destroyString(&cString); | ||
+ | destroyString(&cStringCopy); | ||
+ | |||
+ | return 0; | ||
+ | |||
+ | } | ||
+ | |||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | Here "typedef" just gives a shortcut so that we don't have to type "struct" in front of string for each declaration. | ||
+ | |||
+ | When you have a structure pointer, the notation for accessing the members of the structure that the pointer points to is as follows: | ||
+ | |||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | |||
+ | struct String aString; | ||
+ | struct String *stringPointer; | ||
+ | |||
+ | stringPointer = &aString; | ||
+ | |||
+ | stringPointer->length // accesses the "length" member of aString, equivalent to doing aString.length | ||
+ | |||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | It dereferences the pointer and then accesses the member at the same time. It is syntactic sugar, the following is functionally equivalent: | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | |||
+ | (*stringPointer).length | ||
+ | |||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | But much "uglier". | ||
+ | |||
+ | |||
+ | The functionality in this example may seem trivial, but the concepts can be extended to implementing anything from linked lists to hash tables. | ||
+ | |||
+ | |||
+ | = Optimizing Security of your Programs = | ||
+ | |||
+ | In this part we will present you a few functions that should be avoided and their counter pieces which are to be preferred, as well as general advice on secure programming with C. | ||
+ | |||
+ | == Avoiding Buffer Overflow Vulnerabilities == | ||
+ | |||
+ | [[Buffer Overflows]] occur when programs try to store more information in a variable than it has memory allocated for. For example if you declare a variable that's defined as an array of 9 characters it has space for 8 characters plus the terminating null, so if this variable receives its input from stdin it is easy for a user to corrupt your programs functionality simply by assigning it 9 or more characters. This can only be avoided by sanitizing your input, for example consider the following code snippet: | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | char password[9]; | ||
+ | |||
+ | printf("Please enter your password: "); | ||
+ | fflush(stdout); | ||
+ | gets(password); | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | The danger here lies within the gets() function which simply copies the whole input onto the stack which will most likely result in data corruption. The safe way to solve this would be to use <b>fgets()</b> instead, which has additional parameters that allow you to sanitize input. Simply change the last line of the program to | ||
+ | |||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | fgets(password, 9, stdin); | ||
+ | </source> | ||
+ | }} | ||
+ | |||
+ | Like this the fgets() function only copies the first 8 characters from stdin onto the stack, following the terminating null. Because of that you should also always check the length of the input, it needs to be 'sizeof() - 1' in order to avoid overwriting the null byte. | ||
+ | |||
+ | Other functions that are to be avoided for the same reasons are <i>strcpy(), strlen()</i> and <i>sprintf()</i>. | ||
+ | Instead use their safer counterparts <b>strncpy(), strnlen()</b> and <b>snprintf()</b>. | ||
+ | |||
+ | == Initial Variable Values == | ||
+ | |||
+ | When you declare a global variable in C, its value is implicitly initialized to zero. Now if you redeclare it as a local variable, the compiler simply turns the variable into an increment of its first assigned value in order to represent the new one. Or rather: it uses whatever value that the last function assigned to the specific part of stack memory where the variable is allocated at. | ||
+ | This can have as a consequence, that the behavior of your program can become unpredictable when using uninitialized variables. | ||
+ | For example let's say you start writing your program and declare a variable at the very beginning of your code as follows | ||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | int n = 0; | ||
+ | </source> | ||
+ | }} | ||
+ | Now you decide later on that you want to use n in order to conditionalize a function, so you add something like | ||
+ | {{code|text= | ||
+ | <source lang="c"> | ||
+ | if [condition] { | ||
+ | int n = 23; | ||
+ | } | ||
+ | </source> | ||
+ | }} | ||
+ | This code will compile and work correctly, except for the cases when the condition turns out true. | ||
+ | In those cases n will simply be interpreted as 0 by the compiler, because as soon as the code inside of the if {} brackets is executed, the local variable n is deliberately ignored for all further processes and the initial global variable n with the value 0 comes back into place. | ||
+ | This is relevant not only for effective but also for secure programming, because if your program contains functions that can read information from the stack this can turn out to be a serious security flaw. | ||
+ | |||
+ | == Program Environment == | ||
+ | |||
+ | A well written program should never be designed in such way that it relies on information about its environment, such as the working directory or the value of its umask, which is why you should only use full-path names instead of relative names in order to work with external files. Also you should consider what UID and GID you will let your program run under, for example it is extremely important that vulnerable or potentially malicious programs don't run as root. It is always advised to grant the program only the permissions it absolutely needs to perform its tasks, so you can for example consider to run the program under a designated UID and a correspondant GID, giving them only restricted access to whatever the program requires. | ||
+ | |||
+ | == Disable Core Dumping == | ||
+ | |||
+ | Limiting the size of your programs core dump to 0 bytes ensures that potential attackers can't find out sensitive information about the programs memory management. This is done either with the shell-command <i>ulimit</i> before running the program or internally calling the <b>setrlimit()</b> function at the beginning of the program. | ||
+ | |||
+ | == system() & popen() == | ||
+ | |||
+ | These functions are used to call exterior programs that are installed on your system. They should be avoided because they will spawn a shell in order to do so. | ||
+ | Instead use <b>fork()</b> or <b>exec()</b> to achieve the same goal without compromising the security of your programs environment. | ||
− | |||
− | [[Category:Programming Languages]] | + | {{programming}} |
+ | [[Category:Programming Languages]] [[Category:Compiled languages]] |
Latest revision as of 00:21, 17 July 2016
C is a high-level programming language which allows you to construct programs writing in a syntactical form. When compiled (typically using cc
(short for C compiler) or gcc
(GNU C compiler)), the C code will be converted into machine-readable code to execute the program.
Installation
Most distributions have gcc
as base package included so that no further setup is necessary in order to start developing C programs, if not however you can install all necessary applications through your respective package manager with these commands:
- Debian/Ubuntu
# apt-get install build-essential
|
- Arch Linux
# pacman -S base-devel
|
Overview
Basic programs can be broken down into 3 main categories: variables, loops, and If/Else statements.
Basic Formatting
Each C program follows a general format.
Includes
Includes are calls from within a C program which reference a set of declarations, whether functions or global variables, or compile-time macros. They are used for including sets of previously implemented functionality, as opposed to reinventing the wheel with each program. There is a collection of "standard" includes which make up the standard C library, and on top of that there are OS-standard includes, such as the ones defined in POSIX.
Includes in C follow this syntax:
#include <library.h> //searches for library.h in the default directory of libraries #include "/path/library.h" //searches for library.h in the defined path |
A few includes are recommended for every C program - namely stdio.h
(a library defining functions to deal with basic input and output). A C program can be compiled without any includes, but you will be fairly limited in the functionality you are able to leverage. By convention, includes are normally placed at the beginning of a program, although it is not necessary.
The main()
Function
The main()
function is the entry point of the program, unlike interpreted languages which are parsed linearly and then run after the definition tree is built, most executable formats require an entry point so that the Operating System knows where to "start".
The main function is the function that calls all other functions. As an example, consider the canonical "Hello World" in C:
#include <stdio.h> int main() { printf(%s, "Hello, world!\n"); return 0; } |
Execution starts inside of main, which prints out Hello, world!
and exits on return 0
.
Variables
A variable is a value that stores data that can be edited, modified, and used at a later time. To declare a variable in the C language your first declare its type and then the variable name. Some of the basic variable types are:
int iName; float fName; double dName; char cName; |
Integer or int
variables can store whole numbers while float
s and double
s can hold integer values with decimal places. A char
type variable can only hold a single character. while C itself does not have a string variable type you can create an array of characters referred to as a char *
(or CString on Microsoft realm) to accomplish the same task.
Further you can use Variables of any type as pointers or arrays by adding their respective signs. These can be used to store multiple Values in one Variable (Array) or to store information about memory allocation (Pointer) :
int *iPointer; char cArray[]; |
Logical Operators
In C, Logical Operators are used to help program flow, and expand upon if
, while
, for
statements. It can help increase possibilities, expand if
options, and as stated, generally help along those functions. There are three logical operators in C, being a Logical OR, a Logical AND and lastly a Logical NOT.
- ||
The above is a logical OR, and is used as so:
if(a == 5 || a == 10) printf("The 'a' variable is either 5, or 10\n"); |
- !
This operator is used to reverse a bit, such as 1111 ^ 0010 = 1101.
- &&
The AND operator is used similar to the OR operator, an example is shown below:
/* Assume a and b are user input */ if(a == 5 && b == 10) printf("'a' and 'b' are correct.\n"); |
Bitwise Operators
C has six bitwise operators which are used to modify bits, similar to logical operators. Bitwise operators have more flexibility and, have twice the amount of operators than Logical ones.
- &
That is the AND bitwise operator, and works by turning on bits, if when comparing two values, such as 1101
and 0011
, at least one bit is simultaneously turned on. An example:
1101 & |
0011 |
---------- |
0001 |
- |
This is a bitwise OR operator, and works just as the AND bitwise operator, however, as long as one of the bits in the two values are turned on, the result will return 1:
1101 |
---------- |
1111 |
- ^
The XOR, exclusive OR operator, only has one difference from the OR operator. This being, if more than one bit is turned on, then the output will return a 0:
0011 ^ |
1010 |
---------- |
1001 |
- ~
The unary operator reverses a bit. Such that, if the value is 1, the output will be 0, and vice versa. These next two operators are a little bit more complicated, however, still very simple to grasp and use.
- >> and <<
A right shift, or left shit, is used when there is need to shift bits to either side, by however many steps.
[ Shift right by 1 ] 00111101 >> 00011110
[ Shift right by 2 ] 00111101 >> 00001111
When shifting bits through C code, the number following the operand, is the value to shift the bits to either side. An example of this:
int a = 1337, b = 2, c; printf("'a' is currently %d\n", a); c = a >> b; printf("'a' is now %d\n", c); return 0; |
The program would output the following:
'a' is 1337 'a' is now 334
...which is exactly what the value would be if 1337 was converted into binary, shifted two bits to the right, and back into decimal. The same applies for a left shift, however, to the opposite side:
int a = 1337, b = 2, c; printf("'a' is currently %d\n", a); c = a << b; printf("'a' is now %d\n", c); return 0; |
And the program output to follow:
'a' is 1337 'a' is now 5348
Arithmetic
In C, there are arithmetic operators for addition, subtraction, multiplication, division, and modulus.
Operation | Operator | Assignment Operator |
---|---|---|
Addition | + | += |
Subtraction | - | -= |
Multiplication | * | *= |
Division | / | /= |
Modulus | % | %= |
Each operator can be used as an assignment operator:
int main(int argc, const char *argv[]) { int a = 3, b = 4; a *= 4; a += b; return 0; } |
On line 4, the variable a will be equal to a multiplied by 4, on line 5, a is equal to a plus b.
Loops
In C there are three types of loops that allow the user to accomplish a repetitive task without repeating numerous lines of code. These three basic loops are called the for loop, the while loop, and the do while loop. Each loop has their own purpose for being used and normally follow the same syntax. All loops are based of an equation and if that equation does not evaluate to true then the looping will not halt. A for loop is good for a repetitive task that you know how many times you want to repeat, while a while loops is normally used when how many times you need to loop is unknown like when you are reading a text document. A do while loop is almost the same as a while loop except for one difference, it runs its code at least once before checking if it should stop looping
The for loop loop deserves special attention as it's slightly more complicated than the other two:
int i; for(i = 0; i < 10; i++) //Assign 0 to integer one; when "i" is less then 10 increment "i" by one { //code to repeat 9 times } |
The loop check has 3 parts. The part before the first semicolon, in this case "i = 0", is the loop initializer. The code inside it is run once when the loop is first called, so in this case, it sets the value of "i" to 0. The second part is the check or predicate, this is a statement that's treated as a boolean (true/false) value, if the statement ever evaluates to false (or 0), the loop ends. The last part is the block that's executed *at the end of* a full iteration of the loop.
char myChar; while(myChar != 'c') //While "myChar" does not equal "c" continue to loop { scanf("%c", &myChar); //get input from the user and put it into variable "myChar" } |
do //loop at least once { x = x + 1; //variable x equals itself plus one (if x equals 0 then x = 0 + 1) } while(x < 2); //check to see if condition to stop looping is met |
If/Else
If/Else statements are used when you need some way to control the flow of execution of your code. These statements are just like asking questions and depending upon if the answer is true or false the program may execute differently.
Simple example:
if(1 == 1) //if 1 equals 1 execute the true code block { printf("This is the true code block"); //execute the true code block } else{ //if the statement is not true printf("This is the false code block"); //execute the false code block } |
You can chain "else" and "if" to create complex flows. Here is an example:
if(a == 1) //if a equals 1 execute the following block { printf("a equals 1!"); } else if (b == 1) { //otherwise, check if b equals 1 printf("b equals 1!"); } else { // This executes only if all the predicates before it failed, else can be used as a "catch-all" printf("Neither a or b equal 1!"); } |
File Stream
Opening a file with C so that your program can use the containing information is a bit more complicated than one would suspect. The following example shows how to store all the information from the file in a local variable.
#include <stdio.h> #include <stdlib.h> int main(void) { int fLen = 0; FILE *textFile = NULL; char *fileBytes = NULL; if ((textFile = fopen("/home/directory/textfile", "r")) == NULL) { // if /home/../textfile is found assign its content to the variable *textFile printf("No file.\n"); // else print "No file." return 0; } fseek(textFile, 0, SEEK_END); // determine file length fLen = ftell(textFile); if ((fileBytes = malloc(fLen * sizeof(char) + 1)) == NULL) { // if space the size of fLen + 1 byte can be allocated, assign it to *fileBytes printf("Memory allocation failed.\n"); // else close the file. fclose(textFile); return 0; } fseek(textFile, SEEK_SET, 0); // copy the content from *textFile over to *fileBytes fread(fileBytes, sizeof(char), fLen, textFile); fileBytes[fLen + 1] = '\0'; printf("%s", fileBytes); // print out *fileBytes } |
Compilation
In order to run a finished program you will have to create an executable binary by compiling the source code, which is done with cc/gcc
gcc -Wall -o <output file> <sourcecode file> chmod +x <output file> |
where -Wall enables full documentation of warnings during the compilation process.
Sometimes you will have to include a specific library and if it's not found, specify the directory in which it is contained. This is done with the -l and -L parameters, for example if you need to include the library libconv-core.a which is located in /usr/local/char/lib the command to use would look like this
gcc -Wall -lconv-core -L/usr/local/char/lib -o <output file> <sourcecode file> |
Example Program
This simple little script makes use of two subfunctions (datew and daten) which are each called by another function (main->daten->datew) in order to generate a list of all dates inbetween the year 999 and the end of 2012, in all numeric and a few alphabetic formats, with only those simple methods described in this article.
#include <stdio.h> int daten(char *limiter) { int month; int day; int n; for (n = 999; n < 2013; n++) { for (month = 1; month < 13; month++) { for (day = 1; day < 32; day++) { if ((day < 10) && (month < 10)) { printf("0%d%s%d%s%d\n", day, limiter, month, limiter, n); printf("%d%s%d%s0%d\n", n, limiter, month, limiter, day); printf("%d%s0%d%s%d\n", day, limiter, month, limiter, n); printf("%d%s0%d%s%d\n", n, limiter, month, limiter, day); printf("0%d%s0%d%s%d\n", day, limiter, month, limiter, n); printf("%d%s0%d%s0%d\n", n, limiter, month, limiter, day); printf("%d%s%d%s%d\n", day, limiter, month, limiter, n); printf("%d%s%d%s%d\n", n, limiter, month, limiter, day); } else if ((day < 10) && (month >= 10)) { printf("0%d%s%d%s%d\n", day, limiter, month, limiter, n); printf("%d%s0%d%s%d\n", month, limiter, day, limiter, n); printf("%d%s%d%s0%d\n", n, limiter, month, limiter, day); printf("%d%s%d%s%d\n", day, limiter, month, limiter, n); printf("%d%s%d%s%d\n", month, limiter, day, limiter, n); printf("%d%s%d%s%d\n", n, limiter, month, limiter, day); } else if ((month < 10) && (day >= 10)) { printf("%d%s0%d%s%d\n", day, limiter, month, limiter, n); printf("0%d%s%d%s%d\n", month, limiter, day, limiter, n); printf("%d%s0%d%s%d\n", n, limiter, month, limiter, day); printf("%d%s%d%s%d\n", day, limiter, month, limiter, n); printf("%d%s%d%s%d\n", month, limiter, day, limiter, n); printf("%d%s%d%s%d\n", n, limiter, month, limiter, day); } else { printf("%d%s%d%s%d\n", day, limiter, month, limiter, n); printf("%d%s%d%s%d\n", month, limiter, day, limiter, n); printf("%d%s%d%s%d\n", n, limiter, month, limiter, day); } datew(day, month, n, limiter); } } } } int datew(int day, int month, int n, char *limiter) { char *en; char *fr; if (month == 1) { en = "January"; fr = "Janvier"; } if (month == 2) { en = "February"; fr = "F\xe9" "vrier"; } if (month == 3) { en = "March"; fr = "Mars"; } if (month == 4) { en = "April"; fr = "Avril"; } if (month == 5) { en = "May"; fr = "Mai"; } if (month == 6) { en = "June"; fr = "Juin"; } if (month == 7) { en = "July"; fr = "Juillet"; } if (month == 8) { en = "August"; fr = "Ao\xfb" "t"; } if (month == 9) { en = "Septembre"; fr = "Septembre"; } if (month == 10) { en = "October"; fr = "Octobre"; } if (month == 11) { en = "November"; fr = "Novembre"; } if (month == 12) { en = "December"; fr = "D\xe9" "cembre"; } if (day < 10) { printf("0%d%s%s%s%d\n", day, limiter, en, limiter, n); printf("%s%s0%d%s%d\n", en, limiter, day, limiter, n); printf("0%d%s%s%s%d\n", day, limiter, fr, limiter, n); printf("%s%s0%d%s%d\n", fr, limiter, day, limiter, n); } printf("%d%s%s%s%d\n", day, limiter, en, limiter, n); printf("%s%s%d%s%d\n", en, limiter, day, limiter, n); printf("%d%s%s%s%d\n", day, limiter, fr, limiter, n); printf("%s%s%d%s%d\n", fr, limiter, day, limiter, n); } int main(void) { daten("\\"); daten("/"); daten(" "); daten("_"); daten("*"); daten("^"); daten("-"); } |
In order to compile and save the binary under the filename "date-gen" run the following:
gcc <filename.c> -o date-gen chmod +x date-gen |
If you sort -u the output and use awk to delete all entries that're shorter than 8 characters this script will create a wordlist of a few million entries which can be used for the purpose of WPA password-recovery and the likes.
./date-gen | sort -u >> wordlist.txt awk '{if ((length($0) >= 8) && (length($0) <= 63)){ print $0 }}' wordlist.txt > wordlist.wpa.txt |
Intermediate Concepts
In this section the implementation and design of the C language is covered, along with some of the harder to understand concepts in C.
The Language and the Machine
C is often misinterpreted as a low-level language since as it abstracts little in terms of memory management and doesn't provide many "abstracted" data structures. In fact, the primitive types in C are based on what is available in the hardware, 32-bit integers, 64-bit floats, and so on. Although it is very close to the machine in this sense, and often touted as a "portable assembler", it is still a high-level language since it's grammar is not a 1 to 1 (or close) mapping to the underlying instruction set. It requires a compiler because it is a high level language, as low-level languages need only assemblers or substitution/translator operations.
Having the language be so close to the machine is often seen as a disadvantage, but if you understand the underlying implementation stack and the environment you're working in, it instead becomes a huge advantage. It allows you to create applications that perform well, and most importantly, perform predictably, and hence are secure.
Pointers in C
Pointers are an important concept and feature in C. All variables can be treated as pointers, and the ability to inspect pointers gives one an insight into how the memory model of the underlying machine works. In short, pointers are exactly what their name implies, they "point" to other data. When you access a variable, you are accessing the data the variable is associated with. When you access the value of a pointer, you are accessing the address of that variable. When you "dereference" a pointer, you are accessing the value -stored- at the address. Pointers are declared like so:
int *pointer; // declare integer pointer int number = 4; // declare variable that holds the value 4 pointer = &number; // Assign the address of the "number" variable to the pointer *pointer = 5; // Assign the value 5 to the address contained within "pointer", // which at the moment is the address of "number", which changes the value of "number" to 5 |
The data type comes first, then the pointer's name prefixed by the *, usually called the "dereference" operator. The data type is necessary as it tells the compiler how much data the pointer's domain covers, as where a char is one byte, an int is usually 4 bytes, so if you try to access an int through a char pointer, you will only access the first (or last in little endian) byte. This is allowed, and used in many cases, but you should make sure that it's intentional, the compiler will always issue warnings unless you manually dispell them using a cast (a cast is just a way to tell the compiler explicitly how to treat the data, so that it doesn't complain, it will be covered in the later sections).
In reality, every variable is in itself a pointer. If you use the address-of operator &, you can get the address of any variable, which is exactly what a pointer stores as its value. It's just that accessing a variable's value directly accesses its memory, where accessing a pointer's value directly accesses the address of where that value is stored. It's only a conceptual abstraction designed to give the programmer the ability to think of stored values as discrete and isolated.
The memory model
What is meant by "the address of" a variable? Implementation-wise, all variables are stored on a linear memory array called a "stack". The implementation of a stack itself will be covered later on in the Data Structures section, but in short, it has the properties that you can push data onto it, and pop data from it. A "push" puts data on top of the stack, and a "pop" retrieves data from the top of the stack. You can think of the stack as a pile of papers, and each paper has a number. The one on the bottom has the lowest page number, '0', and the one on the top has the number of all the papers currently in the pile. If you "pop" a paper from the pile, you take one off of the pile. If you "push" a paper to the pile, you add one to the top of the pile, increasing the page count. Each page has an "address", which is its page number, and the value of that "address" can be seen as what is written onto the page. In the computer's stack, each address can hold a 1-byte value, which can be represented in the paper stack as two hexadecimal numbers. Now, in the machine's memory model, the stack is allocated at a high address and "grows" down as more data is allocated. The paper pile analogy lends itself well here nonetheless, if for example the stack starts at memory address 100, and I "push" 4 bytes (pieces of paper) onto it, the new address (number of free pages) will be 96. So you can view the machine's stack as a subtractive model. You start with a certain number of blank pages, and when you "push" the number of blank pages decrease. When you "pop" it is similar to blanking out the pages again. The current address, or which page to write to next or pop from, is stored in the stack pointer.
As a practical example, I'll describe what happens in the stack in the following C source snippet:
int varone; int vartwo; int *point; varone = 5; point = &varone; |
Here we declare 3 variables. Two integers and one integer pointer. The two integers each take up 4 bytes, so if the stack space started at 100, the first integer is stored at 97-100, the second integer is stored at 93-96, and, assuming a 32-bit architecture, the integer pointer is stored at 89-92. Now, when we assign the value '5' to varone, we are actually writing that to the pages 97-100. Stored as a 32-bit integer in little-endian, the pages would look as follows: 0x05,0x00,0x00,0x00 where 0x05 is page 97 and pages 98, 99, and 100 have a 0x00 on them. In the next line we assign the address of varone to point. Remember the address of varone is 97, since that's where its data starts on the page, so we are writing 0x61,0x00,0x00,0x00 to pages 89, 90, 91, 92 respectively. 0x61 equals 97 in hex. So when we dereference the pointer "point" in C, we are actually saying "access the data at page 97-100". When we assign a value to the pointer, we are just changing the page number saved inside the pointer. In this sense you can think of pointers like memory bookmarks.
By reference vs By value
Building on pointers, in C you can pass values to functions either by reference or by value. Passing variables by reference means you're passing the address of that variable to the function, so only the 4 (or 8) byte address gets pushed onto the stack. If you pass variables by value, the entire data block the variable references is pushed onto the stack. An important difference is that if data is passed by value, a function has access to a local -copy- of the data, which disappears after the function finishes. If a variable is passed by reference, the function operates on the same memory the original variable referenced, so any changes made will persist when the function ends.
Here's an example of pass by value:
void byvalue(int number){ number = 10; printf("%d\n", number); return; } int main(void){ int local = 5; byvalue(local); // prints 10 printf("%d\n", local); // prints 5, value unchanged as function only got a copy return 0; } |
Here we are pushing the value "5" onto the stack, making a copy, and the function byvalue builds a reference to it named "number". The memory changed to contain "10", the value is then printed, and when the function ends, the stack gets popped again, and the memory where the "10" is held is relinquished back to the system.
Here, instead, we will do the same thing, but we'll pass the variable by reference:
void byref(int *number){ *number = 10; printf("%d\n", *number); return; } int main(void){ int local = 5; byref(&local); // prints 10 printf("%d\n", local); // prints 10, value changed since the function //accessed the memory directly instead of making a copy return 0; } |
Here, instead of the value "5" being pushed to the stack, we are pushing the page number of "local" to the stack. When the function is called, it uses the page number to find the memory and assigns "10" to it, prints the new value at that memory, and returns. When the function returns, only the memory bookmark (the pointer) is released back to the system, the value remains changed since the function wrote to previously allocated memory directly. So when the main function prints the value of "local", it results in the new value being printed, "10".
In most cases, especially when dealing with large data structures, it is more efficient to pass variables by reference. If the function does not or should not have access to the original memory page, then a call by value can be done, or you may mark the memory as constant using the "const" identifier, but this causes problems when the data passed to the function needs to be changed temporarily, so it is something to think about on a case by case basis.
Function returns work in the same way. If a function returns a pointer, it must be a pointer that holds an address allocated on the heap (with malloc), if it returns a pointer that has data allocated on the function's local stack, that data will be released as soon as the function terminates, and this is a common cause of segfaults. If you return a variable by value, then a copy of the data block is passed to the calling function, and it will persist in the calling function's scope.
Data Structures
C is a simple language by design, and therefore does not natively support many of the complex data structures other languages support, but any data structure that is computationally feasible can be implemented in C, and being able to implement the data structures yourself has the advantage that, you can customize them to do what you want, and as you'll actually understand them when implementing them, you are able to use them more efficiently than if you were just given them as an abstracted device.
The struct
The general data structure in C is struct. A struct definition is just a collection of values unified inside of a single variable reference. Here's an example:
struct String { int length; char *string; }; |
This defines a structure named "String", which has one integer member named "length", and one character pointer named "string". Note that this does not declare the structure, as in, no memory is allocated yet. The structure is only defined. To actually declare a variable with a type of this structure, this syntax is used:
struct String mystring; |
This declares a variable "mystring" of type "struct String". To access the members of "mystring" the . (dot) operator is used:
mystring.string = "A sample string"; mystring.length = strlen(mystring.string); |
Here we assign the string "A sample string" to the character pointer (the "A sample string" is stored in static memory, we've only assigned the location of the string to the pointer), and assigns the length of the string to mystring.length using the standard library function strlen, which takes a character pointer and returns the length of the string the pointer points to.
Struct definitions are used to compartmentalize data, and along with wrapper functions, you can abstract away operations and implement any data type. As a simple example, using the String structure defined, I'll implement a String data type, which will give us the ability to use strings having to worry explicitly about their length:
#include <stdio.h> #include <string.h> #include <stdlib.h> struct String { int length; char *string; } typedef String; String newString(const char *string){ String myString; myString.length = strlen(string); myString.string = malloc(myString.length); memcpy(myString.string, string, myString.length); return myString; } String catString(String first, String second){ String myString; myString.length = first.length + second.length; myString.string = malloc(myString.length); memcpy(myString.string, first.string, first.length); memcpy(myString.string+first.length, second.string, second.length); return myString; } String cloneString(String toClone){ String myString; myString.length = toClone.length; myString.string = malloc(toClone.length); memcpy(myString.string, toClone.string, toClone.length); return myString; } void appendString(String *myString, const char *toAppend){ int toAppendLength = strlen(toAppend); myString->string = realloc(myString->string, myString->length + toAppendLength); memcpy(myString->string+myString->length, toAppend, toAppendLength); myString->length = myString->length + toAppendLength; return; } void prependString(String *myString, const char *toPrepend){ int toPrependLength = strlen(toPrepend); myString->string = realloc(myString->string, myString->length + toPrependLength); char *temp = malloc(myString->length); memcpy(temp, myString->string, myString->length); memcpy(myString->string, toPrepend, toPrependLength); memcpy(myString->string+toPrependLength, temp, myString->length); myString->length = myString->length + toPrependLength; free(temp); return; } void destroyString(String *myString){ myString->length = 0; free(myString->string); return; } void printString(String myString){ fprintf(stdout, "%s\n", myString.string); return; } int lengthOfString(String myString){ return myString.length; } int main(void){ String aString = newString("A sample string... "); String bString = newString("Another sample string... "); String cString = catString(aString, bString); String cStringCopy = cloneString(cString); printString(cString); fprintf(stdout, "(Before append/prepend) Length of cStringCopy: %d\n", lengthOfString(cStringCopy)); appendString(&cStringCopy, "(This is a copy)"); prependString(&cStringCopy, "(This is a copy) "); printString(cStringCopy); fprintf(stdout, "(After append/prepend) Length of cStringCopy: %d\n", lengthOfString(cStringCopy)); /* Free all the strings */ destroyString(&aString); destroyString(&bString); destroyString(&cString); destroyString(&cStringCopy); return 0; } |
Here "typedef" just gives a shortcut so that we don't have to type "struct" in front of string for each declaration.
When you have a structure pointer, the notation for accessing the members of the structure that the pointer points to is as follows:
struct String aString; struct String *stringPointer; stringPointer = &aString; stringPointer->length // accesses the "length" member of aString, equivalent to doing aString.length |
It dereferences the pointer and then accesses the member at the same time. It is syntactic sugar, the following is functionally equivalent:
(*stringPointer).length |
But much "uglier".
The functionality in this example may seem trivial, but the concepts can be extended to implementing anything from linked lists to hash tables.
Optimizing Security of your Programs
In this part we will present you a few functions that should be avoided and their counter pieces which are to be preferred, as well as general advice on secure programming with C.
Avoiding Buffer Overflow Vulnerabilities
Buffer Overflows occur when programs try to store more information in a variable than it has memory allocated for. For example if you declare a variable that's defined as an array of 9 characters it has space for 8 characters plus the terminating null, so if this variable receives its input from stdin it is easy for a user to corrupt your programs functionality simply by assigning it 9 or more characters. This can only be avoided by sanitizing your input, for example consider the following code snippet:
char password[9]; printf("Please enter your password: "); fflush(stdout); gets(password); |
The danger here lies within the gets() function which simply copies the whole input onto the stack which will most likely result in data corruption. The safe way to solve this would be to use fgets() instead, which has additional parameters that allow you to sanitize input. Simply change the last line of the program to
fgets(password, 9, stdin); |
Like this the fgets() function only copies the first 8 characters from stdin onto the stack, following the terminating null. Because of that you should also always check the length of the input, it needs to be 'sizeof() - 1' in order to avoid overwriting the null byte.
Other functions that are to be avoided for the same reasons are strcpy(), strlen() and sprintf(). Instead use their safer counterparts strncpy(), strnlen() and snprintf().
Initial Variable Values
When you declare a global variable in C, its value is implicitly initialized to zero. Now if you redeclare it as a local variable, the compiler simply turns the variable into an increment of its first assigned value in order to represent the new one. Or rather: it uses whatever value that the last function assigned to the specific part of stack memory where the variable is allocated at. This can have as a consequence, that the behavior of your program can become unpredictable when using uninitialized variables. For example let's say you start writing your program and declare a variable at the very beginning of your code as follows
int n = 0; |
Now you decide later on that you want to use n in order to conditionalize a function, so you add something like
if [condition] { int n = 23; } |
This code will compile and work correctly, except for the cases when the condition turns out true. In those cases n will simply be interpreted as 0 by the compiler, because as soon as the code inside of the if {} brackets is executed, the local variable n is deliberately ignored for all further processes and the initial global variable n with the value 0 comes back into place. This is relevant not only for effective but also for secure programming, because if your program contains functions that can read information from the stack this can turn out to be a serious security flaw.
Program Environment
A well written program should never be designed in such way that it relies on information about its environment, such as the working directory or the value of its umask, which is why you should only use full-path names instead of relative names in order to work with external files. Also you should consider what UID and GID you will let your program run under, for example it is extremely important that vulnerable or potentially malicious programs don't run as root. It is always advised to grant the program only the permissions it absolutely needs to perform its tasks, so you can for example consider to run the program under a designated UID and a correspondant GID, giving them only restricted access to whatever the program requires.
Disable Core Dumping
Limiting the size of your programs core dump to 0 bytes ensures that potential attackers can't find out sensitive information about the programs memory management. This is done either with the shell-command ulimit before running the program or internally calling the setrlimit() function at the beginning of the program.
system() & popen()
These functions are used to call exterior programs that are installed on your system. They should be avoided because they will spawn a shell in order to do so. Instead use fork() or exec() to achieve the same goal without compromising the security of your programs environment.