Sunday, January 15, 2012

Bash Basics

I've been doing a lot of bash scripting lately.  I haven't really done any bash scripting since, oh, 2007 or so, and nothing to complicated even then, so it's been an interesting learning / remembering experience for me.  Hopefully writing it down this time will make it easier next time.  Here's a preview of what you'll find here:

  • Defining and using variables
  • Variable and arithmetic expansions
  • Arrays
  • If statements
  • While loops
  • For loops
  • Input redirection, output redirection, and pipes
  • Regex tools: grep, sed
  • Command substitution
  • Useful commands

Before I start...

I'm going to be giving a lot of example code.  If you have a bash terminal, you should be able to follow right along, typing what I type, and get the exact same output.  A line of code (what you type) will look like this:

line of code

And a line of output (what you get back) will look like this:

line of output.

With that said, let's look at...

Defining and Using Variables

Defining a variable looks like this:

variable=value

Pretty simple.  Make sure you don't put spaces before the equals sign, or it will think you are trying to run a command called variable:

variable = value
bash: variable: command not found

You can access the value of a variable using the $ symbol.  Here, I'll use the built-in command echo to write the value of our variable to the console:

echo $variable
value

If I had left out the $, you would see this instead:

echo variable
variable

The $ is part of our next topic, parameter substitution, which is a huge part of bash.  Parameter substitution will also happen within double quotes, but not single quotes:

echo "$variable"
value
echo '$variable'
$variable

This is mostly useful when you need a string with spaces in it to function as a single unit.

Parameter Substitution and Arithmetic Expansions

Parameter substitution is a big part of bash.  We've already seen the most basic use, the $ symbol.  Putting $ before a variable is shorthand for ${variableName}:

echo ${variable}
value

This is the most basic form of substituion, replacing the expression with the value of the variable.  Many other types of substitutions are possible - more than I am able to write about here.  A more exhaustive list can be found here: http://tldp.org/LDP/abs/html/parameter-substitution.html

You can get the length of a variable with the # symbol:

echo ${#variable}
5

You can get a subset starting at an offset with the : symbol:

echo ${variable:1}
alue

You can get a subset starting at an offset with a given length with an addition : symbol:

echo ${variable:1:3}
alu

It's also possible to do some pattern matching with parameter substitution, using the #, ##, %, %% operators.  The # operators match at the beginning, the % match at the end.  The single symbols match the shortest match, and doubled symbols match the longest match.  All of them remove what they match.  Here they are in action:

variable=aaa.bbb.ccc
echo ${variable#*.}
bbb.ccc
echo ${variable##*.}
ccc
echo ${variable%.*}
aaa.bbb
echo ${variable%%.*}
aaa

You can also do some substitution using the / symbol:

echo ${variable/b/d}
aaa.dbb.ccc

Notice that it only replaced the first match.  Doubling the first / will make it replace all matches:

echo ${variable//b/d}
aaa.ddd.ccc

You can use the # and % as part of your pattern to require it to match the prefix or suffix of the variable:

echo ${variable/#bbb/ddd}
aaa.bbb.ccc
echo ${variable/#aaa/ddd}
ddd.bbb.ccc
echo ${variable/%bbb/ddd}
aaa.bbb.ccc
echo ${variable/%ccc/ddd}
aaa.bbb.ddd

Arithmetic is similar, but uses double parentheses instead of curly braces:

two=$(( 1 + 1 ))
echo $two
2

Variables within curly braces don't need a $ prefix:

three=$(( two + 1 ))
echo $three
3

If you aren't using the value of the expression, like when incrementing a variable, you have to leave off the $:

$(( three++ ))
bash: 3: command not found
(( three++ ))
echo $three
5

Notice that the increment still happened both times, despite the error.

Arrays

You can define variables that are arrays like this:

array=( 1 2 3 4 5 6 7 8 9 10 )
echo $array
1

Wait, why did it only print 1?  It's because you have to expand the array using [@] to see the whole thing:

echo ${array[@]}
1 2 3 4 5 6 7 8 9 10

Subsets work with expanded arrays like they did with strings:

echo ${array[@]:3:4}
4 5 6 7

You can also get the size of an array:

echo ${#array[@]}
10

You can access individual elements of the array with [ ], using zero-based indices:

echo ${array[3]}
4

If you have bash 4.2 or later (use 'bash --version' to check your version), you can use a negative index to count backward from the end:

echo ${array[-3]}
8

Using the above, you can see how you might add to an array:

array=( ${array[@]} 11 )
echo ${array[@]}
1 2 3 4 5 6 7 8 9 10 11

We'll see how to use arrays in loops later on.

If Statements

If statements are exactly what you might think they are.  A condition to test and code to execute if it's true.  The -z operator tests to see if a variable is defined or not.

if [ -z $nullvariable ]
then
echo "variable is undefined"
fi
variable is undefined
if [ -z $variable ]
then
echo "variable is undefined"
fi

You can provide an else case if the test fails:

if [ -z $variable ]
then
echo "variable is undefined"
else
echo "variable is defined"
fi
variable is defined

You can also chain conditions together using elif, and at this point it's helpful to use a ; to join statements into single lines:

if [ -z $variable ]; then
echo "undefined"
elif [ ${#variable} -gt 10 ]; then
echo "variable size is greater than 10"
else
echo "variable size is less than or equal to 10"
fi
variable size is greater than 10

There's too many different operators for use in these conditions to go over them all here, but here are a few I've used.

  • ! - logical not
  • -a - logical and
  • -o - logical or
  • -gt - greater than
  • -ge - greater than or equal to
  • -lt - less than
  • -le - less than or equal to
  • -eq - equality
  • -ne - inequality
  • -z - true if undefined
  • -n - true if defined

It's a good idea to put your variables in quotes when doing most comparisons.  Evaluating the condition will expand the variable, and if it's a string with spaces, your condition may become invalid. 

While Loops

While loops look a lot like a basic if statement:

i=0
while [ $i < 5 ]
do
echo Iteration number $i
done
Iteration 0
Iteration 1
Iteration 2
Iteration 3
Iteration 4

Any expression with a boolean return value can be used as the loop condition.  Even other bash commands will work.  For example, this code will allow you print each line of a file:

--- file.txt ---
Line 1
Line 2
Line 3
--- end file.txt ---
while read line
do
echo $line
done < file.txt
Line 1
Line 2
Line 3

Note the input redirection at the end.  That's how the read command knows what file to read.  More on that later.

For Loops

I've used two kinds of for loops so far.  The first will iterate over a list:

for i in 1 2 3 4 5
do
echo Iteration number $i
done
Iteration number 1
Iteration number 2
Iteration number 3
Iteration number 4
Iteration number 5

It will also iterate over an expanded array:

for i in $array[@]
do
echo $i
done
1
2
3
4
5
6
7
8
9
10
11

The second kind is just like a 3 expression for loop in a language like C, and uses arithmetic expansion-like syntax:

for (( i=0; i<5; i++ ))
do
echo Iteration number $i
done
Iteration number 0
Iteration number 1
Iteration number 2
Iteration number 3
Iteration number 4

Input Redirection, Output Redirection, and Pipes

Occasionally, you'll want to interact with the file system, instead of just using console input and output.  There are several ways to accomplish this.

Input can be rerouted with the < symbol, as we've already seen.  Here's an simpler example using the file we used earlier in the while loop example, with the the sort command.  It does just what it says - it sorts its input line by line.  The -r switch performs the sort in reverse order:

sort -r < file.txt
Line 3
Line 2
Line 1

Output redirection is accomplished with the > symbol:

echo some text > file2.txt

Notice that no output is produced on the console.  But if we check the contents of file2.txt:

cat file2.txt
some text

The output is there.  The > operator will overwrite an existing file.  If you want to append instead, use >>:

echo some more text >> file2.txt
cat file2.txt
some text
some more text

You can use both input and output redirection on the same line:

sort -r < file.txt > file2.txt
cat file2.txt
Line 3
Line 2
Line 1

Sometimes, you will want to use the output of one command as the input to another.  This can be done with the | symbol, called a pipe:

cat file.txt | sort -r
Line 3
Line 2
Line 1

You can chain pipes together if you like:

cat file.txt | sort -r | head -1
Line 3

Regex Tools: grep, sed

If you do much file parsing, you're going to need regular expressions sooner or later!  The full range of possibilities with regular expressions is beyond the scope of this post, as are all the ins and outs of sed - it is, after all, a Turing-complete language all by itself.  However, this should get you going with the basics.

grep is a tool for pattern matching.  Normally, it will match line by line, and output lines that match:

grep 1 file.txt
Line 1

If you need to to do full regex, you'll need the -E option:

grep -E 'Line [2-9]+' file.txt
Line 2
Line 3

grep also exits with a success code if it finds a match, and with a failure code if it doesn't.  This makes it useful in conditional statements - especially with the -q option, which suppresses output:

if grep -q Line file.txt; then
echo Found
fi
Found

Notice the lack of [ ] in this condition. The reason is that grep is a command. It turns out that [ also happens to be a command, and the different operators are passed to it as command-line options! Any command that returns with an exit code can be used in a condition.

If you need to do replacements, you'll need to use sed instead, which does pattern matching and replacement.  Here's a basic example:

sed 's/Line [2-9]/DELETED/g' < file.txt
Line 1
DELETED
DELETED

Here's what's going on.  s is the operation, meaning substitute.  It's one of many operations, but probably the most common.  The next character that follows will be used as your delimiter between parts.  Pick something that is not in your pattern or replacement.  Commonly /, :, |, and _ are used.  Next is the pattern that will be matched, followed by a delimiter, then the replacement, then another delimiter, and then finally options.  In this case, we used a g, which means replace all within a line, not just the first option.

If you want to use part of the match in the replacement, you can use backreferences - a \ followed by a digit 1 through 9.  Backreferences are captured by placing parts of your pattern in escaped parentheses \( \):

sed 's/Line \([2-9]\)/DELETED \1/' < file.txt
Line 1
DELETED 2
DELETED 3

You can save up to 9 backreferences in a single pattern.

A great resource for regular expressions in general can be found here: http://www.regular-expressions.info/

Some grep examples: http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_02.html

Lots more about sed can be found here: http://www.grymoire.com/Unix/Sed.html

Command Substitution

Command substitution allows you to capture the output of a command or series of commands as a single unit, to do with as you please, using backticks ` `.  For example, you can assign the cumulative output of a loop to a variable:

loopvar=`for i in 1 2 3 4 5; do echo -n $i; done`
echo $loopvar
12345

Note that the commands between the backticks will execute in a subshell, but you will still have access to variables from the parent shell:

loopvar2=`echo $loopvar`
echo $loopvar2
12345

More on command substitution here: http://tldp.org/LDP/abs/html/commandsub.html

Useful Commands

There's a lot of built in Unix and bash commands that can be extremely useful when writing bash scripts.  Each has their own list of options; type 'man <command>' to find out more about them.  You can find a better list with examples here: http://tldp.org/LDP/abs/html/internal.html

  • cat - concatenates its input to its output stream
  • cd - change directory
  • cp - copy files
  • du - get file size information
  • echo - produce output
  • exit - ends the script, optionally with an exit value
  • getopts - helper for processing command line arguments and options
  • ls - list directory contents
  • kill - kill process by process ID
  • mv - move or rename a file
  • printf - produce formatted output
  • ps - list running processes
  • pwd - print working directory
  • read - capture line of input into a variable
  • sleep - wait the specified number of seconds
  • sort - sort text line by line
  • wait - wait for a process ID to complete

Anyway, that's definitely more than enough to get you started (or re-started) with bash.  Enjoy!

No comments:

Post a Comment