My Avatar

Jun Li

Live it! Love it! Enjoy it!

Shell Programming

2016年11月07日 Monday, 发表于 纽约

如果你对本文有任何的建议或者疑问, 可以在 这里给我提 Issues, 谢谢! :)

Today, I want to summarize some useful shell programming for ease of reference. These are just some basic commands but I used to forget them and have to Google them over and over and over again.

Overview

1. Basic Bash Shell

First, enter your “Terminal”. The most useful and common commands as follows:

2. Manipulate Output

First, we need to find some data for practice, for example: run

1
wget https://www.*****.com/README.txt

to get data from the Internet. Then use

1
cat README.txt

to see the contents of the file. But sometimes the file is too large that you don’t want to print all the contents in the terminal output. To see the beginning of the file, use

1
head README.txt

To see the tail of the file, use

1
tail README.txt

Also, we can specify the number of lines in the output, use

1
2
head --lines=3 README.txt
tail --lines=3 README.txt

To show full page at a time, use (enter “Enter” to scroll through the file, enter “q” to quit)

1
less README.txt

To count the number of lines of the output file, use

1
wc -l README.txt

To see lines that contain word “book”, use

1
grep "book" README.txt

To see lines that contain word “book”, the save them to a .txt file ( “>” means to redirect the output)

1
grep "book" README.txt > book.txt
To find the number of lines that contain word “book”, use ( “ ” means send output of “grep” to “wc” )
1
grep "book" README.txt | wc -l

Note: “>” means cover the original content in the file. “»” means not cover, new data will append to the file instead.

3. Loops and Conditionals

Like most of the programming languages, Bash supports loops and conditionals.

1
2
3
for i in 1 2 3; do
  echo "$i"
done

The blank in the first line of the code is crucial and you cannot omit it.

1
2
3
4
for i in $( seq 1 10 )
do
  echo "$i"
done

Here, use $ to access the value of a variable, and we can name our variable whatever we want:

1
2
3
for animal in "cat" "dog" "monkey" "bird"; do
  echo "I am a $animal" > "$animal".txt
done

This will automatically create 4 txt files, the name of these files are the animal’s name and the contents of these files are “I am a *****”

We can also loop all the files, for example,

1
2
3
for file in *.txt; do
  echo "There is a file called $file"
done

We can also use “if” statements in BASH,

1
2
3
4
5
i=0
if [ $i -eq 0 ]
then
  echo "The value is zero!"
fi

Then the output of terminal will print “The value is zero!” I give some simple exercises, you can try them.

1
2
3
4
5
6
for animal in "cat" "dog" "monkey" "bird"; do
  if [[ $animal == c* ]]
  then
    echo "$animal starts with c"
  fi
done

4. Workig with Remote Server

In the previous section, we have used “wget” to get data files from the Internet onto our local system. Now we need to copy files from a remote system which we access with “SSH” to our local machine. First, we need to log in the remote server, you need to know the server’s name and password.

1
ssh jl7333@server_name

Then use scp to transfer files, the syntax is:

1
scp source destination

Attention: you should work in your local laptop system whenever you want to fetch files or send files. Fetchfiles to current directory:

1
scp jl7333@server.nyu.edu:~/README.txt .

This README.txt file is under the home directory in the remote server, “.” denotes “here”. If you want to send files to the remote server, use:

1
scp README.txt jl7333@server.nyu.edu:~/README_2.txt

Sometimes we want to upload files to the Internet (opposite process of wget), use:

1
curl --upload-file ./README.txt https://www.*****.com/README.txt

This will return a URL which you can see and download the file you have just uploaded.

5. Shell Scripts

Now we need to do something more complicated, we need to create a workflow. Save all the steps in one file, and run this file whenever we need. It is called Shell Script. First, we create a file use:

1
nano hello.sh

Then write some codes in your .sh file, like this:

1
2
3
#!/bin/bash

echo "Hello World"

The first line is to tell our operation system to using BASH as interpreter. Notice the .sh file is not executable now, we need to modify the mode.

1
chmod a+x hello.sh

Now we could run it using:

1
./hello.sh

or

1
bash hello.sh

Sometimes the program is very time-consuming. It is useful to leave it running and come back later when we work on a remote system.

1
2
3
4
5
6
#!/bin/bash

for i in $(seq 100); do
  echo "on iteration $i"
  sleep 30
done

This script need to run for a long time, so we first start a new screen session with

1
screen -S "jun_new_session"

Then start this time-consuming program in this new session. Then detach from your current screen new session with “ctrl+A+D” Your script is still running! Don’t worry! Now you can do something else. To resume your session, run

1
screen -ls

To lise all the sessions,then run

1
screen -R jun_new_session

to re-attach to your running session.

6. A Simple Practice Task:Sort CitiBike Data

In this exercise, our goal is to retrieve and process a set of data files about CitiBike. We get the CitiBike data from 2016-01 to 2016-09. The data structure is .csv file like this:

Our output should be a file in which each row is the name of a station and the number of times it appears as either the “start station” or “end station” in the dataset. The file should be sorted so that the most popular stations are at the top. We should write all the code in one shell script so that we can reproduce in the future.

Hints: first use “wget” to get data from the Internet, then use “unzip” to extract files. Using “awk” to print data you want, then “sort” “uniq” “wc”

My code as follows:

1
2
3
4
5
6
7
8
9
10
#!/bin/bash

echo "geting data start!"
for i in $( seq 1 9 ); do
  wget http://witestlab.poly.edu/bikes/20160${i}-citibike-tripdata.zip
  mv 20160${i}-citibike-tripdata.zip ~/citibike/data/20160${i}-citibike-tripdat$
  unzip ~/citibike/data/20160${i}-citibike-tripdata.zip
  mv 20160${i}-citibike-tripdata.csv ~/citibike/data/20160${i}-data.csv
done
echo "geting data end!"
1
2
3
4
5
6
7
8
9
10
11
#!/bin/bash

echo "process star"
for i in $( seq 1 9 );do
  awk -F "\"*,\"*" '{print $5}' ~/citibike/data/20160${i}-data.csv >> ~/citibik$
  awk -F "\"*,\"*" '{print $9}' ~/citibike/data/20160${i}-data.csv >> ~/citibik$
done
echo "sort data start"
sort ~/citibike/data/2016-station.txt | uniq -c > ~/citibike/data/sort.txt
sort -r -n -k 1 -t " " ~/citibike/data/sort.txt > ~/citibike/data/result.txt
echo "done!"

The result is:

1
2
3
4
5
6
7
8
9
10
11
12
jl7333@server:~/citibike/data$ head result.txt 
 221327 Pershing Square North
 158631 West St & Chambers St
 153278 W 21 St & 6 Ave
 153254 E 17 St & Broadway
 150042 Broadway & E 22 St
 129900 W 20 St & 11 Ave
 129170 12 Ave & W 40 St
 127663 Greenwich Ave & 8 Ave
 126638 8 Ave & W 33 St
 126540 Cleveland Pl & Spring St
jl7333@server:~/citibike/data$ 

It shows that Pershing Square North station is the most popular station in which bikes are used for 221327 times in the past nine months. Maybe it should put more bikes there in the future.

6. Reference

1.Scientific Computing Workshop from NYU Tandon Department of ECE

2.Stack Overflow