Basic Unix

In the Arizona State University AML610 course “Computational and Statistical Methods in Applied Mathematics”, we will be ultimately be using super computing resources at ASU and the NSF XSEDE initiative to fit the parameters of a biological model to data.  To do this, it is necessary to know basic Unix commands to copy, rename, and delete files and directories, and how to list directories and locate files.  We will also be compiling all our C++ programs from the Unix shell, and in the command line directing the output of our programs to files.

A Unix shell is a command-line interpreter that provides an interface to the Unix operating system.  There are a few different kinds of Unix shells (like bash and C shell, for instance), but for the kinds of simple things we’ll be doing it frankly doesn’t matter what shell you are using.

One of the most useful things to know in any operating system is how to get help.  In Unix, the

man <insert name of Unix command here>

command (short for “manual”) will provide help for a Unix command.  Try typing

man man

which will give you the help page on the man command itself, and the options it takes (options to Unix commands begin with a dash, -). To scroll down in the help page, press the “f” key (for “forward”). To scroll up, press the “b” key (for “backward”). To exit, press the “q” key. Notice in reading about all the options for the man command that the command

man -k <insert name of topic here>

will suggest to you a list of Unix commands and system files related to that topic. Try typing

man -k copy

Look at all the commands and system files it returns! To scroll through the list, use the “f” and “b” keys like before.

One of the most used Unix commands is “ls” to list the contents of a directory (“ls” is Unix shorthand for “list”).  Try typing

ls

on the command line right now. You will get a list of files in your current directory. To find out more about the options of the ls command, type

man ls

The options I tend to use the most are the -l option (which gives the long format of a directory listing that includes the file sizes and modification dates, amongst other things), -t which sorts the directory list by modification date, and -r which reverse sorts. Thus, if you type

ls -art

(note that you can string a bunch of command options together after the “-”) you will get a full list of the files in your directory, sorted from earliest modification date to most recent. Another useful option of the ls command is -F, which will put a “/” after the name of all directories in the directory tree under your current directory.

If you want to make a new directory under your directory tree, you use the mkdir command. Try typing

mkdir anewdir

and then type

ls -F

You should see a directory called anewdir in your file list.
The Unix command “cd” changes directories. To change to the anewdir directory, type

cd anewdir

Now type ls. It won’t list anything, because you haven’t put any files in that directory yet. To get the full directory listing type

ls -art

Notice that it returns two entries, “..” and “.” In Unix, ./ is a shorthand that always refers to the current directory (you will see a context it is used in a bit when we discuss the find command), and ../ is a short hand for the parent directory. In the directory anewdir, type

cd ../

and then type ls. You will see the files in the original directory that you started from because you returned to the parent directory of anewdir. To print the full path name of the directory you are currently in, use the Unix pwd (“print working directory”) command. This command is very useful if you forget what directory you are in!

Now let’s talk about wildcards: The asterisk, *, is the Unix wildcard. In order to list all the files in your directory that are Word document files, you would type

ls *.doc

You can list all the files that have the phrase “new” in them by typing

ls *new*

Notice that it listed your anewdir directory.

To list the files in a directory other than your current directory type

ls <pathname of the directory>

If you want to find all Word document files anywhere in the directory tree under your current directory, you would use the Unix find command like this:

find . -name \*.doc -print

Recall that “.” refers to your current directory. The “\*” is a wildcard when using the find command, and -name and -print are options to the find command.  The find command is a very useful one to remember, because it is sometimes easy to forget what directory you have a particular file in.

To create an empty file, type

touch <your filename here>

To rename or move a file type

mv <old filename> <new filename>

(mv is the Unix move command).

To copy a file to another file, type

cp <first filename> <new filename;>

(cp is the Unix copy command).  Use “man cp” to get information on all of its many options.
To copy the contents of an entire directory (say, a directory called dir1) to another directory (say, a directory called dir2), go up the directory tree one level above dir1, and type

 cp -R dir1 dir2

Alternatively, you can use the full path names of the directory.

To delete a file, type

rm <the filename>

(rm is the Unix remove command). The rm command won’t work to remove directories without the -R option. Thus, to delete a directory, type

rm -R <directory name>

 

Pipes are a very useful aspect of Unix. The “|” character is used on the command line sandwiched between two unix commands; using |, the standard output of the command to the left of the pipe gets sent as standard input of the command to the right of the pipe. Pipes are particularly useful when used with the Unix grep command, which will print lines that contain text matching a pattern. For instance, typing

ls -F | grep /

will print the lines in the output of the ls -F command that contain the “/” character (ie; all the directories!).  It is a nice way to get a list of just the directories in the tree under your current directory

The Unix cat command is also a useful way to concatenate (join) two or more files. Say you had a bunch of data files corresponding to different years of data that looked something like data_1997.csv, data_1998.csv, data_1999.csv, etc, and you wanted to concatenated them into one long file called data_total.txt. You would use the command

cat data_*.csv > data_total.txt

The “>” is the Unix output re-direction command that puts the output of the cat command into the filename to the right of the “>”. If you didn’t re-direct the output, and just typed

cat data_*.csv

the output would go to the screen (ie; standard output).

You can also concatenate two or more files by listing them explicitly in the cat command like this

cat file1 file2 file3 > output_file

Before starting this module, all students in AML610 were asked to apply for an XSEDE user portal account by going to this link and filling in the form, and choosing a username.  Now I will show you how to use ssh to login to the user portal.  Type

ssh -l <your username> login.xsede.org

This also works

ssh <your username>@login.xsede.org

When you want to logoff, type

exit

To transfer a file from your current directory to a directory on another machine (for instance, a hypothetical machine with host name sunsystem.math.asu.edu) on which you have an account (for instance, with username stowers), type

scp <name of file> <your username>@login.xsede.org:<path name and file name on remote machine>

You will be prompted for your password on the remote machine, and once you have entered it successfully, the file transfer will take place.

To transfer a file from the remote machine to your current directory, type

scp <your username>@login.xsede.org:<pathname and filename on remote machine> .

Example (which won’t work for you, because you don’t know the password to my XSEDE portal account)

scp stowers@login.xsede.org:~/hess.R .

or, I could give the target file a different name

scp stowers@login.xsede.org:~/hess.R mynewfile.R

If you want to transfer an entire directory, you would use the -r option in scp, similar to the -R option in the cp command.

 

Another command that can be useful in unix is wget, that allows for download of files from the web.  For instance, if you are using unix with anything other than a mac, you can type in the command line

wget http://www.cdc.gov/flu/weekly/regions1997-1998/datafinalHHS/whoreg1.csv

The file whoreg1.csv will be downloaded to your working directory (this file contains CDC data on the confirmed influenza cases during the 1997-98 season in Department of Health and Human Services geographic region 1). On a mac, the curl command is more or less the equivalent to wget. Thus on a mac you would type

curl  http://www.cdc.gov/flu/weekly/regions1997-1998/datafinalHHS/whoreg1.csv -o whoreg1_1997_1998.txt

If you have to download a bunch of files (say, in this example, flu data corresponding to numerous years and numerous geographic region), a useful trick is to write an R script that uses the cat command to output the curl commands to a file (say, myget.txt). The R script wget_example.R gives an example of how to do this.
Then, after running the R script, in your working directory on the unix command line you would type

chmod +x myget.txt 
./myget.txt

The first command changes the file permission to make the myget.txt file executable, and the next command executes the commands within that file.

Leave a Reply