In the Arizona State University AML610 course “Computational and Statistical Methods in Applied Mathematics”, we will be ultimately be using super computing resources at ASU and the NSF XSEDE initiative to fit the parameters of a biological model to data. To do this, it is necessary to know basic Unix commands to copy, rename, and delete files and directories, and how to list directories and locate files. We will also be compiling all our C++ programs from the Unix shell, and in the command line directing the output of our programs to files.
A Unix shell is a command-line interpreter that provides an interface to the Unix operating system. There are a few different kinds of Unix shells (like bash and C shell, for instance), but for the kinds of simple things we’ll be doing it frankly doesn’t matter what shell you are using.
One of the most useful things to know in any operating system is how to get help. In Unix, the
man <insert name of Unix command here>
command (short for “manual”) will provide help for a Unix command. Try typing
man man
which will give you the help page on the man command itself, and the options it takes (options to Unix commands begin with a dash, -). To scroll down in the help page, press the “f” key (for “forward”). To scroll up, press the “b” key (for “backward”). To exit, press the “q” key. Notice in reading about all the options for the man command that the command
man -k <insert name of topic here>
will suggest to you a list of Unix commands and system files related to that topic. Try typing
man -k copy
Look at all the commands and system files it returns! To scroll through the list, use the “f” and “b” keys like before.
One of the most used Unix commands is “ls” to list the contents of a directory (“ls” is Unix shorthand for “list”). Try typing
ls
on the command line right now. You will get a list of files in your current directory. To find out more about the options of the ls command, type
man ls
The options I tend to use the most are the -l option (which gives the long format of a directory listing that includes the file sizes and modification dates, amongst other things), -t which sorts the directory list by modification date, and -r which reverse sorts. Thus, if you type
ls -art
(note that you can string a bunch of command options together after the “-“) you will get a full list of the files in your directory, sorted from earliest modification date to most recent. Another useful option of the ls command is -F, which will put a “/” after the name of all directories in the directory tree under your current directory.
If you want to make a new directory under your directory tree, you use the mkdir command. Try typing
mkdir anewdir
and then type
ls -F
You should see a directory called anewdir in your file list.
The Unix command “cd” changes directories. To change to the anewdir directory, type
cd anewdir
Now type ls. It won’t list anything, because you haven’t put any files in that directory yet. To get the full directory listing type
ls -art
Notice that it returns two entries, “..” and “.” In Unix, ./ is a shorthand that always refers to the current directory (you will see a context it is used in a bit when we discuss the find command), and ../ is a short hand for the parent directory. In the directory anewdir, type
cd ../
and then type ls. You will see the files in the original directory that you started from because you returned to the parent directory of anewdir. To print the full path name of the directory you are currently in, use the Unix pwd (“print working directory”) command. This command is very useful if you forget what directory you are in!
Now let’s talk about wildcards: The asterisk, *, is the Unix wildcard. In order to list all the files in your directory that are Word document files, you would type
ls *.doc
You can list all the files that have the phrase “new” in them by typing
ls *new*
Notice that it listed your anewdir directory.
To list the files in a directory other than your current directory type
ls <pathname of the directory>
If you want to find all Word document files anywhere in the directory tree under your current directory, you would use the Unix find command like this:
find . -name \*.doc -print
Recall that “.” refers to your current directory. The “\*” is a wildcard when using the find command, and -name and -print are options to the find command. The find command is a very useful one to remember, because it is sometimes easy to forget what directory you have a particular file in.
To create an empty file, type
touch <your filename here>
To rename or move a file type
mv <old filename> <new filename>
(mv is the Unix move command).
To copy a file to another file, type
cp <first filename> <new filename;>
(cp is the Unix copy command). Use “man cp” to get information on all of its many options.
To copy the contents of an entire directory (say, a directory called dir1) to another directory (say, a directory called dir2), go up the directory tree one level above dir1, and type
cp -R dir1 dir2
Alternatively, you can use the full path names of the directory.
To delete a file, type
rm <the filename>
(rm is the Unix remove command). The rm command won’t work to remove directories without the -R option. Thus, to delete a directory, type
rm -R <directory name>
Pipes are a very useful aspect of Unix. The “|” character is used on the command line sandwiched between two unix commands; using |, the standard output of the command to the left of the pipe gets sent as standard input of the command to the right of the pipe. Pipes are particularly useful when used with the Unix grep command, which will print lines that contain text matching a pattern. For instance, typing
ls -F | grep /
will print the lines in the output of the ls -F command that contain the “/” character (ie; all the directories!). It is a nice way to get a list of just the directories in the tree under your current directory
The Unix cat command is also a useful way to concatenate (join) two or more files. Say you had a bunch of data files corresponding to different years of data that looked something like data_1997.csv, data_1998.csv, data_1999.csv, etc, and you wanted to concatenated them into one long file called data_total.txt. You would use the command
cat data_*.csv > data_total.txt
The “>” is the Unix output re-direction command that puts the output of the cat command into the filename to the right of the “>”. If you didn’t re-direct the output, and just typed
cat data_*.csv
the output would go to the screen (ie; standard output).
You can also concatenate two or more files by listing them explicitly in the cat command like this
cat file1 file2 file3 > output_file
Before starting this module, all students in AML610 were asked to apply for an XSEDE user portal account by going to this link and filling in the form, and choosing a username. Now I will show you how to use ssh to login to the user portal. Type
ssh -l <your username> login.xsede.org
This also works
ssh <your username>@login.xsede.org
When you want to logoff, type
exit
To transfer a file from your current directory to a directory on another machine (for instance, a hypothetical machine with host name sunsystem.math.asu.edu) on which you have an account (for instance, with username stowers), type
scp <name of file> <your username>@login.xsede.org:<path name and file name on remote machine>
You will be prompted for your password on the remote machine, and once you have entered it successfully, the file transfer will take place.
To transfer a file from the remote machine to your current directory, type
scp <your username>@login.xsede.org:<pathname and filename on remote machine> .
Example (which won’t work for you, because you don’t know the password to my XSEDE portal account)
scp stowers@login.xsede.org:~/hess.R .
or, I could give the target file a different name
scp stowers@login.xsede.org:~/hess.R mynewfile.R
If you want to transfer an entire directory, you would use the -r option in scp, similar to the -R option in the cp command.
Another command that can be useful in unix is wget, that allows for download of files from the web. For instance, if you are using unix with anything other than a mac, you can type in the command line
wget http://www.cdc.gov/flu/weekly/regions1997-1998/datafinalHHS/whoreg1.csv
The file whoreg1.csv will be downloaded to your working directory (this file contains CDC data on the confirmed influenza cases during the 1997-98 season in Department of Health and Human Services geographic region 1). On a mac, the curl command is more or less the equivalent to wget. Thus on a mac you would type
curl http://www.cdc.gov/flu/weekly/regions1997-1998/datafinalHHS/whoreg1.csv -o whoreg1_1997_1998.txt
If you have to download a bunch of files (say, in this example, flu data corresponding to numerous years and numerous geographic region), a useful trick is to write an R script that uses the cat command to output the curl commands to a file (say, myget.txt). The R script wget_example.R gives an example of how to do this.
Then, after running the R script, in your working directory on the unix command line you would type
chmod +x myget.txt ./myget.txt
The first command changes the file permission to make the myget.txt file executable, and the next command executes the commands within that file.