ASU AML 610 Module IX: Introduction to C++ for computational epidemiologists

After going through this module, students should be familiar with basic skills in C++ programming, including the structure of a basic program, variable types, scope, functions (and function overloading), control structures, and the standard template library.

So far in this course we have used R to explore methods related to fitting model parameters to data (in particular, we explored the Simplex method for parameter estimation).  As we’ve shown, parameter estimation can be a very computationally intensive process.

When you use R, it gives you a prompt, and waits for you to input commands, either directly through the command line, or through an R script that you source.  Because R is a non-compiled language, and instead interprets code step-by-step, it does not have the ability to optimize calculations by pre-processing the code.

In contrast, compiled programming languages like C, java, or C++ (to name just a few) use a compiler to process the code, and optimize the computational algorithms.  In fact, most compilers have optional arguments related to the level of optimization you desire (with the downside that the optimization process can be computationally intensive).  Optimized code runs faster than non-optimized code.

For the purposes of this course, we could just as easily use C, java or C++, but in this course I will be teaching you how to write relatively simple programs in C++, which I prefer to C because C++ has a very useful library called the Standard Template Library (more on that later… I love the STL, and we will be heavily using it in this course).  However, everything I teach you how to do in this course in C++ can also be done in pretty much all other compiled programming languages (perhaps not quite as prettily at times).

I certainly will not be covering all aspects of the C++ programming language in this course.  For instance, I will not discuss pointers, and we will only touch upon the subject of objects.  For those who want to extensively use C++ in computational epidemiology after this course, I recommend taking a full one semester course in C++.

Hello world

Let’s begin our introduction to C++ with a “hello world” program. Download hello.cpp to your working directory (C++ programs normally end in .cpp), and type

g++ hello.cpp -o hello

The argument after the -o option is the name of the compiled program (if you don’t provide the -o option, the program gets put into a.out).  (note that if you don’t have the g++ compiler on your machine, you should have the cc compiler)
Now type

./hello

And you will see “Hello,world!” printed to your terminal.

Let’s take a look at the program

# include <iostream>

int main(){
   std::cout << "Hello, world!\n";
   return 0;
}

First of all, note that comments in C++ begin with //.  If you want to comment out an entire block of code you can put a /* at the beginning of the block, and a */ at the end, and the compiler will ignore all text in between.

#include <iostream>
Note that lines beginning with a hash sign are not comments in C++ like they are in R! Lines beginning with a hash sign (#) in C++ are directives for the pre-processor. They are not regular code lines with expressions but indications for the compiler’s pre-processor. In this case the directive #include <iostream> tells the pre-processor to include the iostream standard file. This specific file (iostream) includes the declarations of the basic standard input-output library in C++, and it is included because its functionality is going to be used later in the program. There are 50 different modules of the standard library in C++, and you can find a list of them here.  In particular, the <vector> library is also referred to as the standard template library, and we will be using that a lot in upcoming lectures.

int main ()
This line corresponds to the beginning of the definition of the main function. The main function is the point by where all C++ programs start their execution, independently of its location within the source code. It does not matter whether there are other functions with other names defined before or after it – the instructions contained within this function’s definition will always be the first ones to be executed in any C++ program. For that same reason, it is essential that all C++ programs have a main function.The word main is followed in the code by a pair of parentheses (()). That is because it is a function declaration: In C++, what differentiates a function declaration from other types of expressions are these parentheses that follow its name. Optionally, these parentheses may enclose a list of parameters within them.Right after these parentheses we can find the body of the main function enclosed in braces ({}). What is contained within these braces is what the function does when it is executed.

std::cout << "Hello, world!\n";
This line is a C++ statement. A statement is a simple or compound expression that can actually produce some effect. In fact, this statement performs the only action that generates a visible effect in our first program.cout is the name of the standard output stream in C++, and the meaning of the entire statement is to insert a sequence of characters (in this case the
Hello Worldsequence of characters) into the standard output stream (cout, which usually corresponds to the screen).cout is declared in the iostream standard file within the std namespace, which is why we refer to it as std::cout (the letters before the :: denote the name of the namespace).The “<<” is what is known as the insertion operator; it directs what comes after it to the standard output.

Notice that each code statement ends with a semicolon character (;). This character is used to mark the end of the statement and in fact it must be included at the end of all expression statements in all C++ programs (one of the most common syntax errors is to forget to include the semicolon after a statement).

Your first C++ program bug (deliberately added)

Edit hello.cpp, and remove the semicolon at the end of the cout line.  Now try to compile the program.  The compiler should complain at you, and give you some indication of the line of the program where the problem lies (usually the bug will lie somewhere within the near vicinity of the line the compiler points out… not always on the exact line).

Slightly simplifying the Hello World program

If we add a line to the Hello World program just before main() that says “using namespace std;”the C++ compiler will automatically look for functions (like cout) in the C++ standard library. This way, instead of using std::cout throughout the program, you can just put cout.  Like so:

# include <iostream>
using namespace std;
int main(){
   cout << "Hello, world!\n";
}

Variable types in C++

Variable names in C++ consist of one or more letters, digits, or the underscore (_) character.  No other characters are allowed.  In addition, variable names have to begin with a letter.

Just like in R, variable names in C++ are case sensitive.

There are a number of reserved keywords in C++ that are forbidden as variable names. These are: asm, auto, bool, break, case, catch, char, class, const, const_cast, continue, default, delete, do, double, dynamic_cast, else, enum, explicit, export, extern, false, float, for, friend, goto, if, inline, int, long, mutable, namespace, new, operator, private, protected, public, register, reinterpret_cast, return, short, signed, sizeof, static, static_cast, struct, switch, template, this, throw, true, try, typedef, typeid, typename, union, unsigned, using, virtual, void, volatile, wchar_t, while

There are several different fundamental data types in C++;  the ones we will most commonly use in this course are int, bool, float, and double.

In order to use a variable in C++, we must first declare it specifying which data type we want it to be. The syntax to declare a new variable is to write the specifier of the desired data type (like int, bool, float…) followed by a valid variable identifier. For example:

  int a;
  float mynumber;

These are two valid declarations of variables. The first one declares a variable of type int with the identifier a. The second one declares a variable of type float with the identifier mynumber. Once declared, the variables a and mynumber can be used within the rest of their scope in the program (more on scope in a minute).

If you are going to declare more than one variable of the same type, you can declare all of them in a single statement by separating their identifiers with commas. For example:

  int a,b,c;

The cmath library contains the usual set of functions needed to do mathematical calculations:abs acos asin atan atan2 ceil cos cosh exp fabs floor fmod frexp ldexp log log10 modf pow sin sinh sqrt tan tanh

The program arith.cpp gives examples of variable declarations and arithmetic operations.

#include <iostream>
#include <cmath>
using namespace std;

int main (){
  int a=0,b=-1; // can fill with initial values
  int result;

  double c,d;   // don't need to fill with initial values if you don't want to
  double result_double;

  //******************************************************************************
  // now do integer arithmetic, double arithmetic, and mixed integer double
  // arithmetic
  //******************************************************************************
  cout << "\n";
  cout << "\n";
  cout << "Do calculations with integers, and store the result in an integer\n";
  a = 6;
  b = 2;
  a = a + 1;
  cout << "a is integer " << a << "\n";
  cout << "b is integer " << b << "\n";
  result = b/a;
  cout << "b/a " << result << "\n";
  result = a/b;
  cout << "a/b " << result << "\n";

  cout << "\n";
  cout << "Do calculations with doubles, and store the result in a double\n";
  c = 7;
  d = 2;
  cout << "c is double " << c << "\n";
  cout << "d is double " << d << "\n";
  result_double = d/c;
  cout << "d/c " << result_double << "\n";
  result_double = c/d;
  cout << "c/d " << result_double << "\n";

  cout << "\n";
  cout << "Do calculations with mixed integers and doubles, and store the result in a double\n";
  cout << "\n";
  cout << "c is double  " << c << "\n";
  cout << "b is integer " << b << "\n";
  result_double = b/c;
  cout << "b/c " << result_double << "\n";
  result_double = c/b;
  cout << "c/b " << result_double << "\n";

  cout << "\n";
  cout << "Do calculations with mixed integers and doubles, and store the result in an integer\n";
  cout << "a is integer " << a << "\n";
  cout << "d is double  " << d << "\n";
  result = d/a;
  cout << "d/a " << result << "\n";
  result = a/d;
  cout << "a/d " << result << "\n";

  //******************************************************************************
  // try some built-in math functions in cmath
  //******************************************************************************
  cout << "\n";
  cout << "Log of -a is  " << log(-a) << endl;
  cout << "Log of 0 is   " << log(0) << endl;
  cout << "Log of +a is  " << log(+a) << endl;
  cout << "Pi is         " << acos(-1.0) << endl;
  cout << "Sin of +a is  " << sin(+a) << endl;
  cout << "c^2 is        " << pow(+c,2) << endl;
  cout << "c^1.5 is      " << pow(+c,1.5) << endl;
  cout << "a modulo b is " << a%b << endl;           // won't compile with anything other than integers
  cout << "c modulo d is " << fmod(c,d) << endl;       
  cout << "exp(c) is     " << exp(c) << endl;
  cout << "abs(log(1/a)) " << abs(log(1/a)) << endl; // Unexpected result?
  cout << "abs(log(1/c)) " << abs(log(1/c)) << endl;

  cout << "\n";

  return 0;
}

++ , –, +=, -= are special operators in C++ that increment of decrement an integer.  if int i = 10, i++ will increment it to be equal to 11.  If int i=10, i+=5 will increment it to be equal to 15.

If used as a prefix, ++i will increment the value of i before it is used in the remainder of the expression.  If used as a suffix, i++ will increment the value of i after it is used in the remainder of the expression.

Examples:

int x = 5;
int y = ++x; // x is now equal to 6, and 6 is assigned to y
int x = 5;
int y = x++; // x is now equal to 6, and 5 is assigned to y

 


Strings

In order to use strings within a C++ program, you need to include the C++ string standard library (this code is in string.cpp):

#include <iostream>
#include <string>
using namespace std;

int main (){
  string mystring = "An example of a string";
  cout << mystring << "\n";
  string mystringb = "Another example of a string";
  string mystringc = mystringb+"!"; // concatenate two strings
  cout << mystringc << "\n";
  return 0;
}

 

Boolean logic

C++ has a bool type, that can be used in Boolean arithmetic, as demonstrated in bool.cpp

#include <iostream>
using namespace std;

int main (){
  int a=0,b=-1; // can fill with initial values
  int result;

  bool check = (a>=b);
  cout << "\n";
  cout << "check that 0 is greater than or equal to -1: " << check << endl; // if true, a 1 will be output, 0 otherwise
  bool checkb = !check;
  cout << "checkb = not(check): " << checkb << endl; 

  cout << "check or checkb:     " << bool(check||checkb) << endl;
  cout << "check and checkb:    " << bool(check&&checkb) << endl;
  cout << "!(check and checkb): " << bool(!(check&&checkb)) << endl;
  cout << "!check or ! checkb:  " << bool(!check||!checkb) << endl;
  cout << "\n";

  return 0;
}

 

Scope of variables in C++

All the variables that we intend to use in a program must have been declared with its type specifier in an earlier point in the code, like we did in the previous code at the beginning of the body of the function main when we declared that a, b, and result were of type int.

A variable can be either of global or local scope. A global variable is a variable declared in the main body of the source code, outside all functions, while a local variable is one declared within the body of a function or a block. Just like in R, a block of code in C++ is enclosed within curly braces.

Local scope:   A variable name declared within a block is accessible only within that block and blocks enclosed by it, and only after the point of declaration. The C++ program scope.cpp demonstrates scoping.

// from  http://en.wikibooks.org/wiki/C%2B%2B_Programming/Scope/Examples
#include <iostream>

using namespace std;  /* outermost level of scope starts here */

int i=10;

int main(){           /* next level of scope starts here */
  cout << i << endl;
  int i;
  i = 5;
  cout << i << endl;
  {                   /* next level of scope starts here */
    cout << i << endl;
    int j,i;
    j = 1;
    i = 0;
    cout << i << endl;

    {                 /* innermost level of scope of this program starts here */
      cout << i << endl;
      int k, i;
      i = -1;
      j = 6;
      k = 2;
      cout << i << endl;
    }                 /* innermost level of scope of this program ends here */

  }                   /* next level of scope ends here */

  cout << i << endl;
  return 0;
}                     /* next and outermost levels of scope end here */

A brief introduction to functions in C++

Functions (also called methods) in C++ allow us to structure our programs in a more modular way, accessing all the potential that structured programming can offer to us in C++.

A function is a group of statements that is executed when it is called from some point of the program. The following is its format:

type name ( parameter1, parameter2, ...) {statements }

where:

  • type is the data type specifier of the data returned by the function.
  • name is the identifier by which it will be possible to call the function.
  • parameters (as many as needed): Each parameter consists of a data type specifier followed by an identifier, like any regular variable declaration (for example: int x) and which acts within the function as a regular local variable. They allow to pass arguments to the function when it is called. The different parameters are separated by commas.
  • statements is the function’s body. It is a block of statements surrounded by braces { }.

Here is an example of a function (function_example.cpp)


#include <iostream>
using namespace std;

int subtraction(int a, int b){
  int r;
  cout << "a is " << a << endl;
  cout << "b is " << b << endl;
  r=a-b;
  return (r);
}

int main (){
  int z;
  cout << endl;
  z = subtraction(2,-1.5);
  cout << "The result is " << z << endl;
  cout << endl;
  return 0;
}

Arguments to main

As described above, near the beginning of this model, main() is a function.  This means that we can pass arguments to it (arguments_to_main.cpp).

#include <iostream>

using namespace std;

int main(int argc, char **argv){
  cout << "The name of the program is: " << argv[0] << endl;
  cout << "The arguments passed to the program are\n";
  for (int n=1; n<argc; n++){
    cout << n << ": " << argv[n] << '\n';
  }
  return 0;
}

Arguments to main can be a nice way to pass info to a program running in batch.  I’ve used it before to set the random seed of jobs running in batch with the batch job number (passed to main) and the time stamp (in seconds from Jan 1, 1970).  This program can be found in set_random_seed.cpp

#include <cmath>
#include <string>
#include <iostream>

using namespace std;
int main(int argc, char **argv){

  int iranda = unsigned(time(0)); // seconds since Jan 1st, 1970
  int irandb = iranda + atoi(argv[1]);
  cout << iranda << " " << irandb << endl;
  srand(irandb);//set the random seed
  return 0;
}

The source of Jan 1,1970 as the beginning of history as we know it, is thought to be attributed to this guy.

 

 C++ control structures

Just like pretty much any other computing language (compiled or un-compiled), C++ has a suite of control structures that allow it to repeat sections of code (for/while loops), or make decisions (if statements).

If/then/else statements

If statements in C++ can either be one liners, like this:

if (x == 1) cout << "x is 1";

or you can enclose multiple lines of code in braces:

if (y == 1){
   cout << "y is " << y << endl;
   y++;
}

We can also do if/then/else statements:

if (y > 0){
  cout << "y is positive";
}else if (y < 0){
  cout << "y is negative";
}else{
  cout << "y is 0";
}

 

While loops

While loops are one way to repeat sections of code. while_example.cpp is a program that plays a guessing game, using a while loop.  At the beginning of the loop the program evaluates the logical statement, and performs the code within the loop if the logical is true.

#include <iostream>
#include <cstdlib>
using namespace std;

int main (){
  //*************************************************************
  // first generate a uniform random number between 0 and 1
  // set the random seed with the time stamp
  //*************************************************************
  int iranda = unsigned(time(0)); // seconds since Jan 1st, 1970
  srand(iranda); // set the random seed
  int secret_number = rand()%101; // rand() returns random #'s from 0 to RAND_MAX

  cout << "I am thinking of a number between 0 to 100... try to guess it!\n";

  int n;
  cout << "Enter your starting guess\n";
  cin >> n;

  while (n!=secret_number) {
    if (n>secret_number){
      cout << "You guessed too high.  Guess again!\n";
    }else{
      cout << "You guessed too low.  Guess again!\n";
    }
    cin >> n;
  }
  cout << "You guessed it!\n";

  return 0;
}

 

For loops

For loops also allow you to repeat sections of code.  The format of a for loop is

for ( init; condition; increment ){
   <your code>;
}
  • The init step is executed first, and only once. This step allows you to declare and initialize any loop control variables. You are not required to put a statement here, as long as a semicolon appears.
  • Next, the condition is evaluated. If it is true, the body of the loop is executed. If it is false, the body of the loop does not execute and flow of control jumps to the next statement just after the for loop.
  • After the body of the for loop executes, the flow of control jumps back up to the increment statement. This statement allows you to update any loop control variables. This statement can be left blank, as long as a semicolon appears after the condition.
  • The condition is now evaluated again. If it is true, the loop executes and the process repeats itself (body of loop, then increment step, and then again condition). After the condition becomes false, the for loop terminates.

An example:

#include <iostream>
using namespace std;

int main (){
   for (int a=0; a<20;a = a++){
       cout << "The value of a is: " << a << endl;
   }
   return 0;
}

What would I put in the increment if I wanted a to go in steps of 3?

Functions: pass by value

In C++ it is possible to pass the values of variables to a function by value, which means that the function can manipulate the variables any way it wants, but it does not change the values of those variables in the main block of code that called the function.  Consider the following program example_fun_by_value.cpp:

#include <iostream>
using namespace std;

int subtraction(int a, int b){
  int r;
  cout << "a is " << a << endl;
  cout << "b is " << b << endl;
  r=a-b;
  a++;
  b++;
  cout << "new a is " << a << endl;   
  cout << "new b is " << b << endl;
  return (r);
}

int main (){
  int z;
  int a = 2;
  int b = 3;
  cout << endl;
  z = subtraction(a,b);
  cout << "(a-b) is " << z << endl;
  cout << "The value of a in the main program is " << a << endl;
  cout << "The value of b in the main program is " << b << endl;
  cout << endl;
  return 0;
}

The subtraction() function will not change the values of a and b outside the scope of that function, because a and b were passed by value.

Functions: pass by reference

In contrast, C++ also allows you to pass variables to functions by reference, which means that the function can change those values, and the change will be passed back to the code which called the function.  Consider the following code in example_fun_by_ref.cpp  In this case, the subtraction() function has variable a passed by value, and variable b passed by reference.  The changes to variable b within subtraction() are visible in main().  Passing by reference saves memory, because a copy is not made.

#include <iostream>
using namespace std;

int subtraction(int  a   // a is passed by value
               ,int& b){ // b is passed by reference
  int r;
  cout << "a is " << a << endl;
  cout << "b is " << b << endl;
  r=a-b;
  a++;
  b++;
  cout << "new a is " << a << endl;
  cout << "new b is " << b << endl;
  return (r);
}

int main (){
  int z;
  int a = 2;
  int b = 3;

  cout << endl;
  z = subtraction(a,b);
  cout << "(a-b) is " << z << endl;
  cout << "The value of a in the main program is " << a << endl;
  cout << "The value of b in the main program is " << b << endl;

  cout << endl;
  return 0;
}

 

Functions: overloading

Function overlaoding is a feature of C++ that allows us to create multiple functions with the same name, so long as they have different parameters.  Consider the following program in function_overload.cpp  It creates 4 different versions of a subroutine myprint() that prints to standard output what is passed to it.  The different versions take int, double, float, and string.  Note that if you just pass a floating point number, it assumes the default type is double.  (also note the “void” type of the functions)

#include <iostream>
#include <string>
using namespace std;

void myprint(int i) {
  cout << " Printing out a int " << i << endl;
}
void myprint(double  d) {
  cout << " Printing out a double " << d << endl;
}
void myprint(float  f) {
  cout << " Printing out a float " << f << endl;
}

void myprint(string a) {
  cout << " Printing out a string " << a << endl;
}

int main() {
  myprint(610);
  myprint(610.00);
  myprint("AML610");
  float f=610.00;
  myprint(f);
}

 The standard template library: vectors

A very useful C++ standard library is the the vector library that allows you to manipulate vector array objects not so differently than in R.  Consider the following program (example_vector.cpp):

#include <iostream>
#include <vector>   // need this to use vectors
#include <numeric>  // need this for various operations on vectors, like summing
using namespace std;

//************************************************************************
//************************************************************************
//************************************************************************
int main (){
  vector<double> vt;  // creates an empty vector
  cout << endl;
  cout << "The size of the vt vector is " << vt.size() << endl;
  cout << endl;

  double delta_t = 1.0;
  double t = 0.0;
  while(t<(2.0)){
     vt.push_back(t);  // push_back() appends to the end of the vector
     t = t + delta_t;
  }
  vt.push_back(-2.0);

  cout << "The size of the vt vector is " << vt.size() << endl;
  cout << "vt is: " << endl;
  for (int i=0;i<vt.size();i++){ // indexing of vectors starts at 0
     cout << vt[i] << " ";
  }
  cout << endl;
  cout << endl;

  double sumvt = accumulate(vt.begin(),vt.end(),0);
  cout << "The sum of vt is " << sumvt << endl;
  cout << endl;

  int iind = max_element(vt.begin(),vt.end())-vt.begin();
  cout << "The index and value of the maximum is "
       << iind << " " << vt[iind] << endl;

  cout << endl;

  int jind = min_element(vt.begin(),vt.end())-vt.begin();
  cout << "The index and value of the minimum is "
       << jind << " " << vt[jind] << endl;
  cout << endl;

  vector<double> vtb;
  vtb = vt;
  transform(vt.begin()
           ,vt.end()
           ,vtb.begin()
           ,bind2nd(multiplies<double>(), 2.0)
           );
  cout << "vt*2 is: " << endl;
  for (int i=0;i<vtb.size();i++){
     cout << vtb[i] << " ";
  }
  cout << endl;
  cout << endl;

  vector<double> vtc;
  vtc = vt;
  vtc.insert(vtc.end(),vtb.begin(),vtb.end());
  cout << "vtc=(vt*2 appended to vt) is: " << endl;
  for (int i=0;i<vtc.size();i++){
     cout << vtc[i] << " ";
  }
  cout << endl;
  cout << endl;

  // now remove the third element from vtc
  vtc.erase(vtc.begin()+2);
  cout << "vtc with the third element removed is: " << endl;
  for (int i=0;i<vtc.size();i++){
     cout << vtc[i] << " ";
  }
  cout << endl;
  cout << endl;

  // now erase all elements from vt
  vt.clear();
  cout << "The length of the cleared vt vector is " << vt.size() << endl << endl;

  return 0;
}

 

Vectors in multi-dimensions

The standard template library allows you to make vectors of vectors (and vectors of vectors of vectors…), enabling multi-dimensional arrays and even ragged arrays. Consider the following code (example_multid.cpp):

#include <iostream>
#include <vector>   // need this to use vectors
using namespace std;

//************************************************************************
//************************************************************************
//************************************************************************
int main (){
  vector< vector <double> > mvt;  // creates an empty 2D vector

  // create a ragged arrary of 10 rows
  cout << endl;
  for (int i=0;i<10;i++){
     vector<double> v;
     cout << "Row " << i+1 << " of the mvt vector: ";
     for (int j=0;j<=i;j++){
        v.push_back(j);
        cout << j << " ";
     }
     cout << endl;
     mvt.push_back(v);
  }
  cout << endl;

  cout << "The number of rows of the mvt vector is "
       << mvt.size() << endl;
  cout << "The length of the fourth row of the mvt vector is "
       << mvt[3].size() << endl;
  cout << "The value of the third row and second column is "
       << mvt[2][1] << endl;
  cout << endl;

  vector< vector <vector <double> > > mvt3d;  // creates an empty 3D vector
  mvt3d.push_back(mvt);
  mvt3d.push_back(mvt);
  mvt3d.push_back(mvt);
  cout << "The [2,5,3]th element of mvt3d is " << mvt3d[1][4][2] << endl;

  mvt.clear();
  mvt3d.clear();
  cout << endl;

  return 0;
}

 

 

Reading in data from comma delimited files

The following code gives an example of parsing comma delimited data, in this case from the file ozone_2000.csv

The program is in example_csv.cpp

#include <fstream>
#include <iostream>
#include <sstream>
#include <string>
#include <vector>
using namespace std;

int main(){
  ifstream infile("ozone_2000.csv");

  int icount = 0;
  vector<double> vozone;
  vector<int> vmonth;
  vector<int> vday;
  vector<int> vyear;
  while (infile){
    string myline;
    if (!getline(infile,myline)) break;
    //cout << myline << endl;
    istringstream ss(myline);

    int ifield = 0;
    while (ss){
      string myelement;
      if (!getline(ss,myelement, ',' )) break;
      ifield++;
      if (ifield==1&icount>0){
        //cout << "The date string is " << myelement << endl;
        //cout << "The length of the date string (including quotes) "
        //     << myelement.length() << endl;

        string mymonth = myelement.substr(1,2);
        string myday = myelement.substr(4,2);
        string myyear = myelement.substr(7,4);
        vmonth.push_back(atof(mymonth.c_str()));
        vday.push_back(atof(myday.c_str()));
        vyear.push_back(atof(myyear.c_str()));
      }
      if (ifield==4&icount>0){
        //cout << "The ozone reading string is " << myelement << endl;
        // remove the quotes from the string
        myelement.erase(remove(myelement.begin()
                              ,myelement.end(),'\"' )
                       ,myelement.end());
        double ozone = atof(myelement.c_str());
        vozone.push_back(ozone);

        //cout << "The ozone reading string (without qoutes) is "
        //     << myelement << endl;
        //cout << "The ozone reading is " << ozone << endl;

     }
    }

    icount++;
  }// end loop over lines in file

  for (int i=0;i<vozone.size();i++){
     cout << i+1       << " "
          << vmonth[i] << " "
          << vday[i]   << " "
          << vyear[i]  << " "
          << vozone[i] << endl;
  }

  if (!infile.eof()){
    cerr << "File not found!\n";
  }

  return 0;
}

 


Makefiles

“Make” is a UNIX utility that automatically builds executable programs from source code by reading files called makefiles, which have directives that specify how to compile the target program.

The Make utility detects which source files have changed since the last build, and only compiles the files that have changed.

To compile a program using Make, you have to create a makefile.  In the file makefile_csv, you will find a makefile to compile the example_csv.cpp file above.  To compile the program, type

make -f makefile_csv example_csv


Classes and Objects

Objects are one of the main thing that distinguish C++ from C (and from other non-object oriented languages like Fortran)
A class is a type that defines an object.  An object of a class is a variable type (just like int, or double).

Before you can use an object of a class in a program, you must first define what the class is (perhaps this isn’t surprising…)

Usually when you are looking at a class definition for a class that has already been defined, you don’t care too much about the source code, but you do want to know how to implement the class (just like when using R or matlab, you usually don’t care how the algorithms in the built-in functions are implemented, you only care about what they do).

Thus, in C++ it is good form to separate the class definition into a file called the “header” file (usually ending in a .h prefix), and the source code for the implementation of the class into a source code file (usually ending in a .cpp prefix).

Let’s look at an example of a class header file in C++, in this case, a class that contains methods related to rectangles.  The header file looks like this (in Rectangle.h):

class Rectangle {
  private:
    double _x, _y;

  public:
    void SetValues(double my_x
                  ,double my_y);
    double GetArea();
    double GetPerimeter();
};

The class name is Rectangle. Note that the class definition is enclosed in curly braces, that are terminated in a semicolon.  “private” members of a class are only accessible within methods that are defined within the class. “public” members of a class (either variables or methods) are accessible from anywhere an object of the class is in scope (examples in a moment).

The source code for the Rectangle class is in Rectangle.cpp

#include "Rectangle.h"
void Rectangle::SetValues(double my_x
                         ,double my_y){
   _x = my_x;
   _y = my_y;
}
double Rectangle::GetArea(){
   return (_x*_y);
}
double Rectangle::GetPerimeter(){
   return (2*_x+2*_y);
}

the .cpp file containing the class source code has to begin with an include directive to the preprocessor to include the header file for the class that contains the class definition.  Within the cpp file, the names of the class methods need to start with the class name followed by a ::

The file example_rectangle.cpp gives an example of how to implement the Rectangle class within a program.  It “instantiates” an object of the class Rectangle, with variable name myrectangle:

#include <iostream>
#include "Rectangle.h"

using namespace std;
int main(int argc, char **argv){
  Rectangle myrectangle;
  double x=3.0,y=4.0;

  cout << "the width and length are " << x << " " << y << endl;
  myrectangle.SetValues(x,y);
  cout << "The area is: " << myrectangle.GetArea() << endl;

  cout << "The perimeter is: " << myrectangle.GetPerimeter() << endl;

  return 0;

} // end program

The file makefile_rectangle is the makefile to first compile the Rectangle class source code, and then link the example_rectangle.cpp to it in order to compile example_rectangle.cpp

Rectangle.o : Rectangle.cpp Rectangle.h
        g++ -g -c Rectangle.cpp

example_rectangle : example_rectangle.cpp Rectangle.o
        g++ -g -o example_rectangle example_rectangle.cpp Rectangle.o

To compile example_rectangle.cpp type

make -f makefile_rectangle example_rectangle

A somewhat more complicated example

Examine the file Rectangle_b.h.  The file is similar to Rectangle.h except that the class definitions now contain definitions of the class constructor (in this case it takes two arguments), and the class destructor (in this case it takes no arguments).  The class destructor is the class name, preceded by ~.  The class constructor gets called when a class object is originally instantiated.  The destructor is the last method called just as your class object goes out of scope (for instance just as the main program ends if you had instantiated a class object within the main program).  In the destructor, if you have had to allocate memory within an object, the memory should be released when the class object goes out of scope to prevent memory leaks.  For instance, if you have filled vectors within the class, it is good form to .clear() them in the destructor.

class Rectangle_b {
  private:
    double _x, _y;

  public:
    Rectangle_b(double my_x
               ,double my_y);

    ~Rectangle_b();   // desctructor
    void SetValues(double my_x
                  ,double my_y);
    double GetArea();
    double GetPerimeter();
};

The file Rectangle_b.cpp contains the class definitions:

#include <iostream>
#include "Rectangle_b.h"

using namespace std;

Rectangle_b::Rectangle_b(double my_x
                        ,double my_y
                        ):_x(my_x)
                         ,_y(my_y){
}

Rectangle_b::~Rectangle_b(){
  cout << "Rectangle_b: here we are in the destructor " << endl;
}

void Rectangle_b::SetValues(double my_x
                           ,double my_y){
   _x = my_x;
   _y = my_y;
}

double Rectangle_b::GetArea(){
   return (_x*_y);
}

double Rectangle_b::GetPerimeter(){
   return (2*_x+2*_y);
}

The file example_rectangle_b.cpp is a program that implements the Rectangle_b class

#include <iostream>
#include "Rectangle_b.h"

using namespace std;
int main(int argc, char **argv){
  double x=3.0,y=4.0;
  cout << "the width and length are " << x << " " << y << endl;

  Rectangle_b myrectangleb(x,y);
  cout << "The area after using the custom constructor is: "
       << myrectangleb.GetArea() << endl;

  cout << "about to end the program..." << endl;
  return 0;

} // end program

the makefile_rectangle makefile also compiles the example_rectangle_b program.

UsefulUtils: a class with lots of Useful Utility methods

In the files UsefulUtils.h and UsefulUtils.cpp I have compiled a number of utility methods I’ve developed over the years for use in my own research.  Things like methods to randomly generate numbers drawn from the Normal, Poisson, Exponential, Uniform, etc probability distributions (the RandomNormal, RandomPoisson, RandomExponential and RandomUniform methods).

The RandomSample method selects k numbers out of a sequence of 1 to n without replacement.

There are methods in there to calculate the inverse of matrices (when the matrices are expressed as a vector<vector<double> > array), multiply two matrices, get the dot product of vectors, etc.

I’ve written C++ equivalents to R pnorm, ppois, and pchisq functions.

There are also methods to sum a STL vector (Sum), find the index of its minimum and maximum (WhichMin and WhichMax), and return the minimum or maximum (Min and Max). There is a method Seq that is the equivalent to R’s seq() method.  There are also methods to calculate the mean, std dev, variance of a vector.

I also have put in several methods useful for comparing models to data, like PearsonChisquared and likelihood functions.

In the C++ program useful_example.cpp I give examples of use of methods in UsefulUtils.  You can compile the program using the makefile_use makefile by typing

make -f makefile_use useful_example

To run the program type

./useful_example

If you run the program again, do the random numbers change? If not, why not?  How could we fix it?

Leave a Reply