Part 1

What is R?

R is a software that you interact with via a programming language. So, in short, you can say that R is a programming language.

As with any other programming language, the logic of interacting with R is that you give it input, and it gives you output. Or: you enter expressions and R evaluates them.


Who is Hadley Wickham?

Hadley is a statistician, and a regular man

Hadley

Hadley

with admirers who think he is from a higher sphere, and who live in the “Hadleyverse.”

Hadley as Obama poster

Hadley as Obama poster

Hadley grew up in New Zealand, and has the accent to prove it (look him up on YouTube). He now lives in Houston, TX, where he used to be a professor of statistics at Rice University. He’s not a regular professor anymore, because he doesn’t need to be. What did Hadley do to become this famous?

In everything he does, he tries to make R much, much more easy to use by writing packages that simplify core functions of the data science work flow.

Hadley wrote packages for R that have become very widely used, such as

  • ggplot2
  • dplyr
  • tidyr

and many others. He also wrote books about writing packages for R… they are all freely, legally available in pdf format online. - In addition, he is the “chief scientist” at RStudio.


Setting up a working environment

RStudio: script, console, data file(s), viewing options


Command line functionality: R as calculator

You can use R as a calculator - the most straightforward example of expression in - evaluation out.

What happens if you enter this in R (and then hit Return)?

4 + 2

Try to have R perform a few other arithmetic operations, e.g.

  • six times eleven
  • six point nine times eleven
  • two minus three
  • four divided by two

You must of course figure out how to communicate these mathematical operations to R in a way that it understands.


Assigning variables

The most fundamental functionality in computer programming is assigning variables. That is to say: linking content to placeholders. In the following example, x and y are variable names, and they are each assigned to numeric content.

x = 7
y = 2

Once you have assigned variables, the variables can be used just like their actual content. In other words: you can

  • add x to y,
  • divide y by x,
  • divide x by 2 and then add y,
  • etc.

Try those operations on your own.


Equal and double-equal

First, try this:

a = 3.5
b = 4
a + b

What happens when you do this?

a = b

and then add a + b?

What happened is that we assigned the value of b to a. That’s what = does.

There is, however, another equal sign: the “double equal” sign (==).

# FIRST TEST OF THE == SIGN
a = 9
b = 12
a + b
## [1] 21
a == b
## [1] FALSE
# SECOND TEST
a = b
a + b
## [1] 24
a == b
## [1] TRUE

Packages

There is a lot of functionality in “base R,” which is the list of functions that R understands when you simply run R from your Applications folder. But R is used in many different disciplines and areas, and so, new specialized uses come up every day. For these, users write specialized functions. And these functions can be made available through packages.

In order to use an existing package (or “library”), you need to

  1. download it,
  2. install it, and
  3. load it.

You can use RStudio’s GUI for 1. and 2., and either one of these functions for 3.:

library(packagename)

#or

require(packagename)

Try to call the help text for the function rowPerc(). You can call help on a function by simply entering the name of the function, preceded by one question mark:

?rowPerc

Presumably, you will get some sort of error message because your R doesn’t know the function yet. That’s because it is part of a specialized package that doesn’t come with base R.

So try to download, install, and load the tigerstats package. Once you have loaded it (step 3.), call help on rowPerc() again.