Part 3


Crosstabulating nominal variables

The standard function for crosstabs is xtabs(). It takes as arguments:

∼ ResponseFactor + ExplanatoryFactor, data

So, in order to crosstabulate, in our “relatives” data, the text genre (as explanatory factor) and the choice of that as relativizer (as response), we would use:

xtabs(~ genre + that, data = relatives)

There are two utility functions that my prove helpful: rowPerc() and colPerc(). They are part of the tigerstats package.

# install.packages('tigerstats')

require(tigerstats)
## Loading required package: tigerstats
## Loading required package: abd
## Loading required package: nlme
## Loading required package: lattice
## Loading required package: grid
## Loading required package: mosaic
## Loading required package: dplyr
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:nlme':
## 
##     collapse
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Loading required package: ggplot2
## Loading required package: mosaicData
## Loading required package: Matrix
## 
## The 'mosaic' package masks several functions from core packages in order to add additional features.  
## The original behavior of these functions should not be affected by this.
## 
## Attaching package: 'mosaic'
## The following object is masked from 'package:Matrix':
## 
##     mean
## The following objects are masked from 'package:dplyr':
## 
##     count, do, tally
## The following objects are masked from 'package:stats':
## 
##     binom.test, cor, cov, D, fivenum, IQR, median, prop.test,
##     quantile, sd, t.test, var
## The following objects are masked from 'package:base':
## 
##     max, mean, min, prod, range, sample, sum
SexSeat <- xtabs(~sex+seat, data=m111survey)

print(SexSeat)
##         seat
## sex      1_front 2_middle 3_back
##   female      19       16      5
##   male         8       16      7
rowPerc(SexSeat)
##         seat
## sex      1_front 2_middle 3_back  Total
##   female   47.50    40.00  12.50 100.00
##   male     25.81    51.61  22.58 100.00
colPerc(SexSeat)
##         seat
## sex      1_front 2_middle 3_back
##   female   70.37       50  41.67
##   male     29.63       50  58.33
##   Total   100.00      100 100.00

Regular barplot

The table that xtabs() outputs is pretty useful in itself. To make a barplot, we use (surprise!) the barplot() function. It takes an x and a y as arguments - but alternatively, it can also work with just a 2D table that it then visualizes.

So, let’s re-use our xtabs call from above and assign it to a variable name (I will use x), and then plot it:

xtabs(~ genre + that, data = relatives)  -> x

barplot(x)

Take a moment to call the help on barplot(). It will show you all the different options you can specify. Remember: to call help on a function, you just need to precede the function name with a ?.

Now make the same plot as you just did, but specify a main title.

Releveling a factor

If we would like, as is often done, to arrange our bars by decreasing height, we’d need to relevel the genre factor. I will give you a theoretical example of releveling below, and let you figure out how to apply it to the genre factor in our relatives dataset:

# install.packages('sciplot')

# Compare:

library(sciplot)
fac<-rep(c("a","b","c"),2)
response=c(1:6)
bargraph.CI(response=response, x.factor=fac)

# With:
newfac<-factor(fac, levels=c("b","c","a"))
bargraph.CI(response=response, x.factor=newfac)


Stacked percentage bar chart

For a percentage bar chart, we need to give R more data than just an x and y. It requires a matrix of data.

mydata <- data.frame(
    row.names =c(100, 200, 300, 400, 500),
    Male =c(68.33333, 53.33333, 70, 70, 61.66667),
    Female =c(31.66667, 46.66667, 30, 30, 38.33333))

x <- barplot(t(as.matrix(mydata)), col=c("yellow", "green"), 
    legend=TRUE, border=NA, xlim=c(0,8), args.legend=
        list(bty="n", border=NA), 
    ylab="Cumulative percentage", xlab="Village number")

There is another function that is helpful in adding text to an existing chart: text(). If we want to label our bars, we can use this code – beginning with the production of the plot itself as above:

mydata <- data.frame(
    row.names =c(100, 200, 300, 400, 500),
    Male =c(68.33333, 53.33333, 70, 70, 61.66667),
    Female =c(31.66667, 46.66667, 30, 30, 38.33333))

x <- barplot(t(as.matrix(mydata)), col=c("yellow", "green"), 
    legend=TRUE, border=NA, xlim=c(0,8), args.legend=
        list(bty="n", border=NA), 
    ylab="Cumulative percentage", xlab="Village number")

text(x, mydata$Male-10, labels=round(mydata$Male), col="black")
text(x, mydata$Male+10, labels=100-round(mydata$Male))