The standard function for crosstabs is xtabs()
. It takes as arguments:
∼ ResponseFactor + ExplanatoryFactor, data
So, in order to crosstabulate, in our “relatives” data, the text genre (as explanatory factor) and the choice of that as relativizer (as response), we would use:
xtabs(~ genre + that, data = relatives)
There are two utility functions that my prove helpful: rowPerc()
and colPerc()
. They are part of the tigerstats
package.
# install.packages('tigerstats')
require(tigerstats)
## Loading required package: tigerstats
## Loading required package: abd
## Loading required package: nlme
## Loading required package: lattice
## Loading required package: grid
## Loading required package: mosaic
## Loading required package: dplyr
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:nlme':
##
## collapse
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Loading required package: ggplot2
## Loading required package: mosaicData
## Loading required package: Matrix
##
## The 'mosaic' package masks several functions from core packages in order to add additional features.
## The original behavior of these functions should not be affected by this.
##
## Attaching package: 'mosaic'
## The following object is masked from 'package:Matrix':
##
## mean
## The following objects are masked from 'package:dplyr':
##
## count, do, tally
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cov, D, fivenum, IQR, median, prop.test,
## quantile, sd, t.test, var
## The following objects are masked from 'package:base':
##
## max, mean, min, prod, range, sample, sum
SexSeat <- xtabs(~sex+seat, data=m111survey)
print(SexSeat)
## seat
## sex 1_front 2_middle 3_back
## female 19 16 5
## male 8 16 7
rowPerc(SexSeat)
## seat
## sex 1_front 2_middle 3_back Total
## female 47.50 40.00 12.50 100.00
## male 25.81 51.61 22.58 100.00
colPerc(SexSeat)
## seat
## sex 1_front 2_middle 3_back
## female 70.37 50 41.67
## male 29.63 50 58.33
## Total 100.00 100 100.00
The table that xtabs()
outputs is pretty useful in itself. To make a barplot, we use (surprise!) the barplot()
function. It takes an x and a y as arguments - but alternatively, it can also work with just a 2D table that it then visualizes.
So, let’s re-use our xtabs
call from above and assign it to a variable name (I will use x), and then plot it:
xtabs(~ genre + that, data = relatives) -> x
barplot(x)
Take a moment to call the help on barplot()
. It will show you all the different options you can specify. Remember: to call help on a function, you just need to precede the function name with a ?.
Now make the same plot as you just did, but specify a main title.
If we would like, as is often done, to arrange our bars by decreasing height, we’d need to relevel the genre factor. I will give you a theoretical example of releveling below, and let you figure out how to apply it to the genre factor in our relatives dataset:
# install.packages('sciplot')
# Compare:
library(sciplot)
fac<-rep(c("a","b","c"),2)
response=c(1:6)
bargraph.CI(response=response, x.factor=fac)
# With:
newfac<-factor(fac, levels=c("b","c","a"))
bargraph.CI(response=response, x.factor=newfac)
For a percentage bar chart, we need to give R more data than just an x and y. It requires a matrix of data.
mydata <- data.frame(
row.names =c(100, 200, 300, 400, 500),
Male =c(68.33333, 53.33333, 70, 70, 61.66667),
Female =c(31.66667, 46.66667, 30, 30, 38.33333))
x <- barplot(t(as.matrix(mydata)), col=c("yellow", "green"),
legend=TRUE, border=NA, xlim=c(0,8), args.legend=
list(bty="n", border=NA),
ylab="Cumulative percentage", xlab="Village number")
There is another function that is helpful in adding text to an existing chart: text()
. If we want to label our bars, we can use this code – beginning with the production of the plot itself as above:
mydata <- data.frame(
row.names =c(100, 200, 300, 400, 500),
Male =c(68.33333, 53.33333, 70, 70, 61.66667),
Female =c(31.66667, 46.66667, 30, 30, 38.33333))
x <- barplot(t(as.matrix(mydata)), col=c("yellow", "green"),
legend=TRUE, border=NA, xlim=c(0,8), args.legend=
list(bty="n", border=NA),
ylab="Cumulative percentage", xlab="Village number")
text(x, mydata$Male-10, labels=round(mydata$Male), col="black")
text(x, mydata$Male+10, labels=100-round(mydata$Male))