streamR
There are tens of thousands of software packages available that expand the capacities of R, and make them very specific to certain purposes. If you are a nuclear physicist studying, I don’t know, neutrons in space, R will have your back: there is probably a package that helps you do exactly the statistics that you need to do.
One such specialized package is streamR
. It is designed specifically to let you download Twitter data, and then load it into R.
Here is how you load a package in R:
library(streamR)
Try that on your computer.
Didn’t work? Presumably, this is because you haven’t installed the package yet.
So let’s install it:
install.packages("streamR") # remember to use double quotes
Does it look like it worked? Good. Now try loading it again.
library(streamR) # remember to use no quotes
## Loading required package: RCurl
## Loading required package: bitops
## Loading required package: rjson
Above, you see what your R console should be showing you – basically showing that the package was loaded.
Before we do any harvesting, we need to (a) have a Twitter account and (b) get friendly with the interface that Twitter provides for developers: with its “API”" (application program interface).
In preparation for this workshop, I asked you to register as a developer with Twitter and do this:
Just kidding, we’re not quite ready. There’s one more thing: we first need to load one more package, and chances are you’ll need to install it (see above).
library(ROAuth)
(After installing ROAuth
don’t forget to run the library(ROAutho)
command again.)
Once you have streamR and ROAuth installed and loaded, copy, paste, adapt and run this code.
requestURL <- "https://api.twitter.com/oauth/request_token"
accessURL <- "https://api.twitter.com/oauth/access_token"
authURL <- "https://api.twitter.com/oauth/authorize"
consumerKey <- "YOUR CONSUMER KEY"
consumerSecret <- "YOUR CONSUMER SECRET"
my_oauth <- OAuthFactory$new(consumerKey=consumerKey,
consumerSecret=consumerSecret,
requestURL=requestURL,
accessURL=accessURL, authURL=authURL)
my_oauth$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
The last command will open a browser window which asks you to approve of it, then shows a PIN. You enter the PIN into R. Then you should be ready to use the commands in the streamR
package.
Use a command like this:
tweets <- try(filterStream("filename.json",
locations = c(-180, -90, 180, 90),
timeout = 45,
oauth = my_oauth))
filename
is the name you assign to the file in which your downloaded tweets will be stored. You can call it alfred, mildred, thomas, or whatever you wish, but be sure to end the name in .json
. And enclose the name in “”.
location
is a bounding box of latitude-longitude value pairs. In this case, the values cover the whole globe. In other words, it just downloads all geolocated tweets. If you wanted to download all tweets that your account receives, even if they aren’t geotagged, then set location
to NULL
.
timeout
stands for the time that R will be collecting tweets, in seconds. If set to 0
, it will continue indefinitely.
If you should be looking to study specific lexical items, then use the track
argument (consult the streamR
documentation for details).
To restrict your search to just tweets in English, use language = "en"
as an argument.
For the following exercises, be sure to have the streamR
documentation open.
Use the filterStream()
command provided above and adjust the collection time to 2 min 30 sec, then run it.
Design another filterStream()
command, this time, tracking specifically the key phrases “dumb ass” and “big ass”. Since you are looking for more than one phrase, you will need to provide these as a vector, which should have this format:
c("", "")
Run this call as well, restricting it to 2:30 just like in exercise 1.
Design and run a third call. This time, restrict the search to the Republic of Ireland.
Remember that Google Maps and the Twitter API work with opposite sequencing of latitude and longitude values…
Include a track
argument of your own choosing as well.