This week, Riinu, Steve, and Cameron are attending the annual RStudio Workshops (Tue-Wed), Conference (Thu-Fri), and the tidyverse developer day (Sat) in Austin, Texas.
We won’t even try to summarise everything we’re learning here, since the content is vast and the learning is very much hands on, but we will be posting a small selection of some take-aways in this blog.
We’re all attending different workshops (Machine Learning, Big Data, Markdown&Shiny).
Interesting take-away no. 1: terminology
Classification means categorical outcome variable.
Regression means continuous (numeric) outcome variable.
The is a bit confusing when using logistic regression – which by this definition is “classification”, rather than “regression”. But it is very common machine learning terminology, and makes sense considering the wide range of different methods used for classification (so not just regression).
Interesting take-away no. 2: library(parsnip)
The biggest strength of R is how many different packages (=extensions) it has. Basically, if you can think of a statistical or machine learning method, it’s probably implemented in R. This is because a lot of R users are also R developers – if you find a method that you really want to use, but that hasn’t been implemented yet, you can just go on and implement it youself. And then publish this new functionality as an R package than everyone can use.
However, this also means that different R packages sometimes do similar things using very different syntax. This is where the
parsnip packages comes to resque, providing a unified interface for using some of these modelling packages.
spec_lin_reg = linear_reg()
spec_lm = set_engine(spec_lin_reg, "lm")
spec_stan = set_engine(spec_lin_reg, "stan")
spec_spark = set_engine(spec_lin_reg, "spark")
Instead of figuring out the syntax for
lm() (basic linear regression model), and then for Stan, and Spark, and keras,
library(parsnip) provides us with a unified interface for all of these different methods for linear regression.
A fully working example can be found in the course materials (all publicly available):
Interesting take-away no 3: Communication by a new means
The Rmarkdown workshop raised two interesting points within the first few mintues of starting – how prevalent communication by html has become (i.e. the internet, use of interactive documents and apps to relay industry and research data to colleagues and the wider commmunity).
But, maybe more importantly, how little is understood by the general public and how it can be used relatively easily for impressive interactivity with few lines of code….followed by the question – how about that raise boss??
For example, how using the package plotly can add immediate interactivity following on from all the ggplot basics learnt at healthyR:
ggplot(data = Just_my_countries, aes(y = name)) +
geom_segment(aes(x = `1`, xend = `7`, y = name, yend = name)) +
geom_point(aes(x = `1`), color = "red", size = 5) +
geom_point(aes(x = `7`), color = "blue", size = 5) +
labs(x = "Price in USD")
And when you come across a website called “pimp my Rmarkdown” how can you not want to play!!!!
Interesting take-away no. 4: monitor progress with Viewer pane
Regular knitting, including at the start of an Rmd document to ensure any errors are highlighted early, is key. Your RMarkdown is a toddler who loves to misbehave. Previewing your document in a new window can take time and slow you down….
Frequent knitting into the Viewer pane can give you quick updates on how your code is behaving and identify bugs early!
The default in Rstudio loads your document into a new window when the Knit button is hit. A loading of a preview into the Viewer pane can be set as follows:
Tools tab > Global Options > RMarkdown > Set “Output preview in” to Viewer pane
Rmarkdown hack of the day: New chunk shortcut
Control + Alt + I
Cmd + Alt + I