Flow diagrams using the consort package in R

This post was originally published here

TLDR: library(consort) is a great package for creating CONSORT/patient flow diagrams in R. Thank you author Alim Dayim!
Jump to example code.
Documentation.
Introduction The easiest way to make a one-off diagram is using something with a graphical interface, such as Power Point, Omnigraffle, or Lucidchart, just to name a few.
If, however, you need something that updates automatically based on the underlying dataset changing, then a programmatical solution using R is possible.

Flow diagrams using the consort package in R

This post was originally published here

TLDR: library(consort) is a great package for creating CONSORT/patient flow diagrams in R. Thank you authors!
Jump to example code.
Documentation.
Introduction The easiest way to make a one-off diagram is using something with a graphical interface, such as Power Point, Omnigraffle, or Lucidchart, just to name a few.
If, however, you need something that updates automatically based on the underlying dataset changing, then a programmatical solution using R is possible.

HealthyR advent calendar

This post was originally published here

The HealthyR Advent Calendar 2022 was a series of 24 R tips I shared on Twitter last December
It is based on “R for Health Data Science” by Harrison and Pius. Use JKL20 for 20% off, including free worldwide shipping.
Here’s a selection of the most popular ones, all 24 can be fount at this website: https://healthyradvent.netlify.app/
More information about HealthyR, including the book and freely available resources can be found at: https://healthyr.

Creating a Quarto website served on Netlify

This post was originally published here

What is Quarto? Prerequisites 1. Create a new Quarto website project Troubleshooting 2. Edit your Quarto website 3. Add a page to your website 4. Add R code to your website 5. Serve your website using Netlify Optional: If want to keep the site for longer than 1h 6. Update your website Optional advanced: automatic deploys via GitHub I’ve put together a quick ‘getting started with Quarto and Netlify and GitHub (optional) workshop’.

ZOOMORPHIC ADVENTURES WITH DATA ANIMALS AT THE SURGICAL & CRITICAL CARE INFORMATICS AWAY DAY

If data were an animal, what would it be? 

Cath Montgomery, Medical Sociologist, writes about our recent exploration into pottery and data animals.

Often we think about data as an object: inert, manipulable, and something we control. We collect it, harvest it, scrape it, clean it, curate it, store it, share it, analyse it and display it. In these endeavours, we think about human agency and the work that we – as clinicians, statisticians, data scientists, sociologists – do to make sense of the world through quantified means. But what about the data themselves? What do they do? And what would it mean to give them agency? This is something that Science & Technology Studies scholars do routinely to underscore the ways in which the material world interacts with humans to create societal order. As a fun and playful way to think about some of the features of data that we identify with or relate to, we can ask, “If data were an animal, what would it be?”

After a full day and a half of talking about data at the strategy away day, it was time for people to get their hands dirty at Doodles pottery for some ‘team-building’. What better occasion to paint our own data animals! Everyone chose an item to paint: a mug, a jug, a bowl, and got to work dabbing, splatting, etching, and painting their designs. Creative activities like pottery painting are said to be good for team building because they nurture trust between colleagues; usually, everyone starts with minimal expertise, which is a good leveller, and everyone makes themselves a little bit vulnerable by putting their creations out into the world. This kind of activity also helps people get in touch with their inner  artist and the parts of their brain responsible for creativity, imagination and intuition. This is the birth place of data animals! 

If the description of data as inert, manipulable, something we control were sufficient, we might have seen a lot of domestic data animals – cats and dogs, rabbits and rodents. Of these, there were none. Instead, we had a zebra and a giraffe, centipedes and dragonflies, frogs, foxes, owls, a death butterfly and a skull. Certainly, it seems that data are not tame in this group’s collective imagination! 

So what did our data animals have to say about data? Riinu’s rainbow zebra shows the importance of reading between the lines; data analysis is not black and white and datasets are diverse, represented by the zebra’s rainbow stripes. Sarah’s giraffe represents the ability to use data for utilising resources that would otherwise be difficult to access (it’s also an animal in long-format). George’s frog follows an r-selection breeding strategy, otherwise known as an ‘r-strategist’: “this narrative is inspired by my approach to model selection – generating as many as one can sensibly think of and then witling them down using natural selection/data metric driven selection”.  Liz’s centipedes represent lots of quick-moving arms but overall, somewhat slow going; Annemarie’s death butterfly is superficially elegant and beautiful, but must be treated with respect as can be deadly if provoked or used badly. Ewen’s “ripped off owl jug” embodies imitation as the sincerest form of flattery: in data science, it is best to build on what has already been a success. Cath’s barn owl is a flash of light in the dark, but also eats other data animals for breakfast (sociologists of science and medicine can be a critical bunch).  Ian’s animal is deceased and only the skull remains: “being the oldest member of the group I have datasets dead and buried all over Scotland…but a little bit of “data mining” might resurrect some of them?”

So: from sex and death to work and the constant striving for resources, social benefit and success, the data animals have it all. It would be disingenuous to suggest that the explanations we wove to account for our creations preceded the act of painting them; nonetheless, the stories we tell about data are an important way in which we relate to the world and the work that we do to make sense of it through research. 

Making a Research Focus Wordcloud

Is it better to have a narrow or broad research focus? There are obviously pros/cons to both options (and arguably these aren’t mutually exclusive!), but it’s certainly an interesting thought posed in a recent tweet from @dnepo.

While I’m sure we all have a vague idea of where we sit on that spectrum of broad-narrow focus, there’s nothing like a bit of objective data (like a word cloud) to help us understand this better! While there are some online tools out there, R can make getting, cleaning, and displaying this data very easy and reproducible.

We will aim to cut down on the work required in collecting all your publication data by using google scholar – if you don’t have an account already, make one!

Firstly, we need 3 packages to achieve this:

  1. scholar: to download publications associated with your google scholar account.
  2. tidyverse: to clean and wrangle your publication data into the required format.
  3. wordcloud2: to generate a pretty wordcloud of your publication titles.
# install.packages(c("scholar", "wordcloud2"))
library(tidyverse); library(scholar); library(wordcloud2)

Secondly, we need to provide specific information to R to allow it to do the task.

  1. We need to get our Google Scholar ID from our account (look at the URL) to tell R where to download from (we’ll use mine as an example, but anyone’s can be used here).
  2. We want to tell R which words we can ignore because they’re just filler words or irrelevant (e.g. we don’t care how many times titles have “and” in them!). This is optional, but recommended!
gscholarid <- "MfGBD3EAAAAJ" # Kenneth McLean
remove <- c("and", "a","or", "in", "of", "on","an", "to", "the", "for", "with")

Finally, we can generate our word cloud! The code below is generic, so works for anyone so long as you supply the Google Scholar ID (“gscholarid”) and filler words (remove).

# Download dataframe of publications from Google Scholar
scholar::get_publications(id = gscholarid) %>%
  tibble::as_tibble() %>%
  
  # Do some basic cleaning of paper titles
  dplyr::mutate(title = stringr::str_to_lower(title),
                title = stringr::str_replace_all(title, ":|,|;|\\?", " "),
                title = stringr::str_remove_all(title, "\\(|\\)"),
                title = stringr::str_remove_all(title, "…"),
                title = stringr::str_remove_all(title, "\\."),
                title = stringr::str_squish(title)) %>%
  
  # Combine all text together then separate by spaces (" ")
  dplyr::summarise(word = paste(title, collapse = " ")) %>%
  tidyr::separate_rows(word, sep = " ") %>%
  
  # Count each unique word
  dplyr::group_by(word) %>%
  dplyr::summarise(freq = n()) %>%
  
  # Remove common filler words
  dplyr::filter(! (word %in% remove)) %>%
  
  # Put into descending order
  dplyr::arrange(-freq) %>%
  
  wordcloud2::wordcloud2()

And here we go! I think safe to say I’m surgical focussed, but quite a lot of different topics under that umbrella! Why not run the code here and figure out how your publications break down!

World map using the tidyverse (ggplot2) and an equal-area projection

This post was originally published here

There are several different ways to make maps in R, and I always have to look it up and figure this out again from previous examples that I’ve used. Today I had another look at what’s currently possible and what’s an easy way of making a world map in ggplot2 that doesn’t require fetching data from various places.
TLDR: Copy this code to plot a world map using the tidyverse:

Reshaping multiple variables into tidy data (wide to long)

This post was originally published here

There’s some explanation on what reshaping data in R means, why we do it, as well as the history, e.g., melt() vs gather() vs pivot_longer() in a previous post: New intuitive ways for reshaping data in R
That post shows how to reshape a single variable that had been recorded/entered across multiple different columns. But if multiple different variables are recorded over multiple different columns, then this is what you might want to do:

Setting up a simple one page website using Nicepage and Netlify

This post was originally published here

I’ve just set up a single page website (= online business card) for myself and my husband: https://pius.cloud/ . This post summarises what I did. If you’re looking to get started with something super quickly, then only the first two steps are essential (Creating a website and Serving a website).
Creating a website (using Nicepage) I’ve created websites using various tools such as straight up HTML, WordPress, Hugo+blogdown (this site – riinu.

HealthyR Online: Lockdown Learning

With news of the lockdown in March came the dawning reality that we wouldn’t be able to deliver our usual HealthyR 2.5 day quick start course in May.

The course is always over-subscribed so we were keen to find a solution rather than cancelling altogether.

HealthyR teaches the Notebook format which is already an online tool hosted by RStudio Cloud – so we knew that bit would work online. But what to do about getting attendees and tutors online, delivering lectures and offering interactive support with coding? Could we recreate our usual classroom environment online?

Never a group to shy away from a technical challenge, and with expertise in online education, we set about researching what online tools could be used.

After trying various options we went with Blackboard Collaborate to provide an online classroom, together with our usual RStudio Cloud to provide the Notebooks interface. Collaborate has a really nice feature of ‘break-out rooms’ where small groups can be assigned a separate online room with a tutor to work through exercises. The tutor can provide support and answer questions, using the screen share option to see exactly what each person might be having difficulty with.

After a few rehearsals to work out what roles to assign all our moderators and attendees, how to send people to the break rooms and recall them back to the main room we were set!

Ahead of the course, attendees were emailed the usual pre-course materials and a log in for their RStudio Cloud accounts, together with an invite to a Collaborate session for each of the 3 days. We split the 20 attendees who had confirmed attendance into groups of 5 and assigned one of our fantastic tutors to each group.

We also set up a an extra break out room with a dedicated tutor which could be used for anyone needing specific one-to-one help.

After the ice-breaker, ‘What’s a new thing you’ve done since lockdown?’ – everything from macrame to margaritas plus tie-dying and a lot of baking – the course got underway with the first lecture.

One or two delegates had some problems with internet connections, and the assigning of breakout rooms took a bit of getting used, but Riinu soon worked out an efficient system and the first coding exercises were underway!

We were delighted that the course received really positive feedback overall – none of us were sure this would work, but it did! The live coding sessions and pop quizzes were particularly popular.

We’ll definitely run HealthyR online again if the lockdown continues. Even after the lockdown, moving online widens access and offers the possibility for our international collaborators to join a course without having to travel.

Thank you to all our attendees who quickly adapted to the online format and to our amazing tutors, Tom, Kenny, Derek, Peter, Katie, Stephen, Michael and Ewen, who provided 3 days of their time to run the course, led as ever, by Riinu.

Course Feedback

Collaborate and RStudio Cloud worked very well for me. The breakout rooms were a nice touch to allow discussions.

Very well set-up, particularly considering the challenges of online teaching! Collaborate and RStudio made the course very accessible. Also a fantastic ratio of tutors to pupils and very clear explanations of key concepts in ’R’ languageand stats!

Clear and easy instructions. Worked seamlessly!

Teaching materials fantastic. In particular I thought linear and logistic
regressions were superbly well taught (as difficult to teach/understand). I think
I’ve now understand these for the first time having wasted loads of time reading
about them in the past!

This was a great course. I think in person would have allowed more interaction so I would still keep your original format available after this lockdown is over but well done on adapting and providing an excellent course.

Resources

https://healthyr.surgicalinformatics.org

All the HealthyR resources, including our new online book, are available for free on the HealthyR website