Teaching our REDCap basic anatomy, cancer classifications, and some common sense

This post was originally published here

With over 11,000 patients now entered into REDCap and more being entered every day we thought it would be a good time to reflect on some of the ways in which GlobalSurg 3 has been set up to help collaborators from around the world enter accurate and high-quality data.

With so many important things to know about each patient undergoing cancer surgery, the GlobalSurg team of nearly 3000 collaborators has been busy entering data into our secure database at redcap.globalsurg.org (REDCap is an amazing database software developed by Vanderbilt University). With over 750,000 values entered, it isn’t surprising that from time to time a mistake occurs whilst entering the data. This might be because the data entered onto a paper form was incorrect to begin with or it might be due to accidentally clicking on the wrong options when entering the data to REDCap. In some cases, the incorrect data might even appear in the notes if the surgeon or the anaesthetist has forgotten how to decide on the most appropriate ASA grade.

To try to help our collaborators identify cases when these mistakes may have happened we have taught our REDCap some basic anatomy, cancer classifications and some common sense so that it can alert collaborators to mistake mistakes as soon as they occur. The automatic alerts appear when a collaborator tries to save a page with incorrect data meaning that they can change it immediately when they still have access to the patient notes.

Our REDCap now knows 58 things about cancer surgery that it is using to help collaborators enter accurate data.

globalsurg_redcap_dq (1).png

Figure 1. Examples of our (a) paper data collection form, (b) REDCap interface, and (c) a data quality pop-up warning.

All 58 rules are given at the bottom of this post, but here are some examples:

  • Our REDCap knows some basic anaesthesiology:
    • As a collaborator if you have tried to enter a patient with diabetes mellitus and stated that they have an ASA grade of 1 Our REDCap should have informed you that diabetic patients should really have an ASA grade of 2 or more
  • Our REDCap knows the basics of anatomy:
    • It knows that if a patient had a total colectomy that they don’t have a colostomy
  • Our REDCap knows about TNM staging:
    • It knows that patients with an M score of M1 should also have an Essential TNM score of M+
  • Our REDCap also knows some common sense:
    • It knows that patients can’t have more involved lymph nodes in a specimen that the total number of lymph nodes in a specimen
    • It knows that a patient couldn’t have their operation before being admitted to the hospital

Our REDCap has been working tirelessly for several months to generate these alerts and help collaborators ensure their data is accurate. We hope that training REDCap to detect problems with the data will make the GlobalSurg 3 analysis more efficient and contribute to the accuracy of the data.

The final data entry deadline is 17th December so remember to upload all of your data before then. Our REDCap is ready and waiting to store and check the data.

View and download all of our Data Quality rules at github.com/SurgicalInformatics

Global map of country names

This post was originally published here

This post demonstrates the use of two very cool R packages – ggrepel and patchwork.

ggrepel deals with overlapping text labels (Code#1 at the bottom of this post):

patchwork is a very convenient new package for combining multiple different plots together (i.e. what we usually to use grid and gridExtra for).

More info:

To really demonstrate the power of them, let’s make a global map of country names using ggrepel:

library(tidyverse)
library(ggrepel)
library(patchwork)

# data from https://worldmap.harvard.edu/data/geonode:country_centroids_az8

orig_data = read_csv("country_centroids_az8.csv")


centroidsdata = orig_data %>% 
  select(country = admin, continent, lat = Latitude, lon = Longitude) %>% 
  filter(continent != "Seven seas (open ocean)" & continent != "Antarctica") %>% 
  mutate(continent  = fct_collapse(continent, "Americas" = c("North America", "South America")))

head(centroidsdata)
## # A tibble: 6 x 4
##   country     continent   lat   lon
##   <chr>       <fct>     <dbl> <dbl>
## 1 Aruba       Americas   12.5 -70.0
## 2 Afghanistan Asia       33.8  66.0
## 3 Angola      Africa    -12.3  17.5
## 4 Anguilla    Americas   18.2 -63.1
## 5 Albania     Europe     41.1  20.0
## 6 Aland       Europe     60.2  20.0
plot1 = centroidsdata %>% 
  ggplot(aes(x = lon, y = lat, label = country, colour = continent)) +
  geom_text_repel(segment.alpha = 0)   +
  theme_void() +
  scale_color_brewer(palette = "Dark2")

plot1

Now this is very good already with hardly any overlapping labels and the world is pretty recognisable. And really, you can make this plot with just 2 lines of code:

ggplot(centroidsdata, aes(x = lon, y = lat, label = country)) +
geom_text_repel(segment.alpha = 0)

So what these two lines make is already very amazing.

But I feel like Europe is a little bit misshapen and that the Caribbean and Africa are too close together. So I divided the world into regions (in this case same as continents except Russia is it’s own region – it’s just so big). Then wrote two functions that asked ggrepel to plot each region separately and use patchwork to patch each region together:

centroidsdata = centroidsdata %>% 
  mutate(region = continent %>% fct_expand("Russia")) %>% 
  mutate(region = replace(region, country == "Russia", "Russia"))

mapbounds = centroidsdata %>% 
  group_by(region) %>% 
  summarise(xmin = min(lon), xmax = max(lon), ymin = min(lat), ymax = max(lat))


create_labelmap = function(mydata, mycontinent, myforce = 1, mycolour = "black"){
  
  mymapbounds = mapbounds %>% 
    filter(region == mycontinent)
  
  mydata %>% 
    filter(region == mycontinent) %>% 
    ggplot(aes(x = lon, y = lat, label = country)) +
    geom_text_repel(segment.alpha = 0, force = myforce, colour = mycolour)   +
    theme_void() +
    theme(legend.position = "none") +
    scale_y_continuous(limits = c(mymapbounds$ymin, mymapbounds$ymax)) +
    scale_x_continuous(limits = c(mymapbounds$xmin, mymapbounds$xmax))
  
}

mycolours = RColorBrewer::brewer.pal(5,"Dark2")

make_world = function(mydata){
  mydata  = centroidsdata
  afr = create_labelmap(mydata, "Africa",   mycolour = mycolours[1])
  ame = create_labelmap(mydata, "Americas", mycolour = mycolours[4])
  asi = create_labelmap(mydata, "Asia",     mycolour = mycolours[2])
  eur = create_labelmap(mydata, "Europe",   mycolour = mycolours[3])
  rus = create_labelmap(mydata, "Russia",   mycolour = mycolours[3])
  oce = create_labelmap(mydata, "Oceania",  mycolour = mycolours[5])

  
  (ame + (eur / afr) + (rus / asi / oce)) + plot_layout(ncol = 3)
}

plot2 = make_world(centroidsdata)

plot2

This gives continents a much better shape, but it does severaly misplace Polynesia. See if you can find where, e.g., Tonga is and where it should be.

To see what I did with patchwork there, let’s add black borders to each region (Code#2):

Code#1:

#devtools::install_github("slowkow/ggrepel")
#devtools::install_github("thomasp85/patchwork")


library(tidyverse)
library(ggrepel)
library(patchwork)

mydata  = data_frame(x = c(1, 1.3), y = c(1, 1), mylabel = c("Point-1", "Point-2"))

p = mydata %>% 
  ggplot(aes(x, y, label = mylabel, colour = mylabel)) +
  geom_point() +
  coord_cartesian(xlim = c(-3, 3), ylim = c(-3, 3)) +
  theme_bw() +
  theme(legend.position = "none") +
  scale_colour_viridis_d()

plot1 = p + geom_text() + ggtitle("geom_text()")

plot2 = p+ geom_text_repel() + ggtitle("geom_text_repel()")

plot1 + plot2

Code#2:

create_labelmap = function(mydata, mycontinent, myforce = 1, mycolour = "black"){
  
  mymapbounds = mapbounds %>% 
    filter(region == mycontinent)
  
  mydata %>% 
    filter(region == mycontinent) %>% 
    ggplot(aes(x = lon, y = lat, label = country)) +
    geom_text_repel(segment.alpha = 0, force = myforce, colour = mycolour)   +
    theme_void() +
    theme(legend.position = "none") +
    scale_y_continuous(limits = c(mymapbounds$ymin, mymapbounds$ymax)) +
    scale_x_continuous(limits = c(mymapbounds$xmin, mymapbounds$xmax)) +
    theme(panel.border = element_rect(colour = "black", fill=NA, size=5))
  
}

plot3 = make_world(centroidsdata)

plot3

My data science toolbox

This post was originally published here

I’ve been doing data science for over 10 years now. Although most of this time I didn’t realise I was doing data science. I thought I was just doing normal science but focusing on simulations and data analysis, rather than field or lab work. I’ve switched fields a few times now- physics BSc, Chemistry PhD, now working in medical research. Therefore, instead of this lenghty introduction:

“I’m a physicist by background with substantial interdisciplinary expertise in simulations, data analysis, programming…”

I just go with:

I’m a data scientist.

Anyway, here’s how my toolbox and technical skills have evolved over the years:

My data science toolbox evolution.

P.S. Once a physicist, always a physicist.

Islay distilleries in 3 days

This post was originally published here

Day 0 (Sunday 18-February 2018)

Left Edinburgh at 8am for a 1pm ferry Kennacraig to Port Askaig (Islay). Edinburgh-Kennacraig should be a 3.5h drive (and it was), but we left early to allow for any delays on the road. Arrived on Islay at 3pm and our accommodation near Port Ellen (southern Islay, close to to Ardbeg, Lagavulin, Laphroiaig) was a 40 min drive from the port.

Map of Islay with all its lovely distilleries.

Map of Islay with all its lovely distilleries.

Day 1 (Monday 19-February 2018): Ardbeg, Lagavulin, Laphroiaig

We hadn’t booked anything other than the ferry and accommodation. February is very low season so we were right to think that no other advance bookings were necessary.

We had a lazy morning and drove to Laphroaig at about 11am. We asked which tours or tasting events were on that day and booked Einar onto the Layers of Laphroaig tasting at 3pm (as the driver, I was allowed to accompany him for free). We then drove to Lagavulin (just a few miles from Laphroaig) and booked us onto the tour at 1pm. We then drove to Ardbeg (another few miles) and had second breakfast at their cafe. Then drove back to Lagavulin for the tour, and then back to Laphroaig for the testing.

Ardbeg’s epic cafe.

Ardbeg’s epic cafe.

Waiting for the tour to begin at Lagavulin’s homey tasting room.

Waiting for the tour to begin at Lagavulin’s homey tasting room.

In Laphroiaig’s tasting room: The Layers of Laphroiaig introduced whiskies from different casks that make up their range of malts. These include ex-bourbon, virgin oak (I did not know Scotch could be matured in virgin casks - I thought it always had to be ex-something!), ex-sherry, ex-port. We were the only ones booked on this so it ended up being a private tasting.

In Laphroiaig’s tasting room: The Layers of Laphroiaig introduced whiskies from different casks that make up their range of malts. These include ex-bourbon, virgin oak (I did not know Scotch could be matured in virgin casks – I thought it always had to be ex-something!), ex-sherry, ex-port. We were the only ones booked on this so it ended up being a private tasting.

Day 2 (Tuesday 20-Febaruary 2018): Kilchoman, Bruichladdich

Einar drove us to Kilchoman where I had a tasting of their 3 limited edition malts in the visitor centre. Kilchoman is a “farm-distillery” and they even grow some of their own barley. We bought a bottle of their “100% Islay” which is made from barley grown at the premises. Unfortunately, we completely forgot to take any pictures there. Must go back.

Driving on Islay.

Driving on Islay.

We then went to Bruichladdich and booked me on the Warehouse Experience at 2pm. Simiarly to Laphroaig, the driver was allowed to accompany for free. We had lunch at Port Charlotte while waiting for the event.

Bruichladdich warehouse experience

Bruichladdich warehouse experience

We then went by Bowmore (it was nearly 5pm) and asked about the different tours and experiences they have on the next day. Decided to do the “Bottle Your Own in the Vaults” first thing on Wednesday morning.

Day 3 (Wednesday 21-February 2018): Bowmore, Bunnahabhain, Ardnahoe, Caol Ila

Bottling a 17-year-old sherry cask beauty at Bowmore.

Bottling a 17-year-old sherry cask beauty at Bowmore.

We then dropped by Bunnahabhain – no tours were running that but we were offered a few free tasters at the shop. On our way back from Bunnahabhain we took a picture at Ardanahoe (a new distillery that opens any day now).

Visiting Bunnahabhain and stopping at soon to be opened Ardnahoe.

Visiting Bunnahabhain and stopping at soon to be opened Ardnahoe.

The final distillery was Caol Ila where we went on the standard tour. The view in the stills room was just out of this world. They didn’t allow us to take pictures inside, so I took this from their website:

Caol Ila stills with a view of the Isle of Jura. Picture from: https://www.malts.com/en-row/distilleries/caol-ila/

Caol Ila stills with a view of the Isle of Jura. Picture from: https://www.malts.com/en-row/distilleries/caol-ila/

Me outside Caol Ila with Jura in the background

Me outside Caol Ila with Jura in the background

What we brought back with us

In addition to whisky distilleries, we also visited a nano-brewery, and it turns out that The Botanist (a gin) is made at Bruichladdich.

In addition to whisky distilleries, we also visited a nano-brewery, and it turns out that The Botanist (a gin) is made at Bruichladdich.

Converting old WordPress posts to Hugo

This post was originally published here

Between 2014-2018 I published 29 posts on riinudata.wordpress.com. Today I’m converting all of those to my new website powered by blogdown-Hugo.

Step 1

Read the Migration: From WordPress chapter of the blogdown book.

Step 2

Get all your wordpress posts into one XML: WP Admin – Tools – Export.

Step 3

Install Exitwp and its dependencies (pyyamp, beautifulsoup4, html2text):

git clone https://github.com/thomasf/exitwp.git
sudo easy_install pip
sudo pip install pyyaml
sudo pip install beautifulsoup4
sudo pip install html2text

This worked on macOS1 High Sierra – I already had python installed.

Step 4

Working in the directory that git clone created (exitwp):

  • Put the WordPress XML in the wordpress-xml directory.
  • Run xmllint riinu_wordpress.xml, worked the first time for me and I didn’t get any errors (so not sure what the fix errors if there are would entail).
  • Back in the exitwp folder, run python exitwp.py
  • This created folders build/jekyll/riinudata.wordpress.com/_posts and the content looked like this:
  • Move all these into exitwp/post folder.

Step 5

  • Take a copy of https://github.com/yihui/oldblog_xml/blob/master/convert.R to clean these .markdown files up and ready for Hugo. I edited the first three lines, skipped the “Do not run if…” chunk as I’d already done that in Step 3, edited the authors = c(), did not run the very last chunk (local({if (!dir.exist...})).
  • Move all of the files (now .md) into content/post of your blogdown repo. Build and voila!

Further modifications

Looks like most of my posts were converted like a charm, with nicely formatted code blocks and images. But I few things I noticed that I think I have to fix:

  • GitHub gists are now displayed as links, will make those into code blocks (or embed them using a Hugo shortcodes.
  • Most images show up perfectly, but some have gotten stuck in a code block, e.g. showing up as <img src="https://surgicalinformatics.org/wp-content/uploads/2018/02/rplot.png" alt="Rplot"/>. Will sort these

Overall I feared a lot worse and am super happy with the conversion experience. Took exactly 3 h.

My name is Hildegard and I approve this message.

My name is Hildegard and I approve this message.


  1. I’m only 1.5 years late to discover that OS X has been rebranded as macOS: https://www.wired.com/2016/06/apple-os-x-dead-long-live-macos/

Hello world: blogdown loves Hugo

This post was originally published here

We are live!

I wrote my last blog post on WordPress on 20-October 2017 and promised myself this was the last time. I’ve been blogging on WordPress since 2014 and the more I used it the more painful it got! This is most likely caused by the fact that I have been thrifting further and further away from point-and-click interfaces anyway…oh and discovering MARKDOWN.

My two rules:

So I finally got round to creating a blogdown-Hugo site:

Hugo is a website generator that is code-based (no more dragging around those pesky WordPress elements); blogdown is an R package that will help you generate Hugo, Jekyll, or Hexo sites, especially if you will be including R Markdown in it.

Steps on 12-February 2018:

  • Created a new blogdown project on RStudio, set kakawait/hugo-tranquilpeak-theme as the theme
  • Edited my name, email etc. information in the config.toml.
  • Absolutely could not figure out how to change coverImage = "cover.jpg". Tried putting my cover image in /static/img/, /static/_images/, source/assets/images and tried linking to these any way I could think of (e.g. with and without the first /) but it just wasn’t happening. Ended up putting my picture in /themes/hugo-tranquilpeak-theme/static/images/ and blatantly naming it cover.jpg (replacing the theme’s default photo). This worked.
  • Pushed the whole project to https://github.com/riinuots/hugo-tranquil-website and then created a submobule in https://github.com/riinuots/hugo-tranquil-website/tree/master/themes so when the theme gets updated I can pull the new version. This is not essential. I need to figure out the cover image issue though.
  • Set up Netlify as in https://bookdown.org/yihui/blogdown/netlify.html which was superquick but then spent some time troubleshooting why my theme wasn’t displaying properly. Turns out that for this theme, it is essential to set the baseURL = "https://riinu.netlify.com/" (in config.toml).
  • Created this Hello World post which seemed to work fine at first. I then added an unquoted semicolon to the title, broke everything and spent 2 h trying to figure out what went wrong. These were the errors I was getting and that no-one else in the world (Google) seemed to have reported:
    • edits to the new post not happening, but the site isn’t broken either
    • clean_site() errors with:

rmarkdown::clean_site() Error in file.exists(files) : invalid 'file' argument

  • after spending 2h on Google/github/rstudio/rmarkdown, blogdown book, blogdown repo, Hugo documentation, I finally came across hugo -v (v for verbose). Noticed

yaml: line 1: mapping values are not allowed in this context

(which I had indeed seen before at some point during these 2 hours). Anyway, seeing it for the second time clicked – markdown thinks I’m mapping something that shouldn’t be mapped (mapping usually means defining variables). My title was (second line of the markdown file, really) title: Hello world: blogdown loves Hugo, but if using a semicolon you need quotes: title: "Hello world: blogdown loves Hugo".

Still better than WordPress.

Next steps:

  • Set up Disqus (comments).
  • Bring over old posts from https://riinudata.wordpress.com
  • Write all the new posts ideas I’ve been gathering over the past 4 months.

Your first Shiny app

This post was originally published here

What is Shiny?

Shiny is an R package (install.packages("shiny")) for making your outputs interactive. Furthermore, Shiny creates web apps meaning your work can be shared online with people who don’t use R. In other words: with Shiny, R people can make websites without ever learning Javascript etc.

I am completely obsessed with Shiny and these days I end up presenting most of my work in a Shiny app.

If it’s not worth putting in a Shiny app it’s not worth doing.

Your first Shiny app

Getting started with Shiny is actually a lot easier than a lot of people make it out to be. So I created a very short (9 slides) presentation outlining my 5-step programme for your first Shiny app.

This is the app: https://riinu.shinyapps.io/shiny_testing/

This is the presentation: http://rpubs.com/riinu/shiny

And here are the steps (also included in the presentation):

STEP 0: install.packages("shiny"). Use RStudio.

STEP 1: Create a script called app.R using this skeleton:

https://gist.github.com/riinuots/c6ec0691633df2929adc7de90bdbc792

STEP 2: Copy your plot code into the renderPlot function.

STEP 3: Add a sliderInput to your User Interface (ui). A slider is just one of the many Shiny widgets you could be using: https://shiny.rstudio.com/gallery/widget-gallery.html

STEP 4: Tell your Server you wish the dplyr::filter() to use the value from the slider. All inputs from the User Interface (ui) are stored in input$variable_name: replace the 2007 with input$year.

STEP 5 (optional): Add animate = TRUE.

Press Control+Shift+Enter or the “Run App” button. You now have a Shiny app running on your computer. To deploy it to the internet, e.g. like I’ve done in the link above, see this.

Handling your .bib file (LaTex bibliography)

This post was originally published here

To create a .bib file that only includes the citations you used in the manuscript:

bibexport -o extracted_file.bib manuscript.aux

There are a few issues with this though. The command bibexport comes with the installation of TexLive, but my Windows computer (bless) does not cooperate (“bibexport is not recognised as an internal or external command…”) . So I can only use it on my Mac (luv ya).

Get data from ggplot()

This post was originally published here

ggplot includes built in and seamless functionality that summarises your data before plotting it. As shown in the example below, ggplot_build() can be used to access the summarised dataset.

summarised_barplot

fill         y count prop x PANEL group    ...
#D7301F 0.2147239    35    1 1     1     4 ...
#FC8D59 0.6871166    77    1 1     1     3 ...
#FDCC8A 0.9570552    44    1 1     1     2 ...
#FEF0D9 1.0000000     7    1 1     1     1 ...
#D7301F 0.1696429    38    1 2     1     8 ...
#FC8D59 0.6116071    99    1 2     1     7 ...
...