Making a Research Focus Wordcloud

Is it better to have a narrow or broad research focus? There are obviously pros/cons to both options (and arguably these aren’t mutually exclusive!), but it’s certainly an interesting thought posed in a recent tweet from @dnepo.

While I’m sure we all have a vague idea of where we sit on that spectrum of broad-narrow focus, there’s nothing like a bit of objective data (like a word cloud) to help us understand this better! While there are some online tools out there, R can make getting, cleaning, and displaying this data very easy and reproducible.

We will aim to cut down on the work required in collecting all your publication data by using google scholar – if you don’t have an account already, make one!

Firstly, we need 3 packages to achieve this:

  1. scholar: to download publications associated with your google scholar account.
  2. tidyverse: to clean and wrangle your publication data into the required format.
  3. wordcloud2: to generate a pretty wordcloud of your publication titles.
# install.packages(c("scholar", "wordcloud2"))
library(tidyverse); library(scholar); library(wordcloud2)

Secondly, we need to provide specific information to R to allow it to do the task.

  1. We need to get our Google Scholar ID from our account (look at the URL) to tell R where to download from (we’ll use mine as an example, but anyone’s can be used here).
  2. We want to tell R which words we can ignore because they’re just filler words or irrelevant (e.g. we don’t care how many times titles have “and” in them!). This is optional, but recommended!
gscholarid <- "MfGBD3EAAAAJ" # Kenneth McLean
remove <- c("and", "a","or", "in", "of", "on","an", "to", "the", "for", "with")

Finally, we can generate our word cloud! The code below is generic, so works for anyone so long as you supply the Google Scholar ID (“gscholarid”) and filler words (remove).

# Download dataframe of publications from Google Scholar
scholar::get_publications(id = gscholarid) %>%
  tibble::as_tibble() %>%
  # Do some basic cleaning of paper titles
  dplyr::mutate(title = stringr::str_to_lower(title),
                title = stringr::str_replace_all(title, ":|,|;|\\?", " "),
                title = stringr::str_remove_all(title, "\\(|\\)"),
                title = stringr::str_remove_all(title, "…"),
                title = stringr::str_remove_all(title, "\\."),
                title = stringr::str_squish(title)) %>%
  # Combine all text together then separate by spaces (" ")
  dplyr::summarise(word = paste(title, collapse = " ")) %>%
  tidyr::separate_rows(word, sep = " ") %>%
  # Count each unique word
  dplyr::group_by(word) %>%
  dplyr::summarise(freq = n()) %>%
  # Remove common filler words
  dplyr::filter(! (word %in% remove)) %>%
  # Put into descending order
  dplyr::arrange(-freq) %>%

And here we go! I think safe to say I’m surgical focussed, but quite a lot of different topics under that umbrella! Why not run the code here and figure out how your publications break down!

Leave a Reply

Your email address will not be published. Required fields are marked *