## How long did my R script run?

## Adding space between rows in LaTex tables

By default, LaTex tables are very tight:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
usepackage{booktabs} begin{table}[] centering caption{My caption} label{my-label} begin{tabular}{@{}lll@{}} toprule Rows & Column 1 & Column 2 midrule Row 1 & 1234 & 2345 Row 2 & 3456 & 4567 Row 3 & 5678 & 6789 Row 4 & 7890 & 8901 Row 5 & 9012 & 10000 bottomrule end{tabular} end{table} |

Adding this to the document preamble will add space between the rows:

1 2 |
renewcommand{arraystretch}{1.7} |

And this command can be used to add space between rows manually:

1 2 |
vspace{1cm} |

## My minimal LaTex preamble

My minimal example:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
documentclass[a4paper]{article} %%% FIGURES AND TABLES %%%% usepackage{graphicx} %gives the includegraphics[width=0.5textwidth]{my_image} %%% PAGE AND TEXT SET-UP %%%% usepackage{fullpage} %gets rids of the wide default borders renewcommand{baselinestretch}{1.5} %space between lines begin{document} Hello hello hello end{document} And then one that is not so minimal, but still pretty basic and useful: documentclass[a4paper]{article} %%% FIGURES AND TABLES %%%% usepackage{graphicx} %gives the includegraphics[width=0.5textwidth]{my_image} usepackage{booktabs} %for nicer tables usepackage{tabu} %advanced control over tables renewcommand{thetable}{Sarabic{table}} %if this is supplement (this numbers figures as S1, S2...), comment out if main renewcommand{thefigure}{Sarabic{figure}} %if this is supplement, replace S with A if Appendix %%% SPECIAL CHARACTERS %%%% usepackage{amsmath} % amsmath provides extra maths symbols newcommand{degree}{ensuremath{^circ}} %for some reason I can not find a degree symbol from other packages or the packages I do find it from clash with some others usepackage{times} %these packages will make texttildelow look normal usepackage{textcomp} %%% REFERENCES $$$ usepackage{natbib} %references as citet (textual) or citep (parenthetical) %%% PAGE AND TEXT SET-UP %%%% usepackage{fullpage} %gets rids of the wide default borders usepackage{caption} captionsetup[table]{skip=10pt} %this adds space between the table caption and the table itself renewcommand{baselinestretch}{1.5} %space between lines begin{document} Hello hello hello bibliographystyle{apalike} bibliography{mybibfile.bib} end{document} |

## How to “increase” array resolution in R (replicate each element both column-wise and row-wise)

One picture says more than a thousand words. You have what is one the left, and you want what is on the right.

1 2 3 4 5 6 |
my_matrix = matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow=3) #matrix is a 2D array, this next row creates a third dimension, #duplicating the data my_array = array(my_matrix, dim = c(3,3,2)) |

There are a few different ways to do this, but by far the cleanest and quickest way is to just select the rows and columns multiple times, by replicating row and column numbers (instead of actually replicating each element):

1 2 3 4 5 6 |
#2D: increased_matrix = my_matrix[rep(1:nrow(my_matrix), each=3), rep(1:ncol(my_matrix, each=3)] #3D (same really, just one extra comma for the third dimension): increased_array = my_array[rep(1:nrow(my_array), each=3), rep(1:ncol(my_array, each=3), ] |

Note that by default, in rep(something, n) the n is **times** so equivalent to rep(something, times=n), but in this case we need to use **each** instead of **times**.

## Cut a time period from netCDF with nco

## Converting R Markdown to Latex

Install Pandoc: http://pandoc.org/

1 2 3 4 |
library(knitr) knit('report.Rmd') #This creates 'report.md' |

Open the Terminal, Command Prompt (search for cmd) or Windows Powershell, go to the folder and do:

`pandoc -s report.md -o report.tex`

And that’s it!

(Read this, if you want vector images.)

## Reordering factor levels in R and what could go wrong

I’ve recently started using ggplot2 in addition to lattice (see this post that I made a while ago, explaining how I got into using lattice in the first place). Hint: when using ggplot2, you’ll need to use of the reshape2 package (also written by the amazing Hadley Wickham) to get your data into a form that ggplot2 works best with. Another thing that you’ll want to think about when using ggplo2 is factor levels. This post will show how to (and also how not to) rearrange factor levels in R.

Let’s create a quick barplot with strings as x labels.

1 2 3 4 5 6 7 8 9 10 |
library(ggplot2) #create dummy data a = paste('my', 1:11) b = 1:11 df = data.frame(a, b) df qplot(a, b, data=df, geom='bar', stat='identity') + theme(axis.text=element_text(size=16, angle=45)) |

As df$a is an array of strings, R sets the factor levels alphabetically: my 1, my 10, my 11, my 2…which is not what we want, so let’s rearrange factor levels:

1 2 3 4 5 |
df$a = factor(df$a, levels = paste('my', 1:11)) df$a qplot(a, b, data=df, geom='bar', stat='identity') + theme(axis.text=element_text(size=16, angle=45)) |

And finally, the wrong way to rearrange factor levels would be by using the levels() function:

1 2 3 4 5 |
df = data.frame(a, b) levels(df$a) = paste('my', 1:11) qplot(a, b, data=df, geom='bar', stat='identity') + theme(axis.text=element_text(size=16, angle=45)) |

So be careful – if your data is not as obvious as this example and you are a bit new to factors and levels, you might end up plotting wrong results (like on the last example, “my 2” and “my 3” were plotted with the values 10 and 11).

## Latex tables: column widths and alignments

Firstly, start off your table in http://www.tablesgenerator.com/.

Tables Generator will do a lot for you. Its most useful features are importing from .csv and merging cells. The Booktabs table style (alternative to default table style from the menu) looks a bit nicer and is “publication quality”. Note that publication quality tables should not contain vertical lines.

Code #1 is the code from Tables Generator with the addition of caption, label and Latex document begin-end (so it’s compilable). Continuing from that table, let’s centre the contents of columns 1-3 and the whole table in your document, by adding centering and changing the table specs from l’s to c’s: Code #2.

Finally, if your cell contents are long and need wrapping:

Note that if your table is too wide for your document margins, then LaTex issues a warning, not an error. So you need check for warnings like *“Overfull hbox (9.75735pt too wide) in paragraph at lines 55–63”* in your compilation log. A quick solution to wide cells is like this (Code#4):

But this solution does not include decent central alignment. Using m (so m{2cm} instead of p{2cm}) would do the vertical centering (*e.g.* look how the first row is alligned), but still not horizontal. So following this StackOverflow post, I started defining column types and widths using the array package. See Code#5.

Next time I might write a post on how to add extra space between lines.

## Why does a linear model without an intercept (forced through the origin) have a higher R-squared value? [calculated by R]

*This is a short note based on this.*

Answer in short: Because different formulas are used to calculate the R-squared of a linear regression, depending on whether it has an intercept or not.

R2 for a linear model that has an intercept:

,

where * y* is the variable that the linear model is trying to predict (the response variable),

*y^*is the predicted value and

*y-*is the mean value of the response variable.

And the R2 for a linear model that is forced through the origin:

,

basically the mean value of the response variable is removed from the equation, making the denominator bigger (and the result of the division smaller). The reason why the mean can not be used for this calculation is that it does not make sense any more – forcing the fit through zero kind of means adding an infinite number of (0,0) points into the dataset.

This means that the R-squared values of two different linear models (one with an intercept, one without) can not really be compared, because when the intercept is quite small compared to the residuals (basically the numerator) then the R2 “advantange” that the through-origin regression gets is relatively bigger than the decrease in residuals, when including the intercept.