My minimal LaTex preamble

This post was originally published here

My minimal example:

How to “increase” array resolution in R (replicate each element both column-wise and row-wise)

This post was originally published here

One picture says more than a thousand words. You have what is one the left, and you want what is on the right.


There are a few different ways to do this, but by far the cleanest and quickest way is to just select the rows and columns multiple times, by replicating row and column numbers (instead of actually replicating each element):

Note that by default, in rep(something, n) the n is times so equivalent to rep(something, times=n), but in this case we need to use each instead of times.

Converting R Markdown to Latex

This post was originally published here

Install Pandoc:

Open the Terminal, Command Prompt (search for cmd) or Windows Powershell, go to the folder and do:

pandoc -s -o report.tex

And that’s it!

(Read this, if you want vector images.)

Reordering factor levels in R and what could go wrong

This post was originally published here

I’ve recently started using ggplot2 in addition to lattice (see this post that I made a while ago, explaining how I got into using lattice in the first place). Hint: when using ggplot2, you’ll need to use of the reshape2 package (also written by the amazing Hadley Wickham) to get your data into a form that ggplot2 works best with. Another thing that you’ll want to think about when using ggplo2 is factor levels. This post will show how to (and also how not to) rearrange factor levels in R.

Let’s create a quick barplot with strings as x labels.


As df$a is an array of strings, R sets the factor levels alphabetically: my 1, my 10, my 11, my 2…which is not what we want, so let’s rearrange factor levels:


And finally, the wrong way to rearrange factor levels would be by using the levels() function:


So be careful – if your data is not as obvious as this example and you are a bit new to factors and levels, you might end up plotting wrong results (like on the last example, “my 2” and “my 3” were plotted with the values 10 and 11).

Latex tables: column widths and alignments

This post was originally published here

Firstly, start off your table in

Tables Generator will do a lot for you. Its most useful features are importing from .csv and merging cells. The Booktabs table style (alternative to default table style from the menu) looks a bit nicer and is “publication quality”. Note that publication quality tables should not contain vertical lines.

Screen shoti of Tables Generator

Screen shot of Tables Generator


Code #1 is the code from Tables Generator with the addition of caption, label and Latex document begin-end (so it’s compilable). Continuing from that table, let’s centre the contents of columns 1-3 and the whole table in your document, by adding centering and changing the table specs from l’s to c’s: Code #2.


Finally, if your cell contents are long and need wrapping:

table 3

Note that if your table is too wide for your document margins, then LaTex issues a warning, not an error. So you need check for warnings like “Overfull hbox (9.75735pt too wide) in paragraph at lines 55–63” in your compilation log. A quick solution to wide cells is like this (Code#4):



But this solution does not include decent central alignment. Using m (so m{2cm} instead of p{2cm}) would do the vertical centering (e.g. look how the first row is alligned), but still not horizontal. So following this StackOverflow post, I started defining column types and widths using the array package. See Code#5.



Next time I might write a post on how to add extra space between lines.

Why does a linear model without an intercept (forced through the origin) have a higher R-squared value? [calculated by R]

This post was originally published here

This is a short note based on this.

Answer in short: Because different formulas are used to calculate the R-squared of a linear regression, depending on whether it has an intercept or not.

R2 for a linear model that has an intercept:


where y is the variable that the linear model is trying to predict (the response variable), y^ is the predicted value and y- is the mean value of the response variable.

And the R2 for a linear model that is forced through the origin:

CodeCogsEqn (2),

basically the mean value of the response variable is removed from the equation, making the denominator bigger (and the result of the division smaller). The reason why the mean can not be used for this calculation is that it does not make sense any more – forcing the fit through zero kind of means adding an infinite number of (0,0) points into the dataset.

This means that the R-squared values of two different linear models (one with an intercept, one without) can not really be compared, because when the intercept is quite small compared to the residuals (basically the numerator) then the R2 “advantange” that the through-origin regression gets is relatively bigger than the decrease in residuals, when including the intercept.

Symbolic links and 2 common errors with them

This post was originally published here

I don’t know if it’s good or bad, but I like when the files I’m working with are in the working directory (so instead of using pathnames to my files I can just type filename or ./filename). But to avoid copying data and wasting space, symbolic links are the way to go. The command for that is:

ln -s target_file sym_link,

where -s stands for “symbolic” (just ln would create a hard link)

However, if you are not a complete UNIX guru, then trying to access your linked files is likely to produce one of these errors:

No such file or directory OR Too many levels of symbolic links

The solution to both of these is to always use full paths to the files and their symbolic links (ln -s /home/folder/file.txt /home/folder2/file.txt). For further information, see this and this. Apparently you can have 32 levels of symbolic links, so getting a “Too many levels of symbolic links” after just creating one, means that there is some serious recursion going on.

Remove symbolic links just as you remove files:

rm sym_link