Piping tibbles in tidyverse

A couple of weeks ago, while preparing for the R for ecologist class, I found tidyverse, a suite of packages (mostly written by Hadley Wickham) for various ways how to tweak data in R (thanks to Viktoria Wagner for wonderful online materials for managing vegetation data!). Somewhen in 2009 I attended useR! conference in Rennes, France, where Hadley held a workshop on using one of these packages (I think that time still called plyr). I attended it, but to be honest, I remember nothing from that workshop, it was a way too abstract for me. Since then several times when preparing data for analysis in R, I thought that I should check it again and finally learn it. And here we are, it is coming, preparation of the R class pushed me to do that finally. I also found the piping of data through the functions, with an entirely different logic of sending data from function to function*, and a new form of data frames called tibbles (no idea what the word means). All the new fancy things; R moves forward quite wildly and erratically. But this post is not about piping, tibbles, and tidyverse. It is a quick thought about me using R, how did I change, and whether I should do something with that or not.

For the first time, I used R somewhen around the year 2005 when I started to work in Brno, my previous university, and a colleague asked me to calculate and draw species curves which were hard to do anywhere else. I recall S-PLUS, a program I used as a master student when I took the class of Modern regression methods while studying in České Budějovice. Petr Šmilauer that time taught it in S-PLUS, commercial system, which turned to be R’s ancestor. Knowledge of S-PLUS came handy, and I was able to do quite pretty figures in R. That time there were just a few R packages, and no RStudio (I think I was using Tinn R editor, it still exists actually). R was not a sweet candy at the beginning, I remember quite some time spent by frustration from always occurring annoying error messages, but eventually I kind of started to like it. Later I found that the skill of using R is a powerful advantage – I was able to calculate or draw almost anything, things that would otherwise need to be done in a cumbersome way in some clickable software. That time there was just a couple of other people fancy in R. I started to teach R, which was a great way to push myself to improve, and to spread the knowledge of R among others. How different is it from today, when R is considered a lingua franca of ecologists, scientists publish papers with R codes appended, and common requirements for a newly hired research assistant, postdoc or even faculty member is a skill of using R.

But I also changed. There was a time, couple of years ago, when I was eager to learn all different new developments in R, study fancy packages, try new analytical or visualisation methods. At one time, together with Víťa, another R guy from Brno, we even taught seminaR, a class focused purely on trying new fancy things in R. But it’s gone. When it comes to R, I start to be somewhat conservative, stick to what I know and feel quite reluctant to discover new things. Partly perhaps I got lazy, but partly it has reasons. Some of those new packages, methods and “new trends” in using R, in the end, turned to be just ephemeral matter, packages were not maintained, their developers deserted them and turned interest into something else. Also, now the use of R is so broad, that it is hard to keep track of all new and exciting things. And my time is also not what it used to be; I can’t spend a week or more by trying to tweak the R code in this and that way, thinking about it day and night, excited and barely sleeping. It is not that I don’t use R anymore – actually I use it on a daily basis, at any time I can be sure that some of my computers have RStudio on, with some half-written script or some long code running. But I use it as a tool to solve some problem, and I focus on the problem itself, instead of keeping polishing my skills of using that tool. Learning R was, without doubt, one of the best investment I did in my professional life, and now I hope to move further while keep using it to do something else.

Since now every winter semester I teach R for ecologists, I keep myself somehow fit in using R in a way that I sort things to be able to explain them to students. The class is actually focused on pretty basic R stuff. Half of it we mostly draw figures, because this does not require any added theoretical background which would be needed if I use R to teaching, e.g. statistics. The second half of the class we focus on “business as usual, with R”. Loops, functions, importing, exporting and manipulating data, and some simple calculations, mostly to show students the benefit R may bring them if they use it. But that is actually the point; I do not focus on bringing new fancy things to the class, but instead, teach R oldies goldies in a way as smooth and digestible as possible.

But, piping tibbles in tidyverse let me think that it may be time to sit down for a while and recheck how R changed in the past couple of years I did not follow it. Here comes my point. Do you have some suggestions what to look for? What in R do you find useful recently that you cannot breathe without and you would suggest me to learn it? No specialised packages, something for “everyday business”. For example, I still wonder whether I should learn ggplot2 or be happy with my base R graphics combined with lattice – do you have some opinion about that? Any comments welcome!

* If you are familiar with R, than piping (library magrittr) can replace (for example) sum (log (sqrt (abs (1:10)))) by 1:10 %>% abs %>% sqrt %>% log %>% sum. Quite a revolutionary way how to assemble script together. If not familiar with R, just don’t worry about that 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *