Monthly Archives: December 2017

Piping tibbles in tidyverse

A couple of weeks ago, while preparing for the R for ecologist class, I found tidyverse, a suite of packages (mostly written by Hadley Wickham) for various ways how to tweak data in R (thanks to Viktoria Wagner for wonderful online materials for managing vegetation data!). Somewhen in 2009 I attended useR! conference in Rennes, France, where Hadley held a workshop on using one of these packages (I think that time still called plyr). I attended it, but to be honest, I remember nothing from that workshop, it was a way too abstract for me. Since then several times when preparing data for analysis in R, I thought that I should check it again and finally learn it. And here we are, it is coming, preparation of the R class pushed me to do that finally. I also found the piping of data through the functions, with an entirely different logic of sending data from function to function*, and a new form of data frames called tibbles (no idea what the word means). All the new fancy things; R moves forward quite wildly and erratically. But this post is not about piping, tibbles, and tidyverse. It is a quick thought about me using R, how did I change, and whether I should do something with that or not.

For the first time, I used R somewhen around the year 2005 when I started to work in Brno, my previous university, and a colleague asked me to calculate and draw species curves which were hard to do anywhere else. I recall S-PLUS, a program I used as a master student when I took the class of Modern regression methods while studying in České Budějovice. Petr Šmilauer that time taught it in S-PLUS, commercial system, which turned to be R’s ancestor. Knowledge of S-PLUS came handy, and I was able to do quite pretty figures in R. That time there were just a few R packages, and no RStudio (I think I was using Tinn R editor, it still exists actually). R was not a sweet candy at the beginning, I remember quite some time spent by frustration from always occurring annoying error messages, but eventually I kind of started to like it. Later I found that the skill of using R is a powerful advantage – I was able to calculate or draw almost anything, things that would otherwise need to be done in a cumbersome way in some clickable software. That time there was just a couple of other people fancy in R. I started to teach R, which was a great way to push myself to improve, and to spread the knowledge of R among others. How different is it from today, when R is considered a lingua franca of ecologists, scientists publish papers with R codes appended, and common requirements for a newly hired research assistant, postdoc or even faculty member is a skill of using R.

But I also changed. There was a time, couple of years ago, when I was eager to learn all different new developments in R, study fancy packages, try new analytical or visualisation methods. At one time, together with Víťa, another R guy from Brno, we even taught seminaR, a class focused purely on trying new fancy things in R. But it’s gone. When it comes to R, I start to be somewhat conservative, stick to what I know and feel quite reluctant to discover new things. Partly perhaps I got lazy, but partly it has reasons. Some of those new packages, methods and “new trends” in using R, in the end, turned to be just ephemeral matter, packages were not maintained, their developers deserted them and turned interest into something else. Also, now the use of R is so broad, that it is hard to keep track of all new and exciting things. And my time is also not what it used to be; I can’t spend a week or more by trying to tweak the R code in this and that way, thinking about it day and night, excited and barely sleeping. It is not that I don’t use R anymore – actually I use it on a daily basis, at any time I can be sure that some of my computers have RStudio on, with some half-written script or some long code running. But I use it as a tool to solve some problem, and I focus on the problem itself, instead of keeping polishing my skills of using that tool. Learning R was, without doubt, one of the best investment I did in my professional life, and now I hope to move further while keep using it to do something else.

Since now every winter semester I teach R for ecologists, I keep myself somehow fit in using R in a way that I sort things to be able to explain them to students. The class is actually focused on pretty basic R stuff. Half of it we mostly draw figures, because this does not require any added theoretical background which would be needed if I use R to teaching, e.g. statistics. The second half of the class we focus on “business as usual, with R”. Loops, functions, importing, exporting and manipulating data, and some simple calculations, mostly to show students the benefit R may bring them if they use it. But that is actually the point; I do not focus on bringing new fancy things to the class, but instead, teach R oldies goldies in a way as smooth and digestible as possible.

But, piping tibbles in tidyverse let me think that it may be time to sit down for a while and recheck how R changed in the past couple of years I did not follow it. Here comes my point. Do you have some suggestions what to look for? What in R do you find useful recently that you cannot breathe without and you would suggest me to learn it? No specialised packages, something for “everyday business”. For example, I still wonder whether I should learn ggplot2 or be happy with my base R graphics combined with lattice – do you have some opinion about that? Any comments welcome!

* If you are familiar with R, than piping (library magrittr) can replace (for example) sum (log (sqrt (abs (1:10)))) by 1:10 %>% abs %>% sqrt %>% log %>% sum. Quite a revolutionary way how to assemble script together. If not familiar with R, just don’t worry about that 🙂

Visualizing random drift in ecological communities. Part 2.

In the first part of this post, I focused on a simple illustration of random drift without acknowledging any other process. In the case of trees growing on a small island with only 16 individuals, it does not matter which individual died and which reproduced to replace it, because all offsprings could disperse anywhere and the whole process looked more like swapping the colours on the balls in the bag. Here, I explicitly include dispersal in two forms: as a movement of the offspring limited to some short distance, and as an immigration of species from outside. Then, in the second step, I also add the process of selection, which means that I include environmental heterogeneity into the model (elevation), and let species differ from each other in their ecological preference. While in the first part I was interested how changes in the number of species, number of individuals and number of “generations” influences the outcome of random drift, here I will ask how important is ecological drift when interacting with immigration and selection. For all simulations I used an R package developed by Tyler Smith and published in Smith & Lundholm (2010); for more technical details, see the last chapter below.

Artificial landscape: it matters if the individual has a neighbour and how far its offspring can disperse

For all further simulations in this post, I will use artificial landscape of certain size (e.g. 50×50 or 100×100 cells, each cell accommodating maximally one individual) and fixed number of species (10). At the beginning, each cell in the landscape is occupied by one individual of randomly chosen species, which means that each species is represented by a similar number of individuals. In each cycle, a fixed proportion of individuals will die, and a fixed proportion of individuals will produce one offspring. The dispersal of the offspring is spatially limited in that it can get only to one of the neighbouring cells, rarely a bit further. Starting from the second generation, there could be vacant places in the landscape after inidividuals which died; they remain vacant until offspring of some neighbouring individual occupies them. See the figure below showing this dynamic on the example of a landscape with only four species. In the scenario without immigration this is all; if we add immigration, then in every generation, individuals of randomly selected species will be added to the marginal cells of the landscape. Note that all individuals of all species are demographically equal (i.e. all have the same probability to die or reproduce) and all cells within the landscape have the same “environmental conditions”. In this sense, the model shows “neutral” community where only neutral processes (drift and dispersal) operate.

Example of neutral dynamic in the landscape with only 4 species. Each circle is an individual, and each color is a species. At the beginning (1), all cells are filled by individuals of randomly chosen species. The next window shows the result after the first generation; some individuals died, and some other replaced them, but only within the neighbouring cells; white spots are empty. Panels (3) to (9) shows results of next generations.

An interesting property of this model is that abundances of species, which at the beginning were almost equal, will become highly uneven. I added diagram with species abundance distribution (see the figure below, the right panel), in which species are ordered according to the number of individuals (from high to low), and each species is represented by a bar with relevant colour. Order of species in this distribution dynamically changes, and in the course of time, some species become rare and eventually extinct. As a result, only one species dominates the whole landscape.

Artificial landscape, with 50×50 cells each occupied by max one individual, 10 species. Left panel shows distribution of species in the landscape, right barplot shows species abundance distribution (bars are ordered decreasingly by the number of individuals, and scaled relatively to the most dominant species). This is the initial situation: landscape with all cells occupied by one individual of randomly selected species.

Artificial landscape after the first generation. Some cells became empty (white), and species abundances start to differentiate from each other.

After 100 generations, abundances of species even more differentiated. No clear spatial pattern yet, no species extincted yet.

After 200 generations. Clear spatial pattern arises, one species becomes dominant, one species extincted.

After 500 generations. Only three species left, from which one (red) clearly going to extinct soon. Abundances are highly uneven.

To know the whole sequence from beginning to the end, see the video below.

In the video you see that after the red species extinct (time stamp 1:06), the two remaining species keep silently fighting for a long time; in fact, I haven’t had the patience to run the simulation long enough, but eventually, one of the two species will for sure win.

The simulation above is quite similar to the one done in Part 1, just the landscape is “spatially explicit”, meaning that it matters where in the landscape the individual occurs and what are its neighbours.

Including immigration: constant supply of individuals from outside

Now let’s add immigration. The following simulation has the same parameters as the one before (the landscape of 50×50 cells with ten species), except that after each cycle, the individuals of randomly selected species will be added to marginal cells (that’s why the landscape keeps colourful margins all the time). The initial situation looks very similar to the one above, so below I show only the situation after 200 and 500 generations.

Artificial landscape with immigration, after 200 generations. Species abundance distribution highly uneven (ligh blue species recently dominating), but no species extincted.

Artificial landscape with immigration after 500 generations. Nothing much happening, species happily coexist.

Again, see the video below for the whole time sequence. Just note that it is actually pretty boring; drift makes some of the species rather rare, but due to constant immigration no species is going to extinct (and if yes, they will be most likely added in the next generation into the marginal cells). If the speed of immigration is high enough as in this simulation, the species actually happily coexist together.

Some conclusions from what we learned so far. In the isolated landscape without immigration (like island far in the ocean, or patch of some natural habitat in the landscape isolated from other patches by human activities), the random drift earlier or later results into extinction of all but one species; how long it takes depends on the number of individuals (size of the island) and initial number of species. Immigration can revert this result by returning the species to the game.

The end of neutrality: adding environmental heterogeneity and selection

So far all cells within the artificial landscape were equal, and all species have an equal chance to survive in any of them. The next step is to include environmental heterogeneity and selection between species. Environmental heterogeneity means that different cells have different environmental conditions (or habitat properties).

In our model there is only one environmental gradient; let’s call it elevation. The spatial distribution of elevation is not random – it forms four “humps” which we will call hills. Species, so far, were also equal in that they did not have any environmental preferences (even if they did, so as so the environment was perfectly “flat”). Now, each species gets ecological optima in one elevation zone, but it can (with lower success) grow also in the adjacent elevation zones. For example, species 5 has optima in elevation zone 5, but its offspring can also survive in the zones 6 or 4 (but not much higher or lower). This is the selection by environment: species 5 has an advantage within the cells of elevation zone 5, while species 4 has an advantage within the cells of elevation zone 4; if species 5 gets to elevation zone 4, it may still survive there, but with a lower probability than species 4. Our model has four hills, ten elevation zones and ten species with each species specialised on one elevation zone. The proportion of cells is roughly equal for each elevation zone. All other parameters are identical to the previous simulation, except that the landscape is 100×100 cells large (to squeeze all four “hills”).

To visualise this, each snapshot has three different panels (see below). I removed species abundance distribution and included 3D visualisation of our four hill landscape instead. I also added the stacked barplots showing relative abundances of individual species within each elevation zone within each mountain. As it will turn out, this visualisation will become quite important to understand what is happening.

Artificial landscape with four hills representing elevation (min 1, max 10). This is the initial situation, with individuals randomly dispersed across the landscape. The left panel shows the aerial view of this landscape, with contours for elevation, and individual peaks marked according to geographical position (NW, NE, SW, SE for the north-west, north-east, south-west and south-east, respectively). Upper right panel shows the same in 3D (note that the hills have been “smoothed”, otherwise they would have “stairs” on the slopes). The lower right panel shows four stacked barplots, each showing relative abundances of each species within each elevation zone. In this initial step, all species are everywhere and proportions of species in each elevation in each hill are similar. Species elevation optima shows which species (colour) has optima in which elevation zone (the number).

Artificial landscape with topography after 100 generations. Spatial pattern starts to emerge (left and upper right panel). All species are still in most elevation zones (barplots in lower right panel), but there start to be differentiation according to species ecological preferences (e.g. dark violet species stats to prevail in lower elevations, while light and dark blue species in higher elevations).

After 500 generations. Spatial pattern is rather clear, and species differentiation along elevations also. Note one interesting thing: different peaks start to be dominated by different species (dark blue vs light blue ones). Peaks are like small islands in the sea of lowland, and ecological drift here works faster than in other parts of the landscape.

After 3000 generations. Dark blue species dominates NE and SE peak, while light blue dominates NW and SW peaks (with light green in SW).

Video with the whole sequence is below.

Perhaps the most interesting is that different peaks of the hills become dominated by different species, as a result of random drift (dark blue species dominates eastern peaks, while light blue western peaks). Lowland is also partly differentiated, but since lowland of each mountain is interconnected with each other, this differentiation is arbitrary. In this way, random drift increased the beta diversity of along elevation; if we put a sample in each peak, the differences in species composition (beta diversity) will be high, while not so high in the lowland.

With immigration, the pattern is similar, just more dynamic. Lowland cells have most of the species, but this is mostly the result of the immigration into the marginal cells of the landscape (which are, due to the construction of the model, all in the lowland).

Adding selection to the mix of drift and dispersal has an interesting outcome. Stochastic ecological drift acts against a deterministic effect of selection via “environmental filtering”, and as a result, ecologically identical habitats (elevation zones) can have similar or somewhat different species composition, depending on their isolation. Hill peaks in our model are the most isolated, and as a result of drift and dispersal limitation, different peaks became dominated by different species. Immigration can revert the effect of drift, but only if the habitats are not isolated (influence of immigration is high in the lowland which is interconnected, but not in more isolated higher elevations).

This leads me to the question: how important is ecological drift in real communities? How much is the pattern which we see in nature actually result of predictable species ecological preferences and biotic interactions, and how much it results from upredictable effect of drift and dispersal? This “niche-neutral” riddle is around already for couple of years, but in my impression the methodological problems and lack of appropriate data about ecological communities make it difficult to answer. But that’s a topic for another post somewhen in future.

Technical details: how I made it

The simulation is done in R by packages neutral.vp written by Tyler Smith and published in Smith & Lundholm’s 2010 Ecography paper. The original library published as an appendix to the paper is already not working, but I fixed it to be able to compile it in newer versions of R (details here). I have been playing with this library couple of years ago, when I was deeply in love with variance partitioning of beta diversity among environmental and spatial components (the love is gone, luckily). There are some other (even more advanced) models simulating spatially explicit communities (one overview is here), but for the purpose of this blog, neutral.vp is perfect and rather fast (at least for reasonably small landscapes). Visualisations were done in R (the squared landscape using functions image and contour, the 3D models of a landscape with four hills using the function persp). Figures were assembled into a video in Movie Maker (Windows). The R scripts are a bit messy, so I don’t provide them here, but if you wish, I will be happy to share.

References

  • Smith, T.W. & Lundholm, J.T. 2010. Variation partitioning as a tool to distinguish between niche and neutral processes. Ecography 33: 648-655.