Should you learn the Latin names of plants? You should, but perhaps for a different reason than you think

Sarothamnus scoparius, according to recent nomenclature Cytisus scoparius. Photo by Evka Šmerdová (née Hettenbergerová), borrowed from

The title of this post is aimed at my Taiwanese students, but the content is hopefully good also for a broader audience. Since I started to teach at National Taiwan University in 2015, I am constantly facing one interesting resistance from my students: learning Latin names (of plants in my case, but it applies in general, also for other organisms). I know it’s not easy (well, what is?). Latin is very different from the Chinese language most students use, and it may not seem so useful at first, especially when others around also do not use it. So why bother? I did my own survey (highly biased and with very low n), asking students why they think they need to learn Latin names. Far the most common answer (just after “because you force us”) was: because of foreigners, they don’t know Chinese names, so if they ask us, we need to tell them “scientific name” (some directly say “English name”, as if Latin and English are the same thing). At that point, I felt a bit guilty, because it means if not me asking the names in Latin, they don’t need to bother, and can go well with Chinese ones. Fair enough, but maybe that’s not how it works.

Let me depart from the topic, but just for a while. Recently I read an opinion piece in newspapers, which commented on the strategy of the Taiwanese government to turn Taiwan into an officially bilingual country in 2030. The author of that article (a foreigner living in Taiwan for more than 30 years) claimed that this is an impossible task. Not because people cannot learn English. But because – why should they? Just because “the government decided” to show how progressive Taiwan is and how good a place for foreign investors it could be? That does not sound like a good motivation for people to climb on a top of so steep mountain, or even start thinking about that. Unless people experience the real need and usefulness of English, it is hard to motivate them to learn it. Just like the Latin names.

I started to learn Latin names in the first year of my undergraduate study at the Biological Faculty. During the first week of the first semester, which we spent in the field station in the middle of the lovely south Bohemian landscape, teachers told us: now, forget the Czech names of plants, you won’t need them anymore. Learn the Latin names. No “discussion” allowed, in fact, there was nothing to discuss: from the first excursion, teachers always show us all the plants and tell their Latin name first, Czech one only if someone asked. When we did the final test from the excursion, we walked with the teacher and had to say the name of any species s/he will point at; Latin name counted for 2 points, Czech name just for 1 point. It was not easy; Czech is a Slavic language, not a Latin one, and those Latin names, with few exceptions, really don’t sound like anything familiar. I still remember how I was walking there and back through the yard of the field station before my first final test, memorizing “Sa-ro-tha-mnus sco-pa-ri-us” and trying to find similarity with something in Czech (untranslatable Sáro, tam hnus! Skopárius!). And yes, at the beginning we murmured a lot with other classmates why should we learn something that we will never need in the future! How wrong I was, by the way; those who were not kicked out of the school in the first year of the undergraduate study will agree that it would be impossible to get a Master or even just finish undergraduate study without knowing the Latin names of the organisms we study. Not because teachers push us, later they don’t need to: everyone was using Latin names, and it would be really difficult (and embarrassing) not knowing them. Now, I teach vegetation ecology at National Taiwan University; how would it be if I resisted and kept learning only Czech plant names at that time?

For English, my beginning was even more painful. Till the Velvet Revolution (end of Communism in Czech, 1989) we learned Russian at school as the second language (I was eleven that year). Then, our Russian language teacher almost overnight turned into an English language teacher, and ever since I studied English, through the rest of my elementary, secondary and also university study. When I decided to travel to New Zealand for a volunteering job in the Department of Conservation (that was 2002 and I was 24, just with a fresh Master diploma in my pocket), I thought I speak English pretty well, after sooo many years of learning it. How wrong I was I learned already at Auckland’s airport, when I got out of the flight and (surprisingly) everyone spoke English, with that unmistakable New Zealand accent (Gud daj!). The immigration officer asked me “Do you have some food in your backpack?”, and I said “yes”, because I thought she is asking about “foot” and I thought that “foot” is the same as “shoes”. She said “but you cannot have any food in your backpack” and wanted me to throw them out; I resisted arguing “but it is hiking food”; after I showed her my hiking boots, it was clear that there is something not quite working. But yeah, after half a year in New Zealand, speaking English all the time and Czech only occasionally with my family and friends using a phone (no Skype or Facebook that time!), I learnt how to use English. I do have a weird accent, though (one British girl told me I sound pretty Indian, whatever she meant by that), but I can survive with that.

It never came to my mind that I would be learning the Latin names of plants or study English because of “foreigners”, who may not know the Czech plant names or ask me something in English on the street. And yeah, there are pretty many foreigners in Czech, but who cares. In the beginning, I was pushed to do that, mostly by my annoying teachers, who insisted we need to learn it (I can never thank them enough for their wiseness and resistance to our ignorance!). Later, no one needed to convince me to do that, because I found how important it is for me personally. You don’t learn something “because of others”. You always learn it because of yourself. The knowledge and skills you have are what makes you special, and opens new possibilities for you. So, please, don’t waste your energy by searching for arguments why you just can’t or don’t need to do that, and just do that.

Massenerhebung effect in Taiwan

When we compare vegetation zonation in smaller and isolated mountains with that in large mountain ranges, we often find that in smaller mountains, the same vegetation zones occur in lower elevations than in larger, more “massive” ones. This pattern, first described from European Alps by German researchers at the beginning of the 20th century, is known as Massenerhebung effect, or mass elevation effect (Massen = mass, erhebung = uprising or elevation). In Taiwan, this pattern has been pointed out by Su (1984a,b), who plotted elevation ranges of main vegetation zones in Taiwan along latitude (see below). Since Taiwan island stretches mainly in a latitudinal direction, one would expect that vegetation zonation will steadily decrease toward the northern, cooler part of the island. Instead, the schema from Su (1984b) shows quite clearly that “[t]he altitudinal level of these zones shift downward towards both northern and southern tips as a result of Massenerhebung”, a pattern which is most obvious for lower Quercus zone (lower cloud forest).

Latitudinal pattern in distribution of vegetation zones in Taiwan. Combination of Fig. 3 from Su (1984b) with part of Fig. 1 from Su (1984a).

Scientific literature recognises three potential reasons why Massenerhebung effect occurs. First is the landmass heating effect: large mountainous masses slow down the speed at which temperature decreases with elevation (or, in other words, decreases the lapse rate). The temperature at mountains in certain elevation is always higher than the temperature of the free air at the same elevation; this is because the solar rays reflected by mountains have a considerable heating effect on the surrounding air. A solitary mountain (e.g. volcano) is not capable of reflecting as much heat as an entire mountain range, and as a result, the temperature at the same elevation in the solitary mountain will be cooler than at comparable elevation in an extensive (“massive”) mountain range. The other is the effect of wind: wind will cool down the outer mountain margins or isolated mountains more than inner parts of mountain ranges. Windward parts of the mountains are usually more humid, which means that more heat has to be spent to evaporate the water instead of heating the ground. Inner parts of the mountains will be dryer and warmer (having more “continental” climate). The last is the effect of cloud: isolated mountains and mountains in outer margins of mountain ranges in coastal regions will be more exposed to prevailing cloud formations. High air humidity causes lower air temperature, and high water content in the soil results into slower humus decomposition. Slow decomposition rate causes lack of nutrients available for vegetation, typical for the mountain cloud forest; this may be partly the reason why at isolated mountains and mountains at the margin of mountain ranges the cloud forest occurs at lower elevations than in the central parts of the mountain ranges. Some authors consider as Massenerhebung only the first effect (landmass heating), others include all three together since their effect is not easy to be separated.

On a large scale, Massenerhebung effect is noticeable for example in Himalaya, where the timberline (the upper climatic boundary of the forest) shifts to a considerably higher elevation than in smaller mountains of comparable latitude. On a scale of Taiwan, along with the descriptive study of Su (1984b), the pattern can be recognised also on the distribution of cloud forest from data in the vegetation database (Schultz et al. 2017, their Figure 2), which shows the hump shape pattern (red crosses indicate cloud forest plots).

Latitudinal pattern of cloud forest vegetation (red crosses) in Taiwan (Schulz et al. 2017, their Figure 2). Yes, it looks a bit strange; I modified the figure by stretching it five times along the vertical axis, to magnify the humshaped pattern of the cloud forest (red crosses).

As far as I know, there is only a single study which is attempting to explain the environmental correlates of Massenerhebung effect in Taiwan. Chiou et al. (2010) used data from National Vegetation Database of Taiwan, and concluded that the cooling effect of northeastern monsoon is more important than Massenerhebung effect caused by the landmass heating. Authors selected 76 woody species occurring commonly across the island, and for each of them estimated the upper altitudinal limit based on the database data. If only data from the eastern part of Taiwan were used, results showed decreasing elevation pattern with increasing latitude (lowest in the northern part of Taiwan and higher in the central and southern part), while in the case of the dataset from western part there was no clear trend. Authors concluded that since they have not detected the decrease in the altitudinal limits in the southern parts of Taiwan, the “[h]eat retention of Massenerhebung is not the mechanism which can best explain the patterns of species altitudinal distribution in Taiwan”.

Distribution of vegetation plots used in the study of Chiou et al. (2010).

My modest opinion is that although authors of Chiou et al. (2011) use a rather large dataset of vegetation plots (1604 vegetation plots, each 400 m2), the analysis itself is not too persuasive, and I would be more careful with bold conclusions of no Massenerhebung in Taiwan. It seems that in their definition, Massenerhebung effect includes only the landmass heating effect, while the effect of wind and cloud are considered separately. But at the same times, authors have chosen three areas in Taiwan of comparable elevation span (each reaches the elevation well over 3000 m asl), while not including the marginal mountain systems at the northern and southern part. But this means that the massiveness of all three selected regions is comparable, so I am not sure why authors expect to see the decreasing trend caused by lower landmass heating effect. Additionally, authors used only 76 out of almost two thousands woody species in Taiwan, including only the most common species occurring in all three selected regions. So, not including marginal mountains (the less “massive” far north and far south part of the island) and using only the limited number of species may be exactly those reasons why the hump shape pattern was not detected. But who knows.

The disadvantage of the schema drawn by Su (1984b) is in a relatively subjective nature of the data it builds upon. Although it is based on more than 200 vegetation records from different elevation zones and different watersheds, combined with data from published studies, still, the boundaries of vegetation zones are delineated somewhat arbitrarily, not really using any quantitative tool. But since now we have data from the National Vegetation Database of Taiwan and the classification of forest vegetation types according to the study of Li et al. (2013), it is possible to construct similar schema using the real vegetation plots. This study, which Woody wrote together with co-authors as a part of his PhD during his study in Brno (Czech Republic), contains a description of 21 zonal and azonal forest vegetation types in Taiwan (zonal types are those determined by climate, while azonal are those determined mainly by edaphic conditions). The paper additionally contains one convenient instrument: Cocktail Determination Key (CoDeK), an R-based application which allows automatic determination of vegetation plots into vegetation types described in the paper, based on plot species composition. I used CoDeK to assign plots into vegetation types, chose only plots belonging to zonal vegetation types and merged analogous tropical and subtropical types into the same units. I plotted latitudinal-altitudinal distribution of the plots and fitted the lower boundaries of each vegetation type with quadratic quantile regression. Except for high-elevation conifer-dominated forest types, all other types show clear hump shaped pattern, with a decrease not only in the northern, but also in the southern part of Taiwan. And since zonal forest types in Li et al. (2013) are largely analogous to vegetation types in Su (1984b), the schema shows a nice agreement with the one in Su (1984b) and supports his observation.

Latitudinal-altitudinal pattern of forest vegetation zones (data from National Vegetation Database of Taiwan, with only zonal forest vegetation types according to Li et al. 2013). Lower boundaries of vegetation zones are delineated by quadratic quantile regression curve.

Indeed, the hump shape pattern of vegetation types along latitude is just an empirical observation and does not say anything about factors which are is causing it. In fact, it is quite likely that the resulting pattern is a result of all three factors together – landmass heating effect may be responsible for uplift of vegetation zones in central, drier and less windy parts of the mountain ranges, while the effect of wind and cloud may be responsible for downslope shift in the marginal areas. Additionally, land-use changes caused by human activities also may play a role. With all data readily available, it should not be so difficult to find it out.


  • Chiou, C.-R., Song, G.-Z.M., Chien, J.-H., Hsieh, C.-F., Wang, J.-C., Chen, M.-Y., Liu, H.-Y., Hsia, Y.-J., & Chen, T.-Y. (2010). Altitudinal distribution patterns of plant species in Taiwan are mainly determined by the northeast monsoon rather than the heat retention mechanism of Massenerhebung. Botanical Studies, 51: 89-97.
  • Li C.-F., Chytrý M., Zelený D., Chen M.-Y., Chen T.-Y., Chiou C.-R., Hsia Y.-J., Liu H.-Y., Yang S.-Z., Yeh C.-L., Wang J.-C., Yu C.-F., Lai Y.-J., Chao W.-C., & Hsieh C.-F. (2013). Classification of Taiwan forest vegetation. Applied Vegetation Science, 16: 698–719.
  • Schulz, H.M., Li, C.-F., Thies, B., Chang, S.-C., & Bendix, J. (2017). Mapping the montane cloud forest of Taiwan using 12 year MODIS-derived ground fog frequency data. Plos One, 12: e0172663.
  • Su, H.-J. (1984a). Studies on the climate and vegetation types of the natural forests in Taiwan (I): Analysis of the variations in climatic factors. Quarterly Journal of Chinese Forestry, 17: 57–73.
  • Su, H.-J. (1984b). Studies on the climate and vegetation types of the natural forests in Taiwan (II): Altitudinal vegetation zones in relation to temperature gradient. Quarterly Journal of Chinese Forestry, 17: 1–14.

About P-values and multiple testing issue

The Internet is flooded with explanations related to P-values (To P or not to P?) and problems around multiple testing (the more tests you do, the more significant results you get even if everything is random). I don’t bring anything new, I just wrap it for myself and add a few examples that hopefully make some concepts clearer. I will talk about Type I and Type II error rate, the power of the test, disadvantages of multiple tests, how it looks like when we use the correction for multiple tests (not many significant results left) and what else we could do with that (some global analysis?). Read further if you also think about these things.


Several decades ago, ecologists were strongly arguing that it is important to start using rigorous statistical approaches and standards in ecology; that was the era when ecology studies were based mostly on natural history observations instead of systematically sampled data analysed by rigorous statistics. Meanwhile, ecology grew into a standard scientific discipline, and some start to argue that P-values are often being misused and misunderstood, and should be banned from ecological publications (as it happens in medicine or psychology). Putting this discussion aside, P-values are still representing a useful tool telling us how much evidence we have against the null hypothesis, which we are testing.

P-value is the probability of obtaining the value of test statistic equal or more extreme than the observed one, even if the null hypothesis is true. The null hypothesis is usually formulated as “there is no relationship between the variables” (or there is some fixed relationship, from which we are testing the difference). For example, if I relate species composition to some environmental factor using constrained ordination, P-value is the probability of obtaining the test statistic (R2, pseudo-F-value) equal or higher than the one we truly calculated, even if the environment is truly not related to species composition (null hypothesis is true). If the P-value is sufficiently low (e.g. less than 5%, as is the common standard, proposed originally by Ronald Fisher), we may conclude that it is quite unlikely that the result could be obtained at a situation that the null hypothesis is true and state that the observed value is “significant”. That 5% (or in general alpha level) is the probability of Type I error, and we report such result as “significant” at given alpha threshold (e.g. P < 0.05). Some examples. First, let’s generate a data for which we are sure that the null hypothesis is true: we generate two random variables, independently from each other; then we calculate the correlation between them (Pearson’s r) and test it using parametric t-test (all this is done by ‘’cor.test’’ function in R):

x.rand <- rnorm (100)
y.rand <- rnorm (100)
cor.test (x.rand, y.rand)$p.value

We obtain some P-value (e.g. if you include ”set.seed (1234)” before the script above, the P-value should be P = 0.8020633, i.e. not significant at alpha = 0.05). Yet, even if these variables are randomly generated, occasionally we still can obtain a “significant” result with P-value smaller than alpha, with a probability of alpha (try it – you can e.g. ‘’set.seed (1257)’’, and you obtain P-value = 0.001678019, well significant at 0.05 level).

Now, the second situation – we generate data, for which we are sure that the null hypothesis is not true; variable x is still generated randomly, but variable y is the function of x, with added random noise. The same – calculate correlation and test it:

x.dep <- rnorm (100)
y.dep <- 0.2 * x.dep + rnorm (100)
cor.test (x.dep, y.dep)$p.value

If in the code above you ‘’set.seed (12345)’’, you got P-value = 0.001676154, well below the 0.05 threshold, so you would conclude that the value is significant at P < 0.05. But of course, it may also happen that the result is not significant, even it should be (e.g. with ‘’set.seed (12355)’’, you obtain P-value = 0.06517154, which would already not be significant at the 0.05 significance level). To summarize, see the possible outcomes in the table:

The left column shows the situation that the null hypothesis is, in reality, true (our first example with randomly generated variables). If I still reject it, I conduct the Type I error or a false positive. The probability that this will happen is set by me and equals to alpha level at which I decided to reject the null hypothesis (alpha = 0.05 or 5% in this case). If I did not reject it, I conduct the true negative (with the probability of 1 – alpha).

The right column shows the situation that the null hypothesis is in reality not true (our second example, where y is dependent on x because I enforced this dependence). If I reject the null hypothesis, I did correct decision (true positive), if I did not reject it, I conducted the Type II error or a false negative). The probability of Type II error is beta, and the probability of true positive is 1 – beta, also called the power of the test. Beta and the power of the test depends on several factors – test itself, how powerful it is (some are more and some less powerful), and also the data – sample size, and amount of variance. In our case, the data are quite noisy and I sample only 100 samples, so it may be not so easy to find the relationship which is there (this will improve if we increase the sample size, e.g. to 1000).

Now, in reality, I do not know whether the null hypothesis is true or not (that is why I want to test it), and I need to collect the data and use the statistic to get an idea. Both Type I and Type II errors are problematic – it means that I rejected something I should not (Type I error) or did not reject something I should (Type II error). The probability of Type I error is selected by me (here 0.05), while the probability of Type II error is dependent on three factors: the sample size (larger sample size means lower Type II error), the selected value of Type I error (lower alpha means higher beta), and the true difference of the data from the null hypothesis, i.e. true effect size (the higher is the difference, the lower is Type II error rate – it’s easier to reject the null and conduct true positive). This means that decreasing our choice of Type I error rate (alpha) increases the probability of Type II error (beta) and decreases the power of the test (1 – beta). Not an easy life; best solution – sample more to get higher sample size, which will reduce the probability of Type II error (of course, not always possible).

Both the effect size and the number of samples in the dataset are also influencing the P-value (it decreases with both the effect size and the number of samples), so in case of a large dataset, we are likely to get strongly significant results even if the true effect (e.g. R2) is pretty small. Some argue that datasets with more than several hundreds of samples the P-value already does not represent a useful tool for the decision, because virtually anything could get highly significant. Also, the threshold for rejecting the null hypothesis (often set as P < 0.05) should not be taken too seriously. It does not mean that at P = 0.048 we have evidence against the null hypothesis, but at P = 0.052 we already do not. Better is to consider P-value as a continuous measure, and use those thresholds (P < 0.05, P < 0.01, P < 0.001) only as a recommendation to which results we should pay more attention to. The other extreme, i.e. interpreting everything, would return us several decades back, to the era when interpretations of ecological studies were based on feelings and intuition rather than on the results of statistical analyses.

Multiple testing issue

Sometimes the study includes an analysis which is based on testing more than one null hypothesis. Imagine example: you want to know whether plant functional traits (e.g. leaf area, SLA) are related to some of the environmental factors (e.g. elevation, soil P). We will keep technical details of how this can be done aside, just imagine that you have ten different traits and ten different environmental variables, and you collect 100 samples (e.g. vegetation plots) in which you do all these measurements. Then, you take each of the ten traits and regress it against each of the ten environmental variables, test the significance, and interpret significant results. What is the problem?

Let’s do the same exercise as above. First, simulate the situation when all null hypotheses are true (variables are randomly and independently generated). R code gets more complex, so I deposited it in the Github repository here. What I do: I generate two random variables from the normal distribution, each having 100 values, calculated regression of y on x, and tested the significance (by parametric F-test in this case; note that if I calculate Pearson’s r correlation, the P-value would be identical). I repeat this 100 times and highlighted significant results by adding the red regression line into the scatterplot.

You see that I got a few significant results, although I should not – the null hypothesis is truly correct, I generated data in this way. These are false positives, and their probability is equal to the significance level alpha I set up (here 0.05, i.e. 5%). You can see that if I repeat the same analysis again, I got again some significant hits, just this time in different combinations of traits and environmental variables.

In fact, I can calculate the probability that, for given number of tests and given alpha value, I will get at least one significant result (with P < alpha), even if the null hypothesis is true: p{at least one significant result} = 1 – (1 – alpha)m, where m is the number of tests. In case of 100 tests, the probability of obtaining at least one result significant at P < 0.05 is 1 – (1 – 0.05)^100 = 99.4; I can be almost sure that I get at least one! Now, the second scenario, when all 100 relationships are generated with null hypothesis not true – I created variables which are dependent on each other. The same as above applies – I test each of them, and if significant, I add the red regression line. In this case, I coloured the background each scatter by light blue, to indicate that the variables are truly not independent.

The number of hits is high, but not all combinations are significant. Those which are not are false negatives, and those which are significant are true positive; the probability of true positive equals to the power of the test, in this case, something around 50% (How do I know that? Repeat the analysis many times, e.g. 1000 times, to see the proportion of true positives; this gives you the power of the test, from which you can calculate Type II error rate as beta = 1 – power). If you repeat the same thing, again quite high proportion of relationships will be significant (true positives), but some not even they should (false negatives).

In reality, we don’t know for which combination of variables the null hypothesis is true and for which it is not. Let’s make a third scenario, in which we mix both options – some of the relationships are generated from independent variables (null hypothesis is true), some from related variables (null hypothesis is not true). For simplicity, the 10 relationships in the first row are those with null hypothesis not true, and the rest (90) with null hypothesis true.

The figure shows the whole spectrum of situations: true positives (blue background with red regression line), true negatives (white background, no regression line), false positive (white background, regression line) and false negatives (blue background, no regression line). The problem is that, in reality, we do not know which of the relationships have the blue background (null hypothesis is not true), so when we do the interpretation one by one, we may well interpret true positives (correct!) as false positives (not correct!).

How to deal with this? One way is to adjust P-values of individual tests to correct for multiple testing issue. The simplest is the Bonferroni correction, which multiplies each P-value by the number of tests done (Padj = P * m) and then evaluates Padj instead of the original P for the significance. This helps to control for the overall (or family-wise) Type I error rate, but at the same time, it quite drastically reduces the power of the test, and you are unlikely to get false positives, but you are quite unlikely to get true positives. There are better methods then Bonferroni (e.g. FDR, false discovery rate, or Holm’s correction), but the drastic reduction of test power is true for all of them.

If we increase the power of the test by increasing the sample size, this problem will be less serious, but this is not always possible. Below, I increased the number of samples in each comparison from 100 to 500.

This looks great! No false positive, and only two false negatives. However, in real data, this is quite unlikely to happen, because you won’t have the power of the test more than 99% as in this case.

Seems that multiple testing strategy is not the best way to go – either you suffer from too many false positives, or you correct and get (almost) nothing significant. If possible, opt for other types of analysis, some which conduct a single global test instead of the series of pairwise ones, or combine these two strategies together. A well-known example of such an approach is one-way ANOVA and post-hoc comparison. ANOVA tests whether the mean of at least one of the groups deviates from the overall mean; if yes, then one can conduct a pairwise post-hoc comparison of groups to identify which of the groups are those different (still, this pairwise comparison is not a simple series of t-tests, but includes an implicit correction for multiple testing). In the case of trying to find the relationship between a set of two variable types (e.g. traits and environment as in our example), you may opt for constrained ordination, which is in fact multivariate regression (many traits regressed on many environmental variables, more precisely CWM-RDA in this case, but details are beyond this text). If the ordination is significant, then go for pairwise regressions to see which of the variables are truly related (still, to use some of the corrections for multiple testing is advisable).

You may ask: so if I measure things, relate them to each other and find a significant relationship, does it mean that these variables, in fact, may have no real meaning, they could be in fact just random variables? The answer is yes, it may happen that this result is just a false positive. The way to make sure that the relationship between those variables (e.g. given trait and environment) truly exists is to replicate the study, best in a different location. Or, research the literature to see whether ever someone detected the same relationship in a similar context. It’s unlikely that the same combination of variables would be a false positive in many studies (you see above that false positives occur, but in the new replication it occurs in different variable combination). If you repeat the study again (and again and again) and you get the same relationship significant (or you see multiple papers reporting the same thing), that is strong evidence that it may truly mean something.

Where to go next?

A nice readable text on this topic is the chapter The p value and the base rate fallacy in the book Statistics Done Wrong by Alex Reinhart (see the comics there!). Also, a few years ago there was an inspiring series of forum papers in Ecology, started by the paper of Paul A. Murtaugh entitled In defense of P values (Murtaugh 2014), followed up by several others and wrapped together again by Murtaugh. Also, there is recently an interesting suggestion that we should redefine statistical significance and lower the alpha threshold from 0.05 to 0.005, which will greatly reduce the probability of reporting false positives in scientific studies and increase reproducibility. As it goes, one of the first comments (from many) to this paper is entitled “Remove, rather than redefine, statistical significance”…

R script for all calculations in this post is in Github repository:

Five years with community weighted mean

Sometime in 2008, I made a surprising finding. I was doing a routine analysis: unconstrained ordination of community composition data (I think it was DCA) and fitting mean Ellenberg indicator values (1) onto ordination diagram for ease of interpretation (using function envfit from vegan). What you get is an ordination diagram, where points are samples or species, and vectors are mean Ellenberg indicator values pointing to one or other direction, and showing which way nutrients or pH or light increases. To decide which of the six mean Ellenbergs to display, I tested the regression of axes with mean Ellenbergs using permutation tests and included only those which were significant (often most of them are, mean Ellenbergs are so good!). To my surprise, when I was looking at the ordination, it just did not make ecological sense. I knew my data, I knew where to expect more nutrients, higher pH and more light, but the arrows pointed to different directions, although they were almost all significant.

I rechecked everything again, and found the mistake – in Excel, where I sorted Ellenberg values for calculation, I sorted the species in different ways than in the species composition matrix. I actually calculated mean Ellenberg values from values of different species than species occurring in the plots. But how come that I still got so nicely significant results? I tried several times and found that indeed, even I calculate mean Ellenbergs from permuted or randomly generated species Ellenberg indicator values, I still have high chance to get a significant relationship with ordination axes, although the direction of the vectors just makes no sense. I did more experiments: I correlated mean Ellenbergs calculated from randomized indicator values with species richness or comparing differences between groups of samples in clusters from numerical classification. The same problem repeated: the chance to get significant relationship was pretty high, even though the species indicator values were just random.

I presented this on the IAVS meeting in Crete in spring 2009 and wrote the first draft of a paper for the Journal of Vegetation Science in summer 2010. It was a rather short Forum paper, but reviewers did not like the structure, so I reshaped it into a standard research article, and after three rounds of reviews (and with one of the reviewers, André Schaffers, becoming the new co-author, since he greatly helped me to make the text much readable), it was finally published in autumn 2011 as “too good to be true” paper (Zelený & Schaffers 2012). I was happy like a monkey; this was my first paper where I got my idea published, and it still remains my paper with most citations. In the paper, we suggested that one should be aware of the problem with inflated Type I error rate, and either not to test the relationship of mean Ellenberg values with ordination/species richness/cluster analysis results, or to use what we called “modified permutation test”, which compares observed relationship with those of mean Ellenberg values calculated from randomized species indicator values. I made a simple R application to do this test (MoPeT, and wrote a blog post about that ( The story could be finished.

But it wasn’t. By playing around, I found that the same problem occurs even if I relate the mean Ellenbergs from randomised species indicator values with environmental variables, like soil pH. This started to be more serious and perplexing at the same time. Why is this happening? When relating mean Ellenbergs with ordination, one can argue that the problem is that two things which are not entirely independent are in fact tested against each other: mean Ellenbergs are calculated from indicator values and species composition, and ordination/richness/clusters are calculated from the same species composition matrix (I call this “similarity issue” in the 2012’s JSV paper). But when relating mean Ellenbergs with environmental variables, there is no such thing, environment is completely independent from species composition (Brad Hawkins, whom I met later and became co-author on his paper related to community weighted mean, calls environmental variables “extrinsic”, in contrast to “intrinsic” ones, which are derived from species composition data; Hawkins et al. 2016).

I wrote a short paper about that and in November 2013 I first time submitted it. I aimed high, because it was clear that this is a serious problem and many studies using community weighted means (not only Ellenberg values but also traits) are influenced by overly optimistic results. The first submission went to Ecology Letters (rejected in four days based on editorial reading), then to Journal of Ecology (this took five days for editors to reject – they thought the paper is too statistical) and finally to Methods in Ecology and Evolution, where it went for the review. The result was reject-resubmit, with one of the reviewers being Bill Shipley, who gave me quite interesting suggestions about different types of null hypotheses tested by CWM approach. I presented the idea at European Vegetation Survey workshop in Ljubljana (beautiful city!), Slovenia, in May 2014 (, and then again in July at International Statistical Ecology Conference in Montpellier, France ( There I met Stéphane Dray, whose R workshop about ade4 package I took before the conference, and later also Pedro Peres-Neto. After my presentation, we three had quite an emotional discussion in the canteen. Pedro and Stéphane showed me their almost finished manuscript about the same thing I was talking at the conference – that relationship of community weighted means with environmental variables has inflated Type I error rate if tested by standard tests.

I closed myself in the hotel for the rest of the meeting and worked hard to prepare the resubmission of the paper rejected in MEE, with only breaks watching the FIFA World cup in which Brazil lost with Germany 1:7 (I never watched football before and never after, but this one was really an awesome experience: In the end, I was not that fast; I submitted the rebuttal to MEE in October and got rejected again a month later, this time with Cajo ter Braak as one of the reviewers. While Cajo was reviewing the paper, he contacted me to clarify some details and we had lively email exchange (some of the emails I wrote at night from a tent while sleeping at the tea plantation close to central Taiwan during a survey trip to a one-hectare plot in nearby forest). Again, I got useful hints on how to go further, but started to feel more and more frustrated about the fact that I cannot get the message through.

That was also the time I was about to move to Taiwan permanently to start my new life as Asian, and it took me almost one and half year to submit the new version, this time to Ecology (April 2016, rejected by editor) and again Ecology (August 2016, rejected after reviews). But, it became clear that I am late; in July 2016, the paper of Pedro, Stéphane and newly also Cajo got published in Ecography (Peres-Neto et al. 2017). First I thought let’s forget it and move on, but in the end, I remade the paper into a Synthesis format, where I mostly reviewed what is known, and brought few new things from the previous versions(2). I submitted it to the Journal of Vegetation Science in April 2017, got reject-resubmit, remade again and submitted in June 2018, got a minor revision, and, finally, got acceptance on the first October 2018. Five years from the first version, eight submissions and five different journals(3), yeah!

I was never a fast guy, at least not when it comes to writing papers. However, this one was really an extreme case and made me learn a lesson. I knew the problem, and I knew that the issue is actually quite important, but I could not get it through and publish it. In fact, later, desperately, I started to post preprints of each submitted manuscript to BiorXiv (e.g. Zelený 2016), to get a feeling that there is at least something out there.

Looking backwards, I see the main reasons why it did not work. I was not able to crack the problem down, because I do not have this type of skill. I am not statistician or mathematician, although I often dig methodological problems, and in this case, it showed up as a handicap. Although I knew about the problem of inflated Type I error rate, my solution (modified permutation test) did not work properly, which at the beginning I was not able to fully understand. Also, I spent quite a lot of time in pushing the analogy of weighted mean and compositional autocorrelation, which, clearly, most people cannot understand, although I felt that it is so clear (no, it was not, but still, even now I cannot get rid of that idea). In addition, I struggled in finding the way how to structure the manuscript that does not have a standard “report” structure. I studied several other papers trying to figure out how the flow of the text should go and get inspired, but from the reaction of reviewers (complaining about my “hard to follow” text) I could see I was not able to nail it. And, at few moments, I got a bit too emotional from the whole situation, not able to see things rationally and from distance. First, this happened when I read the paper of Otto Wildi (2016), who was criticising our 2012 JVS paper. I felt quite frustrated; not because of the critique (some of them justified, some not), but because although Wildi’s paper contains several obvious misunderstandings, it could still get published(4). Another emotions showed when Peres-Neto et al. paper appeared online in Ecography. First I was even not able to read it; later I did (also because I needed to refer to it), and I found that it is actually a pretty good one, pointing clearly to the whole problem and bringing an elegant solution.

Although my interest already shifted somewhat away, I still do have some plans left with weimea (how I call weighted mean problem for myself). I hope to finish the R package (yes, weimea), which is already pretty functional (, also appended to 2018 JVS paper). And there is actually a couple of things in the previous manuscripts which never made it through to the final published version, although they are important. Hope to be faster this time…

(1) Mean Ellenberg indicator values (mEIV) are a kind of European speciality; Heinz Ellenberg, the German plant ecologist, created a set of species indicator values, where he assigned to most of the Central European species an ordinal value according to their ecological preference along main environmental gradients like nutrients, moisture or soil pH. When you make vegetation plot with a list of species occurring at a specific location and calculate mean of indicator values for species occurring in it, you get quite meaningful estimated of ecological conditions for that location, as plants interpret it. Ellenberg was not the only one who got this idea, but his system was perhaps the first one that was rather comprehensive, an important criterion for a successful application.

(2) The original acknowledgment, which was in the preprint on BiorXiv but which I later removed, reflected the whole story in short: “My thanks go to Bill Shipley, Cajo ter Braak and several anonymous reviewers for critical comments on previous versions of this manuscript, which motivated me to several times heavily rework it. Thanks also go to Pedro Péres-Neto and Stéphane Dray for (emotional) discussion of differences between my modified permutation test solution and their fourth-corner one during the ISEC 2014 conference in Montpellier.” Acknowledgements are sometimes the only place in the paper recording the background story of the paper, although, of course, seen only from the author’s point of view.

(3) List of the submitted manuscript titles, journals and dates:

  • Relationship between community-weighted mean trait values and environmental variables or ecosystem properties is biased due to compositional autocorrelation (Ecology Letters, submitted Nov-2013, rejected Nov-2013);
  • Relationship between community-weighted mean trait values and environmental variables or ecosystem properties is biased (Journal of Ecology, submitted Nov-2013, rejected Dec-2013);
  • Relationship between community-weighted mean trait values and environmental variables or ecosystem properties is biased (Methods in Ecology and Evolution, submitted Dec-2013, rejected Feb-2014);
  • Linking weighted-mean of species attributes with sample attributes: the problem of inflated Type I error rate (Methods in Ecology and Evolution, submitted Oct-2014, rejected Nov-2014);
  • Bias in community-weighted mean analysis relating species attributes to sample attributes: justification and remedy (Ecology, submitted Apr-2016, rejected Apr-2016);
  • Bias in community-weighted mean analysis relating species attributes to sample attributes: justification and remedy (Ecology, submitted Aug-2016, rejected Oct-2016);
  • Bias in community-weighted mean analysis of plant functional traits and species indicator values (Journal of Vegetation Science, submitted Apr-2017, reject-resubmit May-2017);
  • Which results of the standard test for community weighted mean approach are too optimistic? (Journal of Vegetation Science, submitted Jun-2018, minor revision Aug-2018, accepted Oct-2018).

(4) One example of the misunderstandings for all: in Zelený & Schaffers (2012) we did not say anything about Ellenberg indicator values as being biased, and still, Wildi spent considerable space in arguing that they are indeed not (because they are not measured variables), and even put this argument as a title of the paper (Why mean indicator values are not biased). What we claimed was that “results of analyses based on mean Ellenberg indicator values may be biased”, which is something I still stand for, although I would perhaps not use the term “biased” in this context anymore (and use inflated or overly optimistic instead). At first, I started to write a reply paper, but gave up when I found that I would have to spend too much energy in correcting those misunderstandings, proving that the author perhaps have not read the original paper properly. But not all was negative: Otto was right with the argument that the testing relationship of mEIV with ordination scores/richness/cluster results goes against the statistical logic, because from the definition these two variables are not independent (both are numerically in some way derived from the same species composition matrix), so the null hypothesis of non-independence is easy to be rejected. I touched on this in the 2018 JVS paper, arguing that if we see this as a case of spurious correlation, the modified permutation test (testing the modified null hypothesis) is a valid solution.


  • Hawkins B.A., Leroy B., Rodríguez M.Á., Singer A., Vilela B., Villalobos F., Wang X. & Zelený D. (2017): Structural bias in aggregated trait variables driven by species co-occurrences: a pervasive problem in community and assemblage data. Journal of Biogeography, 44: 1199–1211.
  • Peres-Neto, P.R., Dray, S. & ter Braak, C.J.F. (2017) Linking trait variation to the environment: critical issues with community‐weighted mean correlation resolved by the fourth‐corner approach. Ecography 40: 806-816.
  • Wildi, O. (2016). Why mean indicator values are not biased. Journal of Vegetation Science, 37: 40-49.
  • Zelený D. & Schaffers A.P. (2012): Too good to be true: pitfalls of using mean Ellenberg indicator values in vegetation analyses. Journal of Vegetation Science, 23: 419–431. (free read-only pdf)
  • Zelený D. (2016) Bias in community-weighted mean analysis relating species attributes to sample attributes: justification and remedy. bioRxiv. (this is the first version, later several times updated)
  • Zelený D. (2018): Which results of the standard test for community weighted mean approach are too optimistic? Journal of Vegetation Science 29: 953-966. (free read-only pdf)

Computer addictions, “No Computer Day” and idle brain

Years ago my computer addiction included playing cards (I think it was called Solitaire), browsing websites (news during the day, porn at night), later also Facebook and some other social apps. Now I found myself addicted to R. However stupid it may sound, it seems to be true. By addiction I mean: it is not easy to stop when you need to do something really important. All those things before (cards, news, porn, Facebook) are understandable, and I bet I am not the only one who fights with that. But R? Come on. I thought that I had a perfect therapy last year after the semester of teaching R three hours per week (and countless hours preparing the class and checking assignments). Right, I was sick of seeing R for a while, but not for long. It was around that time I also found Shiny, an interactive option how to play R, and coded my first Shiny app. It’s nice – you click, something changes, click again, changes again, keep clicking, improving, adding new ideas, making new apps… and addiction is cooked, no escape.

Some time ago I came with an idea to introduce “No Computer Day.” A day, when you should not touch the computer, at all. Obviously, it cannot be during the working day (unless I go to the field), so the weekend is a hot candidate. It would be cheating to set the No Computer Day on the day you go out of the house – so as so you would not touch it. No, just set up the ordinary Sunday, when you have a plan to relax, but ordinarily, you would still end up sitting in front of the computer (I may do something useful today) and keep clicking (doing nothing useful at all). It’s similar to “a week without alcohol” (all bottles in the house need to be locked somewhere since the late evening feeling of “just one beer” is surprisingly strong). Just leave your computer switched off (not just sleeping – you may touch the mouse, and it will wake up, with blinking screen luring your attention). You will see how difficult is it to resist (ok, I will just check emails, what if something important is there!), but how rewarding it could get (clean the house, read the book, fix what should have been fixed long time ago, go for a walk, sleep – add your own “no-computer” items).

Sometimes I wonder: how did the life of scientist (let’s say a teacher at university doing also research, or simply a full-time scientist in academia) looked like before the computers were standing on her/his desk? Come to office, make a coffee, check the mails, do paperwork (ok, this sounds almost the same as now). And then? Take a pencil, read a paper and write a notes, take a typewriter, write a short piece of manuscript, put it away, take a pencil again, make a phone call, take a typewriter, go for lunch, meet a colleague for coffee, pencil, paper, typewriter, … Anyway, the most important currency of scientists are the ideas, and those are not born in the computer, but in the head, often while doing things entirely not related to work or when the brain is simply idle. What if sitting the whole day in front of blinking computer makes the time for these idle moments rare, and thus the productivity goes down? Ok, by productivity I don’t mean the number of published papers (although I should, since that’s precisely how my productivity is measured by school/grant agency/colleague). By productivity I mean coming up with some interesting idea. What if switching off the computer (or even temporarily moving the computer somewhere out of the sight) in fact increases this kind of productivity?

Ok, but I still need to boil these ideas down to a paper…

Unconstrained ordinations in vegan: an online Shiny app

The R package “vegan”, written and maintained by Jari Oksanen, Gavin Simpson and others, contains the whole set of functions needed for unconstrained ordination (and many other things) in R. I also experimented with other packages (“labdsv” by Dave Roberts or “ade4” by Stephane Dray et al.), but “vegan” is unbeatable when it comes to ordinations. One a little annoying thing in “vegan” is the naming convention of various ordination functions. It’s relatively easy to get used to that PCA (principal component analysis) and CA (correspondence analysis), respectively, are calculated by “rda” and “cca”, respectively, which, if one also supplies environmental variables, also calculate constrained versions (RDA aka redundancy analysis and CCA aka canonical correspondence analysis). DCA (detrended correspondence analysis) is calculated by “decorana”, which also makes sense, if you know the history of the method and the fact that first program written by Mark Hill in Fortran to calculate DCA was called exactly in this way (and in fact the “decorana” function in R is based on the original Fortran code ported into R by Jari). Somewhat harder is to remember that NMDS (non-metric multidimensional scaling) is calculated by a function called “metaMDS”, and the definite killer for me is PCoA (principal coordinate analysis), calculated by “cmdscale”. When you are teaching these to students, it may get a bit frustrating, so some handouts come handy. That’s partly why I got this idea: an interactive online tool, which would allow using few example community ecology dataset, analyse them by a set of standard ordination methods, draw resulting ordination diagram, print summary output, and mainly – show also the R code which was used to do all that. The result is here, welcome to click through!

Previously I used Shiny apps to play with visualisation of random drift, and I got quite hooked on the concept of a dynamic website with R running behind. In the case of this project, it got more complex, and I spent the whole day learning various Shiny hacks (e.g. how to modify dynamically also the clickable menu at the left panel). Although the result is far from perfect (and perhaps will stay like that for a while), it’s functional. You may choose one of the datasets (short description and the link to the website with details will appear), then choose one of six ordination methods, transform or not the community data, and decide what to display in the ordination diagram. The figure above shows the PCA on a simulated community dataset structured by one long gradient, and a lovely horse-shoe artefact which PCA chronically produces on datasets with many zeros.

Don’t get me wrong – this is not an attempt to provide an online, clickable alternative interface to “vegan”, I have no intention to do something like this. But I think that Shiny apps are a great teaching aid and visualisation tool. If you have a new R package and want to show how it works, a small Shiny example may be much better than a long online tutorial (at least as an appetiser with the aim to hook users actually to try the package).

Welcome to play, and thanks for any feedback!

Piping tibbles in tidyverse

A couple of weeks ago, while preparing for the R for ecologist class, I found tidyverse, a suite of packages (mostly written by Hadley Wickham) for various ways how to tweak data in R (thanks to Viktoria Wagner for wonderful online materials for managing vegetation data!). Somewhen in 2009 I attended useR! conference in Rennes, France, where Hadley held a workshop on using one of these packages (I think that time still called plyr). I attended it, but to be honest, I remember nothing from that workshop, it was a way too abstract for me. Since then several times when preparing data for analysis in R, I thought that I should check it again and finally learn it. And here we are, it is coming, preparation of the R class pushed me to do that finally. I also found the piping of data through the functions, with an entirely different logic of sending data from function to function*, and a new form of data frames called tibbles (no idea what the word means). All the new fancy things; R moves forward quite wildly and erratically. But this post is not about piping, tibbles, and tidyverse. It is a quick thought about me using R, how did I change, and whether I should do something with that or not.

For the first time, I used R somewhen around the year 2005 when I started to work in Brno, my previous university, and a colleague asked me to calculate and draw species curves which were hard to do anywhere else. I recall S-PLUS, a program I used as a master student when I took the class of Modern regression methods while studying in České Budějovice. Petr Šmilauer that time taught it in S-PLUS, commercial system, which turned to be R’s ancestor. Knowledge of S-PLUS came handy, and I was able to do quite pretty figures in R. That time there were just a few R packages, and no RStudio (I think I was using Tinn R editor, it still exists actually). R was not a sweet candy at the beginning, I remember quite some time spent by frustration from always occurring annoying error messages, but eventually I kind of started to like it. Later I found that the skill of using R is a powerful advantage – I was able to calculate or draw almost anything, things that would otherwise need to be done in a cumbersome way in some clickable software. That time there was just a couple of other people fancy in R. I started to teach R, which was a great way to push myself to improve, and to spread the knowledge of R among others. How different is it from today, when R is considered a lingua franca of ecologists, scientists publish papers with R codes appended, and common requirements for a newly hired research assistant, postdoc or even faculty member is a skill of using R.

But I also changed. There was a time, couple of years ago, when I was eager to learn all different new developments in R, study fancy packages, try new analytical or visualisation methods. At one time, together with Víťa, another R guy from Brno, we even taught seminaR, a class focused purely on trying new fancy things in R. But it’s gone. When it comes to R, I start to be somewhat conservative, stick to what I know and feel quite reluctant to discover new things. Partly perhaps I got lazy, but partly it has reasons. Some of those new packages, methods and “new trends” in using R, in the end, turned to be just ephemeral matter, packages were not maintained, their developers deserted them and turned interest into something else. Also, now the use of R is so broad, that it is hard to keep track of all new and exciting things. And my time is also not what it used to be; I can’t spend a week or more by trying to tweak the R code in this and that way, thinking about it day and night, excited and barely sleeping. It is not that I don’t use R anymore – actually I use it on a daily basis, at any time I can be sure that some of my computers have RStudio on, with some half-written script or some long code running. But I use it as a tool to solve some problem, and I focus on the problem itself, instead of keeping polishing my skills of using that tool. Learning R was, without doubt, one of the best investment I did in my professional life, and now I hope to move further while keep using it to do something else.

Since now every winter semester I teach R for ecologists, I keep myself somehow fit in using R in a way that I sort things to be able to explain them to students. The class is actually focused on pretty basic R stuff. Half of it we mostly draw figures, because this does not require any added theoretical background which would be needed if I use R to teaching, e.g. statistics. The second half of the class we focus on “business as usual, with R”. Loops, functions, importing, exporting and manipulating data, and some simple calculations, mostly to show students the benefit R may bring them if they use it. But that is actually the point; I do not focus on bringing new fancy things to the class, but instead, teach R oldies goldies in a way as smooth and digestible as possible.

But, piping tibbles in tidyverse let me think that it may be time to sit down for a while and recheck how R changed in the past couple of years I did not follow it. Here comes my point. Do you have some suggestions what to look for? What in R do you find useful recently that you cannot breathe without and you would suggest me to learn it? No specialised packages, something for “everyday business”. For example, I still wonder whether I should learn ggplot2 or be happy with my base R graphics combined with lattice – do you have some opinion about that? Any comments welcome!

* If you are familiar with R, than piping (library magrittr) can replace (for example) sum (log (sqrt (abs (1:10)))) by 1:10 %>% abs %>% sqrt %>% log %>% sum. Quite a revolutionary way how to assemble script together. If not familiar with R, just don’t worry about that 🙂

Visualizing random drift in ecological communities. Part 2.

In the first part of this post, I focused on a simple illustration of random drift without acknowledging any other process. In the case of trees growing on a small island with only 16 individuals, it does not matter which individual died and which reproduced to replace it, because all offsprings could disperse anywhere and the whole process looked more like swapping the colours on the balls in the bag. Here, I explicitly include dispersal in two forms: as a movement of the offspring limited to some short distance, and as an immigration of species from outside. Then, in the second step, I also add the process of selection, which means that I include environmental heterogeneity into the model (elevation), and let species differ from each other in their ecological preference. While in the first part I was interested how changes in the number of species, number of individuals and number of “generations” influences the outcome of random drift, here I will ask how important is ecological drift when interacting with immigration and selection. For all simulations I used an R package developed by Tyler Smith and published in Smith & Lundholm (2010); for more technical details, see the last chapter below.

Artificial landscape: it matters if the individual has a neighbour and how far its offspring can disperse

For all further simulations in this post, I will use artificial landscape of certain size (e.g. 50×50 or 100×100 cells, each cell accommodating maximally one individual) and fixed number of species (10). At the beginning, each cell in the landscape is occupied by one individual of randomly chosen species, which means that each species is represented by a similar number of individuals. In each cycle, a fixed proportion of individuals will die, and a fixed proportion of individuals will produce one offspring. The dispersal of the offspring is spatially limited in that it can get only to one of the neighbouring cells, rarely a bit further. Starting from the second generation, there could be vacant places in the landscape after inidividuals which died; they remain vacant until offspring of some neighbouring individual occupies them. See the figure below showing this dynamic on the example of a landscape with only four species. In the scenario without immigration this is all; if we add immigration, then in every generation, individuals of randomly selected species will be added to the marginal cells of the landscape. Note that all individuals of all species are demographically equal (i.e. all have the same probability to die or reproduce) and all cells within the landscape have the same “environmental conditions”. In this sense, the model shows “neutral” community where only neutral processes (drift and dispersal) operate.

Example of neutral dynamic in the landscape with only 4 species. Each circle is an individual, and each color is a species. At the beginning (1), all cells are filled by individuals of randomly chosen species. The next window shows the result after the first generation; some individuals died, and some other replaced them, but only within the neighbouring cells; white spots are empty. Panels (3) to (9) shows results of next generations.

An interesting property of this model is that abundances of species, which at the beginning were almost equal, will become highly uneven. I added diagram with species abundance distribution (see the figure below, the right panel), in which species are ordered according to the number of individuals (from high to low), and each species is represented by a bar with relevant colour. Order of species in this distribution dynamically changes, and in the course of time, some species become rare and eventually extinct. As a result, only one species dominates the whole landscape.

Artificial landscape, with 50×50 cells each occupied by max one individual, 10 species. Left panel shows distribution of species in the landscape, right barplot shows species abundance distribution (bars are ordered decreasingly by the number of individuals, and scaled relatively to the most dominant species). This is the initial situation: landscape with all cells occupied by one individual of randomly selected species.

Artificial landscape after the first generation. Some cells became empty (white), and species abundances start to differentiate from each other.

After 100 generations, abundances of species even more differentiated. No clear spatial pattern yet, no species extincted yet.

After 200 generations. Clear spatial pattern arises, one species becomes dominant, one species extincted.

After 500 generations. Only three species left, from which one (red) clearly going to extinct soon. Abundances are highly uneven.

To know the whole sequence from beginning to the end, see the video below.

In the video you see that after the red species extinct (time stamp 1:06), the two remaining species keep silently fighting for a long time; in fact, I haven’t had the patience to run the simulation long enough, but eventually, one of the two species will for sure win.

The simulation above is quite similar to the one done in Part 1, just the landscape is “spatially explicit”, meaning that it matters where in the landscape the individual occurs and what are its neighbours.

Including immigration: constant supply of individuals from outside

Now let’s add immigration. The following simulation has the same parameters as the one before (the landscape of 50×50 cells with ten species), except that after each cycle, the individuals of randomly selected species will be added to marginal cells (that’s why the landscape keeps colourful margins all the time). The initial situation looks very similar to the one above, so below I show only the situation after 200 and 500 generations.

Artificial landscape with immigration, after 200 generations. Species abundance distribution highly uneven (ligh blue species recently dominating), but no species extincted.

Artificial landscape with immigration after 500 generations. Nothing much happening, species happily coexist.

Again, see the video below for the whole time sequence. Just note that it is actually pretty boring; drift makes some of the species rather rare, but due to constant immigration no species is going to extinct (and if yes, they will be most likely added in the next generation into the marginal cells). If the speed of immigration is high enough as in this simulation, the species actually happily coexist together.

Some conclusions from what we learned so far. In the isolated landscape without immigration (like island far in the ocean, or patch of some natural habitat in the landscape isolated from other patches by human activities), the random drift earlier or later results into extinction of all but one species; how long it takes depends on the number of individuals (size of the island) and initial number of species. Immigration can revert this result by returning the species to the game.

The end of neutrality: adding environmental heterogeneity and selection

So far all cells within the artificial landscape were equal, and all species have an equal chance to survive in any of them. The next step is to include environmental heterogeneity and selection between species. Environmental heterogeneity means that different cells have different environmental conditions (or habitat properties).

In our model there is only one environmental gradient; let’s call it elevation. The spatial distribution of elevation is not random – it forms four “humps” which we will call hills. Species, so far, were also equal in that they did not have any environmental preferences (even if they did, so as so the environment was perfectly “flat”). Now, each species gets ecological optima in one elevation zone, but it can (with lower success) grow also in the adjacent elevation zones. For example, species 5 has optima in elevation zone 5, but its offspring can also survive in the zones 6 or 4 (but not much higher or lower). This is the selection by environment: species 5 has an advantage within the cells of elevation zone 5, while species 4 has an advantage within the cells of elevation zone 4; if species 5 gets to elevation zone 4, it may still survive there, but with a lower probability than species 4. Our model has four hills, ten elevation zones and ten species with each species specialised on one elevation zone. The proportion of cells is roughly equal for each elevation zone. All other parameters are identical to the previous simulation, except that the landscape is 100×100 cells large (to squeeze all four “hills”).

To visualise this, each snapshot has three different panels (see below). I removed species abundance distribution and included 3D visualisation of our four hill landscape instead. I also added the stacked barplots showing relative abundances of individual species within each elevation zone within each mountain. As it will turn out, this visualisation will become quite important to understand what is happening.

Artificial landscape with four hills representing elevation (min 1, max 10). This is the initial situation, with individuals randomly dispersed across the landscape. The left panel shows the aerial view of this landscape, with contours for elevation, and individual peaks marked according to geographical position (NW, NE, SW, SE for the north-west, north-east, south-west and south-east, respectively). Upper right panel shows the same in 3D (note that the hills have been “smoothed”, otherwise they would have “stairs” on the slopes). The lower right panel shows four stacked barplots, each showing relative abundances of each species within each elevation zone. In this initial step, all species are everywhere and proportions of species in each elevation in each hill are similar. Species elevation optima shows which species (colour) has optima in which elevation zone (the number).

Artificial landscape with topography after 100 generations. Spatial pattern starts to emerge (left and upper right panel). All species are still in most elevation zones (barplots in lower right panel), but there start to be differentiation according to species ecological preferences (e.g. dark violet species stats to prevail in lower elevations, while light and dark blue species in higher elevations).

After 500 generations. Spatial pattern is rather clear, and species differentiation along elevations also. Note one interesting thing: different peaks start to be dominated by different species (dark blue vs light blue ones). Peaks are like small islands in the sea of lowland, and ecological drift here works faster than in other parts of the landscape.

After 3000 generations. Dark blue species dominates NE and SE peak, while light blue dominates NW and SW peaks (with light green in SW).

Video with the whole sequence is below.

Perhaps the most interesting is that different peaks of the hills become dominated by different species, as a result of random drift (dark blue species dominates eastern peaks, while light blue western peaks). Lowland is also partly differentiated, but since lowland of each mountain is interconnected with each other, this differentiation is arbitrary. In this way, random drift increased the beta diversity of along elevation; if we put a sample in each peak, the differences in species composition (beta diversity) will be high, while not so high in the lowland.

With immigration, the pattern is similar, just more dynamic. Lowland cells have most of the species, but this is mostly the result of the immigration into the marginal cells of the landscape (which are, due to the construction of the model, all in the lowland).

Adding selection to the mix of drift and dispersal has an interesting outcome. Stochastic ecological drift acts against a deterministic effect of selection via “environmental filtering”, and as a result, ecologically identical habitats (elevation zones) can have similar or somewhat different species composition, depending on their isolation. Hill peaks in our model are the most isolated, and as a result of drift and dispersal limitation, different peaks became dominated by different species. Immigration can revert the effect of drift, but only if the habitats are not isolated (influence of immigration is high in the lowland which is interconnected, but not in more isolated higher elevations).

This leads me to the question: how important is ecological drift in real communities? How much is the pattern which we see in nature actually result of predictable species ecological preferences and biotic interactions, and how much it results from upredictable effect of drift and dispersal? This “niche-neutral” riddle is around already for couple of years, but in my impression the methodological problems and lack of appropriate data about ecological communities make it difficult to answer. But that’s a topic for another post somewhen in future.

Technical details: how I made it

The simulation is done in R by packages neutral.vp written by Tyler Smith and published in Smith & Lundholm’s 2010 Ecography paper. The original library published as an appendix to the paper is already not working, but I fixed it to be able to compile it in newer versions of R (details here). I have been playing with this library couple of years ago, when I was deeply in love with variance partitioning of beta diversity among environmental and spatial components (the love is gone, luckily). There are some other (even more advanced) models simulating spatially explicit communities (one overview is here), but for the purpose of this blog, neutral.vp is perfect and rather fast (at least for reasonably small landscapes). Visualisations were done in R (the squared landscape using functions image and contour, the 3D models of a landscape with four hills using the function persp). Figures were assembled into a video in Movie Maker (Windows). The R scripts are a bit messy, so I don’t provide them here, but if you wish, I will be happy to share.


  • Smith, T.W. & Lundholm, J.T. 2010. Variation partitioning as a tool to distinguish between niche and neutral processes. Ecography 33: 648-655.

Visualizing random drift in ecological communities. Part 1.

Ecological drift is one of the four main processes maintaining species coexistence in ecological community, along with selection, dispersal and speciation (Vellend 2010, 2016). To me, it seems that from these four processes, the effect of ecological drift on diversity and species composition of ecological communities is the least intuitive. Here I come with a few interactive illustrations and videos showing how the ecological drift works and what it does.

I separated this post into two parts, forming a logical sequence. Here, in part 1, I focus purely on the random drift, without any other process in action. Destiny of all simulated communities is the same: eventually, only one species will prevail, and all other will go extinct, it is just a matter of time when. In part 2, I will also add dispersal and create a virtual landscape in which it matters who is your neighbour and how far your offspring can disperse. Then, I will also add selection, making species not ecologically equal and letting them search for their ecological optima. Combination of drift with dispersal and selection will be the most closely resembling the community assembly in real ecological communities, although being still very far from the situation in any real community. And hopefully, it will convince you that random drift is a process which can get more important than you would think.

But let’s start slowly.

Island with 16 trees of two species: how the ecological drift works

Imagine a small, tiny island – so tiny, that only 16 trees could squeeze in, not a single one more. The island is surrounded by sea, far from any mainland, perfectly isolated. In our example, we start with eight individuals of each species (red and black, see the figure below). A single process is happening: at any given moment, one individual dies and empties its place, and another individual reproduces, and its offspring fills the vacant place. The selection of individual which dies and which reproduces is random; it does not matter to which species the individual belongs or where on the island it occurs. See the illustration below.

Random drift on remote island with 16 trees of two species. Yellow circle indicates species which has been selected to reproduce and whose offspring fills the vacant place.

What happens after several cycles of die-reproduce-replace? Watch below.

The population is small, so even a tiny fluctuations in units of individuals, in the end, lead to the extinction of one species (in this case the red one). If both species have an equal proportion of individuals at the beginning, the probability to extinct is equal for both since there is no difference between species and between individuals (species are ecologically equal, meaning that they have no fitness differences – they have the same probability to reproduce and survive to the next generation).

The same thing can be drawn with the x-axis as time and y-axis as the individuals, with different species displayed by different colour. Because the position of individuals in the space of island does not play any role in this example, we can stack the individuals of the same species (the same colour) together to see overall trends of species abundances in time:

Trend of abundances of two species in time, influenced by random drift. Less than 100 generations were needed in this case for one species go extinct from the island.

Now, let’s rerun the same scenario several times (nine times in this case), to see what happens:

Random drift on island with 16 individuals and two species, running for 300 generations.

Note that I increased the number of generations from 100 to 300 in this multiple run example, to show that for most of the replicates this is enough time for one species to disappear from the island. Result in this case: in eight cases, one species extinct (in four cases it was red, in four black), and in one case still both species co-exist on the island. Note that the first panel (top left) is the same as the single figure above (with 300 instead of 100 generations).

We can play further with this visualisation. How about a bigger island, say with 100 individuals?

Random drift on island with 100 individuals and two species, running for 300 generations.

Not enough generations for any of the two species to extinct. But if we increase considerably the time (into 5000 generations), things happen:

Random drift on island with 100 individuals and two species, running for 5000 generations.

The size of the population decides how fast the random drift will cause one of the species go extinct; more individuals need more time. But anyway, it is just a matter of time when the species will “drift away” of the community.

Now, how about more species? If the community of 100 individuals is cut into ten species, the population of each species becomes rather small, prone to extinction.

Random drift on island with 100 individuals and 10 species, running for 300 generations.

Even if the time is short (300 generations), in three out of nine replicates, one species extincted from the community (check the text in the titles of each plot: Start: indicates how many species were in the community at the start of the simulation, end: counts how many were in the end). If we take the same communities and increase the time to 5000 generations, again, things happen:

Random drift on island with 100 individuals and 10 species, running for 5000 generations. First 300 generations in each panel are identical with previous figure.

After 5000 generations, between one to three species left from what was community of 10 species at the beginning. A little hard to imagine that this figure is showing the same communities as the previous one, just “much later”, right? See the animation below (whcih focuses on the first panel (A) from the nine):

From the rainbow flag at the initial situation (after 100 generations, all species still have rather a similar number of individuals), after 5000 generations, all but one species eventually extinct.

Forest fragment with ~5000 trees and 74 species: getting community parameters closer to something real

Ok, so let’s start to get more real. First, the populations of organisms are usually much larger than 100 individuals. Also, different species have different relative abundance, some are common, and some are rare (this is called species abundance distribution, which is usually highly uneven). Obviously, more common species seems to have a lower probability of extinction compared to rare species. To get more realistic parameters of our island community, I borrowed the data from the real forest in Taiwan, 25-ha large forest permanent plot in Lienhuachih (LHC). As in other forest dynamics plots established following the standards of Smithsonian Institute, all tree individuals with DBH (diameter in breast height) larger than 1 cm are measured, determined to species, their exact location is recorded, and each is permanently marked. Altogether, during the census of LHC plot in 2005 researchers found 153,255 woody individuals of 144 species. To make things simple here, I will consider only individuals with DBH > 20 cm; we get 5449 stems of 74 species. Their absolute abundances are rather variable: two the most abundant species, Engelhardia roxburghiana and Schefflera octophylla, have 599 and 532 individuals each, respectively, while seven species have each only one individual. The two figures below illustrate the flavor of these data:

Distribution of individual trees (DBH > 20 cm) in the 25ha plot of Lienhuachih forest dynamics plot, Taiwan. The size of the circle is proportional to DHB of individual trees, and different color means different species.

Liehnuachih forest dynamics plot. Photo credit: David Zelený.

Let’s put these data into the random drift visualisation, directly with 5000 generations:

Random drift in forest with 5449 trees, 74 species and realistic species abundance distribution, running for 5000 generations.

Seven species went extinct (mostly the rarest ones), but otherwise nothing too much is happening in such short time for such a big community. Note the new visualisation tool here: the left panel is the same type of figure you have seen already above (x-axis is time, y-axis is absolute number of individuals for each species), while the right panel shows species abundance distribution at the beginning (gray bars) and the end of 5000 generations (colorful transparent bars). In the barplot, species are sorted from the most to the least abundant at the start of the simulation (i.e. the tallest bar is Engelhardia roxburghiana with 599 individuals, the second tallest is Schefflera octophylla with 532 individuals).

The figures below show the same community, progressing further (50,000, 500,000 and 5,000,000 generations):

Random drift in forest with 5449 trees, 74 species and realistic species abundance distribution, running for ~50,000 generations.

Random drift in forest with 5449 trees, 74 species and realistic species abundance distribution, running for ~500,000 generations.

Random drift in forest with 5449 trees, 74 species and realistic species abundance distribution, running for ~5,000,000 generations.

The result is as expected; rare species became rather quickly extinct, and few dominant species survived. The expected probability that species will extinct is negatively related to its initial abundance, being highest for rare species. But that is “expected probability”; what we see in this simulation is one of many possibilities how the random drift changes species composition in time. In the figure with 500,000 generations, you can see that dark blue species, which has initially been rather rare, became quite abundant (almost 200 individuals); eventually, however, it still extinct. Video below shows the whole sequence between 5,000 and 5,000,000 generations with steps of 10,000 generations:

Play with random drift: interactive application

Time to try things by yourself. Below is an interactive application, in which you can play with random drift by yourself. By default, it creates a community of 10 species, 100 individuals and 1000 generations, and displays abundances of individual species by colours of the rainbow. Feel free to modify different parameters; for example, set two species, 16 individuals and 100 generations to get the same scenario as displayed in the first figure of this website. Whenever you modify some parameter, the whole simulation is recalculated with new randomly generated values. If you want to stick to some version, fix the seed of the pseudorandom generator (set seed to YES); this will return the same generated community. You can calculate more than one (four or nine) scenarios in parallel.

Feel free to play! (click here to open the app in a new windows)

Technical details: how I made it

Everything on this website was done in R, using a relatively simple random.drift function. Originally I created this function as an exercise with students in my R for Ecologists class; it has been a simple function with two species from which one will eventually drift away, and the purpose was for students to practice how to do a loop and write a function. Eventually, I found it interesting to add an option to have more species and include more arguments to improve flexibility. See the definition of the function on my AnaDat-R website. Videos are not fully R-made – I used R to generate the sequence of *.png images but merged them into a video using Window’s Movie Maker. Hope in the future this can be replaced by some function in ”magick” package, which is a recent fresh import of ImageMagick into R (but seems not having all the functionality ImageMagick has). Yeh and GIF animations were done with package ”animate”, function ”saveGIF”; there are some shortcomings in this package, for example, while I can modify the width and height of the figure, I cannot modify the resolution and pointsize argument; or maybe it just needs to study more.

Additionally to the R code, I also made Shiny application for the random.drift function, see the previous section. The background calculation is run in R on Shiny servers, and if you choose larger values, it may take a while to display the outcome. I got quite fascinated by the possibility to create such interactive apps; it is actually quite simple. More will for sure come in future.


  • Vellend, M. 2010. Conceptual synthesis in community ecology. The Quarterly Review of Biology 85: 183–206.
  • Vellend, M. 2016. The Theory of Ecological Communities. Princeton University Press.

Simpson’s similarity index vs Simpson’s diversity index

Since I started to teach multivariate methods in community ecology and additionally to chapters about ordination I also included the chapter about diversity, I found it confusing that there are two “Simpson’s indices,” one for similarity, and one for diversity. Indeed, each index has a completely different meaning and use, and I may well be the only one confused by them, but for a while, I had an impression that both indices are somehow related and maybe could be derived from each other. I did some search to clarify it for myself, and it turned out that although both indices have been published in roughly the same time (in the ’40s), each one is attributed to different Mr Simspon and appeared in quite a different context.

A bit of theory first. Simpson’s similarity index is used to calculate the similarity between a pair of community samples, to quantify whether their species composition is similar (they share most or all the species) or different. It comes in the company with other, similar indices, like Jaccard and Sørensen, which do the same job but with slightly different logic. If we have two samples, there will be some number of species shared between both, and some species occurring only in the first or only in the second sample. Jaccard or Sørensen divides the number of species both samples share by the number of all species occurring in both samples together (and Sørensen cares about shared species more than Jaccard, so it multiplies it twice in both numerator and denominator). The problem occurs if one of the samples has much higher species richness than the other. The number of shared species (in the numerator) may be the same, but the high richness of one of the samples will increase the denominator and decrease the overall index value, making the index losing the resolution power. Here comes the Simpson’s similarity index, which has the same nominator (number of shared species), but in the denominator, it takes only the number of species occurring in the smaller sample. Its use is recommended in a situation when there are large differences in species richness between the samples, and eventually, it turned out that this index is also the one quantifying the nestedness of two community samples (if the first sample contains the same species as the other one plus some more, the second sample is perfectly nested within the first one, and the value of the Simpson’s similarity index is equal to one).

Simpson’s diversity index is a completely different story. It aims to quantify the diversity of a single community sample, while considering both species richness (the number of species recorded in the sample) and also the relative species abundances (some species within the sample can be dominant, represented by a high number of individuals/higher biomass/larger cover, while some other are less common or rare). Two communities may have the same species richness (i.e. number of species), but if they differ by relative species abundance (i.e. evenness), their overall diversity will be different. If we have two forests, both with ten species (richness), but first with all species represented by the same number of individuals (perfectly even), while the second with one or two dominant trees (highly uneven), the first will intuitively (and also functionally) be more diverse – a walk through such forest will give you a feeling of diversity since you keep meeting different trees. In contrast, the forest dominated by a single or few species will feel species-poor (rare species are also there, so richness is the same, but they are not likely to be encountered as frequently as and also they are not likely to have an important ecological function in the community). Simpson’s diversity index, when calculating the diversity, is weighing the importance of species richness and relative species abundances. It comes in the company of his famous sibling, Shannon index, which does the same thing in a slightly different way. While Simpson’s index cares more about relative abundances, the Shannon index cares more about species richness; or, put in another way, the importance of rare species decreases in order species richness > Shannon index > Simpson index.

Now about the name. Both indices share the author’s surname, but in each case, it was a different Mr Simspon. Simpson’s similarity index was introduced by an American palaeontologist George Gaylord Simpson (1902-1984) to solve the situation often encountered in his practice with fossil samples, where some samples are much richer than the others, meaning that conventional similarity indices (like Jaccard and Sørensen) do not work properly. In 1943, G.G. Simpson wrote a paper “Mammals and the nature of continents” and published it in American Journal of Science; later, in 1960, he published (in the same journal) paper “Notes on the measurement of faunal resemblance”, where he also provided the formula for his index. The index has been rediscovered by Jack J. Lennon et al. in 2001 as βsim (often called beta sim; I suspect “sim” stems from “Simpson”). Note that Lennon et al. (2001) does not talk about “similarity”, but instead they used Simpson’s formula to calculate mean pairwise dissimilarities between the focal sample and the other samples; in Koleff et al. (2003), however, the beta sim is already considered as a pairwise dissimilarity measure based on the original Simpson’s index (1 – Simpson).

Simpson’s diversity index (also called concentration index) was published by British statistician Edward Hugh Simpson (born 1922, living in Oxfordshire), who is mostly famous for his formulation of Simpson’s paradox. E.H.Simpson published the index in the 1949’s Nature’s paper entitled “Measurement of diversity”. It quantifies the probability that two individuals took independently (without replacement) from the sample will be of the same species. With increasing diversity the index actually decreases (two individuals have the highest chance to be of the same species if the sample has only one species), so more often the one complement (1-Simpson) or reciprocal (1/Simpson) values are used, also known as Gini-Simpson index or Hurlbert’s PIE (probability of interspecific encounter, Hurlbert 1971).

So here we go – no reason to confuse them any more!


  • Hurlbert, S.H. 1971. The nonconcept of species diversity: A critique and alternative parameters. Ecology, 52: 577–586.
  • Koleff, P. Gaston, K.J. & Lennon, J.J. 2003. Measuring beta diversity for presence-absence data. Journal of Animal Ecology, 72: 367-382.
  • Lennon, J.J., Koleff, P., Greenwood, J.J.D. & Gaston, K.J. 2001. The geographical structure of British bird distributions: diversity, spatial turnover and scale. Journal of Animal Ecology, 70: 966-979.
  • Simpson, E.H. 1949 Measurement of diversity. Nature, 163: 688.
  • Simpson, G.G. 1943. Mammals and the nature of continents. American Journal of Science, 241: 1-31.
  • Simpson, G.G. 1960. Notes on the measurement of faunal resemblance. American Journal of Science, 258-A: 300-311.