Category Archives: about research

Permanent plots in observational vegetation studies: better to have many small, few large, or several intermediate?

Permanent vegetation plots allow resurveying of the vegetation, optimally in locations exactly identical to those of the original survey. For simplicity, here I will talk about forest vegetation permanent plots and exclude non-forest ones; there are quite a lot of permanent grassland plots (including e.g. GLORIA network on mountain tops), but these are practically all of a rather small size. Also, I will focus only on observational studies, and exclude experimental ones (e.g. adding fertilizer or testing management effects). For logistic reasons, most studies using permanent forest plots face the trade-off between how large the area of individual plot is and how many plots can be surveyed (see the schema above). I tentatively divided the studies into three categories according to the area of one single plot (grain scale). Small plots (100-400 m2) are relatively easy and fast to do and this allows for higher numbers of them to be established and surveyed. They usually come with no (or limited) information about within-plot vegetation structure and environmental heterogeneity, and the main focus is entirely on between-plot variation. Large plots (10-50 ha), on the other side, are difficult and slow to complete, which makes their overall number within a project (or even the database) rather limited. Most plots in this category follow ForestGEO (originally Smithsonian Tropical Research Institute) guidelines for establishing and surveying and include rich within-plot information about vegetation structure and (some) environmental heterogeneity. Very few studies however were able to use multiple of these plots to include also between-plot environmental heterogeneity.

And then there is a third, intermediate category, if middle-sized plots (around 1 ha), which are not so difficult and slow to do as large plots, but not so easy and fast to do as the small ones. They come in tens or so, and theoretically may allow the focus on both within-plot and between-plot vegetation and environmental parameters. There are, however, surprisingly few studies using this kind of data, and I wonder why is that so. One reason may be that there is a clear niche separation between many-small and few-large study designs, the earlier focusing entirely on between-plot differences (usually related to large-scale environmental factors such as climate or geology), while the latter (almost) entirely on within-plot characteristics (related to small-scale environmental variables, such as soil characteristics or topographical differences, and also small-scale demographic patterns). The several-intermediate plots design is a kind of compromise offering both large- and small-scale perspectives, but it may be that each is represented insufficiently to allow for proper conclusions. Another reason may be that there is still somewhat methodological inconsistency in how the several-intermediate plots are done, and this prevents from wider use of such data in synthesis studies; some plots follow the same protocol as large plots (ForestGEO), but others do not; the rules for which trees should be measured (e.g. whether trees with DBH > 1 cm, > 10 cm or even larger) differ among studies, as well as whether trees are permanently labelled or not. Data analysis may also be more problematic, as the several-intermediate plot design is strictly hierarchically nested – within-plot variability is nested in between-plot one, and subplots (made within each intermediate plot) are akin to split-plots within a whole-plot design. The analysis of such data is facing the limitation in the error degrees of freedom, as it needs to be analysed as a split-plot design.

For some types of studies, however, several-intermediate plot design may be more suitable than many small or few large. This may include studies focused on the effect of both large- and fine-scale environmental factors and possibly their interaction, while including more detailed information about the demography of individual stands collected through resurveys (e.g. relative growth rate of individual trees). For example, if the question is about the effect of cloud frequency (a large-scale environmental factor, changing across distances of kilometres) and soil nutrient limitation (fine-scale pattern affected by topography and changing on distances of tens of meters), combining together more detailed information from several locations can be useful. If in addition this information should be combined with e.g. trait response of individual species in the context of their demography (approximated e.g. by relative growth rate), the several-intermediate plots’ design should offer all that is needed. The benefit may be that some data from intermediate permanent plots are already available, usually within small projects focused on individual localities, and what is needed is their resurvey, the extension of the existing network and compiling data together.

Still, there are some important questions that need to be answered. Mainly what is the reasonable number of intermediate-sized plots that need to be available so that the analysis focused on regional environmental drivers has sufficient power? And is one hectare, a common area of the intermediate plot, sufficiently large to provide enough details about within-plot vegetation structure and environmental heterogeneity? While with the number of plots increases our ability to describe larger-scale environmental patterns and generalize the results, with the area of individual plots (with more subplots) increases our ability to get a detailed understanding of local patterns and processes, which are however very idiosyncratic and valid mostly for the given location and vegetation type. A study that would combine simulated spatially explicit data with the review (or meta-analysis) of existing studies may help estimate such numbers.

Finally, a more general question, not related only to permanent plots, but to many scientific studies: can a compromise solution be better than any of the extreme options? What if you just end up with the worst from the other alternative options…

Mean Ellenberg indicator values: too good to be true

Zeleny-David-EVS-2012 2I started to think more intensively about Ellenberg indicator values issue at one conference while listening to the presentation, where a colleague used mean Ellenberg indicator values as explanatory variables in constrained ordination. I considered this as a kind of statistical heresy, perfect example of circularity of reasoning – you take your vegetation data, calculate mean Ellenberg indicator values for each plot, and in turn use these mean values to explain the original data. But it’s tempting – mean Ellenberg values are often considered as good proxies for measured environmental variables, and they are easy to calculate, so using them as explanatory variables is attractive. I tried that – I took a dataset with measured soil pH and calculated mean Ellenberg values for soil reaction, and compared how much variation in species data will be explained by pH and how much by mean Ellenberg; Ellenberg was a way better predictor than measured pH. Ok, so here we have the consequence of circularity. Thinking it through, I concluded that the reason is that mean Ellenberg values carry legacy of the species composition, from which they were calculated – if two plots have the same species composition, their mean Ellenberg values will be identical (considering mean not weighted by species covers), and if the species composition differ a bit, Ellenberg will change just slightly (changing one or a few numbers while calculating the mean doesn’t change the result too much).

I wondered what would happen if I reshuffle species Ellenberg values among species before calculation of mean for the vegetation sample, or if I replace the original species values by randomly generated ones. This would remove ecological meaning of the values, but keep the side effects of calculating the mean. No wonder – mean of randomized species Ellenberg values explains still more variation, when used in constrained ordination, then does random numbers (just keep in mind that every, even randomly generated variable, explains some variation – if this doesn’t make you happy, consider using adjusted R2 instead). There is no ecological information in these randomized values, so the extra explained variance is the legacy of species composition imprinted in mean Ellenberg values. I used randomization of species values as a base for modified permutation test, which can be applied for correcting the issue – not necessarily in constrained ordination, in which mean Ellenberg values are rarely used (scanning JVS and AVS journals through last ten years returned two papers), but also in unconstrained ordination, when mean Ellenberg values are correlated with ordination axes and the correlation is tested (actually this treatment is fairly common, although the problem with circularity of reasoning is exactly the same, yet not so obvious, as in constrained ordination).

I wrote an R code and later also a simple clicking program (MoPeT) running in R and calculating modified permutation tests, which is otherwise not easy to do. I presented the results in 2009 IAVS in Crete and later published a paper in JVS, together with André Schaffers, who helped me a lot in sorting the ideas and writing the manuscript. Still, I am not sure if ever somebody will really use this routine – mean Ellenberg values are great for description, but it’s perhaps better to keep them away from more sophisticated statistical treatments.

The story has actually an unexpected follow up, although I hoped that I won’t touch Ellenberg values any more. As a parody of life, I am just working on a paper describing statistical way how to justify the use of mean Ellenberg indicator values as explanatory variables in constrained ordination, and even do such things like partitioning the variation among different Ellenberg values or between Ellenberg values and measured variables. I presented this topic on EVS workshop in Vienna this spring, where it went through without any feedback – I guess the audience was even a little disgusted by such an overly technical talk. I really don’t feel like convincing somebody to use mean Ellenberg values as explanatory variables in constrained ordination, but I can’t help feeling quite fascinated by the imagination that something like this is actually possible.