Simply put, pseudoreplications are replicates in the study that increase the sample size (*n*) but do not increase the information amount. A typical problem is if we treat pseudoreplications as independent samples, assuming that each contributes a new piece of information (and hence the full degree of freedom). Here, the problem is equivalent to the issue of spatial autocorrelation in the data: the sample close to the focal one is likely to be more similar than a random sample selected from the dataset, and is therefore somewhat a pseudoreplication. One of the consequences is an inflated Type I error rate in the significance test, with results more optimistic than would be warranted by data. Sometimes a way more optimistic – see below.

Why did I write this post? Because of my recent experience as a reader of scientific papers, a reviewer and also an editor. I encountered repeatedly situations when authors analyzed their data with clear pseudoreplications as if all samples are independent. I even saw a few papers published in nice journals that claim an effect of a specific environmental variable on species richness/composition, while the effect in reality was just a result of inflated Type I error rate. (How do I know that? The authors provided the original data used for their analysis, and I could show that even if I randomize their environmental variable while keeping its hierarchical structure, I have a high probability to obtain significant results with otherwise completely irrelevant effects). Last but not least, I write it also as a potential author of papers with flawed results because I am not being honest enough (with myself and others) that I have a pseudoreplication issue in my own data.

## Some background first

In experimental ecological studies, pseudoreplications are not that common, and if they appear, they usually result from a wrong sampling design and authors get eventually punished for them by reviewers. Less clear is the situation in observational ecological studies, based on field sampling. Look at the figure below for an example. Imagine that you want to study the change in diversity or species composition of some community (sampled within small square plots) along some environmental gradient (with intensity changing from left to right, bright to dark). Panel (a) shows a situation when individual samples are somewhat randomly distributed, in panel (b) they are somewhat clustered, and in panel (c) they are strongly clustered into three groups. We intuitively feel that the number of independent samples in panel (a) is close to 12, in panel (c) close to 3, and in panel (b) somewhere in between. In terms of pseudoreplications, while in panel (a) the samples are practically independent, in panel (c) the samples within the same group are almost complete pseudoreplications; they will likely have similar biological and environmental values since they are sampled in close vicinity of each other.

When analyzing data with some level of pseudoreplication, or spatial autocorrelation, we need to consider the hierarchical structure of the data in the model or modify the test of significance accordingly. In the situation of randomly distributed samples (panel a), we may think it is not necessary, especially if samples are far enough (but what is far enough?). In the situation of strong clustering (panel c), we would perhaps directly include “block” into the model, effectively reducing our sample size to the number of blocks. But most situations are somewhat in the middle; there is some level of clustering, and we are left not knowing how big a problem this could be. Since doing a simple analysis (such as linear regression) is much easier than the analysis corrected for spatial autocorrelation (such as spatial autoregressive models), we have good reason to slightly close our eyes and ignore the problem.

## Numerical example with richness along elevation studied on one or several mountains

The example below illustrates the problem that pseudoreplications can cause if not accurately treated in the test of significance. Imagine that we study the pattern of richness (e.g. tree species) along elevation (see figure below). Each mountain (triangle) is separated into elevation zones (five in this example), and in each zone, we collect one or several samples and in each count the numbers of tree species. The figure below shows two scenarios, both generating datasets with 25 samples. In the first scenario (panel a) are five mountains, and each elevation zone in each mountain has only one sample. In the second scenario (panel b) is only one mountain, and each elevation zone in this mountain has five samples. Samples in the same elevation zone, but in a different mountain, may be considered as independent samples: they are far from each other, and their richness can be quite different (e.g. affected by other, regionally specific effects). In contrast, samples in the same elevation zone and the same mountain are clear pseudoreplications, since they are located close to each other and their richness is possibly rather similar (influenced by the same regional factors). For purpose of this example, let’s assume that different elevation zones on the same mountain are far from each other and are not spatially dependent (mountains are in reality not that sharp).

Let’s extend this example, add one more scenario and some math. We add the third scenario (not on the figure above, since I was lazy to click in PowerPoint), which again has only one mountain, but each elevation zone has 100 samples, increasing the sample size to 500 samples. And now the math: we will analyze the correlation between richness and elevation while generating the richness values of individual samples randomly (i.e. creating a dataset where the null hypothesis, that richness and elevation are not dependent, is true, and our test should not reject it – or at least nor reject too often).

For scenario 1 (five mountains, one sample per each of the five elevation zones, n = 25), I assume that samples in the same elevation zone but on different mountain are independent, and I generated species richness of each sample as a random value from a uniform distribution between 0 and 100. For scenario 2 (one mountain, five samples per each of the five elevation zones, n = 25), I assume that samples in the same elevation zone and the same mountain will not be completely independent and will have rather similar richness values, since the richness (along to elevation) may be affected also by some locally specific factors (e.g. deforestation of the locality in the past). I first generated a richness value for each elevation zone (five random numbers from a uniform distribution between 0 and 100); each sample within the same elevation zone will get the richness of that zone plus randomly generated value (normal distribution, mean zero, sd = 10); in this way, the richness of samples within the same elevation zone slightly different (but not very much). For scenario 3 (one mountain, 100 samples per each of the five elevation zones, n = 500) I repeated the same as for scenario 2, but increased the number of samples within the same elevation zone to 100.

One example for each scenario is shown in the figure below (panel a, b and c for scenarios 1, 2 and 3, respectively). The x-axis shows hypothetical elevation (from 200 to 1000 m asl), and the y-axis shows the species richness of individual samples. I calculated the correlation between richness and elevation for each scenario and recorded whether the correlation t-test was significant at P < 0.05 or not. I repeated the simulation 100 times and counted how many significant results I get for each scenario (out of 100). Since we know that the null hypothesis is true (because we generated our data as such), and we decided that Type I error rate of 5% is sufficiently low to consider the result as sufficiently different (we consider P values lower than 0.05 as significant), we can expect that 5% of the tests for each scenario will be false positives (around 5 out of 100).

Panel (d) shows the number of results significant at P < 0.05, compared to the expectation (5%). While for scenario 1 the number fits the expectation (is around the proportion of 5%), scenario 2 has almost 50% of significant results, and scenario 3 almost 90%.

## Few notes at the end

The conclusion from the numerical example above could be: if you want to show that richness is related to elevation (even though it maybe does not), do not bother with climbing too many mountains, just try hard to make as many samples as you can on just one of them! But God bless you with convincing reviewers and readers that all your samples are independent replicates and they can trust your simple correlation results…

The more general suggestion would be: if your sampling design is like the panel (a) or (b) in the first schema above, check for autocorrelation in residuals of your model, and if it is there, be honest and correct for it, or at least report that it may possibly be a problem but you haven’t corrected for it (and why). If your design resembles the situation on panel (c), analyze your data directly as the block design (e.g. using linear mixed effect models with blocks as random effects). If you do not, while possibly having many replicates within each “block”, you are likely to hugely lie with your P-values. Highly significant but very small effect size (like low R^{2}) will for sure cast doubts on your results.

## Data and code

- Simulation of three scenarios with richness along elevation: https://gist.github.com/zdealveindy/75f8c7ad7334d7ee72a745695b540a8a