# Simpson’s similarity index vs Simpson’s diversity index

Since I started to teach multivariate methods in community ecology and additionally to chapters about ordination I also included the chapter about diversity, I found it confusing that there are two “Simpson’s indices,” one for similarity, and one for diversity. Indeed, each index has a completely different meaning and use, and I may well be the only one confused by them, but for a while, I had an impression that both indices are somehow related and maybe could be derived from each other. I did some search to clarify it for myself, and it turned out that although both indices have been published in roughly the same time (in the ’40s), each one is attributed to different Mr Simspon and appeared in quite a different context.

A bit of theory first. Simpson’s similarity index is used to calculate the similarity between a pair of community samples, to quantify whether their species composition is similar (they share most or all the species) or different. It comes in the company with other, similar indices, like Jaccard and Sørensen, which do the same job but with slightly different logic. If we have two samples, there will be some number of species shared between both, and some species occurring only in the first or only in the second sample. Jaccard or Sørensen divides the number of species both samples share by the number of all species occurring in both samples together (and Sørensen cares about shared species more than Jaccard, so it multiplies it twice in both numerator and denominator). The problem occurs if one of the samples has much higher species richness than the other. The number of shared species (in the numerator) may be the same, but the high richness of one of the samples will increase the denominator and decrease the overall index value, making the index losing the resolution power. Here comes the Simpson’s similarity index, which has the same nominator (number of shared species), but in the denominator, it takes only the number of species occurring in the smaller sample. Its use is recommended in a situation when there are large differences in species richness between the samples, and eventually, it turned out that this index is also the one quantifying the nestedness of two community samples (if the first sample contains the same species as the other one plus some more, the second sample is perfectly nested within the first one, and the value of the Simpson’s similarity index is equal to one).

Simpson’s diversity index is a completely different story. It aims to quantify the diversity of a single community sample, while considering both species richness (the number of species recorded in the sample) and also the relative species abundances (some species within the sample can be dominant, represented by a high number of individuals/higher biomass/larger cover, while some other are less common or rare). Two communities may have the same species richness (i.e. number of species), but if they differ by relative species abundance (i.e. evenness), their overall diversity will be different. If we have two forests, both with ten species (richness), but first with all species represented by the same number of individuals (perfectly even), while the second with one or two dominant trees (highly uneven), the first will intuitively (and also functionally) be more diverse – a walk through such forest will give you a feeling of diversity since you keep meeting different trees. In contrast, the forest dominated by a single or few species will feel species-poor (rare species are also there, so richness is the same, but they are not likely to be encountered as frequently as and also they are not likely to have an important ecological function in the community). Simpson’s diversity index, when calculating the diversity, is weighing the importance of species richness and relative species abundances. It comes in the company of his famous sibling, Shannon index, which does the same thing in a slightly different way. While Simpson’s index cares more about relative abundances, the Shannon index cares more about species richness; or, put in another way, the importance of rare species decreases in order species richness > Shannon index > Simpson index.

Now about the name. Both indices share the author’s surname, but in each case, it was a different Mr Simspon. Simpson’s similarity index was introduced by an American palaeontologist George Gaylord Simpson (1902-1984) to solve the situation often encountered in his practice with fossil samples, where some samples are much richer than the others, meaning that conventional similarity indices (like Jaccard and Sørensen) do not work properly. In 1943, G.G. Simpson wrote a paper “Mammals and the nature of continents” and published it in American Journal of Science; later, in 1960, he published (in the same journal) paper “Notes on the measurement of faunal resemblance”, where he also provided the formula for his index. The index has been rediscovered by Jack J. Lennon et al. in 2001 as βsim (often called beta sim; I suspect “sim” stems from “Simpson”). Note that Lennon et al. (2001) does not talk about “similarity”, but instead they used Simpson’s formula to calculate mean pairwise dissimilarities between the focal sample and the other samples; in Koleff et al. (2003), however, the beta sim is already considered as a pairwise dissimilarity measure based on the original Simpson’s index (1 – Simpson).

Simpson’s diversity index (also called concentration index) was published by British statistician Edward Hugh Simpson (born 1922, living in Oxfordshire), who is mostly famous for his formulation of Simpson’s paradox. E.H.Simpson published the index in the 1949’s Nature’s paper entitled “Measurement of diversity”. It quantifies the probability that two individuals took independently (without replacement) from the sample will be of the same species. With increasing diversity the index actually decreases (two individuals have the highest chance to be of the same species if the sample has only one species), so more often the one complement (1-Simpson) or reciprocal (1/Simpson) values are used, also known as Gini-Simpson index or Hurlbert’s PIE (probability of interspecific encounter, Hurlbert 1971).

So here we go – no reason to confuse them any more!

References

• Hurlbert, S.H. 1971. The nonconcept of species diversity: A critique and alternative parameters. Ecology, 52: 577–586.
• Koleff, P. Gaston, K.J. & Lennon, J.J. 2003. Measuring beta diversity for presence-absence data. Journal of Animal Ecology, 72: 367-382.
• Lennon, J.J., Koleff, P., Greenwood, J.J.D. & Gaston, K.J. 2001. The geographical structure of British bird distributions: diversity, spatial turnover and scale. Journal of Animal Ecology, 70: 966-979.
• Simpson, E.H. 1949 Measurement of diversity. Nature, 163: 688.
• Simpson, G.G. 1943. Mammals and the nature of continents. American Journal of Science, 241: 1-31.
• Simpson, G.G. 1960. Notes on the measurement of faunal resemblance. American Journal of Science, 258-A: 300-311.

## 2 thoughts on “Simpson’s similarity index vs Simpson’s diversity index”

1. Svea Hu

Is there any way to calculate community phylogenetic similarity between plots using Simpsons similarity calculation logic?

1. David Zelený Post author

I am afraid I am not familiar with these options… I wonder whether some paper from Anne Chao, who was doing research on similarities also in the context of functional and phylogenetic relationships, would work, but unfortunately do not know any from the top of my head now… sorry 🙂 David