Recently I used R package vegan to recalculate several partial RDA analyses, which were originally calculated in CANOCO 5 (C5). Partial RDA (pRDA) is a constrained ordination which contains both explanatory variables and covariables. To my surprise, the values of explained variation (R2) and adjusted explained variation (adjR2) differ between vegan and C5, with values in vegan being systematically lower. Note that this difference will occur only in case of partial constrained ordination – if you do not have covariables, then R2 and adjR2 in vegan will be the same as R2 and adjR2 in C5.
After some searching and a help from Petr Šmilauer, I realized that vegan and C5 differ in the way how they calculate R2 and adjR2. In R, vegan is calculating so called semipartial R2, in which the denominator is total inertia only (i.e. pure variation explained by explanatory variables / total inertia). On the other hand, C5 calculates so called partial R2, in which variation explained by explanatory variables is divided by total inertia without the effect of covariables (i.e. pure variation explained by explanatory variables / (total inertia – variation explained by covariables)). Note that “pure variation explained by explanatory variables” is the amount of variation explanatory variables explain after removing the effect of covariables. More on this e.g. in Legendre & Legendre (2012), page 651.
In case of adjusted R2, the situation is also different, and slightly more complex. In both programs adjusted R2 is calculated using Ezekiel’s formula adjR2 = 1 – (1 – R2)*(n – 1)/(n – m – 1), where n is number of samples in the dataset, and m is number of variables (note that if some of these variables is a factor, it brings “number of factor levels – 1” degrees of freedom).
In vegan, adjusted R2 of explained variation after removing the effect of covariables is calculated as fraction [a] in variance partitioning between explanatory variables and covariables, where [a + b] is variation explained by explanatory variables, and [b + c] variation explained by covariables. Then, adjR2[a] = adjR2[a+b+c] – adjR2[b+c], i.e. adjR2 of model containing both explanatory variables and covariables minus adjR2 of model containing only covariables. If p is the number of covariables (or number of degrees of freedom relevant to covariables, in case that some of them is a factor), than
adjR2[a] = adjR2[a+b+c] – adjR2[b+c] = (1 – (1 – R2[a+b+c])*(n – 1)/(n – m – p – 1)) – (1 – (1 – R2[b+c])*(n – 1)/(n – p – 1))
In C5, the adjusted R2 explained by explanatory variables after considering covariables is not calculated by subtracting adjR2 of the covariables from the full model, but directly by adjusting the original R2 values using modified number of degrees of freedom in the formula. Hence,
adjR2 = 1 – (1 – R2)*(n – p – 1)/(n – m – p – 1)
which means that the number of degrees for covariables is subtracted at both numerator and denominator of the right hand formula side.
Let’s see the difference on the real example. We will use Vltava dataset, and build the pRDA in both vegan and C5.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | library (vegan) # download Vltava dataset from http://www.davidzeleny.net/anadat-r/ vltava.spe <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-spe.txt', row.names = 1) vltava.env <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-env.txt') vltava.spe.hell <- decostand (vltava.spe, method = 'hell') # Build partial RDA model with pRDA <- rda (vltava.spe.hell ~ pH + ASPSSW + Condition (SOILDPT + SLOPE), data = vltava.env) pRDA # Call: rda(formula = vltava.spe.hell ~ pH + ASPSSW + Condition(SOILDPT + SLOPE), # data = vltava.env) # # Inertia Proportion Rank # Total 0.71576 1.00000 # Conditional 0.04264 0.05957 2 # Constrained 0.06245 0.08725 2 # Unconstrained 0.61067 0.85318 92 # Inertia is variance # # Eigenvalues for constrained axes: # RDA1 RDA2 # 0.03810 0.02435 # # Eigenvalues for unconstrained axes: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 # 0.06648 0.06138 0.04468 0.03336 0.02994 0.02693 0.02328 0.02114 # (Showed only 8 of all 92 unconstrained eigenvalues) RsquareAdj (pRDA) # $r.squared # [1] 0.0872482 # # $adj.r.squared # [1] 0.07016251 # First, let's define variables with explained variations: constrained <- pRDA$CCA$tot.chi # 0.06244857 - variation explained by # explanatory variables after removing the effect of covariables conditional <- pRDA$pCCA$tot.chi # 0.04263834 - variation explained by # covariables total <- pRDA$tot.chi # 0.7157577 - total variation of the compositional dataset # In R, the R2 is calculated as constrained variation / total inertia: constrained / total # [1] 0.0872482 # and adjusted R2 is calculating by subtracting adjR2 explained by covariables # from adjR2 explained by both explanatory variables and covariables: R2abc <- (constrained + conditional) / total R2bc <- conditional / total n <- nrow (vltava.spe) # number of samples in the dataset m <- 2 # number of explanatory variables p <- 2 # number of covariables adjR2abc <- 1 - (1 - R2abc)*(n - 1)/(n - m - p - 1) adjR2bc <- 1 - (1 - R2bc)*(n - 1)/(n - p - 1) adjR2a <- adjR2abc - adjR2bc adjR2a # [1] 0.07016251 |
In CANOCO 5, if the same dataset and the same variables and covariables are used in the same pRDA analysis on Hellinger transformed data, the results are different (if you have CANOCO 5 installed on your computer, download and open this file):
Analysis 'Constrained-partial' Method: partial RDA Partial variation is 64.61974, explanatory variables account for 9.3% (adjusted explained variation is 7.3%) Summary Table: Statistic Axis 1 Axis 2 Axis 3 Axis 4 Eigenvalues 0.0532 0.0340 0.0929 0.0858 Explained variation (cumulative) 5.66 9.28 19.15 28.27 Pseudo-canonical correlation 0.7624 0.6491 0.0000 0.0000 Explained fitted variation (cumulative) 61.01 100.00 Permutation Test Results: On All Axes pseudo-F=4.7, P=0.001
As you can see, R2 is 9.3%, while adjR2 is 7.3%, in contrast to vegan‘s 8.7% and 7.0% for R2 and adjR2, respectively. The sum of eigenvalues for the two constrained axes is 0.0532+0.0340 = 0.0872, the same value as the R2 in vegan, but in the header, C5 reports higher values. These are calculated in the following way (continuing the R code above):
1 2 3 4 5 6 7 8 | # R2 in CANOCO as R2.C5 <- constrained / (total - conditional) R2.C5 # [1] 0.09277488 - after rounding 9.3% # adjR2 in CANOCO as 1 - (1 - R2.C5)*(n - p - 1)/(n - m - p - 1) # [1] 0.0730526 - after rounding 7.3 |
Note that the other parameters are identical in both vegan and CANOCO (eigenvalues, F-value and also the significance).
In the end, I wrote this quick-and-dirty hack of the vegan‘s RsquareAdj function, which includes argument type with the values either “partial” (vegan’s version) or “semipartial” (C5 version):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | RsquareAdj2.rda <- function (x, type = 'semipartial', ...) { m <- x$CCA$qrank n <- nrow(x$CCA$u) if (is.null(x$pCCA)) { R2 <- x$CCA$tot.chi/x$tot.chi radj <- RsquareAdj(R2, n, m) } else if (type == 'semipartial') { R2 <- x$CCA$tot.chi/x$tot.chi R2p <- x$pCCA$tot.chi/x$tot.chi p <- x$pCCA$rank radj <- RsquareAdj(R2 + R2p, n, m + p) - RsquareAdj(R2p, n, p) } else if (type == 'partial'){ R2 <- x$CCA$tot.chi/(x$tot.chi - x$pCCA$tot.chi) p <- x$pCCA$rank radj <- 1 - (1 - R2)*(n - p - 1)/(n - m - p - 1) if (any(na <- m >= n - 1)) radj[na] <- NA } list(r.squared = R2, adj.r.squared = radj, type = type) } |
Quick check whether it works:
1 2 | RsquareAdj2.rda (pRDA, type = 'semipartial') # 0.08724 vs 0.0701 RsquareAdj2.rda (pRDA, type = 'partial') # 0.09277 vs 0.07305 |
Looks like it does. Note, however, if you want to use it, it needs to be further tested (e.g. with datasets of environmental variables include also factors as explanatory variables or covariables).