Recently I used R package *vegan* to recalculate several partial RDA analyses, which were originally calculated in CANOCO 5 (C5). Partial RDA (pRDA) is a constrained ordination which contains both explanatory variables and covariables. To my surprise, the values of explained variation (*R2*) and adjusted explained variation (*adjR2*) differ between *vegan* and C5, with values in *vegan* being systematically lower. Note that this difference will occur only in case of partial constrained ordination – if you do not have covariables, then *R2* and *adjR2* in vegan will be the same as *R2* and *adjR2* in C5.

After some searching and a help from Petr Šmilauer, I realized that *vegan* and C5 differ in the way how they calculate *R2* and *adjR2*. In R, *vegan* is calculating so called *semipartial R2*, in which the denominator is total inertia only (i.e. pure variation explained by explanatory variables / total inertia). On the other hand, C5 calculates so called *partial R2*, in which variation explained by explanatory variables is divided by total inertia without the effect of covariables (i.e. pure variation explained by explanatory variables / (total inertia – variation explained by covariables)). Note that “pure variation explained by explanatory variables” is the amount of variation explanatory variables explain after removing the effect of covariables. More on this e.g. in Legendre & Legendre (2012), page 651.

In case of adjusted *R2*, the situation is also different, and slightly more complex. In both programs adjusted *R2* is calculated using Ezekiel’s formula *adjR2* = 1 – (1 – *R2*)*(*n* – 1)/(*n* – *m* – 1), where *n* is number of samples in the dataset, and *m* is number of variables (note that if some of these variables is a factor, it brings “number of factor levels – 1” degrees of freedom).

In *vegan*, adjusted *R2* of explained variation after removing the effect of covariables is calculated as fraction [a] in variance partitioning between explanatory variables and covariables, where [a + b] is variation explained by explanatory variables, and [b + c] variation explained by covariables. Then, *adjR2*[a] = *adjR2*[a+b+c] – *adjR2*[b+c], i.e. *adjR2* of model containing both explanatory variables and covariables minus *adjR2* of model containing only covariables. If *p* is the number of covariables (or number of degrees of freedom relevant to covariables, in case that some of them is a factor), than

*adjR2*[a] = *adjR2*[a+b+c] – *adjR2*[b+c] = (1 – (1 – *R2*[a+b+c])*(*n* – 1)/(*n* – *m* – *p* – 1)) – (1 – (1 – *R2*[b+c])*(*n* – 1)/(*n* – *p* – 1))

In C5, the adjusted *R2* explained by explanatory variables after considering covariables is not calculated by subtracting *adjR2* of the covariables from the full model, but directly by adjusting the original *R2* values using modified number of degrees of freedom in the formula. Hence,

*adjR2* = 1 – (1 – *R2*)*(*n* – *p* – 1)/(*n* – *m* – *p* – 1)

which means that the number of degrees for covariables is subtracted at both numerator and denominator of the right hand formula side.

Let’s see the difference on the real example. We will use Vltava dataset, and build the pRDA in both *vegan* and C5.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 | library (vegan) # download Vltava dataset from http://www.davidzeleny.net/anadat-r/ vltava.spe <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-spe.txt', row.names = 1) vltava.env <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-env.txt') vltava.spe.hell <- decostand (vltava.spe, method = 'hell') # Build partial RDA model with pRDA <- rda (vltava.spe.hell ~ pH + ASPSSW + Condition (SOILDPT + SLOPE), data = vltava.env) pRDA # Call: rda(formula = vltava.spe.hell ~ pH + ASPSSW + Condition(SOILDPT + SLOPE), # data = vltava.env) # # Inertia Proportion Rank # Total 0.71576 1.00000 # Conditional 0.04264 0.05957 2 # Constrained 0.06245 0.08725 2 # Unconstrained 0.61067 0.85318 92 # Inertia is variance # # Eigenvalues for constrained axes: # RDA1 RDA2 # 0.03810 0.02435 # # Eigenvalues for unconstrained axes: # PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 # 0.06648 0.06138 0.04468 0.03336 0.02994 0.02693 0.02328 0.02114 # (Showed only 8 of all 92 unconstrained eigenvalues) RsquareAdj (pRDA) # $r.squared # [1] 0.0872482 # # $adj.r.squared # [1] 0.07016251 # First, let's define variables with explained variations: constrained <- pRDA$CCA$tot.chi # 0.06244857 - variation explained by # explanatory variables after removing the effect of covariables conditional <- pRDA$pCCA$tot.chi # 0.04263834 - variation explained by # covariables total <- pRDA$tot.chi # 0.7157577 - total variation of the compositional dataset # In R, the R2 is calculated as constrained variation / total inertia: constrained / total # [1] 0.0872482 # and adjusted R2 is calculating by subtracting adjR2 explained by covariables # from adjR2 explained by both explanatory variables and covariables: R2abc <- (constrained + conditional) / total R2bc <- conditional / total n <- nrow (vltava.spe) # number of samples in the dataset m <- 2 # number of explanatory variables p <- 2 # number of covariables adjR2abc <- 1 - (1 - R2abc)*(n - 1)/(n - m - p - 1) adjR2bc <- 1 - (1 - R2bc)*(n - 1)/(n - p - 1) adjR2a <- adjR2abc - adjR2bc adjR2a # [1] 0.07016251 |

In CANOCO 5, if the same dataset and the same variables and covariables are used in the same pRDA analysis on Hellinger transformed data, the results are different (if you have CANOCO 5 installed on your computer, download and open this file):

Analysis 'Constrained-partial' Method: partial RDA Partial variation is 64.61974, explanatory variables account for9.3%(adjusted explained variation is7.3%) Summary Table: Statistic Axis 1 Axis 2 Axis 3 Axis 4 Eigenvalues 0.0532 0.0340 0.0929 0.0858 Explained variation (cumulative) 5.66 9.28 19.15 28.27 Pseudo-canonical correlation 0.7624 0.6491 0.0000 0.0000 Explained fitted variation (cumulative) 61.01 100.00 Permutation Test Results: On All Axes pseudo-F=4.7, P=0.001

As you can see, R2 is 9.3%, while adjR2 is 7.3%, in contrast to *vegan*‘s 8.7% and 7.0% for R2 and adjR2, respectively. The sum of eigenvalues for the two constrained axes is 0.0532+0.0340 = 0.0872, the same value as the R2 in *vegan*, but in the header, C5 reports higher values. These are calculated in the following way (continuing the R code above):

1 2 3 4 5 6 7 8 | # R2 in CANOCO as R2.C5 <- constrained / (total - conditional) R2.C5 # [1] 0.09277488 - after rounding 9.3% # adjR2 in CANOCO as 1 - (1 - R2.C5)*(n - p - 1)/(n - m - p - 1) # [1] 0.0730526 - after rounding 7.3 |

Note that the other parameters are identical in both vegan and CANOCO (eigenvalues, F-value and also the significance).

In the end, I wrote this quick-and-dirty hack of the *vegan*‘s RsquareAdj function, which includes argument *type* with the values either “partial” (vegan’s version) or “semipartial” (C5 version):

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | RsquareAdj2.rda <- function (x, type = 'semipartial', ...) { m <- x$CCA$qrank n <- nrow(x$CCA$u) if (is.null(x$pCCA)) { R2 <- x$CCA$tot.chi/x$tot.chi radj <- RsquareAdj(R2, n, m) } else if (type == 'semipartial') { R2 <- x$CCA$tot.chi/x$tot.chi R2p <- x$pCCA$tot.chi/x$tot.chi p <- x$pCCA$rank radj <- RsquareAdj(R2 + R2p, n, m + p) - RsquareAdj(R2p, n, p) } else if (type == 'partial'){ R2 <- x$CCA$tot.chi/(x$tot.chi - x$pCCA$tot.chi) p <- x$pCCA$rank radj <- 1 - (1 - R2)*(n - p - 1)/(n - m - p - 1) if (any(na <- m >= n - 1)) radj[na] <- NA } list(r.squared = R2, adj.r.squared = radj, type = type) } |

Quick check whether it works:

1 2 | RsquareAdj2.rda (pRDA, type = 'semipartial') # 0.08724 vs 0.0701 RsquareAdj2.rda (pRDA, type = 'partial') # 0.09277 vs 0.07305 |

Looks like it does. Note, however, if you want to use it, it needs to be further tested (e.g. with datasets of environmental variables include also factors as explanatory variables or covariables).