Recently I used R package *vegan* to recalculate several partial RDA analyses, which were originally calculated in CANOCO 5 (C5). Partial RDA (pRDA) is a constrained ordination which contains both explanatory variables and covariables. To my surprise, the values of explained variation (*R2*) and adjusted explained variation (*adjR2*) differ between *vegan* and C5, with values in *vegan* being systematically lower. Note that this difference will occur only in case of partial constrained ordination – if you do not have covariables, then *R2* and *adjR2* in vegan will be the same as *R2* and *adjR2* in C5.

After some searching and a help from Petr Šmilauer, I realized that *vegan* and C5 differ in the way how they calculate *R2* and *adjR2*. In R, *vegan* is calculating so called *semipartial R2*, in which the denominator is total inertia only (i.e. pure variation explained by explanatory variables / total inertia). On the other hand, C5 calculates so called *partial R2*, in which variation explained by explanatory variables is divided by total inertia without the effect of covariables (i.e. pure variation explained by explanatory variables / (total inertia – variation explained by covariables)). Note that “pure variation explained by explanatory variables” is the amount of variation explanatory variables explain after removing the effect of covariables. More on this e.g. in Legendre & Legendre (2012), page 651.

In case of adjusted *R2*, the situation is also different, and slightly more complex. In both programs adjusted *R2* is calculated using Ezekiel’s formula *adjR2* = 1 – (1 – *R2*)*(*n* – 1)/(*n* – *m* – 1), where *n* is number of samples in the dataset, and *m* is number of variables (note that if some of these variables is a factor, it brings “number of factor levels – 1” degrees of freedom).

In *vegan*, adjusted *R2* of explained variation after removing the effect of covariables is calculated as fraction [a] in variance partitioning between explanatory variables and covariables, where [a + b] is variation explained by explanatory variables, and [b + c] variation explained by covariables. Then, *adjR2*[a] = *adjR2*[a+b+c] – *adjR2*[b+c], i.e. *adjR2* of model containing both explanatory variables and covariables minus *adjR2* of model containing only covariables. If *p* is the number of covariables (or number of degrees of freedom relevant to covariables, in case that some of them is a factor), than

*adjR2*[a] = *adjR2*[a+b+c] – *adjR2*[b+c] = (1 – (1 – *R2*[a+b+c])*(*n* – 1)/(*n* – *m* – *p* – 1)) – (1 – (1 – *R2*[b+c])*(*n* – 1)/(*n* – *p* – 1))

In C5, the adjusted *R2* explained by explanatory variables after considering covariables is not calculated by subtracting *adjR2* of the covariables from the full model, but directly by adjusting the original *R2* values using modified number of degrees of freedom in the formula. Hence,

*adjR2* = 1 – (1 – *R2*)*(*n* – *p* – 1)/(*n* – *m* – *p* – 1)

which means that the number of degrees for covariables is subtracted at both numerator and denominator of the right hand formula side.

Let’s see the difference on the real example. We will use Vltava dataset, and build the pRDA in both *vegan* and C5.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
| library (vegan)
# download Vltava dataset from http://www.davidzeleny.net/anadat-r/
vltava.spe <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-spe.txt', row.names = 1)
vltava.env <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-env.txt')
vltava.spe.hell <- decostand (vltava.spe, method = 'hell')
# Build partial RDA model with
pRDA <- rda (vltava.spe.hell ~ pH + ASPSSW + Condition (SOILDPT + SLOPE),
data = vltava.env)
pRDA
# Call: rda(formula = vltava.spe.hell ~ pH + ASPSSW + Condition(SOILDPT + SLOPE),
# data = vltava.env)
#
# Inertia Proportion Rank
# Total 0.71576 1.00000
# Conditional 0.04264 0.05957 2
# Constrained 0.06245 0.08725 2
# Unconstrained 0.61067 0.85318 92
# Inertia is variance
#
# Eigenvalues for constrained axes:
# RDA1 RDA2
# 0.03810 0.02435
#
# Eigenvalues for unconstrained axes:
# PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
# 0.06648 0.06138 0.04468 0.03336 0.02994 0.02693 0.02328 0.02114
# (Showed only 8 of all 92 unconstrained eigenvalues)
RsquareAdj (pRDA)
# $r.squared
# [1] 0.0872482
#
# $adj.r.squared
# [1] 0.07016251
# First, let's define variables with explained variations:
constrained <- pRDA$CCA$tot.chi # 0.06244857 - variation explained by
# explanatory variables after removing the effect of covariables
conditional <- pRDA$pCCA$tot.chi # 0.04263834 - variation explained by
# covariables
total <- pRDA$tot.chi # 0.7157577 - total variation of the compositional dataset
# In R, the R2 is calculated as constrained variation / total inertia:
constrained / total
# [1] 0.0872482
# and adjusted R2 is calculating by subtracting adjR2 explained by covariables
# from adjR2 explained by both explanatory variables and covariables:
R2abc <- (constrained + conditional) / total
R2bc <- conditional / total
n <- nrow (vltava.spe) # number of samples in the dataset
m <- 2 # number of explanatory variables
p <- 2 # number of covariables
adjR2abc <- 1 - (1 - R2abc)*(n - 1)/(n - m - p - 1)
adjR2bc <- 1 - (1 - R2bc)*(n - 1)/(n - p - 1)
adjR2a <- adjR2abc - adjR2bc
adjR2a
# [1] 0.07016251 |

library (vegan)
# download Vltava dataset from http://www.davidzeleny.net/anadat-r/
vltava.spe <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-spe.txt', row.names = 1)
vltava.env <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-env.txt')
vltava.spe.hell <- decostand (vltava.spe, method = 'hell')
# Build partial RDA model with
pRDA <- rda (vltava.spe.hell ~ pH + ASPSSW + Condition (SOILDPT + SLOPE),
data = vltava.env)
pRDA
# Call: rda(formula = vltava.spe.hell ~ pH + ASPSSW + Condition(SOILDPT + SLOPE),
# data = vltava.env)
#
# Inertia Proportion Rank
# Total 0.71576 1.00000
# Conditional 0.04264 0.05957 2
# Constrained 0.06245 0.08725 2
# Unconstrained 0.61067 0.85318 92
# Inertia is variance
#
# Eigenvalues for constrained axes:
# RDA1 RDA2
# 0.03810 0.02435
#
# Eigenvalues for unconstrained axes:
# PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8
# 0.06648 0.06138 0.04468 0.03336 0.02994 0.02693 0.02328 0.02114
# (Showed only 8 of all 92 unconstrained eigenvalues)
RsquareAdj (pRDA)
# $r.squared
# [1] 0.0872482
#
# $adj.r.squared
# [1] 0.07016251
# First, let's define variables with explained variations:
constrained <- pRDA$CCA$tot.chi # 0.06244857 - variation explained by
# explanatory variables after removing the effect of covariables
conditional <- pRDA$pCCA$tot.chi # 0.04263834 - variation explained by
# covariables
total <- pRDA$tot.chi # 0.7157577 - total variation of the compositional dataset
# In R, the R2 is calculated as constrained variation / total inertia:
constrained / total
# [1] 0.0872482
# and adjusted R2 is calculating by subtracting adjR2 explained by covariables
# from adjR2 explained by both explanatory variables and covariables:
R2abc <- (constrained + conditional) / total
R2bc <- conditional / total
n <- nrow (vltava.spe) # number of samples in the dataset
m <- 2 # number of explanatory variables
p <- 2 # number of covariables
adjR2abc <- 1 - (1 - R2abc)*(n - 1)/(n - m - p - 1)
adjR2bc <- 1 - (1 - R2bc)*(n - 1)/(n - p - 1)
adjR2a <- adjR2abc - adjR2bc
adjR2a
# [1] 0.07016251

In CANOCO 5, if the same dataset and the same variables and covariables are used in the same pRDA analysis on Hellinger transformed data, the results are different (if you have CANOCO 5 installed on your computer, download and open this file):

Analysis 'Constrained-partial'
Method: partial RDA
Partial variation is 64.61974, explanatory variables account for ** 9.3%**
(adjusted explained variation is **7.3%**)
Summary Table:
Statistic Axis 1 Axis 2 Axis 3 Axis 4
Eigenvalues 0.0532 0.0340 0.0929 0.0858
Explained variation (cumulative) 5.66 9.28 19.15 28.27
Pseudo-canonical correlation 0.7624 0.6491 0.0000 0.0000
Explained fitted variation (cumulative) 61.01 100.00
Permutation Test Results:
On All Axes pseudo-F=4.7, P=0.001

As you can see, R2 is 9.3%, while adjR2 is 7.3%, in contrast to *vegan*‘s 8.7% and 7.0% for R2 and adjR2, respectively. The sum of eigenvalues for the two constrained axes is 0.0532+0.0340 = 0.0872, the same value as the R2 in *vegan*, but in the header, C5 reports higher values. These are calculated in the following way (continuing the R code above):

1
2
3
4
5
6
7
8
| # R2 in CANOCO as
R2.C5 <- constrained / (total - conditional)
R2.C5
# [1] 0.09277488 - after rounding 9.3%
# adjR2 in CANOCO as
1 - (1 - R2.C5)*(n - p - 1)/(n - m - p - 1)
# [1] 0.0730526 - after rounding 7.3 |

# R2 in CANOCO as
R2.C5 <- constrained / (total - conditional)
R2.C5
# [1] 0.09277488 - after rounding 9.3%
# adjR2 in CANOCO as
1 - (1 - R2.C5)*(n - p - 1)/(n - m - p - 1)
# [1] 0.0730526 - after rounding 7.3

Note that the other parameters are identical in both vegan and CANOCO (eigenvalues, F-value and also the significance).

In the end, I wrote this quick-and-dirty hack of the *vegan*‘s RsquareAdj function, which includes argument *type* with the values either “partial” (vegan’s version) or “semipartial” (C5 version):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
| RsquareAdj2.rda <- function (x, type = 'semipartial', ...)
{
m <- x$CCA$qrank
n <- nrow(x$CCA$u)
if (is.null(x$pCCA)) {
R2 <- x$CCA$tot.chi/x$tot.chi
radj <- RsquareAdj(R2, n, m)
}
else if (type == 'semipartial') {
R2 <- x$CCA$tot.chi/x$tot.chi
R2p <- x$pCCA$tot.chi/x$tot.chi
p <- x$pCCA$rank
radj <- RsquareAdj(R2 + R2p, n, m + p) - RsquareAdj(R2p, n, p)
} else if (type == 'partial'){
R2 <- x$CCA$tot.chi/(x$tot.chi - x$pCCA$tot.chi)
p <- x$pCCA$rank
radj <- 1 - (1 - R2)*(n - p - 1)/(n - m - p - 1)
if (any(na <- m >= n - 1)) radj[na] <- NA
}
list(r.squared = R2, adj.r.squared = radj, type = type)
} |

RsquareAdj2.rda <- function (x, type = 'semipartial', ...)
{
m <- x$CCA$qrank
n <- nrow(x$CCA$u)
if (is.null(x$pCCA)) {
R2 <- x$CCA$tot.chi/x$tot.chi
radj <- RsquareAdj(R2, n, m)
}
else if (type == 'semipartial') {
R2 <- x$CCA$tot.chi/x$tot.chi
R2p <- x$pCCA$tot.chi/x$tot.chi
p <- x$pCCA$rank
radj <- RsquareAdj(R2 + R2p, n, m + p) - RsquareAdj(R2p, n, p)
} else if (type == 'partial'){
R2 <- x$CCA$tot.chi/(x$tot.chi - x$pCCA$tot.chi)
p <- x$pCCA$rank
radj <- 1 - (1 - R2)*(n - p - 1)/(n - m - p - 1)
if (any(na <- m >= n - 1)) radj[na] <- NA
}
list(r.squared = R2, adj.r.squared = radj, type = type)
}

Quick check whether it works:

1
2
| RsquareAdj2.rda (pRDA, type = 'semipartial') # 0.08724 vs 0.0701
RsquareAdj2.rda (pRDA, type = 'partial') # 0.09277 vs 0.07305 |

RsquareAdj2.rda (pRDA, type = 'semipartial') # 0.08724 vs 0.0701
RsquareAdj2.rda (pRDA, type = 'partial') # 0.09277 vs 0.07305

Looks like it does. Note, however, if you want to use it, it needs to be further tested (e.g. with datasets of environmental variables include also factors as explanatory variables or covariables).