Monthly Archives: September 2016

Adjusted R2 in partial constrained ordination: the difference between R (vegan) and CANOCO 5

Recently I used R package vegan to recalculate several partial RDA analyses, which were originally calculated in CANOCO 5 (C5). Partial RDA (pRDA) is a constrained ordination which contains both explanatory variables and covariables. To my surprise, the values of explained variation (R2) and adjusted explained variation (adjR2) differ between vegan and C5, with values in vegan being systematically lower. Note that this difference will occur only in case of partial constrained ordination – if you do not have covariables, then R2 and adjR2 in vegan will be the same as R2 and adjR2 in C5.

After some searching and a help from Petr Šmilauer, I realized that vegan and C5 differ in the way how they calculate R2 and adjR2. In R, vegan is calculating so called semipartial R2, in which the denominator is total inertia only (i.e. pure variation explained by explanatory variables / total inertia). On the other hand, C5 calculates so called partial R2, in which variation explained by explanatory variables is divided by total inertia without the effect of covariables (i.e. pure variation explained by explanatory variables / (total inertia – variation explained by covariables)). Note that “pure variation explained by explanatory variables” is the amount of variation explanatory variables explain after removing the effect of covariables. More on this e.g. in Legendre & Legendre (2012), page 651.

In case of adjusted R2, the situation is also different, and slightly more complex. In both programs adjusted R2 is calculated using Ezekiel’s formula adjR2 = 1 – (1 – R2)*(n – 1)/(nm – 1), where n is number of samples in the dataset, and m is number of variables (note that if some of these variables is a factor, it brings “number of factor levels – 1” degrees of freedom).

In vegan, adjusted R2 of explained variation after removing the effect of covariables is calculated as fraction [a] in variance partitioning between explanatory variables and covariables, where [a + b] is variation explained by explanatory variables, and [b + c] variation explained by covariables. Then, adjR2[a] = adjR2[a+b+c] – adjR2[b+c], i.e. adjR2 of model containing both explanatory variables and covariables minus adjR2 of model containing only covariables. If p is the number of covariables (or number of degrees of freedom relevant to covariables, in case that some of them is a factor), than

adjR2[a] = adjR2[a+b+c] – adjR2[b+c] = (1 – (1 – R2[a+b+c])*(n – 1)/(nmp – 1)) – (1 – (1 – R2[b+c])*(n – 1)/(np – 1))

In C5, the adjusted R2 explained by explanatory variables after considering covariables is not calculated by subtracting adjR2 of the covariables from the full model, but directly by adjusting the original R2 values using modified number of degrees of freedom in the formula. Hence,

adjR2 = 1 – (1 – R2)*(np – 1)/(nmp – 1)

which means that the number of degrees for covariables is subtracted at both numerator and denominator of the right hand formula side.

Let’s see the difference on the real example. We will use Vltava dataset, and build the pRDA in both vegan and C5.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
library (vegan)
# download Vltava dataset from http://www.davidzeleny.net/anadat-r/
vltava.spe <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-spe.txt', row.names = 1)
vltava.env <- read.delim ('http://www.davidzeleny.net/anadat-r/data-download/vltava-env.txt')
vltava.spe.hell <- decostand (vltava.spe, method = 'hell')
 
# Build partial RDA model with
pRDA <- rda (vltava.spe.hell ~ pH + ASPSSW + Condition (SOILDPT + SLOPE),
data = vltava.env)
 
pRDA
# Call: rda(formula = vltava.spe.hell ~ pH + ASPSSW + Condition(SOILDPT + SLOPE),
#           data = vltava.env)
#
# Inertia Proportion Rank
# Total         0.71576    1.00000
# Conditional   0.04264    0.05957    2
# Constrained   0.06245    0.08725    2
# Unconstrained 0.61067    0.85318   92
# Inertia is variance
#
# Eigenvalues for constrained axes:
#   RDA1    RDA2
# 0.03810 0.02435
#
# Eigenvalues for unconstrained axes:
#   PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8
# 0.06648 0.06138 0.04468 0.03336 0.02994 0.02693 0.02328 0.02114
# (Showed only 8 of all 92 unconstrained eigenvalues)
 
RsquareAdj (pRDA)
# $r.squared
# [1] 0.0872482
#
# $adj.r.squared
# [1] 0.07016251
 
# First, let's define variables with explained variations:
constrained <- pRDA$CCA$tot.chi  # 0.06244857 - variation explained by
# explanatory variables after removing the effect of covariables
conditional <- pRDA$pCCA$tot.chi # 0.04263834 - variation explained by
# covariables
total <- pRDA$tot.chi # 0.7157577 - total variation of the compositional dataset
 
# In R, the R2 is calculated as constrained variation / total inertia:
constrained / total
# [1] 0.0872482
 
# and adjusted R2 is calculating by subtracting adjR2 explained by covariables
# from adjR2 explained by both explanatory variables and covariables:
R2abc <- (constrained + conditional) / total
R2bc <- conditional / total
n <- nrow (vltava.spe) # number of samples in the dataset
m <- 2 # number of explanatory variables
p <- 2 # number of covariables
 
adjR2abc <- 1 - (1 - R2abc)*(n - 1)/(n - m - p - 1)
adjR2bc <- 1 - (1 - R2bc)*(n - 1)/(n - p - 1)
adjR2a <- adjR2abc - adjR2bc
adjR2a
# [1] 0.07016251

In CANOCO 5, if the same dataset and the same variables and covariables are used in the same pRDA analysis on Hellinger transformed data, the results are different (if you have CANOCO 5 installed on your computer, download and open this file):

Analysis 'Constrained-partial'
Method: partial RDA
Partial variation is 64.61974, explanatory variables account for   9.3%
(adjusted explained variation is   7.3%)

Summary Table:
  Statistic    Axis 1    Axis 2    Axis 3    Axis 4
Eigenvalues    0.0532    0.0340    0.0929    0.0858
Explained variation (cumulative)    5.66    9.28    19.15    28.27
Pseudo-canonical correlation    0.7624    0.6491    0.0000    0.0000
Explained fitted variation (cumulative)    61.01    100.00        

Permutation Test Results:
  On All Axes    pseudo-F=4.7, P=0.001

As you can see, R2 is 9.3%, while adjR2 is 7.3%, in contrast to vegan‘s 8.7% and 7.0% for R2 and adjR2, respectively. The sum of eigenvalues for the two constrained axes is 0.0532+0.0340 = 0.0872, the same value as the R2 in vegan, but in the header, C5 reports higher values. These are calculated in the following way (continuing the R code above):

1
2
3
4
5
6
7
8
# R2 in CANOCO as
R2.C5 <- constrained / (total - conditional)
R2.C5
# [1] 0.09277488 - after rounding 9.3%
 
# adjR2 in CANOCO as
1 - (1 - R2.C5)*(n - p - 1)/(n - m - p - 1)
# [1] 0.0730526 - after rounding 7.3

Note that the other parameters are identical in both vegan and CANOCO (eigenvalues, F-value and also the significance).

In the end, I wrote this quick-and-dirty hack of the vegan‘s RsquareAdj function, which includes argument type with the values either “partial” (vegan’s version) or “semipartial” (C5 version):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
RsquareAdj2.rda <- function (x, type = 'semipartial', ...)
{
m <- x$CCA$qrank
n <- nrow(x$CCA$u)
if (is.null(x$pCCA)) {
R2 <- x$CCA$tot.chi/x$tot.chi
radj <- RsquareAdj(R2, n, m)
}
else if (type == 'semipartial') {
R2 <- x$CCA$tot.chi/x$tot.chi
R2p <- x$pCCA$tot.chi/x$tot.chi
p <- x$pCCA$rank
radj <- RsquareAdj(R2 + R2p, n, m + p) - RsquareAdj(R2p, n, p)
} else if (type == 'partial'){
R2 <- x$CCA$tot.chi/(x$tot.chi - x$pCCA$tot.chi)
p <- x$pCCA$rank
radj <- 1 - (1 - R2)*(n - p - 1)/(n - m - p - 1)
if (any(na <- m >= n - 1)) radj[na] <- NA
}
list(r.squared = R2, adj.r.squared = radj, type = type)
}

Quick check whether it works:

1
2
RsquareAdj2.rda (pRDA, type = 'semipartial')  # 0.08724 vs 0.0701
RsquareAdj2.rda (pRDA, type = 'partial')      # 0.09277 vs 0.07305

Looks like it does. Note, however, if you want to use it, it needs to be further tested (e.g. with datasets of environmental variables include also factors as explanatory variables or covariables).