3. Pairwise similarity/dissimilarity metrics
Maxime Lenormand, Boris Leroy and Pierre Denelle
2025-01-17
Source:vignettes/a3_pairwise_metrics.Rmd
a3_pairwise_metrics.Rmd
The function similarity
compute well-known and customized pairwise similarity metrics based on a
co-occurrence matrix
such as vegemat
. In the
example below the Simpson similarity index is computed between each pair
of sites.
sim <- similarity(vegemat, metric = "Simpson", formula = NULL, method = "prodmat")
sim[1:10,]
## Data.frame of similarity between sites
## - Total number of sites: 715
## - Total number of species: 3697
## - Number of rows: 255255
## - Number of similarity metrics: 1
##
##
## Site1 Site2 Simpson
## 2 35 36 0.9767442
## 3 35 37 0.9689922
## 4 35 38 0.9457364
## 5 35 39 0.9457364
## 6 35 84 0.2790698
## 7 35 85 0.9147287
## 8 35 86 1.0000000
## 9 35 87 0.9922481
## 10 35 88 0.9844961
## 11 35 89 0.6821705
The resulting data.frame
is stored in a
bioregion.pairwise.metric
object containing Simpson
similarity metric between each pair of sites. The function similarity
can handle three types of metrics: the metrics based on
abc
, the metrics based on ABC
and one metric
based on the Euclidean distance.
The first kind of metrics such as Jaccard, the turnover component of
Jaccard, Simpson or Sorensen are based on presence data with
a
the number of species shared by a pair of sites,
b
species only present in the first site and c
species only present in the second site. Two methods can be used to
compute the abc
based metrics. The first method is based on
a matrix product (performed with the tcrossprod
function from the R package Matrix).
The method is fast but is greedy in memory… The second method is based
on a three
loops function coded in C++ and largely inspired by the bcdist
function from the R package ecodist
(version 2.0.7). It is less efficient than the matrix product but allows
to handle co-occurrence matrix with a large number of sites and/or
species.
The second kind of metrics such as Bray-Curtis and the turnover
component of Bray-Curtis are based on abundance data with A
the sum of the lesser values for common species shared by a pair of
sites. B
and C
are the total number of
specimens counted at both sites minus A
. Only three loops
function is available for the ABC
based metrics.
The main advantage of the similarity
function is to compute and return several metrics, to allow the
computation of customized metric with the formula
argument
and to include the possibility of returning a
,
b
and c
and/or A
, B
and C
. This feature is particularly interesting to compute
similarity metrics on large co-occurrence matrix.
sim <- similarity(vegemat, metric = c("abc","ABC","Simpson","Bray"), formula =c("(b + c) / (a + b + c)", "(B + C) / (2*A + B + C)"))
sim[1:10,]
## Data.frame of similarity between sites
## - Total number of sites: 715
## - Total number of species: 3697
## - Number of rows: 255255
## - Number of similarity metrics: 4
##
##
## Site1 Site2 Simpson Bray a b c A B C
## 2 35 36 0.9767442 0.01901485 126 3 741 420 3 43333
## 3 35 37 0.9689922 0.03745203 125 4 534 366 57 18756
## 4 35 38 0.9457364 0.04025289 122 7 440 347 76 16471
## 5 35 39 0.9457364 0.09754761 122 7 501 356 67 6520
## 6 35 84 0.2790698 0.18757921 36 93 177 74 349 292
## 7 35 85 0.9147287 0.13256181 118 11 614 378 45 4902
## 8 35 86 1.0000000 0.02663928 129 0 753 415 8 30319
## 9 35 87 0.9922481 0.02332663 128 1 909 406 17 33981
## 10 35 88 0.9844961 0.02198536 127 2 812 395 28 35115
## 11 35 89 0.6821705 0.15954416 88 41 177 196 227 1838
## (b + c) / (a + b + c) (B + C) / (2*A + B + C)
## 2 0.8551724 0.9809852
## 3 0.8114630 0.9625480
## 4 0.7855888 0.9597471
## 5 0.8063492 0.9024524
## 6 0.8823529 0.8124208
## 7 0.8411844 0.8674382
## 8 0.8537415 0.9733607
## 9 0.8766859 0.9766734
## 10 0.8650372 0.9780146
## 11 0.7124183 0.8404558
The dissimilarity function is very similar, with the sole exception that it computes the dissimilarity version of the available metrics. The functions dissimilarity_to_similarity and similarity_to_dissimilarity can be used to switch between similarity and dissimilarity metrics.
sim <- similarity(vegemat, metric = c("abc","Simpson"), formula = "(b + c) / (a + b + c)")
sim[1:10,]
## Data.frame of similarity between sites
## - Total number of sites: 715
## - Total number of species: 3697
## - Number of rows: 255255
## - Number of similarity metrics: 2
##
##
## Site1 Site2 Simpson a b c (b + c) / (a + b + c)
## 2 35 36 0.9767442 126 3 741 0.8551724
## 3 35 37 0.9689922 125 4 534 0.8114630
## 4 35 38 0.9457364 122 7 440 0.7855888
## 5 35 39 0.9457364 122 7 501 0.8063492
## 6 35 84 0.2790698 36 93 177 0.8823529
## 7 35 85 0.9147287 118 11 614 0.8411844
## 8 35 86 1.0000000 129 0 753 0.8537415
## 9 35 87 0.9922481 128 1 909 0.8766859
## 10 35 88 0.9844961 127 2 812 0.8650372
## 11 35 89 0.6821705 88 41 177 0.7124183
dissim1 <- dissimilarity(vegemat, metric = c("abc","Simpson"), formula = "(b + c) / (a + b + c)")
dissim1[1:10,]
## Data.frame of dissimilarity between sites
## - Total number of sites: 715
## - Total number of species: 3697
## - Number of rows: 255255
## - Number of dissimilarity metrics: 2
##
##
## Site1 Site2 Simpson a b c (b + c) / (a + b + c)
## 2 35 36 0.023255814 126 3 741 0.8551724
## 3 35 37 0.031007752 125 4 534 0.8114630
## 4 35 38 0.054263566 122 7 440 0.7855888
## 5 35 39 0.054263566 122 7 501 0.8063492
## 6 35 84 0.720930233 36 93 177 0.8823529
## 7 35 85 0.085271318 118 11 614 0.8411844
## 8 35 86 0.000000000 129 0 753 0.8537415
## 9 35 87 0.007751938 128 1 909 0.8766859
## 10 35 88 0.015503876 127 2 812 0.8650372
## 11 35 89 0.317829457 88 41 177 0.7124183
dissim2 <- similarity_to_dissimilarity(sim)
dissim2[1:10,]
## Data.frame of dissimilarity between sites
## - Total number of sites: 715
## - Total number of species: 3697
## - Number of rows: 255255
## - Number of dissimilarity metrics: 2
##
##
## Site1 Site2 Simpson a b c (b + c) / (a + b + c)
## 2 35 36 0.023255814 126 3 741 0.1448276
## 3 35 37 0.031007752 125 4 534 0.1885370
## 4 35 38 0.054263566 122 7 440 0.2144112
## 5 35 39 0.054263566 122 7 501 0.1936508
## 6 35 84 0.720930233 36 93 177 0.1176471
## 7 35 85 0.085271318 118 11 614 0.1588156
## 8 35 86 0.000000000 129 0 753 0.1462585
## 9 35 87 0.007751938 128 1 909 0.1233141
## 10 35 88 0.015503876 127 2 812 0.1349628
## 11 35 89 0.317829457 88 41 177 0.2875817