# Compute similarity metrics between sites based on species composition

Source:`R/similarity.R`

`similarity.Rd`

This function creates a `data.frame`

where each row provides one or
several similarity metric(s) between each pair of sites from a co-occurrence
`matrix`

with sites as rows and species as columns.

## Arguments

- comat
a co-occurrence

`matrix`

with sites as rows and species as columns.- metric
a

`character`

vector indicating which metrics to chose (see Details). Available options are*abc*,*ABC*,*Jaccard*,*Jaccardturn*,*Sorensen*,*Simpson*,*Bray*,*Brayturn*or*Euclidean*.

If`"all"`

is specified, then all metrics will be calculated. Can be set to`NULL`

if`formula`

is used.- formula
a

`character`

vector with your own formula(s) based on the`a`

,`b`

,`c`

,`A`

,`B`

, and`C`

quantities (see Details).`formula`

is set to`NULL`

by default.- method
a string indicating what method should be used to compute

`abc`

(see Details).`method = "prodmat"`

by default is more efficient but can be greedy in memory and`method = "loops"`

is less efficient but less greedy in memory.

## Value

A `data.frame`

with additional class
`bioregion.pairwise.metric`

, providing one or several similarity
metric(s) between each pair of sites. The two first columns represent each
pair of sites.
One column per similarity metric provided in `metric`

and
`formula`

except for the metric *abc* and *ABC* that are
stored in three columns (one for each letter).

## Details

With `a`

the number of species shared by a pair of sites, `b`

species only present in the first site and `c`

species only present in
the second site.

\(Jaccard = 1 - (b + c) / (a + b + c)\)

\(Jaccardturn = 1 - 2min(b, c) / (a + 2min(b, c))\) (Baselga 2012)

\(Sorensen = 1 - (b + c) / (2a + b + c)\)

\(Simpson = 1 - min(b, c) / (a + min(b, c))\)

If abundances data are available, Bray-Curtis and its turnover component can also be computed with the following equation:

\(Bray = 1 - (B + C) / (2A + B + C)\)

\(Brayturn = 1 - min(B, C)/(A + min(B, C))\) (Baselga 2013)

with A the sum of the lesser values for common species shared by a pair of sites. B and C are the total number of specimens counted at both sites minus A.

`formula`

can be used to compute customized metrics with the terms
`a`

, `b`

, `c`

, `A`

, `B`

, and `C`

. For example
`formula = c("1 - pmin(b,c) / (a + pmin(b,c))", "1 - (B + C) / (2*A + B + C)")`

will compute the Simpson and Bray-Curtis similarity metrics, respectively.
**Note that pmin is used in the Simpson formula because a, b, c, A, B and C
are numeric vectors.**

Euclidean computes the Euclidean similarity between each pair of site following this equation:

\(Euclidean = 1 / (1 + d_{ij})\)

Where \(d_{ij}\) is the Euclidean distance between site i and site j in terms of species composition.

## References

Baselga A (2012).
“The Relationship between Species Replacement, Dissimilarity Derived from Nestedness, and Nestedness.”
*Global Ecology and Biogeography*, **21**(12), 1223--1232.

Baselga A (2013).
“Separating the two components of abundance-based dissimilarity: balanced changes in abundance vs. abundance gradients.”
*Methods in Ecology and Evolution*, **4**(6), 552--557.

## Author

Maxime Lenormand (maxime.lenormand@inrae.fr), Pierre Denelle (pierre.denelle@gmail.com) and Boris Leroy (leroy.boris@gmail.com)

## Examples

```
comat <- matrix(sample(0:1000, size = 50, replace = TRUE,
prob = 1 / 1:1001), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
sim <- similarity(comat, metric = c("abc", "ABC", "Simpson", "Brayturn"))
sim <- similarity(comat, metric = "all",
formula = "1 - (b + c) / (a + b + c)")
```