Calculate metrics for one or several partitions
Source:R/bioregionalization_metrics.R
bioregionalization_metrics.Rd
This function aims at calculating metrics for one or several partitions,
usually on outputs from netclu_
, hclu_
or nhclu_
functions. Metrics
may require the users to provide either a similarity or dissimilarity
matrix, or to provide the initial species-site table.
Usage
bioregionalization_metrics(
cluster_object,
dissimilarity = NULL,
dissimilarity_index = NULL,
net = NULL,
site_col = 1,
species_col = 2,
eval_metric = c("pc_distance", "anosim", "avg_endemism", "tot_endemism")
)
Arguments
- cluster_object
a
bioregion.clusters
object.- dissimilarity
a
dist
object or abioregion.pairwise.metric
object (output fromsimilarity_to_dissimilarity()
). Necessary ifeval_metric
includespc_distance
andtree
is not abioregion.hierar.tree
object- dissimilarity_index
a
character
string indicating the dissimilarity (beta-diversity) index to be used in casedist
is adata.frame
with multiple dissimilarity indices.- net
the species-site network (i.e., bipartite network). Should be provided as
data.frame± if
eval_metricincludes
"avg_endemism"or
"tot_endemism"`.- site_col
name or number for the column of site nodes (i.e. primary nodes). Should be provided if
eval_metric
includes"avg_endemism"
or"tot_endemism"
.- species_col
name or number for the column of species nodes (i.e. feature nodes). Should be provided if
eval_metric
includes"avg_endemism"
or"tot_endemism"
.- eval_metric
a
character
vector or a singlecharacter
string indicating metric(s) to be calculated to investigate the effect of different number of clusters. Available options are"pc_distance"
,"anosim"
,"avg_endemism"
or"tot_endemism"
.
Value
a list
of class bioregion.partition.metrics
with two to three elements:
args
: input argumentsevaluation_df
: the data.frame containingeval_metric
for all explored numbers of clustersendemism_results
: if endemism calculations were requested, a list with the endemism results for each partition
Details
Evaluation metrics:
pc_distance
: this metric is the method used by Holt et al. (2013). It is a ratio of the between-cluster sum of dissimilarity (beta-diversity) versus the total sum of dissimilarity (beta-diversity) for the full dissimilarity matrix. In other words, it is calculated on the basis of two elements. First, the total sum of dissimilarity is calculated by summing the entire dissimilarity matrix (dist
). Second, the between-cluster sum of dissimilarity is calculated as follows: for a given number of cluster, the dissimilarity is only summed between clusters, not within clusters. To do that efficiently, all pairs of sites within the same clusters have their dissimilarity set to zero in the dissimilarity matrix, and then the dissimilarity matrix is summed. Thepc_distance
ratio is obtained by dividing the between-cluster sum of dissimilarity by the total sum of dissimilarity.anosim
: This metric is the statistic used in Analysis of Similarities, as suggested in Castro-Insua et al. (2018) (see vegan::anosim()). It compares the between-cluster dissimilarities to the within-cluster dissimilarities. It is based based on the difference of mean ranks between groups and within groups with the following formula: R = (r_B - r_W)/(N (N-1) / 4), where r_B and r_W are the average ranks between and within clusters respectively, and N is the total number of sites. Note that the function does not estimate the significance here, it only computes the statistic - for significance testing see vegan::anosim().avg_endemism
: this metric is the average percentage of endemism in clusters as recommended by Kreft & Jetz (2010). Calculated as follows: End_mean = sum_i (E_i / S_i)/K where E_i is the number of endemic species in cluster i, S_i is the number of species in cluster i, and K the maximum number of clusters.tot_endemism
: this metric is the total endemism across all clusters, as recommended by Kreft & Jetz (2010). Calculated as follows: End_tot = E \ C where E is total the number of endemics (i.e., species found in only one cluster) and C is the number of non-endemic species.
References
Castro-Insua A, Gómez-Rodríguez C & Baselga A (2018) Dissimilarity measures affected by richness differences yield biased delimitations of biogeographic realms. Nature Communications, 9(1), 9-11.
Holt BG, Lessard J, Borregaard MK, Fritz SA, Araújo MB, Dimitrov D, Fabre P, Graham CH, Graves GR, Jønsson Ka, Nogués-Bravo D, Wang Z, Whittaker RJ, Fjeldså J & Rahbek C (2013) An update of Wallace's zoogeographic regions of the world. Science, 339(6115), 74-78.
Kreft H & Jetz W (2010) A framework for delineating biogeographical regions based on species distributions. Journal of Biogeography, 37, 2029-2053.
Author
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Pierre Denelle (pierre.denelle@gmail.com)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
# User-defined number of clusters
tree1 <- hclu_hierarclust(dissim,
n_clust = 2:20, index = "Simpson")
#> Building the iterative hierarchical consensus tree... Note that this process can take time especially if you have a lot of sites.
#>
#> Final tree has a 0.5241 cophenetic correlation coefficient with the initial dissimilarity
#> matrix
#> Warning: The requested number of cluster could not be found
#> for k = 16. Closest number found: 15
#> Warning: The requested number of cluster could not be found
#> for k = 17. Closest number found: 15
#> Warning: The requested number of cluster could not be found
#> for k = 18. Closest number found: 15
#> Warning: The requested number of cluster could not be found
#> for k = 19. Closest number found: 15
#> Warning: The requested number of cluster could not be found
#> for k = 20. Closest number found: 15
tree1
#> Clustering results for algorithm : hclu_hierarclust
#> (hierarchical clustering based on a dissimilarity matrix)
#> - Number of sites: 20
#> - Name of dissimilarity metric: Simpson
#> - Tree construction method: average
#> - Randomization of the dissimilarity matrix: yes, number of trials 100
#> - Method to compute the final tree: Iterative consensus hierarchical tree
#> - Cophenetic correlation coefficient: 0.524
#> - Number of clusters requested by the user: 2 3 4 5 6 7 8 9 10 11 ... (with 9 more values)
#> Clustering results:
#> - Number of partitions: 19
#> - Partitions are hierarchical
#> - Number of clusters: 2 3 4 5 6 7 8 9 10 11 ... (with 9 more values)
#> - Height of cut of the hierarchical tree: 0.125 0.121 0.109 0.102 0.094 0.088 0.078 0.07 0.062 0.055 ... (with 9 more values)
a <- bioregionalization_metrics(tree1, dissimilarity = dissim, net = comnet,
site_col = "Node1", species_col = "Node2",
eval_metric = c("tot_endemism", "avg_endemism",
"pc_distance", "anosim"))
#> Computing similarity-based metrics...
#> - pc_distance OK
#> - anosim OK
#> Computing composition-based metrics...
#> - avg_endemism OK
#> - tot_endemism OK
a
#> Partition metrics:
#> - 19 partition(s) evaluated
#> - Range of clusters explored: from 2 to 15
#> - Requested metric(s): tot_endemism avg_endemism pc_distance anosim
#> - Metric summary:
#> tot_endemism avg_endemism pc_distance anosim
#> Min 0.00000000 0.000000000 0.1333886 0.3764235
#> Mean 0.01052632 0.004912281 0.8232435 0.7230993
#> Max 0.16000000 0.080000000 1.0000000 0.9565217
#>
#> Access the data.frame of metrics with your_object$evaluation_df
#> Details of endemism % for each partition are available in
#> your_object$endemism_results