Divisive hierarchical clustering based on dissimilarity or beta-diversity
Source:R/hclu_diana.R
hclu_diana.Rd
This function computes a divisive hierarchical clustering from a
dissimilarity (beta-diversity) data.frame
, calculates the cophenetic
correlation coefficient, and can generate clusters from the tree if requested
by the user. The function implements randomization of the dissimilarity matrix
to generate the tree, with a selection method based on the optimal cophenetic
correlation coefficient. Typically, the dissimilarity data.frame
is a
bioregion.pairwise.metric
object obtained by running similarity
or similarity
followed by similarity_to_dissimilarity
.
Usage
hclu_diana(
dissimilarity,
index = names(dissimilarity)[3],
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0
)
Arguments
- dissimilarity
The output object from
dissimilarity()
orsimilarity_to_dissimilarity()
, or adist
object. If adata.frame
is used, the first two columns represent pairs of sites (or any pair of nodes), and the remaining column(s) contain the dissimilarity indices.- index
The name or number of the dissimilarity column to use. By default, the third column name of
dissimilarity
is used.- n_clust
An
integer
vector or a singleinteger
indicating the number of clusters to be obtained from the hierarchical tree, or the output from bioregionalization_metrics. Should not be used concurrently withcut_height
.- cut_height
A
numeric
vector indicating the height(s) at which the tree should be cut. Should not be used concurrently withn_clust
.- find_h
A
boolean
indicating whether the cutting height should be determined for the requestedn_clust
.- h_max
A
numeric
value indicating the maximum possible tree height for the chosenindex
.- h_min
A
numeric
value indicating the minimum possible height in the tree for the chosenindex
.
Value
A list
of class bioregion.clusters
with five slots:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
describing the characteristics of the clustering process.algorithm: A
list
containing all objects associated with the clustering procedure, such as the original cluster objects.clusters: A
data.frame
containing the clustering results.
Details
The function is based on diana. Chapter 6 of Kaufman & Rousseeuw (1990) fully details the functioning of the diana algorithm.
To find an optimal number of clusters, see bioregionalization_metrics()
References
Kaufman L & Rousseeuw PJ (2009) Finding groups in data: An introduction to cluster analysis. In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis.
See also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html.
Associated functions: cut_tree
Author
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
data("fishmat")
fishdissim <- dissimilarity(fishmat)
fish_diana <- hclu_diana(fishdissim, index = "Simpson")
#> Output tree has a 0.51 cophenetic correlation coefficient with the initial dissimilarity matrix