Divisive hierarchical clustering based on dissimilarity or beta-diversity
Source:R/hclu_diana.R
hclu_diana.RdThis function computes a divisive hierarchical clustering from a
dissimilarity (beta-diversity) data.frame, calculates the cophenetic
correlation coefficient, and can generate clusters from the tree if requested
by the user. The function implements randomization of the dissimilarity matrix
to generate the tree, with a selection method based on the optimal cophenetic
correlation coefficient. Typically, the dissimilarity data.frame is a
bioregion.pairwise object obtained by running similarity
or similarity followed by similarity_to_dissimilarity.
Usage
hclu_diana(
dissimilarity,
index = names(dissimilarity)[3],
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0
)Arguments
- dissimilarity
The output object from
dissimilarity()orsimilarity_to_dissimilarity(), or adistobject. If adata.frameis used, the first two columns represent pairs of sites (or any pair of nodes), and the remaining column(s) contain the dissimilarity indices.- index
The name or number of the dissimilarity column to use. By default, the third column name of
dissimilarityis used.- n_clust
An
integervector or a singleintegerindicating the number of clusters to be obtained from the hierarchical tree, or the output from bioregionalization_metrics. Should not be used concurrently withcut_height.- cut_height
A
numericvector indicating the height(s) at which the tree should be cut. Should not be used concurrently withn_clust.- find_h
A
booleanindicating whether the cutting height should be determined for the requestedn_clust.- h_max
A
numericvalue indicating the maximum possible tree height for the chosenindex.- h_min
A
numericvalue indicating the minimum possible height in the tree for the chosenindex.
Value
A list of class bioregion.clusters with five slots:
name: A
characterstring containing the name of the algorithm.args: A
listof input arguments as provided by the user.inputs: A
listdescribing the characteristics of the clustering process.algorithm: A
listcontaining all objects associated with the clustering procedure, such as the original cluster objects.clusters: A
data.framecontaining the clustering results.
Details
The function is based on diana. Chapter 6 of Kaufman & Rousseeuw (1990) fully details the functioning of the diana algorithm.
To find an optimal number of clusters, see bioregionalization_metrics()
References
Kaufman L & Rousseeuw PJ (2009) Finding groups in data: An introduction to cluster analysis. In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis.
See also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html.
Associated functions: cut_tree
Author
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
data("fishmat")
fishdissim <- dissimilarity(fishmat)
fish_diana <- hclu_diana(fishdissim, index = "Simpson")
#> Output tree has a 0.51 cophenetic correlation coefficient with the initial dissimilarity matrix