Divisive hierarchical clustering based on dissimilarity or beta-diversity
Source:R/hclu_diana.R
hclu_diana.Rd
This function computes a divisive hierarchical clustering from a
dissimilarity (beta-diversity) data.frame
, calculates the cophenetic
correlation coefficient, and can get clusters from the tree if requested by
the user. The function implements randomization of the dissimilarity matrix
to generate the tree, with a selection method based on the optimal cophenetic
correlation coefficient. Typically, the dissimilarity data.frame
is a
bioregion.pairwise.metric
object obtained by running similarity
or similarity
and then similarity_to_dissimilarity
.
Usage
hclu_diana(
dissimilarity,
index = names(dissimilarity)[3],
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0
)
Arguments
- dissimilarity
the output object from
dissimilarity()
orsimilarity_to_dissimilarity()
, or adist
object. If adata.frame
is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.- index
name or number of the dissimilarity column to use. By default, the third column name of
dissimilarity
is used.- n_clust
an
integer
vector or a singleinteger
indicating the number of clusters to be obtained from the hierarchical tree, or the output from bioregionalization_metrics. Should not be used at the same time ascut_height
.- cut_height
a
numeric
vector indicating the height(s) at which the tree should be cut. Should not be used at the same time asn_clust
.- find_h
a
boolean
indicating if the height of cut should be found for the requestedn_clust
.- h_max
a
numeric
indicating the maximum possible tree height for the chosenindex
.- h_min
a
numeric
indicating the minimum possible height in the tree for the chosenindex
.
Value
A list
of class bioregion.clusters
with five slots:
name:
character
containing the name of the algorithmargs:
list
of input arguments as provided by the userinputs:
list
of characteristics of the clustering processalgorithm:
list
of all objects associated with the clustering procedure, such as original cluster objectsclusters:
data.frame
containing the clustering results
Details
The function is based on diana. Chapter 6 of Kaufman & Rousseeuw (1990) fully details the functioning of the diana algorithm.
To find an optimal number of clusters, see bioregionalization_metrics()
References
Kaufman L & Rousseeuw PJ (2009) Finding groups in data: An introduction to cluster analysis. In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis.
Author
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
data("fishmat")
fishdissim <- dissimilarity(fishmat)
fish_diana <- hclu_diana(fishdissim, index = "Simpson")
#> Output tree has a 0.51 cophenetic correlation coefficient with the initial
#> dissimilarity matrix