This function is designed to work on a hierarchical tree and cut it
at user-selected heights. It works with outputs from either
hclu_hierarclust
or hclust
objects. The function allows for cutting
the tree based on the chosen number(s) of clusters or specified height(s).
Additionally, it includes a procedure to automatically determine the cutting
height for the requested number(s) of clusters.
Usage
cut_tree(
tree,
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0,
dynamic_tree_cut = FALSE,
dynamic_method = "tree",
dynamic_minClusterSize = 5,
dissimilarity = NULL,
...
)
Arguments
- tree
A
bioregion.hierar.tree
or anhclust
object.- n_clust
An
integer
vector or a singleinteger
indicating the number of clusters to be obtained from the hierarchical tree, or the output frombioregionalization_metrics()
. This should not be used concurrently withcut_height
.- cut_height
A
numeric
vector specifying the height(s) at which the tree should be cut. This should not be used concurrently withn_clust
oroptim_method
.- find_h
A
boolean
indicating whether the cutting height should be determined for the requestedn_clust
.- h_max
A
numeric
value indicating the maximum possible tree height for determining the cutting height whenfind_h = TRUE
.- h_min
A
numeric
value specifying the minimum possible height in the tree for determining the cutting height whenfind_h = TRUE
.- dynamic_tree_cut
A
boolean
indicating whether the dynamic tree cut method should be used. IfTRUE
,n_clust
andcut_height
are ignored.- dynamic_method
A
character
string specifying the method to be used for dynamically cutting the tree: either"tree"
(clusters searched only within the tree) or"hybrid"
(clusters searched in both the tree and the dissimilarity matrix).- dynamic_minClusterSize
An
integer
indicating the minimum cluster size for the dynamic tree cut method (see dynamicTreeCut::cutreeDynamic()).- dissimilarity
Relevant only if
dynamic_method = "hybrid"
. Provide the dissimilaritydata.frame
used to build thetree
.- ...
Additional arguments passed to dynamicTreeCut::cutreeDynamic() to customize the dynamic tree cut method.
Value
If tree
is an output from hclu_hierarclust()
, the same
object is returned with updated content (i.e., args
and clusters
). If
tree
is an hclust
object, a data.frame
containing the clusters is
returned.
Details
The function supports two main methods for cutting the tree. First, the tree
can be cut at a uniform height (specified by cut_height
or determined
automatically for the requested n_clust
). Second, the dynamic tree cut
method (Langfelder et al., 2008) can be applied, which adapts to the shape
of branches in the tree, cutting at varying heights based on cluster
positions.
The dynamic tree cut method has two variants:
The tree-based variant (
dynamic_method = "tree"
) uses a top-down approach, relying solely on the tree and the order of clustered objects.The hybrid variant (
dynamic_method = "hybrid"
) employs a bottom-up approach, leveraging both the tree and the dissimilarity matrix to identify clusters based on dissimilarity among sites. This approach is useful for detecting outliers within clusters.
Note
The find_h
argument is ignored if dynamic_tree_cut = TRUE
,
as cutting heights cannot be determined in this case.
References
Langfelder P, Zhang B & Horvath S (2008) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. BIOINFORMATICS 24, 719-720.
See also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_1_hierarchical_clustering.html.
Associated functions: hclu_hierarclust
Author
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Boris Leroy (leroy.boris@gmail.com)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site", 1:20)
colnames(comat) <- paste0("Species", 1:25)
simil <- similarity(comat, metric = "all")
dissimilarity <- similarity_to_dissimilarity(simil)
# User-defined number of clusters
tree1 <- hclu_hierarclust(dissimilarity,
n_clust = 5)
#> Building the iterative hierarchical consensus tree... Note that this process can take time especially if you have a lot of sites.
#>
#> Final tree has a 0.824 cophenetic correlation coefficient with the initial dissimilarity matrix
#> Determining the cut height to reach 5 groups...
#> --> 0.28125
tree2 <- cut_tree(tree1, cut_height = .05)
tree3 <- cut_tree(tree1, n_clust = c(3, 5, 10))
#> Determining the cut height to reach 3 groups...
#> --> 0.375
#> Determining the cut height to reach 5 groups...
#> --> 0.28125
#> Determining the cut height to reach 10 groups...
#> --> 0.21875
tree4 <- cut_tree(tree1, cut_height = c(.05, .1, .15, .2, .25))
tree5 <- cut_tree(tree1, n_clust = c(3, 5, 10), find_h = FALSE)
hclust_tree <- tree2$algorithm$final.tree
clusters_2 <- cut_tree(hclust_tree, n_clust = 10)
#> Determining the cut height to reach 10 groups...
#> --> 0.21875
cluster_dynamic <- cut_tree(tree1, dynamic_tree_cut = TRUE,
dissimilarity = dissimilarity)
#> Some sites were not assigned to any cluster. They will have a NA in the cluster data.frame.