This functions is designed to work on a hierarchical tree and cut it
at user-selected heights. It works on either outputs from
hclu_hierarclust
or hclust
objects. It cuts the tree for the chosen
number(s) of clusters or selected height(s). It also includes a procedure to
automatically return the height of cut for the chosen number(s) of clusters.
Usage
cut_tree(
tree,
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0,
dynamic_tree_cut = FALSE,
dynamic_method = "tree",
dynamic_minClusterSize = 5,
dissimilarity = NULL,
...
)
Arguments
- tree
a
bioregion.hierar.tree
or ahclust
object.- n_clust
an
integer
vector or a singleinteger
indicating the number of clusters to be obtained from the hierarchical tree, or the output frombioregionalization_metrics()
. Should not be used at the same time ascut_height
.- cut_height
a
numeric
vector indicating the height(s) at which the tree should be cut. Should not be used at the same time asn_clust
oroptim_method
.- find_h
a
boolean
indicating if the height of cut should be found for the requestedn_clust
.- h_max
a
numeric
value indicating the maximum possible tree height for finding the height of cut whenfind_h = TRUE
.- h_min
a
numeric
value indicating the minimum possible height in the tree for finding the height of cut whenfind_h = TRUE
.- dynamic_tree_cut
a
boolean
indicating if the dynamic tree cut method should be used, in which casen_clust
andcut_height
are ignored.- dynamic_method
a
character
string indicating the method to be used to dynamically cut the tree: either"tree"
(clusters searched only in the tree) or"hybrid"
(clusters searched on both tree and dissimilarity matrix).- dynamic_minClusterSize
an
integer
indicating the minimum cluster size to use in the dynamic tree cut method (see dynamicTreeCut::cutreeDynamic()).- dissimilarity
only useful if
dynamic_method = "hybrid"
. Provide here the dissimilaritydata.frame
used to build thetree
.- ...
further arguments to be passed to dynamicTreeCut::cutreeDynamic() to customize the dynamic tree cut method.
Value
If tree
is an output from hclu_hierarclust()
, then the same
object is returned with content updated (i.e., args
and clusters
). If
tree
is a hclust
object, then a data.frame
containing the clusters is
returned.
Details
The function can cut the tree with two main methods. First, it can cut
the entire tree at the same height (either specified by cut_height
or
automatically defined for the chosen n_clust
). Second, it can use
the dynamic tree cut method (Langfelder et al., 2008), in which
case clusters are detected with an adaptive method based on the shape of
branches in the tree (thus cuts happen at multiple heights depending on
cluster positions in the tree).
The dynamic tree cut method has two variants.
The tree-based only variant (
dynamic_method = "tree"
) is a top-down approach which relies only on the tree and follows the order of clustered objects on it.The hybrid variant (
dynamic_method = "hybrid"
) is a bottom-up approach which relies on both the tree and the dissimilarity matrix to build clusters on the basis of dissimilarity information among sites. This method is useful to detect outlying members in each cluster.
Note
The argument find_h
is ignored if dynamic_tree_cut = TRUE
,
because heights of cut cannot be estimated in this case.
References
Langfelder P, Zhang B & Horvath S (2008) Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R. BIOINFORMATICS, 24(5), 719-720.
Author
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Boris Leroy (leroy.boris@gmail.com)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site", 1:20)
colnames(comat) <- paste0("Species", 1:25)
simil <- similarity(comat, metric = "all")
dissimilarity <- similarity_to_dissimilarity(simil)
# User-defined number of clusters
tree1 <- hclu_hierarclust(dissimilarity,
n_clust = 5)
#> Building the iterative hierarchical consensus tree... Note that this process can take time especially if you have a lot of sites.
#>
#> Final tree has a 0.7262 cophenetic correlation coefficient with the initial dissimilarity
#> matrix
#> Determining the cut height to reach 5 groups...
#> --> 0.265625
tree2 <- cut_tree(tree1, cut_height = .05)
tree3 <- cut_tree(tree1, n_clust = c(3, 5, 10))
#> Determining the cut height to reach 3 groups...
#> --> 0.28125
#> Determining the cut height to reach 5 groups...
#> --> 0.265625
#> Determining the cut height to reach 10 groups...
#> --> 0.1875
tree4 <- cut_tree(tree1, cut_height = c(.05, .1, .15, .2, .25))
tree5 <- cut_tree(tree1, n_clust = c(3, 5, 10), find_h = FALSE)
hclust_tree <- tree2$algorithm$final.tree
clusters_2 <- cut_tree(hclust_tree, n_clust = 10)
#> Determining the cut height to reach 10 groups...
#> --> 0.1875
cluster_dynamic <- cut_tree(tree1, dynamic_tree_cut = TRUE,
dissimilarity = dissimilarity)
#> Some sites were not assigned to any cluster. They will have a NA
#> in the cluster data.frame.