This functions is designed to work on a hierarchical tree and cut it
at user-selected heights. It works on either outputs from
hclu_hierarclust
or hclust
objects. It cuts the tree for the chosen
number(s) of clusters or selected height(s). It also includes a procedure to
automatically return the height of cut for the chosen number(s) of clusters.
Usage
cut_tree(
tree,
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0,
dynamic_tree_cut = FALSE,
dynamic_method = "tree",
dynamic_minClusterSize = 5,
dissimilarity = NULL,
...
)
Arguments
- tree
a
bioregion.hierar.tree
or ahclust
object- n_clust
an integer or a vector of integers indicating the number of clusters to be obtained from the hierarchical tree, or the output from
partition_metrics()
. Should not be used at the same time ascut_height
- cut_height
a numeric vector indicating the height(s) at which the tree should be cut. Should not be used at the same time as
n_clust
oroptim_method
- find_h
a boolean indicating if the height of cut should be found for the requested
n_clust
- h_max
a numeric indicating the maximum possible tree height for finding the height of cut when
find_h = TRUE
- h_min
a numeric indicating the minimum possible height in the tree for finding the height of cut when
find_h = TRUE
- dynamic_tree_cut
a boolean indicating if the dynamic tree cut method should be used, in which case
n_clust
&cut_height
are ignored- dynamic_method
a character vector indicating the method to be used to dynamically cut the tree: either
"tree"
(clusters searched only in the tree) or"hybrid"
(clusters searched on both tree and dissimilarity matrix)- dynamic_minClusterSize
an integer indicating the minimum cluster size to use in the dynamic tree cut method (see dynamicTreeCut::cutreeDynamic())
- dissimilarity
only useful if
dynamic_method = "hybrid"
. Provide here the dissimilaritydata.frame
used to build thetree
- ...
further arguments to be passed to dynamicTreeCut::cutreeDynamic() to customize the dynamic tree cut method.
Value
If tree
is an output from hclu_hierarclust()
, then the same
object is returned with content updated (i.e., args
and clusters
). If
tree
is a hclust
object, then a data.frame
containing the clusters is
returned.
Details
The function can cut the tree with two main methods. First, it can cut
the entire tree at the same height (either specified by cut_height
or
automatically defined for the chosen n_clust
). Second, it can use
the dynamic tree cut method (Langfelder et al. 2008)
, in which
case clusters are detected with an adaptive method based on the shape of
branches in the tree (thus cuts happen at multiple heights depending on
cluster positions in the tree).
The dynamic tree cut method has two variants.
The tree-based only variant (
dynamic_method = "tree"
) is a top-down approach which relies only on the tree and follows the order of clustered objects on itThe hybrid variant (
dynamic_method = "hybrid"
) is a bottom-up approach which relies on both the tree and the dissimilarity matrix to build clusters on the basis of dissimilarity information among sites. This method is useful to detect outlying members in each cluster.
Note
The argument find_h
is ignored if dynamic_tree_cut = TRUE
,
because heights of cut cannot be estimated in this case.
References
Langfelder P, Zhang B, Horvath S (2008). “Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R.” BIOINFORMATICS, 24(5), 719–720.
Author
Pierre Denelle (pierre.denelle@gmail.com), Maxime Lenormand (maxime.lenormand@inrae.fr) and Boris Leroy (leroy.boris@gmail.com)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site", 1:20)
colnames(comat) <- paste0("Species", 1:25)
simil <- similarity(comat, metric = "all")
dissimilarity <- similarity_to_dissimilarity(simil)
# User-defined number of clusters
tree1 <- hclu_hierarclust(dissimilarity, n_clust = 5)
#> Randomizing the dissimilarity matrix with 30 trials
#> -- range of cophenetic correlation coefficients among
#> trials: 0.83 - 0.85
#> Optimal tree has a 0.85 cophenetic correlation coefficient with the initial dissimilarity
#> matrix
#> Determining the cut height to reach 5 groups...
#> --> 0.171875
tree2 <- cut_tree(tree1, cut_height = .05)
tree3 <- cut_tree(tree1, n_clust = c(3, 5, 10))
#> Determining the cut height to reach 3 groups...
#> --> 0.25
#> Determining the cut height to reach 5 groups...
#> --> 0.171875
#> Determining the cut height to reach 10 groups...
#> --> 0.13671875
tree4 <- cut_tree(tree1, cut_height = c(.05, .1, .15, .2, .25))
tree5 <- cut_tree(tree1, n_clust = c(3, 5, 10), find_h = FALSE)
hclust_tree <- tree2$algorithm$final.tree
clusters_2 <- cut_tree(hclust_tree, n_clust = 10)
#> Determining the cut height to reach 10 groups...
#> --> 0.13671875
cluster_dynamic <- cut_tree(tree1, dynamic_tree_cut = TRUE,
dissimilarity = dissimilarity)
#> Some sites were not assigned to any cluster. They will have a NA
#> in the cluster data.frame.