Skip to contents

This functions is designed to work on a hierarchical tree and cut it at user-selected heights. It works on either outputs from hclu_hierarclust or hclust objects. It cuts the tree for the chosen number(s) of clusters or selected height(s). It also includes a procedure to automatically return the height of cut for the chosen number(s) of clusters.

Usage

cut_tree(
  tree,
  n_clust = NULL,
  cut_height = NULL,
  find_h = TRUE,
  h_max = 1,
  h_min = 0,
  dynamic_tree_cut = FALSE,
  dynamic_method = "tree",
  dynamic_minClusterSize = 5,
  dissimilarity = NULL,
  ...
)

Arguments

tree

a bioregion.hierar.tree or a hclust object

n_clust

an integer or a vector of integers indicating the number of clusters to be obtained from the hierarchical tree, or the output from partition_metrics(). Should not be used at the same time as cut_height

cut_height

a numeric vector indicating the height(s) at which the tree should be cut. Should not be used at the same time as n_clust or optim_method

find_h

a boolean indicating if the height of cut should be found for the requested n_clust

h_max

a numeric indicating the maximum possible tree height for finding the height of cut when find_h = TRUE

h_min

a numeric indicating the minimum possible height in the tree for finding the height of cut when find_h = TRUE

dynamic_tree_cut

a boolean indicating if the dynamic tree cut method should be used, in which case n_clust & cut_height are ignored

dynamic_method

a character vector indicating the method to be used to dynamically cut the tree: either "tree" (clusters searched only in the tree) or "hybrid" (clusters searched on both tree and dissimilarity matrix)

dynamic_minClusterSize

an integer indicating the minimum cluster size to use in the dynamic tree cut method (see dynamicTreeCut::cutreeDynamic())

dissimilarity

only useful if dynamic_method = "hybrid". Provide here the dissimilarity data.frame used to build the tree

...

further arguments to be passed to dynamicTreeCut::cutreeDynamic() to customize the dynamic tree cut method.

Value

If tree is an output from hclu_hierarclust(), then the same object is returned with content updated (i.e., args and clusters). If tree is a hclust object, then a data.frame containing the clusters is returned.

Details

The function can cut the tree with two main methods. First, it can cut the entire tree at the same height (either specified by cut_height or automatically defined for the chosen n_clust). Second, it can use the dynamic tree cut method (Langfelder et al. 2008) , in which case clusters are detected with an adaptive method based on the shape of branches in the tree (thus cuts happen at multiple heights depending on cluster positions in the tree).

The dynamic tree cut method has two variants.

  • The tree-based only variant (dynamic_method = "tree") is a top-down approach which relies only on the tree and follows the order of clustered objects on it

  • The hybrid variant (dynamic_method = "hybrid") is a bottom-up approach which relies on both the tree and the dissimilarity matrix to build clusters on the basis of dissimilarity information among sites. This method is useful to detect outlying members in each cluster.

Note

The argument find_h is ignored if dynamic_tree_cut = TRUE, because heights of cut cannot be estimated in this case.

References

Langfelder P, Zhang B, Horvath S (2008). “Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut package for R.” BIOINFORMATICS, 24(5), 719–720.

See also

Author

Pierre Denelle (pierre.denelle@gmail.com), Maxime Lenormand (maxime.lenormand@inrae.fr) and Boris Leroy (leroy.boris@gmail.com)

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site", 1:20)
colnames(comat) <- paste0("Species", 1:25)

simil <- similarity(comat, metric = "all")
dissimilarity <- similarity_to_dissimilarity(simil)

# User-defined number of clusters
tree1 <- hclu_hierarclust(dissimilarity, n_clust = 5)
#> Randomizing the dissimilarity matrix with 30 trials
#>  -- range of cophenetic correlation coefficients among
#>                      trials: 0.83 - 0.85
#> Optimal tree has a 0.85 cophenetic correlation coefficient with the initial dissimilarity
#>       matrix
#> Determining the cut height to reach 5 groups...
#> --> 0.171875
tree2 <- cut_tree(tree1, cut_height = .05)
tree3 <- cut_tree(tree1, n_clust = c(3, 5, 10))
#> Determining the cut height to reach 3 groups...
#> --> 0.25
#> Determining the cut height to reach 5 groups...
#> --> 0.171875
#> Determining the cut height to reach 10 groups...
#> --> 0.13671875
tree4 <- cut_tree(tree1, cut_height = c(.05, .1, .15, .2, .25))
tree5 <- cut_tree(tree1, n_clust = c(3, 5, 10), find_h = FALSE)

hclust_tree <- tree2$algorithm$final.tree
clusters_2 <- cut_tree(hclust_tree, n_clust = 10)
#> Determining the cut height to reach 10 groups...
#> --> 0.13671875

cluster_dynamic <- cut_tree(tree1, dynamic_tree_cut = TRUE,
                            dissimilarity = dissimilarity)
#> Some sites were not assigned to any cluster. They will have a NA
#>               in the cluster data.frame.