# Divisive hierarchical clustering based on dissimilarity or beta-diversity

Source:`R/hclu_diana.R`

`hclu_diana.Rd`

This function computes a divisive hierarchical clustering from a
dissimilarity (beta-diversity) `data.frame`

, calculates the cophenetic correlation
coefficient, and can get clusters from the tree if requested by the user.
The function implements randomization of the dissimilarity matrix to
generate the tree, with a selection method based on the optimal cophenetic
correlation coefficient. Typically, the dissimilarity `data.frame`

is a
`bioregion.pairwise.metric`

object obtained by running `similarity`

or `similarity`

and then `similarity_to_dissimilarity`

.

## Usage

```
hclu_diana(
dissimilarity,
index = names(dissimilarity)[3],
n_clust = NULL,
cut_height = NULL,
find_h = TRUE,
h_max = 1,
h_min = 0
)
```

## Arguments

- dissimilarity
the output object from

`dissimilarity()`

or`similarity_to_dissimilarity()`

, or a`dist`

object. If a`data.frame`

is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.- index
name or number of the dissimilarity column to use. By default, the third column name of

`dissimilarity`

is used.- n_clust
an

`integer`

or an`integer`

vector indicating the number of clusters to be obtained from the hierarchical tree, or the output from partition_metrics. Should not be used at the same time as`cut_height`

.- cut_height
a

`numeric`

vector indicating the height(s) at which the tree should be cut. Should not be used at the same time as`n_clust`

.- find_h
a

`boolean`

indicating if the height of cut should be found for the requested`n_clust`

.- h_max
a

`numeric`

indicating the maximum possible tree height for the chosen`index`

.- h_min
a

`numeric`

indicating the minimum possible height in the tree for the chosen`index`

.

## Value

A `list`

of class `bioregion.clusters`

with five slots:

**name**:`character`

containing the name of the algorithm**args**:`list`

of input arguments as provided by the user**inputs**:`list`

of characteristics of the clustering process**algorithm**:`list`

of all objects associated with the clustering procedure, such as original cluster objects**clusters**:`data.frame`

containing the clustering results

## Details

The function is based on diana. Chapter 6 of Kaufman and Rousseeuw (1990) fully details the functioning of the diana algorithm.

To find an optimal number of clusters, see `partition_metrics()`

## References

Kaufman L, Rousseeuw PJ (2009).
“Finding groups in data: An introduction to cluster analysis.”
In & Sons. JW (ed.), *Finding groups in data: An introduction to cluster analysis.*.

## Author

Pierre Denelle (pierre.denelle@gmail.com), Boris Leroy (leroy.boris@gmail.com) and Maxime Lenormand (maxime.lenormand@inrae.fr)

## Examples

```
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
data("fishmat")
fishdissim <- dissimilarity(fishmat)
fish_diana <- hclu_diana(fishdissim, index = "Simpson")
#> Output tree has a 0.55 cophenetic correlation coefficient with the initial
#> dissimilarity matrix
```