This function performs non-hierarchical clustering based on dissimilarity using a k-means analysis.
Arguments
- dissimilarity
The output object from
dissimilarity()orsimilarity_to_dissimilarity(), or adistobject. If adata.frameis used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the dissimilarity indices.- index
The name or number of the dissimilarity column to use. By default, the third column name of
dissimilarityis used.- seed
A value for the random number generator (
NULLfor random by default).- n_clust
An
integervector or a singleintegervalue specifying the requested number(s) of clusters.- iter_max
An
integerspecifying the maximum number of iterations for the k-means method (see kmeans).- nstart
An
integerspecifying how many random sets ofn_clustshould be selected as starting points for the k-means analysis (see kmeans).- algorithm
A
characterspecifying the algorithm to use for k-means (see kmeans). Available options are Hartigan-Wong, Lloyd, Forgy, and MacQueen.- algorithm_in_output
A
booleanindicating whether the original output of kmeans should be included in the output. Defaults toTRUE(see Value).
Value
A list of class bioregion.clusters with five components:
name: A
characterstring containing the name of the algorithm.args: A
listof input arguments as provided by the user.inputs: A
listof characteristics of the clustering process.algorithm: A
listof all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE).clusters: A
data.framecontaining the clustering results.
If algorithm_in_output = TRUE, the algorithm slot includes the output of
kmeans.
Details
This method partitions data into k groups such that the sum of squares of Euclidean distances from points to the assigned cluster centers is minimized. K-means cannot be applied directly to dissimilarity or beta-diversity metrics because these distances are not Euclidean. Therefore, it first requires transforming the dissimilarity matrix using Principal Coordinate Analysis (PCoA) with pcoa, and then applying k-means to the coordinates of points in the PCoA.
Because this additional transformation alters the initial dissimilarity matrix, the partitioning around medoids method (nhclu_pam) is preferred.
See also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_dbscan nhclu_pam nhclu_affprop
Author
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
clust <- nhclu_kmeans(dissim, n_clust = 2:10, index = "Simpson")