This function performs non-hierarchical clustering based on dissimilarity using a k-means analysis.
Arguments
- dissimilarity
- The output object from - dissimilarity()or- similarity_to_dissimilarity(), or a- distobject. If a- data.frameis used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the dissimilarity indices.
- index
- The name or number of the dissimilarity column to use. By default, the third column name of - dissimilarityis used.
- seed
- A value for the random number generator ( - NULLfor random by default).
- n_clust
- An - integervector or a single- integervalue specifying the requested number(s) of clusters.
- iter_max
- An - integerspecifying the maximum number of iterations for the k-means method (see kmeans).
- nstart
- An - integerspecifying how many random sets of- n_clustshould be selected as starting points for the k-means analysis (see kmeans).
- algorithm
- A - characterspecifying the algorithm to use for k-means (see kmeans). Available options are Hartigan-Wong, Lloyd, Forgy, and MacQueen.
- algorithm_in_output
- A - booleanindicating whether the original output of kmeans should be included in the output. Defaults to- TRUE(see Value).
Value
A list of class bioregion.clusters with five components:
- name: A - characterstring containing the name of the algorithm.
- args: A - listof input arguments as provided by the user.
- inputs: A - listof characteristics of the clustering process.
- algorithm: A - listof all objects associated with the clustering procedure, such as original cluster objects (only if- algorithm_in_output = TRUE).
- clusters: A - data.framecontaining the clustering results.
If algorithm_in_output = TRUE, the algorithm slot includes the output of
kmeans.
Details
This method partitions data into k groups such that the sum of squares of Euclidean distances from points to the assigned cluster centers is minimized. K-means cannot be applied directly to dissimilarity or beta-diversity metrics because these distances are not Euclidean. Therefore, it first requires transforming the dissimilarity matrix using Principal Coordinate Analysis (PCoA) with pcoa, and then applying k-means to the coordinates of points in the PCoA.
Because this additional transformation alters the initial dissimilarity matrix, the partitioning around medoids method (nhclu_pam) is preferred.
See also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_dbscan nhclu_pam nhclu_affprop
Author
Boris Leroy (leroy.boris@gmail.com) 
Pierre Denelle (pierre.denelle@gmail.com) 
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
clust <- nhclu_kmeans(dissim, n_clust = 2:10, index = "Simpson")