This function performs non-hierarchical clustering based on dissimilarity using a k-means analysis.
Arguments
- dissimilarity
The output object from
dissimilarity()
orsimilarity_to_dissimilarity()
, or adist
object. If adata.frame
is used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the dissimilarity indices.- index
The name or number of the dissimilarity column to use. By default, the third column name of
dissimilarity
is used.- seed
A value for the random number generator (
NULL
for random by default).- n_clust
An
integer
vector or a singleinteger
value specifying the requested number(s) of clusters.- iter_max
An
integer
specifying the maximum number of iterations for the k-means method (see kmeans).- nstart
An
integer
specifying how many random sets ofn_clust
should be selected as starting points for the k-means analysis (see kmeans).- algorithm
A
character
specifying the algorithm to use for k-means (see kmeans). Available options are Hartigan-Wong, Lloyd, Forgy, and MacQueen.- algorithm_in_output
A
boolean
indicating whether the original output of kmeans should be included in the output. Defaults toTRUE
(see Value).
Value
A list
of class bioregion.clusters
with five components:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
If algorithm_in_output = TRUE
, the algorithm
slot includes the output of
kmeans.
Details
This method partitions data into k groups such that the sum of squares of Euclidean distances from points to the assigned cluster centers is minimized. K-means cannot be applied directly to dissimilarity or beta-diversity metrics because these distances are not Euclidean. Therefore, it first requires transforming the dissimilarity matrix using Principal Coordinate Analysis (PCoA) with pcoa, and then applying k-means to the coordinates of points in the PCoA.
Because this additional transformation alters the initial dissimilarity matrix, the partitioning around medoids method (nhclu_pam) is preferred.
See also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_dbscan nhclu_pam nhclu_affprop
Author
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
clust1 <- nhclu_kmeans(dissim, n_clust = 2:10, index = "Simpson")
clust2 <- nhclu_kmeans(dissim, n_clust = 2:15, index = "Simpson")
bioregionalization_metrics(clust2, dissimilarity = dissim,
eval_metric = "pc_distance")
#> Computing similarity-based metrics...
#> - pc_distance OK
#> Partition metrics:
#> - 14 partition(s) evaluated
#> - Range of clusters explored: from 2 to 15
#> - Requested metric(s): pc_distance
#> - Metric summary:
#> pc_distance
#> Min 0.2764929
#> Mean 0.8465581
#> Max 0.9770536
#>
#> Access the data.frame of metrics with your_object$evaluation_df
bioregionalization_metrics(clust2, net = comnet, species_col = "Node2",
site_col = "Node1", eval_metric = "avg_endemism")
#> Computing composition-based metrics...
#> - avg_endemism OK
#> Partition metrics:
#> - 14 partition(s) evaluated
#> - Range of clusters explored: from 2 to 15
#> - Requested metric(s): avg_endemism
#> - Metric summary:
#> avg_endemism
#> Min 0
#> Mean 0
#> Max 0
#>
#> Access the data.frame of metrics with your_object$evaluation_df
#> Details of endemism % for each partition are available in
#> your_object$endemism_results