This function performs non hierarchical clustering on the basis of dissimilarity with a k-means analysis.
Arguments
- dissimilarity
the output object from
dissimilarity()
orsimilarity_to_dissimilarity()
, or adist
object. If adata.frame
is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.- index
name or number of the dissimilarity column to use. By default, the third column name of
dissimilarity
is used.- seed
for the random number generator (NULL for random by default).
- n_clust
an
integer
or aninteger
vector specifying the requested number(s) of clusters- iter_max
an
integer
specifying the maximum number of iterations for the kmeans method (see kmeans)- nstart
an
integer
specifying how many random sets ofn_clust
should be selected as starting points for the kmeans analysis (see kmeans)- algorithm
a
character
specifying the algorithm to use for kmean (see kmeans). Available options are Hartigan-Wong, Lloyd, Forgy and MacQueen.- algorithm_in_output
a
boolean
indicating if the original output of kmeans should be returned in the output (TRUE
by default, see Value).
Value
A list
of class bioregion.clusters
with five slots:
name:
character
containing the name of the algorithmargs:
list
of input arguments as provided by the userinputs:
list
of characteristics of the clustering processalgorithm:
list
of all objects associated with the clustering procedure, such as original cluster objectsclusters:
data.frame
containing the clustering results
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find the output of
kmeans.
Details
This method partitions the data into k groups such that that the sum of squares of euclidean distances from points to the assigned cluster centers is minimized. k-means cannot be applied directly on dissimilarity/beta-diversity metrics, because these distances are not euclidean. Therefore, it requires first to transform the dissimilarity matrix with a Principal Coordinate Analysis (using the function pcoa), and then applying k-means on the coordinates of points in the PCoA. Because this makes an additional transformation of the initial matrix of dissimilarity, the partitioning around medoids method should be preferred (nhclu_pam)
Author
Boris Leroy (leroy.boris@gmail.com), Pierre Denelle (pierre.denelle@gmail.com) and Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
clust1 <- nhclu_kmeans(dissim, n_clust = 2:10, index = "Simpson")
clust2 <- nhclu_kmeans(dissim, n_clust = 2:15, index = "Simpson")
partition_metrics(clust2, dissimilarity = dissim,
eval_metric = "pc_distance")
#> Computing similarity-based metrics...
#> - pc_distance OK
#> Partition metrics:
#> - 14 partition(s) evaluated
#> - Range of clusters explored: from 2 to 15
#> - Requested metric(s): pc_distance
#> - Metric summary:
#> pc_distance
#> Min 0.3119983
#> Mean 0.8514059
#> Max 0.9809241
#>
#> Access the data.frame of metrics with your_object$evaluation_df
partition_metrics(clust2, net = comnet, species_col = "Node2",
site_col = "Node1", eval_metric = "avg_endemism")
#> Computing composition-based metrics...
#> - avg_endemism OK
#> Partition metrics:
#> - 14 partition(s) evaluated
#> - Range of clusters explored: from 2 to 15
#> - Requested metric(s): avg_endemism
#> - Metric summary:
#> avg_endemism
#> Min 0
#> Mean 0
#> Max 0
#>
#> Access the data.frame of metrics with your_object$evaluation_df
#> Details of endemism % for each partition are available in
#> your_object$endemism_results