This function performs non hierarchical clustering on the basis of dissimilarity with partitioning around medoids, using the Clustering Large Applications (CLARA) algorithm.
Arguments
- dissimilarity
the output object from
dissimilarity()
orsimilarity_to_dissimilarity()
, or adist
object. If adata.frame
is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.- index
name or number of the dissimilarity column to use. By default, the third column name of
dissimilarity
is used.- seed
for the random number generator (NULL for random by default).
- n_clust
an
integer
or aninteger
vector specifying the requested number(s) of clusters.- maxiter
an
integer
defining the maximum number of iterations.- initializer
a
character
, either 'BUILD' (used in classic PAM algorithm) or 'LAB' (linear approximative BUILD).- fasttol
positive
numeric
defining the tolerance for fast swapping behavior, set to 1 by default.- numsamples
positive
integer
defining the number of samples to draw.- sampling
positive
numeric
defining the sampling rate.- independent
a
boolean
indicating that the previous medoids are not kept in the next sample (FALSE by default).- algorithm_in_output
a
boolean
indicating if the original output of fastclara should be returned in the output (TRUE
by default, see Value).
Value
A list
of class bioregion.clusters
with five slots:
name:
character
containing the name of the algorithmargs:
list
of input arguments as provided by the userinputs:
list
of characteristics of the clustering processalgorithm:
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
)clusters:
data.frame
containing the clustering results
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find the output of
fastclara.
Details
Based on fastkmedoids package (fastclara).
References
Schubert E, Rousseeuw PJ (2019). “Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms.” Similarity Search and Applications, 11807, 171–187.
Author
Pierre Denelle (pierre.denelle@gmail.com), Boris Leroy (leroy.boris@gmail.com), and Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
clust1 <- nhclu_clara(dissim, index = "Simpson", n_clust = 5)
partition_metrics(clust1, dissimilarity = dissim,
eval_metric = "pc_distance")
#> Computing similarity-based metrics...
#> - pc_distance OK
#> Partition metrics:
#> - 1 partition(s) evaluated
#> - Range of clusters explored: from 4 to 4
#> - Requested metric(s): pc_distance
#> - Metric summary:
#> pc_distance
#> Min 0.470003
#> Mean 0.470003
#> Max 0.470003
#>
#> Access the data.frame of metrics with your_object$evaluation_df