This function performs non-hierarchical clustering based on dissimilarity using partitioning around medoids, implemented via the Clustering Large Applications (CLARA) algorithm.
Arguments
- dissimilarity
The output object from
dissimilarity()
orsimilarity_to_dissimilarity()
, or adist
object. If adata.frame
is used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the dissimilarity indices.- index
The name or number of the dissimilarity column to use. By default, the third column name of
dissimilarity
is used.- seed
A value for the random number generator (set to
NULL
for random initialization by default).- n_clust
An
integer
vector or a singleinteger
specifying the desired number(s) of clusters.- maxiter
An
integer
defining the maximum number of iterations.- initializer
A
character
string, either"BUILD"
(used in the classic PAM algorithm) or"LAB"
(Linear Approximate BUILD).- fasttol
A positive
numeric
value defining the tolerance for fast swapping behavior. Defaults to 1.- numsamples
A positive
integer
specifying the number of samples to draw.- sampling
A positive
numeric
value defining the sampling rate.- independent
A
boolean
indicating whether the previous medoids are excluded in the next sample. Defaults toFALSE
.- algorithm_in_output
A
boolean
indicating whether the original output of fastclara should be included in the output. Defaults toTRUE
(see Value).
Value
A list
of class bioregion.clusters
with five components:
name: A
character
string containing the name of the algorithm.args: A
list
of input arguments as provided by the user.inputs: A
list
of characteristics of the clustering process.algorithm: A
list
of all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE
).clusters: A
data.frame
containing the clustering results.
If algorithm_in_output = TRUE
, the algorithm
slot includes the output of
fastclara.
Details
Based on fastkmedoids package (fastclara).
References
Schubert E & Rousseeuw PJ (2019) Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms. Similarity Search and Applications 11807, 171-187.
See also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clarans nhclu_dbscan nhclu_kmeans nhclu_pam nhclu_affprop
Author
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
dissim <- dissimilarity(comat, metric = "all")
clust1 <- nhclu_clara(dissim, index = "Simpson", n_clust = 5)
bioregionalization_metrics(clust1, dissimilarity = dissim,
eval_metric = "pc_distance")
#> Computing similarity-based metrics...
#> - pc_distance OK
#> Partition metrics:
#> - 1 partition(s) evaluated
#> - Range of clusters explored: from 5 to 5
#> - Requested metric(s): pc_distance
#> - Metric summary:
#> pc_distance
#> Min 0.4639792
#> Mean 0.4639792
#> Max 0.4639792
#>
#> Access the data.frame of metrics with your_object$evaluation_df