This function performs non-hierarchical clustering based on dissimilarity using partitioning around medoids (PAM).
Arguments
- dissimilarity
The output object from
dissimilarity()orsimilarity_to_dissimilarity(), or adistobject. If adata.frameis used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the dissimilarity indices.- index
The name or number of the dissimilarity column to use. By default, the third column name of
dissimilarityis used.- seed
A value for the random number generator (
NULLfor random by default).- n_clust
An
integervector or a singleintegervalue specifying the requested number(s) of clusters.- variant
A
characterstring specifying the PAM variant to use. Defaults tofaster. Available options areoriginal,o_1,o_2,f_3,f_4,f_5, orfaster. See pam for more details.- nstart
An
integerspecifying the number of random starts for the PAM algorithm. Defaults to 1 (for thefastervariant).- cluster_only
A
booleanspecifying whether only the clustering results should be returned from the pam function. Setting this toTRUEmakes the function more efficient.- algorithm_in_output
A
booleanindicating whether the original output of pam should be included in the result. Defaults toTRUE(see Value).- ...
Additional arguments to pass to
pam()(see pam).
Value
A list of class bioregion.clusters with five components:
name: A
characterstring containing the name of the algorithm.args: A
listof input arguments as provided by the user.inputs: A
listof characteristics of the clustering process.algorithm: A
listof all objects associated with the clustering procedure, such as original cluster objects (only ifalgorithm_in_output = TRUE).clusters: A
data.framecontaining the clustering results.
If algorithm_in_output = TRUE, the algorithm slot includes the output of
pam.
Details
This method partitions the data into the chosen number of clusters based on the input dissimilarity matrix. It is more robust than k-means because it minimizes the sum of dissimilarities between cluster centers (medoids) and points assigned to the cluster. In contrast, k-means minimizes the sum of squared Euclidean distances, which makes it unsuitable for dissimilarity matrices that are not based on Euclidean distances.
References
Kaufman L & Rousseeuw PJ (2009) Finding groups in data: An introduction to cluster analysis. In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis.
See also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_dbscan nhclu_kmeans nhclu_affprop
Author
Boris Leroy (leroy.boris@gmail.com)
Pierre Denelle (pierre.denelle@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
clust <- nhclu_pam(dissim, n_clust = 2:15, index = "Simpson")