This function performs non-hierarchical clustering based on dissimilarity using partitioning around medoids (PAM).
Arguments
- dissimilarity
- The output object from - dissimilarity()or- similarity_to_dissimilarity(), or a- distobject. If a- data.frameis used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the dissimilarity indices.
- index
- The name or number of the dissimilarity column to use. By default, the third column name of - dissimilarityis used.
- seed
- A value for the random number generator ( - NULLfor random by default).
- n_clust
- An - integervector or a single- integervalue specifying the requested number(s) of clusters.
- variant
- A - characterstring specifying the PAM variant to use. Defaults to- faster. Available options are- original,- o_1,- o_2,- f_3,- f_4,- f_5, or- faster. See pam for more details.
- nstart
- An - integerspecifying the number of random starts for the PAM algorithm. Defaults to 1 (for the- fastervariant).
- cluster_only
- A - booleanspecifying whether only the clustering results should be returned from the pam function. Setting this to- TRUEmakes the function more efficient.
- algorithm_in_output
- A - booleanindicating whether the original output of pam should be included in the result. Defaults to- TRUE(see Value).
- ...
- Additional arguments to pass to - pam()(see pam).
Value
A list of class bioregion.clusters with five components:
- name: A - characterstring containing the name of the algorithm.
- args: A - listof input arguments as provided by the user.
- inputs: A - listof characteristics of the clustering process.
- algorithm: A - listof all objects associated with the clustering procedure, such as original cluster objects (only if- algorithm_in_output = TRUE).
- clusters: A - data.framecontaining the clustering results.
If algorithm_in_output = TRUE, the algorithm slot includes the output of
pam.
Details
This method partitions the data into the chosen number of clusters based on the input dissimilarity matrix. It is more robust than k-means because it minimizes the sum of dissimilarities between cluster centers (medoids) and points assigned to the cluster. In contrast, k-means minimizes the sum of squared Euclidean distances, which makes it unsuitable for dissimilarity matrices that are not based on Euclidean distances.
References
Kaufman L & Rousseeuw PJ (2009) Finding groups in data: An introduction to cluster analysis. In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis.
See also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_dbscan nhclu_kmeans nhclu_affprop
Author
Boris Leroy (leroy.boris@gmail.com) 
Pierre Denelle (pierre.denelle@gmail.com) 
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
clust <- nhclu_pam(dissim, n_clust = 2:15, index = "Simpson")