This function performs non hierarchical clustering on the basis of dissimilarity with partitioning around medoids.
Arguments
- dissimilarity
the output object from
dissimilarity()
orsimilarity_to_dissimilarity()
, or adist
object. If adata.frame
is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.- index
name or number of the dissimilarity column to use. By default, the third column name of
dissimilarity
is used.- seed
for the random number generator (NULL for random by default).
- n_clust
an
integer
or aninteger
vector specifying the requested number(s) of clusters.- variant
a
character
string specifying the variant of pam to use, by defaultfaster
. Available options areoriginal
,o_1
,o_2
,f_3
,f_4
,f_5
orfaster
. See pam for more details.- nstart
an
integer
specifying the number of random start for the pam algorithm. By default, 1 (for thefaster
variant).- cluster_only
a
boolean
specifying if only the clustering should be returned from the pam function (more efficient).- algorithm_in_output
a
boolean
indicating if the original output of pam should be returned in the output (TRUE
by default, see Value).- ...
you can add here further arguments to be passed to
pam()
(see pam)
Value
A list
of class bioregion.clusters
with five slots:
name:
character
containing the name of the algorithmargs:
list
of input arguments as provided by the userinputs:
list
of characteristics of the clustering processalgorithm:
list
of all objects associated with the clustering procedure, such as original cluster objectsclusters:
data.frame
containing the clustering results
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find the output of
pam.
Details
This method partitions data into the chosen number of cluster on the basis of the input dissimilarity matrix. It is more robust than k-means because it minimizes the sum of dissimilarity between cluster centres and points assigned to the cluster - whereas the k-means approach minimizes the sum of squared euclidean distances (thus k-means cannot be applied directly on the input dissimilarity matrix if the distances are not euclidean).
References
Kaufman L, Rousseeuw PJ (2009). “Finding groups in data: An introduction to cluster analysis.” In & Sons. JW (ed.), Finding groups in data: An introduction to cluster analysis..
Author
Boris Leroy (leroy.boris@gmail.com), Pierre Denelle (pierre.denelle@gmail.com) and Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)
comnet <- mat_to_net(comat)
dissim <- dissimilarity(comat, metric = "all")
clust1 <- nhclu_pam(dissim, n_clust = 2:10, index = "Simpson")
clust2 <- nhclu_pam(dissim, n_clust = 2:15, index = "Simpson")
partition_metrics(clust2, dissimilarity = dissim,
eval_metric = "pc_distance")
#> Computing similarity-based metrics...
#> - pc_distance OK
#> Partition metrics:
#> - 14 partition(s) evaluated
#> - Range of clusters explored: from 2 to 15
#> - Requested metric(s): pc_distance
#> - Metric summary:
#> pc_distance
#> Min 0.2715181
#> Mean 0.6583310
#> Max 0.9593075
#>
#> Access the data.frame of metrics with your_object$evaluation_df
partition_metrics(clust2, net = comnet, species_col = "Node2",
site_col = "Node1", eval_metric = "avg_endemism")
#> Computing composition-based metrics...
#> - avg_endemism OK
#> Partition metrics:
#> - 14 partition(s) evaluated
#> - Range of clusters explored: from 2 to 15
#> - Requested metric(s): avg_endemism
#> - Metric summary:
#> avg_endemism
#> Min 0.000000000
#> Mean 0.001428571
#> Max 0.020000000
#>
#> Access the data.frame of metrics with your_object$evaluation_df
#> Details of endemism % for each partition are available in
#> your_object$endemism_results