Skip to contents

This function performs non hierarchical clustering on the basis of dissimilarity with partitioning around medoids, using the Clustering Large Applications (CLARA) algorithm.

Usage

nhclu_clara(
  dissimilarity,
  index = names(dissimilarity)[3],
  seed = NULL,
  n_clust = c(1, 2, 3),
  maxiter = 0,
  initializer = "LAB",
  fasttol = 1,
  numsamples = 5,
  sampling = 0.25,
  independent = FALSE,
  algorithm_in_output = TRUE
)

Arguments

dissimilarity

the output object from dissimilarity() or similarity_to_dissimilarity(), or a dist object. If a data.frame is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.

index

name or number of the dissimilarity column to use. By default, the third column name of dissimilarity is used.

seed

for the random number generator (NULL for random by default).

n_clust

an integer or an integer vector specifying the requested number(s) of clusters.

maxiter

an integer defining the maximum number of iterations.

initializer

a character, either 'BUILD' (used in classic PAM algorithm) or 'LAB' (linear approximative BUILD).

fasttol

positive numeric defining the tolerance for fast swapping behavior, set to 1 by default.

numsamples

positive integer defining the number of samples to draw.

sampling

positive numeric defining the sampling rate.

independent

a boolean indicating that the previous medoids are not kept in the next sample (FALSE by default).

algorithm_in_output

a boolean indicating if the original output of fastclara should be returned in the output (TRUE by default, see Value).

Value

A list of class bioregion.clusters with five slots:

  1. name: character containing the name of the algorithm

  2. args: list of input arguments as provided by the user

  3. inputs: list of characteristics of the clustering process

  4. algorithm: list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE)

  5. clusters: data.frame containing the clustering results

In the algorithm slot, if algorithm_in_output = TRUE, users can find the output of fastclara.

Details

Based on fastkmedoids package (fastclara).

References

Schubert E, Rousseeuw PJ (2019). “Faster k-Medoids Clustering: Improving the PAM, CLARA, and CLARANS Algorithms.” Similarity Search and Applications, 11807, 171--187.

See also

Author

Pierre Denelle (pierre.denelle@gmail.com), Boris Leroy (leroy.boris@gmail.com), and Maxime Lenormand (maxime.lenormand@inrae.fr)

Examples

comat <- matrix(sample(0:1000, size = 500, replace = TRUE, prob = 1/1:1001),
20, 25)
rownames(comat) <- paste0("Site",1:20)
colnames(comat) <- paste0("Species",1:25)

dissim <- dissimilarity(comat, metric = "all")

clust1 <- nhclu_clara(dissim, index = "Simpson", n_clust = 5)

partition_metrics(clust1, dissimilarity = dissim,
eval_metric = "pc_distance")
#> Computing similarity-based metrics...
#>   - pc_distance OK
#> Partition metrics:
#>  - 1  partition(s) evaluated
#>  - Range of clusters explored: from  5  to  5 
#>  - Requested metric(s):  pc_distance 
#>  - Metric summary:
#>      pc_distance
#> Min    0.4547339
#> Mean   0.4547339
#> Max    0.4547339
#> 
#> Access the data.frame of metrics with your_object$evaluation_df