Skip to contents

This function performs non hierarchical clustering on the Affinity Propagation algorithm.

Usage

nhclu_affprop(
  similarity,
  index = names(similarity)[3],
  p = NA,
  q = NA,
  maxits = 1000,
  convits = 100,
  lam = 0.9,
  details = FALSE,
  nonoise = FALSE,
  seed = NULL,
  K = NULL,
  prc = NULL,
  bimaxit = NULL,
  exact = NULL,
  algorithm_in_output = TRUE
)

Arguments

similarity

the output object from similarity() or dissimilarity_to_similarity(), or a dist object. If a data.frame is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.

index

name or number of the similarity column to use. By default, the third column name of similarity is used.

p

input preference; can be a vector that specifies individual preferences for each data point. If scalar, the same value is used for all data points. If NA, exemplar preferences are initialized according to the distribution of non-Inf values in the similarity matrix. How this is done is controlled by the parameter q.

q

if p = NA, exemplar preferences are initialized according to the distribution of non-Inf values in the similarity matrix. If p = NA, exemplar preferences are set to the median of non-Inf values in the similarity matrix. If q is a value between 0 and 1, the sample quantile with threshold q is used, whereas q=0.5 again results in the median.

maxits

maximal number of iterations that should be executed

convits

the algorithm terminates if the examplars have not changed for convits iterations.

lam

damping factor; should be a value in the range [0.5, 1); higher values correspond to heavy damping which may be needed if oscillations occur.

details

if TRUE, more detailed information about the algorithm's progress is stored in the output object.

nonoise

small amount of noise added to the similarity object to prevent degenerate cases; disabled when set to TRUE.

seed

seed of the random number generator.

K

desired number of clusters. If not null, then the function apcluster is called.

prc

argument needed when K is not null. The algorithm stops if the number of clusters does not deviate more than prc percent from desired value K; set to 0 if you want to have exactly K clusters.

bimaxit

argument needed when K is not null. maximum number of bisection steps to perform; note that no warning is issued if the number of clusters is still not in the desired range.

exact

flag indicating whether or not to compute the initial preference range exactly.

algorithm_in_output

a boolean indicating if the original output of apcluster should be returned in the output (TRUE by default, see Value).

Value

A list of class bioregion.clusters with five slots:

  1. name: character containing the name of the algorithm

  2. args: list of input arguments as provided by the user

  3. inputs: list of characteristics of the clustering process

  4. algorithm: list of all objects associated with the clustering procedure, such as original cluster objects

  5. clusters: data.frame containing the clustering results

In the algorithm slot, if algorithm_in_output = TRUE, users can find the output of apcluster.

Details

Based on apcluster package (apcluster).

References

Frey B & Dueck D (2007) Clustering by Passing Messages Between Data Points. Science, 315, 972-976.

See also

Author

Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)

Examples

comat_1 <- matrix(sample(0:1000, size = 10*12, replace = TRUE,
prob = 1/1:1001), 10, 12)
rownames(comat_1) <- paste0("Site", 1:10)
colnames(comat_1) <- paste0("Species", 1:12)
comat_1 <- cbind(comat_1,
                 matrix(0, 10, 8,
                        dimnames = list(paste0("Site", 1:10),
                                        paste0("Species", 13:20))))

comat_2 <- matrix(sample(0:1000, size = 10*12, replace = TRUE,
                         prob = 1/1:1001), 10, 12)
rownames(comat_2) <- paste0("Site", 11:20)
colnames(comat_2) <- paste0("Species", 9:20)
comat_2 <- cbind(matrix(0, 10, 8,
                        dimnames = list(paste0("Site", 11:20),
                                        paste0("Species", 1:8))),
                 comat_2)

comat <- rbind(comat_1, comat_2)

dissim <- dissimilarity(comat, metric = "Simpson")
sim <- dissimilarity_to_similarity(dissim)

clust1 <- nhclu_affprop(sim)

clust2 <- nhclu_affprop(sim, q = 1)

# Fixed number of clusters
clust3 <- nhclu_affprop(sim, K = 2, prc = 10, bimaxit = 20, exact = FALSE)
#> Trying p = 0.993402 
#>    Number of clusters: 2 
#> Trying p = 0.9340202 
#>    Number of clusters: 2 
#> Trying p = 0.340202 
#>    Number of clusters: 2 
#> 
#> Number of clusters: 2 for p = 0.340202