This function performs non-hierarchical clustering using the Affinity Propagation algorithm.
Usage
nhclu_affprop(
similarity,
index = names(similarity)[3],
seed = NULL,
p = NA,
q = NA,
maxits = 1000,
convits = 100,
lam = 0.9,
details = FALSE,
nonoise = FALSE,
K = NULL,
prc = NULL,
bimaxit = NULL,
exact = NULL,
algorithm_in_output = TRUE
)Arguments
- similarity
The output object from
similarity()ordissimilarity_to_similarity(), or adistobject. If adata.frameis used, the first two columns should represent pairs of sites (or any pair of nodes), and the subsequent column(s) should contain the similarity indices.- index
The name or number of the similarity column to use. By default, the third column name of
similarityis used.- seed
The seed for the random number generator used when
nonoise = FALSE.- p
Input preference, which can be a vector specifying individual preferences for each data point. If scalar, the same value is used for all data points. If
NA, exemplar preferences are initialized based on the distribution of non-Inf values in the similarity matrix, controlled byq.- q
If
p = NA, exemplar preferences are initialized according to the distribution of non-Inf values in the similarity matrix. By default, the median is used. A value between 0 and 1 specifies the sample quantile, whereq = 0.5results in the median.- maxits
The maximum number of iterations to execute.
- convits
The algorithm terminates if the exemplars do not change for
convitsiterations.- lam
The damping factor, a value in the range [0.5, 1). Higher values correspond to heavier damping, which may help prevent oscillations.
- details
If
TRUE, detailed information about the algorithm's progress is stored in the output object.- nonoise
If
TRUE, disables the addition of a small amount of noise to the similarity object, which prevents degenerate cases.- K
The desired number of clusters. If not
NULL, the function apclusterK is called.- prc
A parameter needed when
Kis notNULL. The algorithm stops if the number of clusters deviates by less thanprcpercent from the desired valueK. Set to 0 to enforce exactlyKclusters.- bimaxit
A parameter needed when
Kis notNULL. Specifies the maximum number of bisection steps to perform. No warning is issued if the number of clusters remains outside the desired range.- exact
A flag indicating whether to compute the initial preference range exactly.
- algorithm_in_output
A
booleanindicating whether to include the original output of apcluster in the result. Defaults toTRUE.
Value
A list of class bioregion.clusters with five slots:
name: A
characterstring containing the name of the algorithm.args: A
listof input arguments as provided by the user.inputs: A
listdescribing the characteristics of the clustering process.algorithm: A
listof objects associated with the clustering procedure, such as original cluster objects (ifalgorithm_in_output = TRUE).clusters: A
data.framecontaining the clustering results.
If algorithm_in_output = TRUE, the algorithm slot includes the output of
apcluster.
References
Frey B & Dueck D (2007) Clustering by Passing Messages Between Data Points. Science 315, 972-976.
See also
For more details illustrated with a practical example, see the vignette: https://biorgeo.github.io/bioregion/articles/a4_2_non_hierarchical_clustering.html.
Associated functions: nhclu_clara nhclu_clarans nhclu_dbscan nhclu_kmeans nhclu_affprop
Author
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat_1 <- matrix(sample(0:1000, size = 10*12, replace = TRUE,
prob = 1/1:1001), 10, 12)
rownames(comat_1) <- paste0("Site", 1:10)
colnames(comat_1) <- paste0("Species", 1:12)
comat_1 <- cbind(comat_1,
matrix(0, 10, 8,
dimnames = list(paste0("Site", 1:10),
paste0("Species", 13:20))))
comat_2 <- matrix(sample(0:1000,
size = 10*12,
replace = TRUE,
prob = 1/1:1001),
10, 12)
rownames(comat_2) <- paste0("Site", 11:20)
colnames(comat_2) <- paste0("Species", 9:20)
comat_2 <- cbind(matrix(0, 10, 8,
dimnames = list(paste0("Site", 11:20),
paste0("Species", 1:8))),
comat_2)
comat <- rbind(comat_1, comat_2)
dissim <- dissimilarity(comat, metric = "Simpson")
sim <- dissimilarity_to_similarity(dissim)
clust1 <- nhclu_affprop(sim)
clust2 <- nhclu_affprop(sim, q = 1)
# Fixed number of clusters
clust3 <- nhclu_affprop(sim, K = 2, prc = 10, bimaxit = 20, exact = FALSE)
#> Trying p = 0.9930872
#> Number of clusters: 6
#> Trying p = 0.9308716
#> Number of clusters: 6
#> Trying p = 0.3087157
#> Number of clusters: 1
#> Trying p = 0.6543579 (bisection step no. 1 )
#> Number of clusters: 1
#> Trying p = 0.8271789 (bisection step no. 2 )
#> Number of clusters: 3
#> Trying p = 0.7407684 (bisection step no. 3 )
#> Number of clusters: 1
#> Trying p = 0.7839737 (bisection step no. 4 )
#> Number of clusters: 1
#> Trying p = 0.8055763 (bisection step no. 5 )
#> Number of clusters: 3
#> Trying p = 0.794775 (bisection step no. 6 )
#> Number of clusters: 2
#>
#> Number of clusters: 2 for p = 0.794775