This function performs non hierarchical clustering on the Affinity Propagation algorithm.
Usage
nhclu_affprop(
similarity,
index = names(similarity)[3],
p = NA,
q = NA,
maxits = 1000,
convits = 100,
lam = 0.9,
details = FALSE,
nonoise = FALSE,
seed = NULL,
K = NULL,
prc = NULL,
bimaxit = NULL,
exact = NULL,
algorithm_in_output = TRUE
)
Arguments
- similarity
the output object from
similarity()
ordissimilarity_to_similarity()
, or adist
object. If adata.frame
is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the dissimilarity indices.- index
name or number of the similarity column to use. By default, the third column name of
similarity
is used.- p
input preference; can be a vector that specifies individual preferences for each data point. If scalar, the same value is used for all data points. If NA, exemplar preferences are initialized according to the distribution of non-Inf values in the similarity matrix. How this is done is controlled by the parameter q.
- q
if
p = NA
, exemplar preferences are initialized according to the distribution of non-Inf values in the similarity matrix. Ifp = NA
, exemplar preferences are set to the median of non-Inf values in the similarity matrix. If q is a value between 0 and 1, the sample quantile with threshold q is used, whereas q=0.5 again results in the median.- maxits
maximal number of iterations that should be executed
- convits
the algorithm terminates if the examplars have not changed for convits iterations.
- lam
damping factor; should be a value in the range [0.5, 1); higher values correspond to heavy damping which may be needed if oscillations occur.
- details
if TRUE, more detailed information about the algorithm's progress is stored in the output object.
- nonoise
small amount of noise added to the similarity object to prevent degenerate cases; disabled when set to
TRUE
.- seed
seed of the random number generator.
- K
desired number of clusters. If not null, then the function apcluster is called.
- prc
argument needed when K is not null. The algorithm stops if the number of clusters does not deviate more than prc percent from desired value K; set to 0 if you want to have exactly K clusters.
- bimaxit
argument needed when K is not null. maximum number of bisection steps to perform; note that no warning is issued if the number of clusters is still not in the desired range.
- exact
flag indicating whether or not to compute the initial preference range exactly.
- algorithm_in_output
a
boolean
indicating if the original output of apcluster should be returned in the output (TRUE
by default, see Value).
Value
A list
of class bioregion.clusters
with five slots:
name:
character
containing the name of the algorithmargs:
list
of input arguments as provided by the userinputs:
list
of characteristics of the clustering processalgorithm:
list
of all objects associated with the clustering procedure, such as original cluster objectsclusters:
data.frame
containing the clustering results
In the algorithm
slot, if algorithm_in_output = TRUE
, users can
find the output of apcluster.
References
Frey B & Dueck D (2007) Clustering by Passing Messages Between Data Points. Science, 315, 972-976.
Author
Pierre Denelle (pierre.denelle@gmail.com)
Boris Leroy (leroy.boris@gmail.com)
Maxime Lenormand (maxime.lenormand@inrae.fr)
Examples
comat_1 <- matrix(sample(0:1000, size = 10*12, replace = TRUE,
prob = 1/1:1001), 10, 12)
rownames(comat_1) <- paste0("Site", 1:10)
colnames(comat_1) <- paste0("Species", 1:12)
comat_1 <- cbind(comat_1,
matrix(0, 10, 8,
dimnames = list(paste0("Site", 1:10),
paste0("Species", 13:20))))
comat_2 <- matrix(sample(0:1000, size = 10*12, replace = TRUE,
prob = 1/1:1001), 10, 12)
rownames(comat_2) <- paste0("Site", 11:20)
colnames(comat_2) <- paste0("Species", 9:20)
comat_2 <- cbind(matrix(0, 10, 8,
dimnames = list(paste0("Site", 11:20),
paste0("Species", 1:8))),
comat_2)
comat <- rbind(comat_1, comat_2)
dissim <- dissimilarity(comat, metric = "Simpson")
sim <- dissimilarity_to_similarity(dissim)
clust1 <- nhclu_affprop(sim)
clust2 <- nhclu_affprop(sim, q = 1)
# Fixed number of clusters
clust3 <- nhclu_affprop(sim, K = 2, prc = 10, bimaxit = 20, exact = FALSE)
#> Trying p = 0.993402
#> Number of clusters: 2
#> Trying p = 0.9340202
#> Number of clusters: 2
#> Trying p = 0.340202
#> Number of clusters: 2
#>
#> Number of clusters: 2 for p = 0.340202