This function finds communities in a (un)weighted (un)directed network based on the Infomap algorithm (https://github.com/mapequation/infomap).
Usage
netclu_infomap(
net,
weight = TRUE,
cut_weight = 0,
index = names(net)[3],
seed = NULL,
nbmod = 0,
markovtime = 1,
numtrials = 1,
twolevel = FALSE,
show_hierarchy = FALSE,
directed = FALSE,
bipartite_version = FALSE,
bipartite = FALSE,
site_col = 1,
species_col = 2,
return_node_type = "both",
version = "2.8.0",
binpath = "tempdir",
path_temp = "infomap_temp",
delete_temp = TRUE
)
Arguments
- net
the output object from
similarity()
ordissimilarity_to_similarity()
. If adata.frame
is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the similarity indices.- weight
a
boolean
indicating if the weights should be considered if there are more than two columns.- cut_weight
a minimal weight value. If
weight
is TRUE, the links between sites with a weight strictly lower than this value will not be considered (O by default).- index
name or number of the column to use as weight. By default, the third column name of
net
is used.- seed
for the random number generator (NULL for random by default).
- nbmod
penalize solutions the more they differ from this number (0 by default for no preferred number of modules).
- markovtime
scales link flow to change the cost of moving between modules, higher values results in fewer modules (default is 1).
- numtrials
for the number of trials before picking up the best solution.
- twolevel
a
boolean
indicating if the algorithm should optimize a two-level partition of the network (default is multi-level).- show_hierarchy
a
boolean
specifying if the hierarchy of community should be identifiable in the outputs (FALSE by default).- directed
a
boolean
indicating if the network is directed (from column 1 to column 2).- bipartite_version
a
boolean
indicating if the bipartite version of Infomap should be used (see Note).- bipartite
a
boolean
indicating if the network is bipartite (see Note).- site_col
name or number for the column of site nodes (i.e. primary nodes).
- species_col
name or number for the column of species nodes (i.e. feature nodes).
- return_node_type
a
character
indicating what types of nodes (site
,species
orboth
) should be returned in the output (return_node_type = "both"
by default).- version
a
character
indicating the Infomap version to use.- binpath
a
character
indicating the path to the bin folder (see install_binaries and Details).- path_temp
a
character
indicating the path to the temporary folder (see Details).- delete_temp
a
boolean
indicating if the temporary folder should be removed (see Details).
Value
A list
of class bioregion.clusters
with five slots:
name:
character
containing the name of the algorithmargs:
list
of input arguments as provided by the userinputs:
list
of characteristics of the clustering processalgorithm:
list
of all objects associated with the clustering procedure, such as original cluster objectsclusters:
data.frame
containing the clustering results
In the algorithm
slot, users can find the following elements:
cmd
: the command line use to run Infomapversion
: the Infomap versionweb
: Infomap's GitHub repository
Details
Infomap is a network clustering algorithm based on the Map equation proposed in Rosvall2008bioregion that finds communities in (un)weighted and (un)directed networks.
This function is based on the C++ version of Infomap (https://github.com/mapequation/infomap/releases). This function needs binary files to run. They can be installed with install_binaries.
If you changed the default path to the bin
folder
while running install_binaries PLEASE MAKE SURE to set binpath
accordingly.
The C++ version of Infomap generates temporary folders and/or files that are
stored in the path_temp
folder ("infomap_temp" with an unique timestamp
located in the bin folder in binpath
by default). This temporary folder is
removed by default (delete_temp = TRUE
).
Several version of Infomap are available in the package. See install_binaries for more details.
Note
Infomap has been designed to deal with bipartite networks. To use this
functionality set the bipartite_version
argument to TRUE in order to
approximate a two-step random walker (see
https://www.mapequation.org/infomap/ for more information). Note that
a bipartite network can also be considered as unipartite network
(bipartite = TRUE
).
In both cases do not forget to indicate which of the first two columns is
dedicated to the site nodes (i.e. primary nodes) and species nodes (i.e.
feature nodes) using the arguments site_col
and species_col
.
The type of nodes returned in the output can be chosen with the argument
return_node_type
equal to both
to keep both types of nodes, sites
to preserve only the sites nodes and species
to preserve only the
species nodes.
Author
Maxime Lenormand (maxime.lenormand@inrae.fr), Pierre Denelle (pierre.denelle@gmail.com) and Boris Leroy (leroy.boris@gmail.com)
Examples
comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)
net <- similarity(comat, metric = "Simpson")
com <- netclu_infomap(net)
#> Infomap 2.8.0 is not installed... Please have a look at
#> https//bioRgeo.github.io/bioregion/articles/a1_install_binary_files.html
#> for more details.
#> It should be located in /tmp/RtmpJiKf7Z/bin/INFOMAP/2.8.0/