Skip to contents

This function finds communities in a (un)weighted undirected network based on the Louvain algorithm.

Usage

netclu_louvain(
  net,
  weight = TRUE,
  cut_weight = 0,
  index = names(net)[3],
  lang = "igraph",
  resolution = 1,
  seed = NULL,
  q = 0,
  c = 0.5,
  k = 1,
  bipartite = FALSE,
  site_col = 1,
  species_col = 2,
  return_node_type = "both",
  binpath = "tempdir",
  path_temp = "louvain_temp",
  delete_temp = TRUE,
  algorithm_in_output = TRUE
)

Arguments

net

the output object from similarity() or dissimilarity_to_similarity(). If a data.frame is used, the first two columns represent pairs of sites (or any pair of nodes), and the next column(s) are the similarity indices.

weight

a boolean indicating if the weights should be considered if there are more than two columns.

cut_weight

a minimal weight value. If weight is TRUE, the links between sites with a weight strictly lower than this value will not be considered (O by default).

index

name or number of the column to use as weight. By default, the third column name of net is used.

lang

a string indicating what version of Louvain should be used (igraph or cpp, see Details).

resolution

a resolution parameter to adjust the modularity (1 is chosen by default, see Details).

seed

for the random number generator (only when lang = "igraph", NULL for random by default).

q

the quality function used to compute partition of the graph (modularity is chosen by default, see Details).

c

the parameter for the Owsinski-Zadrozny quality function (between 0 and 1, 0.5 is chosen by default).

k

the kappa_min value for the Shi-Malik quality function (it must be > 0, 1 is chosen by default).

bipartite

a boolean indicating if the network is bipartite (see Details).

site_col

name or number for the column of site nodes (i.e. primary nodes).

species_col

name or number for the column of species nodes (i.e. feature nodes).

return_node_type

a character indicating what types of nodes (site, species or both) should be returned in the output (return_node_type = "both" by default).

binpath

a character indicating the path to the bin folder (see install_binaries and Details).

path_temp

a character indicating the path to the temporary folder (see Details).

delete_temp

a boolean indicating if the temporary folder should be removed (see Details).

algorithm_in_output

a boolean indicating if the original output of cluster_louvain should be returned in the output (TRUE by default, see Value).

Value

A list of class bioregion.clusters with five slots:

  1. name: character containing the name of the algorithm

  2. args: list of input arguments as provided by the user

  3. inputs: list of characteristics of the clustering process

  4. algorithm: list of all objects associated with the clustering procedure, such as original cluster objects (only if algorithm_in_output = TRUE)

  5. clusters: data.frame containing the clustering results

In the algorithm slot, if algorithm_in_output = TRUE, users can find an the output of cluster_louvain if lang = "igraph" and the following element if lang = "cpp":

  • cmd: the command line use to run Louvain

  • version: the Louvain version

  • web: Louvain's website

.

Details

Louvain is a network community detection algorithm proposed in Blondel2008bioregion. This function proposed two implementations of the function (parameter lang): the igraph implementation (cluster_louvain) and the C++ implementation (https://sourceforge.net/projects/louvain/, version 0.3).

The igraph implementation offers the possibility to adjust the resolution parameter of the modularity function (resolution argument) that the algorithm uses internally. Lower values typically yield fewer, larger clusters. The original definition of modularity is recovered when the resolution parameter is set to 1 (by default).

The C++ implementation offers the possibility to choose among several quality functions, q = 0 for the classical Newman-Girvan criterion (also called "Modularity"), 1 for the Zahn-Condorcet criterion, 2 for the Owsinski-Zadrozny criterion (you should specify the value of the parameter with the c argument), 3 for the Goldberg Density criterion, 4 for the A-weighted Condorcet criterion, 5 for the Deviation to Indetermination criterion, 6 for the Deviation to Uniformity criterion, 7 for the Profile Difference criterion, 8 for the Shi-Malik criterion (you should specify the value of kappa_min with k argument) and 9 for the Balanced Modularity criterion.

The C++ version of Louvain is based on the version 0.3 (https://sourceforge.net/projects/louvain/). This function needs binary files to run. They can be installed with install_binaries.

If you changed the default path to the bin folder while running install_binaries PLEASE MAKE SURE to set binpath accordingly.

The C++ version of Louvain generates temporary folders and/or files that are stored in the path_temp folder ("louvain_temp" with an unique timestamp located in the bin folder in binpath by default). This temporary folder is removed by default (delete_temp = TRUE).

Note

Although this algorithm was not primarily designed to deal with bipartite network, it is possible to consider the bipartite network as unipartite network (bipartite = TRUE).

Do not forget to indicate which of the first two columns is dedicated to the site nodes (i.e. primary nodes) and species nodes (i.e. feature nodes) using the arguments site_col and species_col. The type of nodes returned in the output can be chosen with the argument return_node_type equal to both to keep both types of nodes, sites to preserve only the sites nodes and species to preserve only the species nodes.

References

Blondel2008bioregion

Author

Maxime Lenormand (maxime.lenormand@inrae.fr), Pierre Denelle (pierre.denelle@gmail.com) and Boris Leroy (leroy.boris@gmail.com)

Examples

comat <- matrix(sample(1000, 50), 5, 10)
rownames(comat) <- paste0("Site", 1:5)
colnames(comat) <- paste0("Species", 1:10)

net <- similarity(comat, metric = "Simpson")
com <- netclu_louvain(net, lang = "igraph")