1#   IGraph R package
2#   Copyright (C) 2005-2012  Gabor Csardi <csardi.gabor@gmail.com>
3#   334 Harvard street, Cambridge, MA 02139 USA
4#
5#   This program is free software; you can redistribute it and/or modify
6#   it under the terms of the GNU General Public License as published by
7#   the Free Software Foundation; either version 2 of the License, or
8#   (at your option) any later version.
9#
10#   This program is distributed in the hope that it will be useful,
11#   but WITHOUT ANY WARRANTY; without even the implied warranty of
12#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
13#   GNU General Public License for more details.
14#
15#   You should have received a copy of the GNU General Public License
16#   along with this program; if not, write to the Free Software
17#   Foundation, Inc.,  51 Franklin Street, Fifth Floor, Boston, MA
18#   02110-1301 USA
19#
20###################################################################
21
22###################################################################
23# Community structure
24###################################################################
25
26#' Functions to deal with the result of network community detection
27#'
28#' igraph community detection functions return their results as an object from
29#' the \code{communities} class. This manual page describes the operations of
30#' this class.
31#'
32#' Community structure detection algorithms try to find dense subgraphs in
33#' directed or undirected graphs, by optimizing some criteria, and usually
34#' using heuristics.
35#'
36#' igraph implements a number of community detection methods (see them below),
37#' all of which return an object of the class \code{communities}. Because the
38#' community structure detection algorithms are different, \code{communities}
39#' objects do not always have the same structure. Nevertheless, they have some
40#' common operations, these are documented here.
41#'
42#' The \code{print} generic function is defined for \code{communities}, it
43#' prints a short summary.
44#'
45#' The \code{length} generic function call be called on \code{communities} and
46#' returns the number of communities.
47#'
48#' The \code{sizes} function returns the community sizes, in the order of their
49#' ids.
50#'
51#' \code{membership} gives the division of the vertices, into communities. It
52#' returns a numeric vector, one value for each vertex, the id of its
53#' community. Community ids start from one. Note that some algorithms calculate
54#' the complete (or incomplete) hierarchical structure of the communities, and
55#' not just a single partitioning. For these algorithms typically the
56#' membership for the highest modularity value is returned, but see also the
57#' manual pages of the individual algorithms.
58#'
59#' \code{communities} is also the name of a function, that returns a list of
60#' communities, each identified by their vertices. The vertices will have
61#' symbolic names if the \code{add.vertex.names} igraph option is set, and the
62#' graph itself was named. Otherwise numeric vertex ids are used.
63#'
64#' \code{modularity} gives the modularity score of the partitioning. (See
65#' \code{\link{modularity.igraph}} for details. For algorithms that do not
66#' result a single partitioning, the highest modularity value is returned.
67#'
68#' \code{algorithm} gives the name of the algorithm that was used to calculate
69#' the community structure.
70#'
71#' \code{crossing} returns a logical vector, with one value for each edge,
72#' ordered according to the edge ids. The value is \code{TRUE} iff the edge
73#' connects two different communities, according to the (best) membership
74#' vector, as returned by \code{membership()}.
75#'
76#' \code{is_hierarchical} checks whether a hierarchical algorithm was used to
77#' find the community structure. Some functions only make sense for
78#' hierarchical methods (e.g. \code{merges}, \code{cut_at} and
79#' \code{as.dendrogram}).
80#'
81#' \code{merges} returns the merge matrix for hierarchical methods. An error
82#' message is given, if a non-hierarchical method was used to find the
83#' community structure. You can check this by calling \code{is_hierarchical} on
84#' the \code{communities} object.
85#'
86#' \code{cut_at} cuts the merge tree of a hierarchical community finding method,
87#' at the desired place and returns a membership vector. The desired place can
88#' be expressed as the desired number of communities or as the number of merge
89#' steps to make. The function gives an error message, if called with a
90#' non-hierarchical method.
91#'
92#' \code{as.dendrogram} converts a hierarchical community structure to a
93#' \code{dendrogram} object. It only works for hierarchical methods, and gives
94#' an error message to others. See \code{\link[stats]{dendrogram}} for details.
95#'
96#' \code{as.hclust} is similar to \code{as.dendrogram}, but converts a
97#' hierarchical community structure to a \code{hclust} object.
98#'
99#' \code{as_phylo} converts a hierarchical community structure to a \code{phylo}
100#' object, you will need the \code{ape} package for this.
101#'
102#' \code{show_trace} works (currently) only for communities found by the leading
103#' eigenvector method (\code{\link{cluster_leading_eigen}}), and
104#' returns a character vector that gives the steps performed by the algorithm
105#' while finding the communities.
106#'
107#' \code{code_len} is defined for the InfoMAP method
108#' (\code{\link{cluster_infomap}} and returns the code length of the
109#' partition.
110#'
111#' It is possibly to call the \code{plot} function on \code{communities}
112#' objects. This will plot the graph (and uses \code{\link{plot.igraph}}
113#' internally), with the communities shown. By default it colores the vertices
114#' according to their communities, and also marks the vertex groups
115#' corresponding to the communities. It passes additional arguments to
116#' \code{\link{plot.igraph}}, please see that and also
117#' \code{\link{igraph.plotting}} on how to change the plot.
118#'
119#' @rdname communities
120#' @aliases communities membership algorithm crossing cutat merges sizes cut_at
121#' is.hierarchical print.communities plot.communities length.communities
122#' as.dendrogram.communities as.hclust.communities code_len
123#' asPhylo asPhylo.communities showtrace code.length
124#' as_phylo as_phylo.communities show_trace is_hierarchical
125#' @param communities,x,object A \code{communities} object, the result of an
126#' igraph community detection function.
127#' @param graph An igraph graph object, corresponding to \code{communities}.
128#' @param y An igraph graph object, corresponding to the communities in
129#' \code{x}.
130#' @param no Integer scalar, the desired number of communities. If too low or
131#' two high, then an error message is given. Exactly one of \code{no} and
132#' \code{steps} must be supplied.
133#' @param steps The number of merge operations to perform to produce the
134#' communities. Exactly one of \code{no} and \code{steps} must be supplied.
135#' @param col A vector of colors, in any format that is accepted by the regular
136#' R plotting methods. This vector gives the colors of the vertices explicitly.
137#' @param mark.groups A list of numeric vectors. The communities can be
138#' highlighted using colored polygons. The groups for which the polygons are
139#' drawn are given here. The default is to use the groups given by the
140#' communities. Supply \code{NULL} here if you do not want to highlight any
141#' groups.
142#' @param edge.color The colors of the edges. By default the edges within
143#' communities are colored green and other edges are red.
144#' @param hang Numeric scalar indicating how the height of leaves should be
145#' computed from the heights of their parents; see \code{\link{plot.hclust}}.
146#' @param use.modularity Logical scalar, whether to use the modularity values
147#' to define the height of the branches.
148#' @param \dots Additional arguments. \code{plot.communities} passes these to
149#' \code{\link{plot.igraph}}. The other functions silently ignore
150#' them.
151#' @param membership Numeric vector, one value for each vertex, the membership
152#' vector of the community structure. Might also be \code{NULL} if the
153#' community structure is given in another way, e.g. by a merge matrix.
154#' @param algorithm If not \code{NULL} (meaning an unknown algorithm), then a
155#' character scalar, the name of the algorithm that produced the community
156#' structure.
157#' @param merges If not \code{NULL}, then the merge matrix of the hierarchical
158#' community structure. See \code{merges} below for more information on its
159#' format.
160#' @param modularity Numeric scalar or vector, the modularity value of the
161#' community structure. It can also be \code{NULL}, if the modularity of the
162#' (best) split is not available.
163#' @return \code{print} returns the \code{communities} object itself,
164#' invisibly.
165#'
166#' \code{length} returns an integer scalar.
167#'
168#' \code{sizes} returns a numeric vector.
169#'
170#' \code{membership} returns a numeric vector, one number for each vertex in
171#' the graph that was the input of the community detection.
172#'
173#' \code{modularity} returns a numeric scalar.
174#'
175#' \code{algorithm} returns a character scalar.
176#'
177#' \code{crossing} returns a logical vector.
178#'
179#' \code{is_hierarchical} returns a logical scalar.
180#'
181#' \code{merges} returns a two-column numeric matrix.
182#'
183#' \code{cut_at} returns a numeric vector, the membership vector of the
184#' vertices.
185#'
186#' \code{as.dendrogram} returns a \code{\link[stats]{dendrogram}} object.
187#'
188#' \code{show_trace} returns a character vector.
189#'
190#' \code{code_len} returns a numeric scalar for communities found with the
191#' InfoMAP method and \code{NULL} for other methods.
192#'
193#' \code{plot} for \code{communities} objects returns \code{NULL}, invisibly.
194#'
195#' #' @author Gabor Csardi \email{csardi.gabor@@gmail.com}
196#' @seealso See \code{\link{plot_dendrogram}} for plotting community structure
197#' dendrograms.
198#'
199#' See \code{\link{compare}} for comparing two community structures
200#' on the same graph.
201#'
202#' The different methods for finding communities, they all return a
203#' \code{communities} object: \code{\link{cluster_edge_betweenness}},
204#' \code{\link{cluster_fast_greedy}},
205#' \code{\link{cluster_label_prop}},
206#' \code{\link{cluster_leading_eigen}},
207#' \code{\link{cluster_louvain}}, \code{\link{cluster_leiden}},
208#' \code{\link{cluster_optimal}}, \code{\link{cluster_spinglass}},
209#' \code{\link{cluster_walktrap}}.
210#' @keywords graphs
211#' @export
212#' @examples
213#'
214#' karate <- make_graph("Zachary")
215#' wc <- cluster_walktrap(karate)
216#' modularity(wc)
217#' membership(wc)
218#' plot(wc, karate)
219#'
220
221membership <- function(communities) {
222  if (!is.null(communities$membership)) {
223    res <- communities$membership
224  } else if (!is.null(communities$merges) &&
225             !is.null(communities$modularity)) {
226    res <- community.to.membership2(communities$merges, communities$vcount,
227                                    which.max(communities$modularity))
228  } else {
229    stop("Cannot calculate community membership")
230  }
231  if (igraph_opt("add.vertex.names") && !is.null(communities$names)) {
232    names(res) <- communities$names
233  }
234  class(res) <- "membership"
235  res
236}
237
238#' @method print membership
239#' @export
240
241print.membership <- function(x, ...) print(unclass(x), ...)
242
243#' Declare a numeric vector as a membership vector
244#'
245#' This is useful if you want to use functions defined on
246#' membership vectors, but your membership vector does not
247#' come from an igraph clustering method.
248#'
249#' @param x The input vector.
250#' @return The input vector, with the \code{membership} class added.
251#' @export
252#' @examples
253#' ## Compare to the correct clustering
254#' g <- (make_full_graph(10) + make_full_graph(10)) %>%
255#'   rewire(each_edge(p = 0.2))
256#' correct <- rep(1:2, each = 10) %>% as_membership
257#' fc <- cluster_fast_greedy(g)
258#' compare(correct, fc)
259#' compare(correct, membership(fc))
260
261as_membership <- function(x) add_class(x, "membership")
262
263#' @rdname communities
264#' @method print communities
265#' @export
266
267print.communities <- function(x, ...) {
268
269  noc <- if (!is.null(x$membership)) max(membership(x)) else NA
270  mod <- if (!is.null(x$modularity)) {
271    modularity(x) %>% format(digits = 2)
272  } else {
273    NA_real_
274  }
275  alg <- x$algorithm %||% "unknown"
276
277  cat("IGRAPH clustering ", alg, ", groups: ", noc, ", mod: ", mod, "\n", sep="")
278
279  if (!is.null(x$membership)) {
280    grp <- groups(x)
281    cat("+ groups:\n")
282    hp <- function(o) {
283      head_print(o, max_lines = igraph_opt("auto.print.lines"),
284                 omitted_footer = "+ ... omitted several groups/vertices\n",)
285    }
286    indent_print(grp, .printer = hp, .indent = "  ")
287
288  } else {
289    cat(" + groups not available\n")
290  }
291
292  invisible(x)
293}
294
295#' Creates a communities object.
296#'
297#' This is useful to integrate the results of community finding algorithms
298#' that are not included in igraph.
299#'
300#' @param graph The graph of the community structure.
301#' @param membership The membership vector of the community structure, a
302#'   numeric vector denoting the id of the community for each vertex. It
303#'   might be \code{NULL} for hierarchical community structures.
304#' @param algorithm Character string, the algorithm that generated
305#'   the community structure, it can be arbitrary.
306#' @param merges A merge matrix, for hierarchical community structures (or
307#'   \code{NULL} otherwise.
308#' @param modularity Modularity value of the community structure. If this
309#'   is \code{TRUE} and the membership vector is available, then it the
310#'   modularity values is calculated automatically.
311#' @return A \code{communities} object.
312#'
313#' @aliases create.communities
314#'
315#' @export
316
317make_clusters <- function(graph, membership = NULL, algorithm = NULL,
318                          merges = NULL, modularity = TRUE) {
319
320  stopifnot(is.null(membership) || is.numeric(membership))
321  stopifnot(is.null(algorithm) ||
322            (is.character(algorithm) && length(algorithm)==1))
323  stopifnot(is.null(merges) ||
324              (is.matrix(merges) && is.numeric(merges) && ncol(merges)==2))
325  stopifnot(is.null(modularity) ||
326            (is.logical(modularity) && length(modularity) == 1) ||
327            (is.numeric(modularity) &&
328             length(modularity) %in% c(1, length(membership))))
329
330  if (is.logical(modularity)) {
331    if (modularity && !is.null(membership)) {
332      modularity <- modularity(graph, membership)
333    } else {
334      modularity <- NULL
335    }
336  }
337
338  res <- list(membership=membership,
339              algorithm=if (is.null(algorithm)) "unknown" else algorithm,
340              modularity=modularity)
341  if (!is.null(merges)) {
342    res$merges <- merges
343  }
344  if (!is.null(membership)) {
345    res$vcount <- length(membership)
346  } else if (!is.null(merges)) {
347    res$vcount <- nrow(merges) + 1
348  }
349  class(res) <- "communities"
350  res
351}
352
353#' @export
354
355modularity <- function(x, ...)
356  UseMethod("modularity")
357
358#' Modularity of a community structure of a graph
359#'
360#' This function calculates how modular is a given division of a graph into
361#' subgraphs.
362#'
363#' \code{modularity} calculates the modularity of a graph with respect to the
364#' given \code{membership} vector.
365#'
366#' The modularity of a graph with respect to some division (or vertex types)
367#' measures how good the division is, or how separated are the different vertex
368#' types from each other. It defined as \deqn{Q=\frac{1}{2m} \sum_{i,j}
369#' (A_{ij}-\frac{k_ik_j}{2m})\delta(c_i,c_j),}{Q=1/(2m) * sum( (Aij-ki*kj/(2m)
370#' ) delta(ci,cj),i,j),} here \eqn{m} is the number of edges, \eqn{A_{ij}}{Aij}
371#' is the element of the \eqn{A} adjacency matrix in row \eqn{i} and column
372#' \eqn{j}, \eqn{k_i}{ki} is the degree of \eqn{i}, \eqn{k_j}{kj} is the degree
373#' of \eqn{j}, \eqn{c_i}{ci} is the type (or component) of \eqn{i},
374#' \eqn{c_j}{cj} that of \eqn{j}, the sum goes over all \eqn{i} and \eqn{j}
375#' pairs of vertices, and \eqn{\delta(x,y)}{delta(x,y)} is 1 if \eqn{x=y} and 0
376#' otherwise.
377#'
378#' If edge weights are given, then these are considered as the element of the
379#' \eqn{A} adjacency matrix, and \eqn{k_i}{ki} is the sum of weights of
380#' adjacent edges for vertex \eqn{i}.
381#'
382#' \code{modularity_matrix} calculates the modularity matrix. This is a dense matrix,
383#' and it is defined as the difference of the adjacency matrix and the
384#' configuration model null model matrix. In other words element
385#' \eqn{M_{ij}}{M[i,j]} is given as \eqn{A_{ij}-d_i
386#' d_j/(2m)}{A[i,j]-d[i]d[j]/(2m)}, where \eqn{A_{ij}}{A[i,j]} is the (possibly
387#' weighted) adjacency matrix, \eqn{d_i}{d[i]} is the degree of vertex \eqn{i},
388#' and \eqn{m} is the number of edges (or the total weights in the graph, if it
389#' is weighed).
390#'
391#' @aliases modularity
392#' @param x,graph The input graph.
393#' @param membership Numeric vector, one value for each vertex, the membership
394#' vector of the community structure.
395#' @param weights If not \code{NULL} then a numeric vector giving edge weights.
396#' @param \dots Additional arguments, none currently.
397#' @return For \code{modularity} a numeric scalar, the modularity score of the
398#' given configuration.
399#'
400#' For \code{modularity_matrix} a numeric square matrix, its order is the number of
401#' vertices in the graph.
402#' @author Gabor Csardi \email{csardi.gabor@@gmail.com}
403#' @seealso \code{\link{cluster_walktrap}},
404#' \code{\link{cluster_edge_betweenness}},
405#' \code{\link{cluster_fast_greedy}}, \code{\link{cluster_spinglass}},
406#' \code{\link{cluster_louvain}} and \code{\link{cluster_leiden}} for
407#' various community detection methods.
408#' @references Clauset, A.; Newman, M. E. J. & Moore, C. Finding community
409#' structure in very large networks, \emph{Physical Review E} 2004, 70, 066111
410#' @method modularity igraph
411#' @export
412#' @keywords graphs
413#' @examples
414#'
415#' g <- make_full_graph(5) %du% make_full_graph(5) %du% make_full_graph(5)
416#' g <- add_edges(g, c(1,6, 1,11, 6, 11))
417#' wtc <- cluster_walktrap(g)
418#' modularity(wtc)
419#' modularity(g, membership(wtc))
420#'
421
422modularity.igraph <- function(x, membership, weights=NULL, ...) {
423  # Argument checks
424  if (!is_igraph(x)) { stop("Not a graph object") }
425  if (is.null(membership) || (!is.numeric(membership) && !is.factor(membership))) {
426    stop("Membership is not a numerical vector")
427  }
428  membership <- as.numeric(membership)
429  if (!is.null(weights)) weights <- as.numeric(weights)
430
431  on.exit( .Call(C_R_igraph_finalizer) )
432  # Function call
433  res <- .Call(C_R_igraph_modularity, x, membership-1, weights)
434  res
435}
436
437#' @rdname communities
438#' @method modularity communities
439#' @export
440
441modularity.communities <- function(x, ...) {
442  if (!is.null(x$modularity)) {
443    max(x$modularity)
444  } else {
445    stop("Modularity was not calculated")
446  }
447}
448
449#' @rdname modularity.igraph
450#' @aliases mod.matrix
451#' @export
452
453modularity_matrix <- function(graph, membership, weights=NULL) {
454  # Argument checks
455  if (!is_igraph(graph)) { stop("Not a graph object") }
456  if (!missing(membership)) {
457    warning("The membership argument is deprecated; modularity_matrix does not need it")
458  }
459
460  if (is.null(weights) && "weight" %in% edge_attr_names(graph)) {
461    weights <- E(graph)$weight
462  }
463  if (!is.null(weights) && any(!is.na(weights))) {
464    weights <- as.numeric(weights)
465  } else {
466    weights <- NULL
467  }
468
469  on.exit( .Call(C_R_igraph_finalizer) )
470  # Function call
471  res <- .Call(C_R_igraph_modularity_matrix, graph, weights)
472
473  res
474}
475
476#' @rdname communities
477#' @method length communities
478#' @export
479
480length.communities <- function(x) {
481  m <- membership(x)
482  max(m)
483}
484
485#' @rdname communities
486#' @export
487
488sizes <- function(communities) {
489  m <- membership(communities)
490  table(`Community sizes`=m)
491}
492
493#' @rdname communities
494#' @export
495
496algorithm <- function(communities) {
497  communities$algorithm
498}
499
500#' @rdname communities
501#' @export
502
503merges <- function(communities) {
504  if (!is.null(communities$merges)) {
505    communities$merges
506  } else {
507    stop("Not a hierarchical community structure")
508  }
509}
510
511#' @rdname communities
512#' @export
513
514crossing <- function(communities, graph) {
515  m <- membership(communities)
516  el <- as_edgelist(graph, names=FALSE)
517  m1 <- m[el[,1]]
518  m2 <- m[el[,2]]
519  res <- m1 != m2
520  if (!is.null(names(m1))) {
521    names(res) <- paste(names(m1), names(m2), sep="|")
522  }
523  res
524}
525
526#' @rdname communities
527#' @export
528
529code_len <- function(communities) {
530  communities$codelength
531}
532
533#' @rdname communities
534#' @export
535
536is_hierarchical <- function(communities) {
537  ! is.null(communities$merges)
538}
539
540complete.dend <- function(comm, use.modularity) {
541  merges <- comm$merges
542  if (nrow(merges) < comm$vcount-1) {
543    if (use.modularity) {
544      stop(paste("`use.modularity' requires a full dendrogram,",
545                 "i.e. a connected graph"))
546    }
547    miss <- seq_len(comm$vcount + nrow(merges))[-as.vector(merges)]
548    miss <- c(miss, seq_len(length(miss)-2) + comm$vcount+nrow(merges))
549    miss <- matrix(miss, byrow=TRUE, ncol=2)
550    merges <- rbind(merges, miss)
551  }
552  storage.mode(merges) <- "integer"
553
554  merges
555}
556
557# The following functions were adapted from the stats R package
558
559#' @rdname communities
560#' @importFrom stats as.dendrogram
561#' @method as.dendrogram communities
562#' @export
563
564as.dendrogram.communities <- function(object, hang=-1, use.modularity=FALSE,
565                                      ...) {
566  if (!is_hierarchical(object)) {
567    stop("Not a hierarchical community structure")
568  }
569
570  .memberDend <- function(x) {
571    r <- attr(x,"x.member")
572    if(is.null(r)) {
573      r <- attr(x,"members")
574    if(is.null(r)) r <- 1:1
575    }
576    r
577  }
578
579  ## If multiple components, then we merge them in arbitrary order
580  merges <- complete.dend(object, use.modularity)
581
582  storage.mode(merges) <- "integer"
583
584  if (is.null(object$names)) {
585    object$names <- 1:(nrow(merges)+1)
586  }
587  z <- list()
588  if (!use.modularity || is.null(object$modularity)) {
589    object$height <- 1:nrow(merges)
590  } else {
591    object$height <- object$modularity[-1]
592    object$height <- cumsum(object$height - min(object$height))
593  }
594  nMerge <- length(oHgt <- object$height)
595  if (nMerge != nrow(merges))
596    stop("'merge' and 'height' do not fit!")
597  hMax <- oHgt[nMerge]
598  one <- 1L
599  two <- 2L
600  leafs <- nrow(merges)+1
601  for (k in 1:nMerge) {
602    x <- merges[k, ]# no sort() anymore!
603    if (any(neg <- x < leafs+1))
604      h0 <- if (hang < 0) 0 else max(0, oHgt[k] - hang * hMax)
605    if (all(neg)) {                  # two leaves
606      zk <- as.list(x)
607      attr(zk, "members") <- two
608      attr(zk, "midpoint") <- 0.5 # mean( c(0,1) )
609      objlabels <- object$names[x]
610      attr(zk[[1]], "label") <- objlabels[1]
611      attr(zk[[2]], "label") <- objlabels[2]
612      attr(zk[[1]], "members") <- attr(zk[[2]], "members") <- one
613      attr(zk[[1]], "height") <- attr(zk[[2]], "height") <- h0
614      attr(zk[[1]], "leaf") <- attr(zk[[2]], "leaf") <- TRUE
615    }
616    else if (any(neg)) {            # one leaf, one node
617      X <- as.character(x)
618      ## Originally had "x <- sort(..) above => leaf always left, x[1];
619      ## don't want to assume this
620      isL <- x[1] < leafs+1 ## is leaf left?
621      zk <-
622        if(isL) list(x[1], z[[X[2]]])
623        else    list(z[[X[1]]], x[2])
624      attr(zk, "members") <- attr(z[[X[1 + isL]]], "members") + one
625      attr(zk, "midpoint") <-
626        (.memberDend(zk[[1]]) + attr(z[[X[1 + isL]]], "midpoint"))/2
627      attr(zk[[2 - isL]], "members") <- one
628      attr(zk[[2 - isL]], "height") <- h0
629      attr(zk[[2 - isL]], "label") <- object$names[x[2 - isL]]
630      attr(zk[[2 - isL]], "leaf") <- TRUE
631      }
632    else {                        # two nodes
633      x <- as.character(x)
634      zk <- list(z[[x[1]]], z[[x[2]]])
635      attr(zk, "members") <- attr(z[[x[1]]], "members") +
636        attr(z[[x[2]]], "members")
637      attr(zk, "midpoint") <- (attr(z[[x[1]]], "members") +
638                               attr(z[[x[1]]], "midpoint") +
639                               attr(z[[x[2]]], "midpoint"))/2
640    }
641    attr(zk, "height") <- oHgt[k]
642    z[[k <- as.character(k+leafs)]] <- zk
643  }
644  z <- z[[k]]
645  class(z) <- "dendrogram"
646  z
647}
648
649#' @rdname communities
650#' @importFrom stats as.hclust
651#' @method as.hclust communities
652#' @export
653
654as.hclust.communities <- function(x, hang=-1, use.modularity=FALSE,
655                                  ...) {
656  as.hclust(as.dendrogram(x, hang=hang, use.modularity=use.modularity))
657}
658
659#' @rdname communities
660#' @export
661
662as_phylo <- function(x, ...)
663  UseMethod("as_phylo")
664
665#' @rdname communities
666#' @method as_phylo communities
667#' @export
668
669as_phylo.communities <- function(x, use.modularity=FALSE, ...) {
670
671  if (!is_hierarchical(x)) {
672    stop("Not a hierarchical community structure")
673  }
674
675  ## If multiple components, then we merge them in arbitrary order
676  merges <- complete.dend(x, use.modularity)
677
678  if (!use.modularity || is.null(x$modularity)) {
679    height <- 1:nrow(merges)
680  } else {
681    height <- x$modularity[-1]
682    height <- cumsum(height - min(height))
683  }
684
685  if (is.null(x$names)) {
686    labels <- 1:(nrow(merges)+1)
687  } else {
688    labels <- x$names
689  }
690
691  N <- nrow(merges)
692  edge <- matrix(0L, 2*N, 2)
693  edge.length <- numeric(2*N)
694  node <- integer(N)
695  node[N] <- N + 2L
696  cur.nod <- N + 3L
697  j <- 1L
698  for (i in N:1) {
699    edge[j:(j+1), 1] <- node[i]
700    for (l in 1:2) {
701      k <- j + l -1L
702      y <- merges[i, l]
703      if (y > N+1) {
704        edge[k, 2] <- node[y-N-1] <- cur.nod
705        cur.nod <- cur.nod + 1L
706        edge.length[k] <- height[i] - height[y-N-1]
707      } else {
708        edge[k, 2] <- y
709        edge.length[k] <- height[i]
710      }
711    }
712    j <- j + 2L
713  }
714
715  obj <- list(edge=edge, edge.length=edge.length/2, tip.label=labels,
716              Nnode=N)
717  class(obj) <- "phylo"
718  ape::reorder.phylo(obj)
719}
720
721#' @rdname communities
722#' @export
723
724cut_at <- function(communities, no, steps) {
725
726  if (!inherits(communities, "communities")) {
727    stop("Not a community structure")
728  }
729  if (!is_hierarchical(communities)) {
730    stop("Not a hierarchical communitity structure")
731  }
732
733  if ((!missing(no) && !missing(steps)) ||
734      ( missing(no) &&  missing(steps))) {
735    stop("Please give either `no' or `steps' (but not both)")
736  }
737
738  if (!missing(steps)) {
739    mm <- merges(communities)
740    if (steps > nrow(mm)) {
741      warning("Cannot make that many steps")
742      steps <- nrow(mm)
743    }
744    community.to.membership2(mm, communities$vcount, steps)
745  } else {
746    mm <- merges(communities)
747    noc <- communities$vcount - nrow(mm) # final number of communities
748    if (no<noc) {
749      warning("Cannot have that few communities")
750      no=noc
751    }
752    steps <- communities$vcount-no
753    community.to.membership2(mm, communities$vcount, steps)
754  }
755}
756
757#' @rdname communities
758#' @export
759
760show_trace <- function(communities) {
761
762  if (!inherits(communities, "communities")) {
763    stop("Not a community structure")
764  }
765  if (is.null(communities$history)) {
766    stop("History was not recorded")
767  }
768
769  res <- character()
770  i <- 1
771  while (i <= length(communities$history)) {
772    if (communities$history[i] == 2) {  # IGRAPH_LEVC_HIST_SPLIT
773      resnew <- paste("Splitting community", communities$history[i+1],
774                      "into two.")
775      i <- i + 2
776    } else if (communities$history[i]==3) { # IGRAPH_LEVC_HIST_FAILED
777      resnew <- paste("Failed splitting community",
778                      communities$history[i+1], "into two.")
779      i <- i + 2
780    } else if (communities$history[i]==4) { # IGRAPH_LEVC_START_FULL
781      resnew <- "Starting with the whole graph as a community."
782      i <- i + 1
783    } else if (communities$history[i]==5) { # IGRAPH_LEVC_START_GIVEN
784      resnew <- paste("Starting from the", communities$history[i+1],
785                      "given communities.")
786      i <- i + 2
787    }
788
789    res <- c(res, resnew)
790  }
791  res
792}
793
794#####################################################################
795
796community.to.membership2 <- function(merges, vcount, steps) {
797  mode(merges) <- "numeric"
798  mode(vcount) <- "numeric"
799  mode(steps)  <- "numeric"
800  on.exit( .Call(C_R_igraph_finalizer) )
801  res <- .Call(C_R_igraph_community_to_membership2, merges-1, vcount, steps)
802  res+1
803}
804
805#####################################################################
806
807
808
809#' Finding communities in graphs based on statistical meachanics
810#'
811#' This function tries to find communities in graphs via a spin-glass model and
812#' simulated annealing.
813#'
814#' This function tries to find communities in a graph. A community is a set of
815#' nodes with many edges inside the community and few edges between outside it
816#' (i.e. between the community itself and the rest of the graph.)
817#'
818#' This idea is reversed for edges having a negative weight, ie. few negative
819#' edges inside a community and many negative edges between communities. Note
820#' that only the \sQuote{neg} implementation supports negative edge weights.
821#'
822#' The \code{spinglass.cummunity} function can solve two problems related to
823#' community detection. If the \code{vertex} argument is not given (or it is
824#' \code{NULL}), then the regular community detection problem is solved
825#' (approximately), i.e. partitioning the vertices into communities, by
826#' optimizing the an energy function.
827#'
828#' If the \code{vertex} argument is given and it is not \code{NULL}, then it
829#' must be a vertex id, and the same energy function is used to find the
830#' community of the the given vertex. See also the examples below.
831#'
832#' @aliases spinglass.community
833#' @param graph The input graph, can be directed but the direction of the edges
834#' is neglected.
835#' @param weights The weights of the edges. Either a numeric vector or
836#' \code{NULL}. If it is null and the input graph has a \sQuote{weight} edge
837#' attribute then that will be used. If \code{NULL} and no such attribute is
838#' present then the edges will have equal weights. Set this to \code{NA} if the
839#' graph was a \sQuote{weight} edge attribute, but you don't want to use it for
840#' community detection. A larger edge weight means a stronger connection
841#' for this function.
842#' @param vertex This parameter can be used to calculate the community of a
843#' given vertex without calculating all communities. Note that if this argument
844#' is present then some other arguments are ignored.
845#' @param spins Integer constant, the number of spins to use. This is the upper
846#' limit for the number of communities. It is not a problem to supply a
847#' (reasonably) big number here, in which case some spin states will be
848#' unpopulated.
849#' @param parupdate Logical constant, whether to update the spins of the
850#' vertices in parallel (synchronously) or not. This argument is ignored if the
851#' second form of the function is used (ie. the \sQuote{\code{vertex}} argument
852#' is present). It is also not implemented in the \dQuote{neg} implementation.
853#' @param start.temp Real constant, the start temperature.  This argument is
854#' ignored if the second form of the function is used (ie. the
855#' \sQuote{\code{vertex}} argument is present).
856#' @param stop.temp Real constant, the stop temperature. The simulation
857#' terminates if the temperature lowers below this level.  This argument is
858#' ignored if the second form of the function is used (ie. the
859#' \sQuote{\code{vertex}} argument is present).
860#' @param cool.fact Cooling factor for the simulated annealing.  This argument
861#' is ignored if the second form of the function is used (ie. the
862#' \sQuote{\code{vertex}} argument is present).
863#' @param update.rule Character constant giving the \sQuote{null-model} of the
864#' simulation. Possible values: \dQuote{simple} and \dQuote{config}.
865#' \dQuote{simple} uses a random graph with the same number of edges as the
866#' baseline probability and \dQuote{config} uses a random graph with the same
867#' vertex degrees as the input graph.
868#' @param gamma Real constant, the gamma argument of the algorithm. This
869#' specifies the balance between the importance of present and non-present
870#' edges in a community. Roughly, a comunity is a set of vertices having many
871#' edges inside the community and few edges outside the community. The default
872#' 1.0 value makes existing and non-existing links equally important. Smaller
873#' values make the existing links, greater values the missing links more
874#' important.
875#' @param implementation Character scalar. Currently igraph contains two
876#' implementations for the Spin-glass community finding algorithm. The faster
877#' original implementation is the default. The other implementation, that takes
878#' into account negative weights, can be chosen by supplying \sQuote{neg} here.
879#' @param gamma.minus Real constant, the gamma.minus parameter of the
880#' algorithm. This specifies the balance between the importance of present and
881#' non-present negative weighted edges in a community. Smaller values of
882#' gamma.minus, leads to communities with lesser negative intra-connectivity.
883#' If this argument is set to zero, the algorithm reduces to a graph coloring
884#' algorithm, using the number of spins as the number of colors. This argument
885#' is ignored if the \sQuote{orig} implementation is chosen.
886#' @return If the \code{vertex} argument is not given, ie. the first form is
887#' used then a \code{\link{cluster_spinglass}} returns a
888#' \code{\link{communities}} object.
889#'
890#' If the \code{vertex} argument is present, ie. the second form is used then a
891#' named list is returned with the following components:
892#' \item{community}{Numeric vector giving the ids of the vertices in the same
893#' community as \code{vertex}.} \item{cohesion}{The cohesion score of the
894#' result, see references.} \item{adhesion}{The adhesion score of the result,
895#' see references.} \item{inner.links}{The number of edges within the community
896#' of \code{vertex}.} \item{outer.links}{The number of edges between the
897#' community of \code{vertex} and the rest of the graph. }
898#' @author Jorg Reichardt for the original code and Gabor Csardi
899#' \email{csardi.gabor@@gmail.com} for the igraph glue code.
900#'
901#' Changes to the original function for including the possibility of negative
902#' ties were implemented by Vincent Traag (\url{http://www.traag.net/}).
903#' @seealso \code{\link{communities}}, \code{\link{components}}
904#' @references J. Reichardt and S. Bornholdt: Statistical Mechanics of
905#' Community Detection, \emph{Phys. Rev. E}, 74, 016110 (2006),
906#' \url{https://arxiv.org/abs/cond-mat/0603718}
907#'
908#' M. E. J. Newman and M. Girvan: Finding and evaluating community structure in
909#' networks, \emph{Phys. Rev. E} 69, 026113 (2004)
910#'
911#' V.A. Traag and Jeroen Bruggeman: Community detection in networks with
912#' positive and negative links, \url{https://arxiv.org/abs/0811.2329} (2008).
913#' @export
914#' @keywords graphs
915#' @examples
916#'
917#'   g <- sample_gnp(10, 5/10) %du% sample_gnp(9, 5/9)
918#'   g <- add_edges(g, c(1, 12))
919#'   g <- induced_subgraph(g, subcomponent(g, 1))
920#'   cluster_spinglass(g, spins=2)
921#'   cluster_spinglass(g, vertex=1)
922#'
923cluster_spinglass <- function(graph, weights=NULL, vertex=NULL, spins=25,
924                                parupdate=FALSE, start.temp=1,
925                                stop.temp=0.01, cool.fact=0.99,
926                                update.rule=c("config", "random", "simple"),
927                                gamma=1.0, implementation=c("orig", "neg"),
928                                gamma.minus=1.0) {
929
930  if (!is_igraph(graph)) {
931    stop("Not a graph object")
932  }
933
934  if (is.null(weights) && "weight" %in% edge_attr_names(graph)) {
935    weights <- E(graph)$weight
936  }
937  if (!is.null(weights) && any(!is.na(weights))) {
938    weights <- as.numeric(weights)
939  } else {
940    weights <- NULL
941  }
942
943  update.rule <- igraph.match.arg(update.rule)
944  update.rule <- switch(update.rule, "simple"=0, "random"=0, "config"=1)
945  implementation <- switch(igraph.match.arg(implementation),
946                                            "orig"=0, "neg"=1)
947
948  on.exit( .Call(C_R_igraph_finalizer) )
949  if (is.null(vertex)) {
950    res <- .Call(C_R_igraph_spinglass_community, graph, weights,
951                 as.numeric(spins), as.logical(parupdate),
952                 as.numeric(start.temp),
953                 as.numeric(stop.temp), as.numeric(cool.fact),
954                 as.numeric(update.rule), as.numeric(gamma),
955                 as.numeric(implementation), as.numeric(gamma.minus))
956    res$algorithm  <- "spinglass"
957    res$vcount     <- vcount(graph)
958    res$membership <- res$membership + 1
959    if (igraph_opt("add.vertex.names") && is_named(graph)) {
960      res$names <- vertex_attr(graph, "name")
961    }
962    class(res) <- "communities"
963  } else {
964    res <- .Call(C_R_igraph_spinglass_my_community, graph, weights,
965                 as.igraph.vs(graph, vertex)-1, as.numeric(spins),
966                 as.numeric(update.rule), as.numeric(gamma))
967    res$community <- res$community + 1
968  }
969  res
970}
971
972#' Finding community structure of a graph using the Leiden algorithm of Traag,
973#' van Eck & Waltman.
974#'
975#' @param graph The input graph, only undirected graphs are supported.
976#' @param objective_function Whether to use the Constant Potts Model (CPM) or
977#'   modularity. Must be either \code{"CPM"} or \code{"modularity"}.
978#' @param weights Optional edge weights to be used. Can be a vector or an edge
979#'   attribute name. If the graph has a \code{weight} edge attribute, then this
980#'   is used by default. Supply \code{NA} here if the graph has a \code{weight}
981#'   edge attribute, but you want to ignore it.
982#' @param resolution_parameter The resolution parameter to use. Higher
983#'   resolutions lead to more smaller communities, while lower resolutions lead
984#'   to fewer larger communities.
985#' @param beta Parameter affecting the randomness in the Leiden algorithm.
986#'   This affects only the refinement step of the algorithm.
987#' @param initial_membership If provided, the Leiden algorithm
988#'   will try to improve this provided membership. If no argument is
989#'   provided, the aglorithm simply starts from the singleton partition.
990#' @param n_iterations the number of iterations to iterate the Leiden
991#'   algorithm. Each iteration may improve the partition further.
992#' @param vertex_weights the vertex weights used in the Leiden algorithm.
993#'   If this is not provided, it will be automatically determined on the
994#'   basis of whether you want to use CPM or modularity. If you do provide
995#'   this, please make sure that you understand what you are doing.
996#' @return \code{cluster_leiden} returns a \code{\link{communities}}
997#' object, please see the \code{\link{communities}} manual page for details.
998#' @author Vincent Traag
999#' @seealso See \code{\link{communities}} for extracting the membership,
1000#' modularity scores, etc. from the results.
1001#'
1002#' Other community detection algorithms: \code{\link{cluster_walktrap}},
1003#' \code{\link{cluster_spinglass}},
1004#' \code{\link{cluster_leading_eigen}},
1005#' \code{\link{cluster_edge_betweenness}},
1006#' \code{\link{cluster_fast_greedy}},
1007#' \code{\link{cluster_label_prop}}
1008#' \code{\link{cluster_louvain}}
1009#' @references Traag, V. A., Waltman, L., & van Eck, N. J. (2019). From Louvain
1010#'   to Leiden: guaranteeing well-connected communities. Scientific
1011#'   reports, 9(1), 5233. doi: 10.1038/s41598-019-41695-z
1012#' @export
1013#' @keywords graphs
1014#' @examples
1015#' g <- graph.famous("Zachary")
1016#' # By default CPM is used
1017#' g <- cluster_leiden(g, resolution_parameter=0.06)
1018#'
1019cluster_leiden <- function(graph, objective_function=c("CPM", "modularity"),
1020                           weights=NULL, resolution_parameter=1, beta=0.01,
1021                           initial_membership=NULL, n_iterations=2, vertex_weights=NULL)
1022{
1023  if (!is_igraph(graph)) {
1024    stop("Not a graph object")
1025  }
1026
1027  # Parse objective function argument
1028  objective_function <- igraph.match.arg(objective_function)
1029  objective_function <- switch(objective_function, "cpm"=0, "modularity"=1)
1030
1031  # Parse edge weights argument
1032  if (is.null(weights) && "weight" %in% edge_attr_names(graph)) {
1033    weights <- E(graph)$weight
1034  }
1035  if (!is.null(weights) && !any(is.na(weights))) {
1036    weights <- as.numeric(weights)
1037  } else {
1038    weights <- NULL
1039  }
1040
1041  # Parse initial_membership argument
1042  if (!is.null(initial_membership) && !any(is.na(initial_membership))) {
1043    initial_membership <- as.numeric(initial_membership)
1044  } else {
1045    initial_membership <- NULL
1046  }
1047
1048  # Parse node weights argument
1049  if (!is.null(vertex_weights) && !any(is.na(vertex_weights))) {
1050    vertex_weights <- as.numeric(vertex_weights)
1051    if (objective_function == 1) { # Using modularity
1052      warning("Providing node weights contradicts using modularity")
1053    }
1054  } else {
1055    if (objective_function == 1) { # Using modularity
1056      # Set correct node weights
1057      vertex_weights <- strength(graph, weights=weights)
1058      # Also correct resolution parameter
1059      resolution_parameter <- resolution_parameter/sum(vertex_weights)
1060    }
1061  }
1062
1063  on.exit( .Call(C_R_igraph_finalizer) )
1064  membership <- initial_membership
1065  if (n_iterations > 0) {
1066    for (i in 1:n_iterations) {
1067      res <- .Call(C_R_igraph_community_leiden, graph, weights,
1068                    vertex_weights, as.numeric(resolution_parameter),
1069                    as.numeric(beta), !is.null(membership),
1070                    membership)
1071      membership <- res$membership
1072    }
1073  } else {
1074    prev_quality <- -Inf
1075    quality <- 0.0
1076    while (prev_quality < quality) {
1077      prev_quality <- quality;
1078      res <- .Call(C_R_igraph_community_leiden, graph, weights,
1079                    vertex_weights, as.numeric(resolution_parameter),
1080                    as.numeric(beta), !is.null(membership),
1081                    membership)
1082      membership <- res$membership
1083      quality <- res$quality
1084    }
1085  }
1086  res$algorithm <- "leiden"
1087  res$vcount    <- vcount(graph)
1088  res$membership <- res$membership + 1
1089  if (igraph_opt("add.vertex.names") && is_named(graph)) {
1090    res$names <- vertex_attr(graph, "name")
1091  }
1092  class(res) <- "communities"
1093  res
1094}
1095
1096#' Community detection algorithm based on interacting fluids
1097#'
1098#' The algorithm detects communities based on the simple idea of
1099#' several fluids interacting in a non-homogeneous environment
1100#' (the graph topology), expanding and contracting based on their
1101#' interaction and density.
1102#'
1103#' @param graph The input graph. The graph must be simple and connected.
1104#'   Empty graphs are not supported as well as single vertex graphs.
1105#'   Edge directions are ignored. Weights are not considered.
1106#' @param no.of.communities The number of communities to be found. Must be
1107#'   greater than 0 and fewer than number of vertices in the graph.
1108#' @return \code{cluster_fluid_communities} returns a \code{\link{communities}}
1109#' object, please see the \code{\link{communities}} manual page for details.
1110#' @author Ferran Parés
1111#' @seealso See \code{\link{communities}} for extracting the membership,
1112#' modularity scores, etc. from the results.
1113#'
1114#' Other community detection algorithms: \code{\link{cluster_walktrap}},
1115#' \code{\link{cluster_spinglass}},
1116#' \code{\link{cluster_leading_eigen}},
1117#' \code{\link{cluster_edge_betweenness}},
1118#' \code{\link{cluster_fast_greedy}},
1119#' \code{\link{cluster_label_prop}}
1120#' \code{\link{cluster_louvain}},
1121#' \code{\link{cluster_leiden}}
1122#' @references Parés F, Gasulla DG, et. al. (2018) Fluid Communities: A Competitive,
1123#' Scalable and Diverse Community Detection Algorithm. In: Complex Networks
1124#' &amp; Their Applications VI: Proceedings of Complex Networks 2017 (The Sixth
1125#' International Conference on Complex Networks and Their Applications),
1126#' Springer, vol 689, p 229, doi: 10.1007/978-3-319-72150-7_19
1127#' @export
1128#' @keywords graphs
1129#' @examples
1130#' g <- graph.famous("Zachary")
1131#' comms <- cluster_fluid_communities(g, 2)
1132
1133cluster_fluid_communities <- cluster_fluid_communities
1134
1135#' Community strucure via short random walks
1136#'
1137#' This function tries to find densely connected subgraphs, also called
1138#' communities in a graph via random walks. The idea is that short random walks
1139#' tend to stay in the same community.
1140#'
1141#' This function is the implementation of the Walktrap community finding
1142#' algorithm, see Pascal Pons, Matthieu Latapy: Computing communities in large
1143#' networks using random walks, https://arxiv.org/abs/physics/0512106
1144#'
1145#' @aliases walktrap.community
1146#' @param graph The input graph, edge directions are ignored in directed
1147#' graphs.
1148#' @param weights The edge weights. Larger edge weights increase the
1149#' probability that an edge is selected by the random walker. In other
1150#' words, larger edge weights correspond to stronger connections.
1151#' @param steps The length of the random walks to perform.
1152#' @param merges Logical scalar, whether to include the merge matrix in the
1153#' result.
1154#' @param modularity Logical scalar, whether to include the vector of the
1155#' modularity scores in the result. If the \code{membership} argument is true,
1156#' then it will be always calculated.
1157#' @param membership Logical scalar, whether to calculate the membership vector
1158#' for the split corresponding to the highest modularity value.
1159#' @return \code{cluster_walktrap} returns a \code{\link{communities}}
1160#' object, please see the \code{\link{communities}} manual page for details.
1161#' @author Pascal Pons (\url{http://psl.pons.free.fr/}) and Gabor Csardi
1162#' \email{csardi.gabor@@gmail.com} for the R and igraph interface
1163#' @seealso See \code{\link{communities}} on getting the actual membership
1164#' vector, merge matrix, modularity score, etc.
1165#'
1166#' \code{\link{modularity}} and \code{\link{cluster_fast_greedy}},
1167#' \code{\link{cluster_spinglass}},
1168#' \code{\link{cluster_leading_eigen}},
1169#' \code{\link{cluster_edge_betweenness}}, \code{\link{cluster_louvain}},
1170#' and \code{\link{cluster_leiden}} for other community detection
1171#' methods.
1172#' @references Pascal Pons, Matthieu Latapy: Computing communities in large
1173#' networks using random walks, https://arxiv.org/abs/physics/0512106
1174#' @export
1175#' @keywords graphs
1176#' @examples
1177#'
1178#' g <- make_full_graph(5) %du% make_full_graph(5) %du% make_full_graph(5)
1179#' g <- add_edges(g, c(1,6, 1,11, 6, 11))
1180#' cluster_walktrap(g)
1181#'
1182cluster_walktrap <- function(graph, weights=E(graph)$weight, steps=4,
1183                               merges=TRUE, modularity=TRUE,
1184                               membership=TRUE) {
1185  if (!is_igraph(graph)) {
1186    stop("Not a graph object!")
1187  }
1188
1189  if (membership && !modularity) {
1190    modularity <- TRUE
1191  }
1192
1193  if (!is.null(weights)) {
1194    weights <- as.numeric(weights)
1195  }
1196
1197  on.exit( .Call(C_R_igraph_finalizer) )
1198  res <- .Call(C_R_igraph_walktrap_community, graph, weights, as.numeric(steps),
1199        as.logical(merges), as.logical(modularity), as.logical(membership))
1200  if (igraph_opt("add.vertex.names") && is_named(graph)) {
1201    res$names <- V(graph)$name
1202  }
1203
1204  res$vcount <- vcount(graph)
1205  res$algorithm <- "walktrap"
1206  res$membership <- res$membership + 1
1207  res$merges <- res$merges + 1
1208  class(res) <- "communities"
1209  res
1210}
1211
1212
1213
1214#' Community structure detection based on edge betweenness
1215#'
1216#' Many networks consist of modules which are densely connected themselves but
1217#' sparsely connected to other modules.
1218#'
1219#' The edge betweenness score of an edge measures the number of shortest paths
1220#' through it, see \code{\link{edge_betweenness}} for details. The idea of the
1221#' edge betweenness based community structure detection is that it is likely
1222#' that edges connecting separate modules have high edge betweenness as all the
1223#' shortest paths from one module to another must traverse through them. So if
1224#' we gradually remove the edge with the highest edge betweenness score we will
1225#' get a hierarchical map, a rooted tree, called a dendrogram of the graph. The
1226#' leafs of the tree are the individual vertices and the root of the tree
1227#' represents the whole graph.
1228#'
1229#' \code{cluster_edge_betweenness} performs this algorithm by calculating the
1230#' edge betweenness of the graph, removing the edge with the highest edge
1231#' betweenness score, then recalculating edge betweenness of the edges and
1232#' again removing the one with the highest score, etc.
1233#'
1234#' \code{edge.betweeness.community} returns various information collected
1235#' through the run of the algorithm. See the return value down here.
1236#'
1237#' @aliases edge.betweenness.community cluster_edge_betweenness
1238#' @param graph The graph to analyze.
1239#' @param weights The edge weights. Supply \code{NULL} to omit edge weights. By
1240#' default the \sQuote{\code{weight}} edge attribute is used, if it is present.
1241#' Edge weights are used to calculate weighted edge betweenness. This means
1242#' that edges are interpreted as distances, not as connection strengths.
1243#' @param directed Logical constant, whether to calculate directed edge
1244#' betweenness for directed graphs. It is ignored for undirected graphs.
1245#' @param edge.betweenness Logical constant, whether to return the edge
1246#' betweenness of the edges at the time of their removal.
1247#' @param merges Logical constant, whether to return the merge matrix
1248#' representing the hierarchical community structure of the network.  This
1249#' argument is called \code{merges}, even if the community structure algorithm
1250#' itself is divisive and not agglomerative: it builds the tree from top to
1251#' bottom. There is one line for each merge (i.e. split) in matrix, the first
1252#' line is the first merge (last split). The communities are identified by
1253#' integer number starting from one. Community ids smaller than or equal to
1254#' \eqn{N}, the number of vertices in the graph, belong to singleton
1255#' communities, ie. individual vertices. Before the first merge we have \eqn{N}
1256#' communities numbered from one to \eqn{N}. The first merge, the first line of
1257#' the matrix creates community \eqn{N+1}, the second merge creates community
1258#' \eqn{N+2}, etc.
1259#' @param bridges Logical constant, whether to return a list the edge removals
1260#' which actually splitted a component of the graph.
1261#' @param modularity Logical constant, whether to calculate the maximum
1262#' modularity score, considering all possibly community structures along the
1263#' edge-betweenness based edge removals.
1264#' @param membership Logical constant, whether to calculate the membership
1265#' vector corresponding to the highest possible modularity score.
1266#' @return \code{cluster_edge_betweenness} returns a
1267#' \code{\link{communities}} object, please see the \code{\link{communities}}
1268#' manual page for details.
1269#' @author Gabor Csardi \email{csardi.gabor@@gmail.com}
1270#' @seealso \code{\link{edge_betweenness}} for the definition and calculation
1271#' of the edge betweenness, \code{\link{cluster_walktrap}},
1272#' \code{\link{cluster_fast_greedy}},
1273#' \code{\link{cluster_leading_eigen}} for other community detection
1274#' methods.
1275#'
1276#' See \code{\link{communities}} for extracting the results of the community
1277#' detection.
1278#' @references M Newman and M Girvan: Finding and evaluating community
1279#' structure in networks, \emph{Physical Review E} 69, 026113 (2004)
1280#' @export
1281#' @keywords graphs
1282#' @examples
1283#'
1284#' g <- sample_pa(100, m = 2, directed = FALSE)
1285#' eb <- cluster_edge_betweenness(g)
1286#'
1287#' g <- make_full_graph(10) %du% make_full_graph(10)
1288#' g <- add_edges(g, c(1,11))
1289#' eb <- cluster_edge_betweenness(g)
1290#' eb
1291#'
1292cluster_edge_betweenness <- function(graph, weights=E(graph)$weight,
1293                                       directed=TRUE,
1294                                       edge.betweenness=TRUE,
1295                                       merges=TRUE, bridges=TRUE,
1296                                       modularity=TRUE,
1297                                       membership=TRUE) {
1298  if (!is_igraph(graph)) {
1299    stop("Not a graph object!")
1300  }
1301
1302  if (!is.null(weights)) {
1303    weights <- as.numeric(weights)
1304  }
1305
1306  on.exit( .Call(C_R_igraph_finalizer) )
1307  res <- .Call(C_R_igraph_community_edge_betweenness, graph, weights,
1308               as.logical(directed),
1309               as.logical(edge.betweenness),
1310               as.logical(merges), as.logical(bridges),
1311               as.logical(modularity), as.logical(membership))
1312  if (igraph_opt("add.vertex.names") && is_named(graph)) {
1313    res$names <- V(graph)$name
1314  }
1315  res$vcount <- vcount(graph)
1316  res$algorithm <- "edge betweenness"
1317  res$membership <- res$membership + 1
1318  res$merges <- res$merges + 1
1319  res$removed.edges <- res$removed.edges + 1
1320  res$bridges <- res$bridges + 1
1321  class(res) <- "communities"
1322  res
1323}
1324
1325#' Community structure via greedy optimization of modularity
1326#'
1327#' This function tries to find dense subgraph, also called communities in
1328#' graphs via directly optimizing a modularity score.
1329#'
1330#' This function implements the fast greedy modularity optimization algorithm
1331#' for finding community structure, see A Clauset, MEJ Newman, C Moore: Finding
1332#' community structure in very large networks,
1333#' http://www.arxiv.org/abs/cond-mat/0408187 for the details.
1334#'
1335#' @aliases fastgreedy.community
1336#' @param graph The input graph
1337#' @param merges Logical scalar, whether to return the merge matrix.
1338#' @param modularity Logical scalar, whether to return a vector containing the
1339#' modularity after each merge.
1340#' @param membership Logical scalar, whether to calculate the membership vector
1341#' corresponding to the maximum modularity score, considering all possible
1342#' community structures along the merges.
1343#' @param weights If not \code{NULL}, then a numeric vector of edge weights.
1344#' The length must match the number of edges in the graph.  By default the
1345#' \sQuote{\code{weight}} edge attribute is used as weights. If it is not
1346#' present, then all edges are considered to have the same weight.
1347#' Larger edge weights correspond to stronger connections.
1348#' @return \code{cluster_fast_greedy} returns a \code{\link{communities}}
1349#' object, please see the \code{\link{communities}} manual page for details.
1350#' @author Tamas Nepusz \email{ntamas@@gmail.com} and Gabor Csardi
1351#' \email{csardi.gabor@@gmail.com} for the R interface.
1352#' @seealso \code{\link{communities}} for extracting the results.
1353#'
1354#' See also \code{\link{cluster_walktrap}},
1355#' \code{\link{cluster_spinglass}},
1356#' \code{\link{cluster_leading_eigen}} and
1357#' \code{\link{cluster_edge_betweenness}}, \code{\link{cluster_louvain}}
1358#' \code{\link{cluster_leiden}} for other methods.
1359#' @references A Clauset, MEJ Newman, C Moore: Finding community structure in
1360#' very large networks, http://www.arxiv.org/abs/cond-mat/0408187
1361#' @export
1362#' @keywords graphs
1363#' @examples
1364#'
1365#' g <- make_full_graph(5) %du% make_full_graph(5) %du% make_full_graph(5)
1366#' g <- add_edges(g, c(1,6, 1,11, 6, 11))
1367#' fc <- cluster_fast_greedy(g)
1368#' membership(fc)
1369#' sizes(fc)
1370#'
1371cluster_fast_greedy <- function(graph, merges=TRUE, modularity=TRUE,
1372                                 membership=TRUE,
1373                                 weights=E(graph)$weight) {
1374  if (!is_igraph(graph)) {
1375    stop("Not a graph object")
1376  }
1377
1378  if (!is.null(weights)) {
1379    weights <- as.numeric(weights)
1380  }
1381
1382  on.exit( .Call(C_R_igraph_finalizer) )
1383  res <- .Call(C_R_igraph_community_fastgreedy, graph, as.logical(merges),
1384               as.logical(modularity), as.logical(membership), weights)
1385  if (igraph_opt("add.vertex.names") && is_named(graph)) {
1386    res$names <- V(graph)$name
1387  }
1388  res$algorithm <- "fast greedy"
1389  res$vcount <- vcount(graph)
1390  res$membership <- res$membership + 1
1391  res$merges <- res$merges + 1
1392  class(res) <- "communities"
1393  res
1394}
1395
1396igraph.i.levc.arp <- function(externalP, externalE) {
1397  f <- function(v) {
1398    v <- as.numeric(v)
1399    .Call(C_R_igraph_i_levc_arp, externalP, externalE, v)
1400  }
1401  f
1402}
1403
1404
1405
1406#' Community structure detecting based on the leading eigenvector of the
1407#' community matrix
1408#'
1409#' This function tries to find densely connected subgraphs in a graph by
1410#' calculating the leading non-negative eigenvector of the modularity matrix of
1411#' the graph.
1412#'
1413#' The function documented in these section implements the \sQuote{leading
1414#' eigenvector} method developed by Mark Newman, see the reference below.
1415#'
1416#' The heart of the method is the definition of the modularity matrix,
1417#' \code{B}, which is \code{B=A-P}, \code{A} being the adjacency matrix of the
1418#' (undirected) network, and \code{P} contains the probability that certain
1419#' edges are present according to the \sQuote{configuration model}. In other
1420#' words, a \code{P[i,j]} element of \code{P} is the probability that there is
1421#' an edge between vertices \code{i} and \code{j} in a random network in which
1422#' the degrees of all vertices are the same as in the input graph.
1423#'
1424#' The leading eigenvector method works by calculating the eigenvector of the
1425#' modularity matrix for the largest positive eigenvalue and then separating
1426#' vertices into two community based on the sign of the corresponding element
1427#' in the eigenvector. If all elements in the eigenvector are of the same sign
1428#' that means that the network has no underlying comuunity structure.  Check
1429#' Newman's paper to understand why this is a good method for detecting
1430#' community structure.
1431#'
1432#' @aliases leading.eigenvector.community
1433#' @param graph The input graph. Should be undirected as the method needs a
1434#' symmetric matrix.
1435#' @param steps The number of steps to take, this is actually the number of
1436#' tries to make a step. It is not a particularly useful parameter.
1437#' @param weights An optional weight vector. The \sQuote{weight} edge attribute
1438#' is used if present. Supply \sQuote{\code{NA}} here if you want to ignore the
1439#' \sQuote{weight} edge attribute. Larger edge weights correspond to stronger
1440#' connections between vertices.
1441#' @param start \code{NULL}, or a numeric membership vector, giving the start
1442#' configuration of the algorithm.
1443#' @param options A named list to override some ARPACK options.
1444#' @param callback If not \code{NULL}, then it must be callback function. This
1445#' is called after each iteration, after calculating the leading eigenvector of
1446#' the modularity matrix. See details below.
1447#' @param extra Additional argument to supply to the callback function.
1448#' @param env The environment in which the callback function is evaluated.
1449#' @return \code{cluster_leading_eigen} returns a named list with the
1450#' following members: \item{membership}{The membership vector at the end of the
1451#' algorithm, when no more splits are possible.} \item{merges}{The merges
1452#' matrix starting from the state described by the \code{membership} member.
1453#' This is a two-column matrix and each line describes a merge of two
1454#' communities, the first line is the first merge and it creates community
1455#' \sQuote{\code{N}}, \code{N} is the number of initial communities in the
1456#' graph, the second line creates community \code{N+1}, etc.  }
1457#' \item{options}{Information about the underlying ARPACK computation, see
1458#' \code{\link{arpack}} for details.  }
1459#' @section Callback functions: The \code{callback} argument can be used to
1460#' supply a function that is called after each eigenvector calculation. The
1461#' following arguments are supplied to this function: \describe{
1462#'   \item{membership}{The actual membership vector, with zero-based indexing.}
1463#'   \item{community}{The community that the algorithm just tried to split,
1464#'     community numbering starts with zero here.}
1465#'   \item{value}{The eigenvalue belonging to the leading eigenvector the
1466#'     algorithm just found.}
1467#'   \item{vector}{The leading eigenvector the algorithm just found.}
1468#'   \item{multiplier}{An R function that can be used to multiple the actual
1469#'     modularity matrix with an arbitrary vector. Supply the vector as an
1470#'     argument to perform this multiplication. This function can be used
1471#'     with ARPACK.}
1472#'   \item{extra}{The \code{extra} argument that was passed to
1473#'     \code{cluster_leading_eigen}. }
1474#'   The callback function should return a scalar number. If this number
1475#'   is non-zero, then the clustering is terminated.
1476#' }
1477#' @author Gabor Csardi \email{csardi.gabor@@gmail.com}
1478#' @seealso \code{\link{modularity}}, \code{\link{cluster_walktrap}},
1479#' \code{\link{cluster_edge_betweenness}},
1480#' \code{\link{cluster_fast_greedy}}, \code{\link[stats]{as.dendrogram}}
1481#' @references MEJ Newman: Finding community structure using the eigenvectors
1482#' of matrices, Physical Review E 74 036104, 2006.
1483#' @export
1484#' @keywords graphs
1485#' @examples
1486#'
1487#' g <- make_full_graph(5) %du% make_full_graph(5) %du% make_full_graph(5)
1488#' g <- add_edges(g, c(1,6, 1,11, 6, 11))
1489#' lec <- cluster_leading_eigen(g)
1490#' lec
1491#'
1492#' cluster_leading_eigen(g, start=membership(lec))
1493#'
1494cluster_leading_eigen <- function(graph, steps=-1, weights=NULL,
1495                                          start=NULL,
1496                                          options=arpack_defaults,
1497                                          callback=NULL, extra=NULL,
1498                                          env=parent.frame()){
1499
1500  # Argument checks
1501  if (!is_igraph(graph)) { stop("Not a graph object") }
1502  steps <- as.integer(steps)
1503  if (is.null(weights) && "weight" %in% edge_attr_names(graph)) {
1504    weights <- E(graph)$weight
1505  }
1506  if (!is.null(weights) && any(!is.na(weights))) {
1507    weights <- as.numeric(weights)
1508  } else {
1509    weights <- NULL
1510  }
1511  if (!is.null(start)) { start <- as.numeric(start)-1 }
1512  options.tmp <- arpack_defaults; options.tmp[ names(options) ] <- options ; options <- options.tmp
1513
1514  on.exit( .Call(C_R_igraph_finalizer) )
1515  # Function call
1516  res <- .Call(C_R_igraph_community_leading_eigenvector, graph, steps,
1517               weights, options, start, callback, extra, env,
1518               environment(igraph.i.levc.arp))
1519  if (igraph_opt("add.vertex.names") && is_named(graph)) {
1520    res$names <- V(graph)$name
1521  }
1522  res$algorithm <- "leading eigenvector"
1523  res$vcount <- vcount(graph)
1524  res$membership <- res$membership + 1
1525  res$merges <- res$merges + 1
1526  res$history <- res$history + 1
1527  class(res) <- "communities"
1528  res
1529}
1530
1531#' Finding communities based on propagating labels
1532#'
1533#' This is a fast, nearly linear time algorithm for detecting community
1534#' structure in networks. In works by labeling the vertices with unique labels
1535#' and then updating the labels by majority voting in the neighborhood of the
1536#' vertex.
1537#'
1538#' This function implements the community detection method described in:
1539#' Raghavan, U.N. and Albert, R. and Kumara, S.: Near linear time algorithm to
1540#' detect community structures in large-scale networks. Phys Rev E 76, 036106.
1541#' (2007). This version extends the original method by the ability to take edge
1542#' weights into consideration and also by allowing some labels to be fixed.
1543#'
1544#' From the abstract of the paper: \dQuote{In our algorithm every node is
1545#' initialized with a unique label and at every step each node adopts the label
1546#' that most of its neighbors currently have. In this iterative process densely
1547#' connected groups of nodes form a consensus on a unique label to form
1548#' communities.}
1549#'
1550#' @aliases label.propagation.community
1551#' @param graph The input graph, should be undirected to make sense.
1552#' @param weights An optional weight vector. It should contain a positive
1553#' weight for all the edges. The \sQuote{weight} edge attribute is used if
1554#' present. Supply \sQuote{\code{NA}} here if you want to ignore the
1555#' \sQuote{weight} edge attribute. Larger edge weights correspond to
1556#' stronger connections.
1557#' @param initial The initial state. If \code{NULL}, every vertex will have a
1558#' different label at the beginning. Otherwise it must be a vector with an
1559#' entry for each vertex. Non-negative values denote different labels, negative
1560#' entries denote vertices without labels.
1561#' @param fixed Logical vector denoting which labels are fixed. Of course this
1562#' makes sense only if you provided an initial state, otherwise this element
1563#' will be ignored. Also note that vertices without labels cannot be fixed.
1564#' @return \code{cluster_label_prop} returns a
1565#' \code{\link{communities}} object, please see the \code{\link{communities}}
1566#' manual page for details.
1567#' @author Tamas Nepusz \email{ntamas@@gmail.com} for the C implementation,
1568#' Gabor Csardi \email{csardi.gabor@@gmail.com} for this manual page.
1569#' @seealso \code{\link{communities}} for extracting the actual results.
1570#'
1571#' \code{\link{cluster_fast_greedy}}, \code{\link{cluster_walktrap}},
1572#' \code{\link{cluster_spinglass}}, \code{\link{cluster_louvain}} and
1573#' \code{\link{cluster_leiden}} for other community detection methods.
1574#' @references Raghavan, U.N. and Albert, R. and Kumara, S.: Near linear time
1575#' algorithm to detect community structures in large-scale networks. \emph{Phys
1576#' Rev E} 76, 036106. (2007)
1577#' @export
1578#' @keywords graphs
1579#' @examples
1580#'
1581#'   g <- sample_gnp(10, 5/10) %du% sample_gnp(9, 5/9)
1582#'   g <- add_edges(g, c(1, 12))
1583#'   cluster_label_prop(g)
1584#'
1585cluster_label_prop <- function(graph, weights=NULL, initial=NULL,
1586                                        fixed=NULL) {
1587  # Argument checks
1588  if (!is_igraph(graph)) { stop("Not a graph object") }
1589  if (is.null(weights) && "weight" %in% edge_attr_names(graph)) {
1590  weights <- E(graph)$weight
1591  }
1592  if (!is.null(weights) && any(!is.na(weights))) {
1593  weights <- as.numeric(weights)
1594  } else {
1595  weights <- NULL
1596  }
1597  if (!is.null(initial)) initial <- as.numeric(initial)
1598  if (!is.null(fixed)) fixed <- as.logical(fixed)
1599
1600  on.exit( .Call(C_R_igraph_finalizer) )
1601  # Function call
1602  res <- .Call(C_R_igraph_community_label_propagation, graph, weights, initial, fixed)
1603  if (igraph_opt("add.vertex.names") && is_named(graph)) {
1604    res$names <- V(graph)$name
1605  }
1606  res$vcount <- vcount(graph)
1607  res$algorithm <- "label propagation"
1608  res$membership <- res$membership + 1
1609  class(res) <- "communities"
1610  res
1611}
1612
1613
1614
1615#' Finding community structure by multi-level optimization of modularity
1616#'
1617#' This function implements the multi-level modularity optimization algorithm
1618#' for finding community structure, see references below. It is based on the
1619#' modularity measure and a hierarchical approach.
1620#'
1621#' This function implements the multi-level modularity optimization algorithm
1622#' for finding community structure, see VD Blondel, J-L Guillaume, R Lambiotte
1623#' and E Lefebvre: Fast unfolding of community hierarchies in large networks,
1624#' \url{https://arxiv.org/abs/0803.0476} for the details.
1625#'
1626#' It is based on the modularity measure and a hierarchical approach.
1627#' Initially, each vertex is assigned to a community on its own. In every step,
1628#' vertices are re-assigned to communities in a local, greedy way: each vertex
1629#' is moved to the community with which it achieves the highest contribution to
1630#' modularity. When no vertices can be reassigned, each community is considered
1631#' a vertex on its own, and the process starts again with the merged
1632#' communities. The process stops when there is only a single vertex left or
1633#' when the modularity cannot be increased any more in a step.
1634#'
1635#' This function was contributed by Tom Gregorovic.
1636#'
1637#' @aliases multilevel.community
1638#' @param graph The input graph.
1639#' @param weights Optional positive weight vector.  If the graph has a
1640#' \code{weight} edge attribute, then this is used by default. Supply \code{NA}
1641#' here if the graph has a \code{weight} edge attribute, but you want to ignore
1642#' it. Larger edge weights correspond to stronger connections.
1643#' @return \code{cluster_louvain} returns a \code{\link{communities}}
1644#' object, please see the \code{\link{communities}} manual page for details.
1645#' @author Tom Gregorovic, Tamas Nepusz \email{ntamas@@gmail.com}
1646#' @seealso See \code{\link{communities}} for extracting the membership,
1647#' modularity scores, etc. from the results.
1648#'
1649#' Other community detection algorithms: \code{\link{cluster_walktrap}},
1650#' \code{\link{cluster_spinglass}},
1651#' \code{\link{cluster_leading_eigen}},
1652#' \code{\link{cluster_edge_betweenness}},
1653#' \code{\link{cluster_fast_greedy}},
1654#' \code{\link{cluster_label_prop}}
1655#' \code{\link{cluster_leiden}}
1656#' @references Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte,
1657#' Etienne Lefebvre: Fast unfolding of communities in large networks. J. Stat.
1658#' Mech. (2008) P10008
1659#' @export
1660#' @keywords graphs
1661#' @examples
1662#'
1663#' # This is so simple that we will have only one level
1664#' g <- make_full_graph(5) %du% make_full_graph(5) %du% make_full_graph(5)
1665#' g <- add_edges(g, c(1,6, 1,11, 6, 11))
1666#' cluster_louvain(g)
1667#'
1668cluster_louvain <- function(graph, weights=NULL) {
1669  # Argument checks
1670  if (!is_igraph(graph)) { stop("Not a graph object") }
1671  if (is.null(weights) && "weight" %in% edge_attr_names(graph)) {
1672  weights <- E(graph)$weight
1673  }
1674  if (!is.null(weights) && any(!is.na(weights))) {
1675  weights <- as.numeric(weights)
1676  } else {
1677  weights <- NULL
1678  }
1679
1680  on.exit( .Call(C_R_igraph_finalizer) )
1681  # Function call
1682  res <- .Call(C_R_igraph_community_multilevel, graph, weights)
1683  if (igraph_opt("add.vertex.names") && is_named(graph)) {
1684    res$names <- V(graph)$name
1685  }
1686  res$vcount <- vcount(graph)
1687  res$algorithm <- "multi level"
1688  res$membership <- res$membership + 1
1689  res$memberships <- res$memberships + 1
1690  class(res) <- "communities"
1691  res
1692}
1693
1694
1695
1696#' Optimal community structure
1697#'
1698#' This function calculates the optimal community structure of a graph, by
1699#' maximizing the modularity measure over all possible partitions.
1700#'
1701#' This function calculates the optimal community structure for a graph, in
1702#' terms of maximal modularity score.
1703#'
1704#' The calculation is done by transforming the modularity maximization into an
1705#' integer programming problem, and then calling the GLPK library to solve
1706#' that. Please the reference below for details.
1707#'
1708#' Note that modularity optimization is an NP-complete problem, and all known
1709#' algorithms for it have exponential time complexity. This means that you
1710#' probably don't want to run this function on larger graphs. Graphs with up to
1711#' fifty vertices should be fine, graphs with a couple of hundred vertices
1712#' might be possible.
1713#'
1714#' @section Examples:
1715#' \preformatted{
1716#'
1717#' ## Zachary's karate club
1718#' g <- make_graph("Zachary")
1719#'
1720#' ## We put everything into a big 'try' block, in case
1721#' ## igraph was compiled without GLPK support
1722#'
1723#' ## The calculation only takes a couple of seconds
1724#' oc <- cluster_optimal(g)
1725#'
1726#' ## Double check the result
1727#' print(modularity(oc))
1728#' print(modularity(g, membership(oc)))
1729#'
1730#' ## Compare to the greedy optimizer
1731#' fc <- cluster_fast_greedy(g)
1732#' print(modularity(fc))
1733#' }
1734#'
1735#' @aliases optimal.community
1736#' @param graph The input graph. Edge directions are ignored for directed
1737#' graphs.
1738#' @param weights Optional positive weight vector for optimizing weighted
1739#' modularity. If the graph has a \code{weight} edge attribute, then this is
1740#' used by default. Supply \code{NA} to ignore the weights of a weighted graph.
1741#' Larger edge weights correspond to stronger connections.
1742#' @return \code{cluster_optimal} returns a \code{\link{communities}} object,
1743#' please see the \code{\link{communities}} manual page for details.
1744#' @author Gabor Csardi \email{csardi.gabor@@gmail.com}
1745#' @seealso \code{\link{communities}} for the documentation of the result,
1746#' \code{\link{modularity}}. See also \code{\link{cluster_fast_greedy}} for a
1747#' fast greedy optimizer.
1748#' @references Ulrik Brandes, Daniel Delling, Marco Gaertler, Robert Gorke,
1749#' Martin Hoefer, Zoran Nikoloski, Dorothea Wagner: On Modularity Clustering,
1750#' \emph{IEEE Transactions on Knowledge and Data Engineering} 20(2):172-188,
1751#' 2008.
1752#' @export
1753#' @keywords graphs
1754
1755cluster_optimal <- function(graph, weights=NULL) {
1756  # Argument checks
1757  if (!is_igraph(graph)) { stop("Not a graph object") }
1758  if (is.null(weights) && "weight" %in% edge_attr_names(graph)) {
1759    weights <- E(graph)$weight
1760  }
1761  if (!is.null(weights) && any(!is.na(weights))) {
1762    weights <- as.numeric(weights)
1763  } else {
1764    weights <- NULL
1765  }
1766
1767  on.exit( .Call(C_R_igraph_finalizer) )
1768  # Function call
1769  res <- .Call(C_R_igraph_community_optimal_modularity, graph, weights)
1770  if (igraph_opt("add.vertex.names") && is_named(graph)) {
1771    res$names <- V(graph)$name
1772  }
1773  res$vcount <- vcount(graph)
1774  res$algorithm <- "optimal"
1775  res$membership <- res$membership + 1
1776  class(res) <- "communities"
1777  res
1778}
1779
1780
1781
1782#' Infomap community finding
1783#'
1784#' Find community structure that minimizes the expected description length of a
1785#' random walker trajectory
1786#'
1787#' Please see the details of this method in the references given below.
1788#'
1789#' @aliases infomap.community
1790#' @param graph The input graph.
1791#' @param e.weights If not \code{NULL}, then a numeric vector of edge weights.
1792#' The length must match the number of edges in the graph.  By default the
1793#' \sQuote{\code{weight}} edge attribute is used as weights. If it is not
1794#' present, then all edges are considered to have the same weight.
1795#' Larger edge weights correspond to stronger connections.
1796#' @param v.weights If not \code{NULL}, then a numeric vector of vertex
1797#' weights. The length must match the number of vertices in the graph.  By
1798#' default the \sQuote{\code{weight}} vertex attribute is used as weights. If
1799#' it is not present, then all vertices are considered to have the same weight.
1800#' A larger vertex weight means a larger probability that the random surfer
1801#' jumps to that vertex.
1802#' @param nb.trials The number of attempts to partition the network (can be any
1803#' integer value equal or larger than 1).
1804#' @param modularity Logical scalar, whether to calculate the modularity score
1805#' of the detected community structure.
1806#' @return \code{cluster_infomap} returns a \code{\link{communities}} object,
1807#' please see the \code{\link{communities}} manual page for details.
1808#' @author Martin Rosvall wrote the original C++ code. This was ported to
1809#' be more igraph-like by Emmanuel Navarro.  The R interface and
1810#' some cosmetics was done by Gabor Csardi \email{csardi.gabor@@gmail.com}.
1811#' @seealso Other community finding methods and \code{\link{communities}}.
1812#' @references The original paper: M. Rosvall and C. T. Bergstrom, Maps of
1813#' information flow reveal community structure in complex networks, \emph{PNAS}
1814#' 105, 1118 (2008) \doi{10.1073/pnas.0706851105}, \url{https://arxiv.org/abs/0707.0609}
1815#'
1816#' A more detailed paper: M. Rosvall, D. Axelsson, and C. T. Bergstrom, The map
1817#' equation, \emph{Eur. Phys. J. Special Topics} 178, 13 (2009).
1818#' \doi{10.1140/epjst/e2010-01179-1}, \url{https://arxiv.org/abs/0906.1405}.
1819#' @export
1820#' @keywords graphs
1821#' @examples
1822#'
1823#' ## Zachary's karate club
1824#' g <- make_graph("Zachary")
1825#'
1826#' imc <- cluster_infomap(g)
1827#' membership(imc)
1828#' communities(imc)
1829#'
1830cluster_infomap <- function(graph, e.weights=NULL, v.weights=NULL,
1831                              nb.trials=10, modularity=TRUE) {
1832
1833  # Argument checks
1834  if (!is_igraph(graph)) { stop("Not a graph object") }
1835  if (is.null(e.weights) && "weight" %in% edge_attr_names(graph)) {
1836    e.weights <- E(graph)$weight
1837  }
1838  if (!is.null(e.weights) && any(!is.na(e.weights))) {
1839    e.weights <- as.numeric(e.weights)
1840  } else {
1841    e.weights <- NULL
1842  }
1843  if (is.null(v.weights) && "weight" %in% vertex_attr_names(graph)) {
1844    v.weights <- V(graph)$weight
1845  }
1846  if (!is.null(v.weights) && any(!is.na(v.weights))) {
1847    v.weights <- as.numeric(v.weights)
1848  } else {
1849    v.weights <- NULL
1850  }
1851  nb.trials <- as.integer(nb.trials)
1852
1853  on.exit( .Call(C_R_igraph_finalizer) )
1854  # Function call
1855  res <- .Call(C_R_igraph_community_infomap, graph, e.weights,
1856               v.weights, nb.trials)
1857
1858  if (igraph_opt("add.vertex.names") && is_named(graph)) {
1859    res$names <- V(graph)$name
1860  }
1861  res$vcount <- vcount(graph)
1862  res$algorithm <- "infomap"
1863  res$membership <- res$membership + 1
1864  if (modularity) {
1865    res$modularity <- modularity(graph, res$membership, weights=e.weights)
1866  }
1867  class(res) <- "communities"
1868  res
1869}
1870
1871#' @rdname communities
1872#' @method plot communities
1873#' @export
1874#' @importFrom graphics plot
1875
1876plot.communities <- function(x, y,
1877                             col=membership(x),
1878                             mark.groups=communities(x),
1879                             edge.color=c("black", "red")[crossing(x,y)+1],
1880                             ...) {
1881
1882  plot(y, vertex.color=col, mark.groups=mark.groups,
1883       edge.color=edge.color,
1884       ...)
1885}
1886
1887
1888
1889#' @rdname plot_dendrogram.communities
1890#' @aliases dendPlot
1891#' @export
1892
1893plot_dendrogram <- function(x, mode=igraph_opt("dend.plot.type"), ...)
1894  UseMethod("plot_dendrogram")
1895
1896
1897
1898#' Community structure dendrogram plots
1899#'
1900#' Plot a hierarchical community structure as a dendrogram.
1901#'
1902#' \code{plot_dendrogram} supports three different plotting functions, selected via
1903#' the \code{mode} argument. By default the plotting function is taken from the
1904#' \code{dend.plot.type} igraph option, and it has for possible values:
1905#' \itemize{ \item \code{auto} Choose automatically between the plotting
1906#' functions. As \code{plot.phylo} is the most sophisticated, that is choosen,
1907#' whenever the \code{ape} package is available. Otherwise \code{plot.hclust}
1908#' is used.  \item \code{phylo} Use \code{plot.phylo} from the \code{ape}
1909#' package.  \item \code{hclust} Use \code{plot.hclust} from the \code{stats}
1910#' package.  \item \code{dendrogram} Use \code{plot.dendrogram} from the
1911#' \code{stats} package.  }
1912#'
1913#' The different plotting functions take different sets of arguments. When
1914#' using \code{plot.phylo} (\code{mode="phylo"}), we have the following syntax:
1915#' \preformatted{
1916#'     plot_dendrogram(x, mode="phylo", colbar = palette(),
1917#'             edge.color = NULL, use.edge.length = FALSE, \dots)
1918#' } The extra arguments not documented above: \itemize{
1919#'   \item \code{colbar} Color bar for the edges.
1920#'   \item \code{edge.color} Edge colors. If \code{NULL}, then the
1921#'     \code{colbar} argument is used.
1922#'   \item \code{use.edge.length} Passed to \code{plot.phylo}.
1923#'   \item \code{dots} Attitional arguments to pass to \code{plot.phylo}.
1924#' }
1925#'
1926#' The syntax for \code{plot.hclust} (\code{mode="hclust"}): \preformatted{
1927#'     plot_dendrogram(x, mode="hclust", rect = 0, colbar = palette(),
1928#'             hang = 0.01, ann = FALSE, main = "", sub = "", xlab = "",
1929#'             ylab = "", \dots)
1930#' } The extra arguments not documented above: \itemize{
1931#'   \item \code{rect} A numeric scalar, the number of groups to mark on
1932#'     the dendrogram. The dendrogram is cut into exactly \code{rect}
1933#'     groups and they are marked via the \code{rect.hclust} command. Set
1934#'     this to zero if you don't want to mark any groups.
1935#'   \item \code{colbar} The colors of the rectangles that mark the
1936#'     vertex groups via the \code{rect} argument.
1937#'   \item \code{hang} Where to put the leaf nodes, this corresponds to the
1938#'     \code{hang} argument of \code{plot.hclust}.
1939#'   \item \code{ann}  Whether to annotate the plot, the \code{ann}
1940#'     argument of \code{plot.hclust}.
1941#'   \item \code{main} The main title of the plot, the \code{main} argument
1942#'     of \code{plot.hclust}.
1943#'   \item \code{sub} The sub-title of the plot, the \code{sub} argument of
1944#'     \code{plot.hclust}.
1945#'   \item \code{xlab} The label on the horizontal axis, passed to
1946#'     \code{plot.hclust}.
1947#'   \item \code{ylab} The label on the vertical axis, passed to
1948#'     \code{plot.hclust}.
1949#'   \item \code{dots} Attitional arguments to pass to \code{plot.hclust}.
1950#' }
1951#'
1952#' The syntax for \code{plot.dendrogram} (\code{mode="dendrogram"}):
1953#' \preformatted{
1954#'     plot_dendrogram(x, \dots)
1955#' } The extra arguments are simply passed to \code{as.dendrogram}.
1956#'
1957#' @param x An object containing the community structure of a graph. See
1958#' \code{\link{communities}} for details.
1959#' @param mode Which dendrogram plotting function to use. See details below.
1960#' @param \dots Additional arguments to supply to the dendrogram plotting
1961#' function.
1962#' @param use.modularity Logical scalar, whether to use the modularity values
1963#' to define the height of the branches.
1964#' @param palette The color palette to use for colored plots.
1965#' @return Returns whatever the return value was from the plotting function,
1966#' \code{plot.phylo}, \code{plot.dendrogram} or \code{plot.hclust}.
1967#' @author Gabor Csardi \email{csardi.gabor@@gmail.com}
1968#' @method plot_dendrogram communities
1969#' @export
1970#' @keywords graphs
1971#' @examples
1972#'
1973#' karate <- make_graph("Zachary")
1974#' fc <- cluster_fast_greedy(karate)
1975#' plot_dendrogram(fc)
1976#'
1977plot_dendrogram.communities <- function(x,
1978                                 mode=igraph_opt("dend.plot.type"), ...,
1979                                 use.modularity=FALSE,
1980                                 palette=categorical_pal(8)) {
1981  mode <- igraph.match.arg(mode, c("auto", "phylo", "hclust", "dendrogram"))
1982
1983  old_palette <- palette(palette)
1984  on.exit(palette(old_palette), add = TRUE)
1985
1986  if (mode=="auto") {
1987    have_ape <- requireNamespace("ape", quietly = TRUE)
1988    mode <- if (have_ape) "phylo" else "hclust"
1989  }
1990
1991  if (mode=="hclust") {
1992    dendPlotHclust(x, use.modularity=use.modularity, ...)
1993  } else if (mode=="dendrogram") {
1994    dendPlotDendrogram(x, use.modularity=use.modularity, ...)
1995  } else if (mode=="phylo") {
1996    dendPlotPhylo(x, use.modularity=use.modularity, ...)
1997  }
1998}
1999
2000#' @importFrom grDevices palette
2001#' @importFrom graphics plot
2002#' @importFrom stats rect.hclust
2003
2004dendPlotHclust <- function(communities, rect=length(communities),
2005                           colbar=palette(), hang=-1, ann=FALSE,
2006                           main="", sub="", xlab="", ylab="", ...,
2007                           use.modularity=FALSE) {
2008  hc <- as.hclust(communities, hang=hang, use.modularity=use.modularity)
2009  ret <- plot(hc, hang=hang, ann=ann, main=main, sub=sub, xlab=xlab,
2010              ylab=ylab, ...)
2011  if (rect > 0) {
2012    rect.hclust(hc, k=rect, border=colbar)
2013  }
2014  invisible(ret)
2015}
2016
2017#' @importFrom graphics plot
2018
2019dendPlotDendrogram <- function(communities, hang=-1, ...,
2020                               use.modularity=FALSE) {
2021  plot(as.dendrogram(communities, hang=hang, use.modularity=use.modularity),
2022       ...)
2023}
2024
2025#' @importFrom grDevices palette
2026#' @importFrom graphics plot
2027
2028dendPlotPhylo <- function(communities, colbar=palette(),
2029                          col=colbar[membership(communities)],
2030                          mark.groups=communities(communities),
2031                          use.modularity=FALSE,
2032                          edge.color="#AAAAAAFF",
2033                          edge.lty=c(1,2), ...) {
2034
2035  phy <- as_phylo(communities, use.modularity=use.modularity)
2036
2037  getedges <- function(tip) {
2038    repeat {
2039      ee <- which(! phy$edge[,1] %in% tip & phy$edge[,2] %in% tip)
2040      if (length(ee)<=1) { break }
2041      tip <- c(tip, unique(phy$edge[ee,1]))
2042    }
2043    ed <- which(phy$edge[,1] %in% tip & phy$edge[,2] %in% tip)
2044    eds <- phy$edge[ed, 1]
2045    good <- which(phy$edge[ed,1] %in% which(tabulate(eds) != 1))
2046    ed[good]
2047  }
2048  gredges <- lapply(mark.groups, getedges)
2049
2050  if (length(mark.groups) > 0) {
2051    ecol <- rep(edge.color, nrow(phy$edge))
2052    for (gr in seq_along(gredges)) {
2053      ecol[gredges[[gr]]] <- colbar[gr]
2054    }
2055  } else {
2056    ecol <- edge.color
2057  }
2058
2059  elty <- rep(edge.lty[2], nrow(phy$edge))
2060  elty[ unlist(gredges) ] <- edge.lty[1]
2061
2062  plot(phy, edge.color=ecol, edge.lty=elty, tip.color=col, ...)
2063}
2064
2065#' Compares community structures using various metrics
2066#'
2067#' This function assesses the distance between two community structures.
2068#'
2069#'
2070#' @aliases compare.communities compare.membership compare
2071#' @param comm1 A \code{\link{communities}} object containing a community
2072#' structure; or a numeric vector, the membership vector of the first community
2073#' structure. The membership vector should contain the community id of each
2074#' vertex, the numbering of the communities starts with one.
2075#' @param comm2 A \code{\link{communities}} object containing a community
2076#' structure; or a numeric vector, the membership vector of the second
2077#' community structure, in the same format as for the previous argument.
2078#' @param method Character scalar, the comparison method to use. Possible
2079#' values: \sQuote{vi} is the variation of information (VI) metric of Meila
2080#' (2003), \sQuote{nmi} is the normalized mutual information measure proposed
2081#' by Danon et al. (2005), \sQuote{split.join} is the split-join distance of
2082#' can Dongen (2000), \sQuote{rand} is the Rand index of Rand (1971),
2083#' \sQuote{adjusted.rand} is the adjusted Rand index by Hubert and Arabie
2084#' (1985).
2085#' @return A real number.
2086#' @author Tamas Nepusz \email{ntamas@@gmail.com}
2087#' @seealso See \code{\link{cluster_walktrap}},
2088#' \code{\link{cluster_spinglass}},
2089#' \code{\link{cluster_leading_eigen}},
2090#' \code{\link{cluster_edge_betweenness}},
2091#' \code{\link{cluster_fast_greedy}},
2092#' \code{\link{cluster_label_prop}}
2093#' \code{\link{cluster_louvain}}
2094#' \code{\link{cluster_leiden}}
2095#' for various community detection methods.
2096#' @references Meila M: Comparing clusterings by the variation of information.
2097#' In: Scholkopf B, Warmuth MK (eds.). \emph{Learning Theory and Kernel
2098#' Machines: 16th Annual Conference on Computational Learning Theory and 7th
2099#' Kernel Workshop}, COLT/Kernel 2003, Washington, DC, USA. Lecture Notes in
2100#' Computer Science, vol. 2777, Springer, 2003. ISBN: 978-3-540-40720-1.
2101#'
2102#' Danon L, Diaz-Guilera A, Duch J, Arenas A: Comparing community structure
2103#' identification. \emph{J Stat Mech} P09008, 2005.
2104#'
2105#' van Dongen S: Performance criteria for graph clustering and Markov cluster
2106#' experiments. Technical Report INS-R0012, National Research Institute for
2107#' Mathematics and Computer Science in the Netherlands, Amsterdam, May 2000.
2108#'
2109#' Rand WM: Objective criteria for the evaluation of clustering methods.
2110#' \emph{J Am Stat Assoc} 66(336):846-850, 1971.
2111#'
2112#' Hubert L and Arabie P: Comparing partitions. \emph{Journal of
2113#' Classification} 2:193-218, 1985.
2114#' @export
2115#' @keywords graphs
2116#' @examples
2117#'
2118#' g <- make_graph("Zachary")
2119#' sg <- cluster_spinglass(g)
2120#' le <- cluster_leading_eigen(g)
2121#' compare(sg, le, method="rand")
2122#' compare(membership(sg), membership(le))
2123#'
2124compare <- function(comm1, comm2, method=c("vi", "nmi",
2125                                    "split.join", "rand",
2126                                    "adjusted.rand"))
2127  UseMethod("compare")
2128
2129#' @method compare communities
2130#' @export
2131
2132compare.communities <- function(comm1, comm2,
2133                                method=c("vi", "nmi", "split.join", "rand",
2134                                  "adjusted.rand")) {
2135
2136  i_compare(comm1, comm2, method)
2137}
2138
2139#' @method compare membership
2140#' @export
2141
2142compare.membership <- function(comm1, comm2,
2143                                method=c("vi", "nmi", "split.join", "rand",
2144                                  "adjusted.rand")) {
2145
2146  i_compare(comm1, comm2, method)
2147}
2148
2149#' @method compare default
2150#' @export
2151
2152compare.default <- compare.membership
2153
2154i_compare <- function (comm1, comm2, method=c("vi", "nmi", "split.join",
2155                                       "rand", "adjusted.rand")) {
2156
2157  comm1 <- if (inherits(comm1, "communities")) {
2158    as.numeric(membership(comm1))
2159  } else {
2160    as.numeric(as.factor(comm1))
2161  }
2162  comm2 <- if (inherits(comm2, "communities")) {
2163    as.numeric(membership(comm2))
2164  } else {
2165    as.numeric(as.factor(comm2))
2166  }
2167  method <- switch(igraph.match.arg(method), vi = 0, nmi = 1,
2168                   split.join = 2, rand = 3, adjusted.rand = 4)
2169  on.exit(.Call(C_R_igraph_finalizer) )
2170  res <- .Call(C_R_igraph_compare_communities, comm1, comm2, method)
2171  res
2172}
2173
2174#' Split-join distance of two community structures
2175#'
2176#' The split-join distance between partitions A and B is the sum of the
2177#' projection distance of A from B and the projection distance of B from
2178#' A. The projection distance is an asymmetric measure and it is defined as
2179#' follows:
2180#'
2181#' First, each set in partition A is evaluated against all sets in
2182#' partition B. For each set in partition A, the best matching set in
2183#' partition B is found and the overlap size is calculated. (Matching is
2184#' quantified by the size of the overlap between the two sets). Then, the
2185#' maximal overlap sizes for each set in A are summed together and
2186#' subtracted from the number of elements in A.
2187#'
2188#' The split-join distance will be returned as two numbers, the first is
2189#' the projection distance of the first partition from the
2190#' second, while the second number is the projection distance of the second
2191#' partition from the first. This makes it easier to detect whether a
2192#' partition is a subpartition of the other, since in this case, the
2193#' corresponding distance will be zero.
2194#'
2195#' @param comm1 The first community structure.
2196#' @param comm2 The second community structure.
2197#' @return Two integer numbers, see details below.
2198#'
2199#' @references
2200#' van Dongen S: Performance criteria for graph clustering and Markov
2201#' cluster experiments. Technical Report INS-R0012, National Research
2202#' Institute for Mathematics and Computer Science in the Netherlands,
2203#' Amsterdam, May 2000.
2204#'
2205#' @export
2206
2207split_join_distance <- function(comm1, comm2) {
2208  comm1 <- if (inherits(comm1, "communities")) {
2209    as.numeric(membership(comm1))
2210  } else {
2211    as.numeric(comm1)
2212  }
2213  comm2 <- if (inherits(comm2, "communities")) {
2214    as.numeric(membership(comm2))
2215  } else {
2216    as.numeric(comm2)
2217  }
2218  on.exit(.Call(C_R_igraph_finalizer) )
2219  res <- .Call(C_R_igraph_split_join_distance, comm1, comm2)
2220  unlist(res)
2221}
2222
2223#' Groups of a vertex partitioning
2224#'
2225#' Create a list of vertex groups from some graph clustering or community
2226#' structure.
2227#'
2228#' Currently two methods are defined for this function. The default method
2229#' works on the output of \code{\link{components}}. (In fact it works on any
2230#' object that is a list with an entry called \code{membership}.)
2231#'
2232#' The second method works on \code{\link{communities}} objects.
2233#'
2234#' @aliases groups groups.default groups.communities
2235#' @param x Some object that represents a grouping of the vertices. See details
2236#' below.
2237#' @return A named list of numeric or character vectors. The names are just
2238#' numbers that refer to the groups. The vectors themselves are numeric or
2239#' symbolic vertex ids.
2240#' @seealso \code{\link{components}} and the various community finding
2241#' functions.
2242#' @examples
2243#' g <- make_graph("Zachary")
2244#' fgc <- cluster_fast_greedy(g)
2245#' groups(fgc)
2246#'
2247#' g2 <- make_ring(10) + make_full_graph(5)
2248#' groups(components(g2))
2249#' @export
2250
2251groups <- function(x)
2252  UseMethod("groups")
2253
2254#' @method groups default
2255#' @export
2256
2257groups.default <- function(x) {
2258  vids <- names(x$membership)
2259  if (is.null(vids)) vids <- seq_along(x$membership)
2260  tapply(vids, x$membership, simplify=FALSE, function(x) x)
2261}
2262
2263#' @method groups communities
2264#' @export
2265
2266groups.communities <- function(x) {
2267  m <- membership(x)
2268  groups.default(list(membership = m))
2269}
2270
2271#' @export
2272
2273communities <- groups.communities
2274
2275#' @method "[" communities
2276#' @export
2277
2278`[.communities` <- function(x, i) {
2279  groups(x)[i]
2280}
2281
2282#' @method "[[" communities
2283#' @export
2284
2285`[[.communities` <- function(x, i) {
2286  groups(x)[[i]]
2287}
2288
2289
2290#' Contract several vertices into a single one
2291#'
2292#' This function creates a new graph, by merging several vertices into one. The
2293#' vertices in the new graph correspond to sets of vertices in the input graph.
2294#'
2295#' The attributes of the graph are kept. Graph and edge attributes are
2296#' unchanged, vertex attributes are combined, according to the
2297#' \code{vertex.attr.comb} parameter.
2298#'
2299#' @aliases contract.vertices contract
2300#' @param graph The input graph, it can be directed or undirected.
2301#' @param mapping A numeric vector that specifies the mapping. Its elements
2302#' correspond to the vertices, and for each element the id in the new graph is
2303#' given.
2304#' @param vertex.attr.comb Specifies how to combine the vertex attributes in
2305#' the new graph. Please see \code{\link{attribute.combination}} for details.
2306#' @return A new graph object.
2307#' @author Gabor Csardi \email{csardi.gabor@@gmail.com}
2308#' @keywords graphs
2309#' @examples
2310#'
2311#' g <- make_ring(10)
2312#' g$name <- "Ring"
2313#' V(g)$name <- letters[1:vcount(g)]
2314#' E(g)$weight <- runif(ecount(g))
2315#'
2316#' g2 <- contract(g, rep(1:5, each=2),
2317#'                         vertex.attr.comb=toString)
2318#'
2319#' ## graph and edge attributes are kept, vertex attributes are
2320#' ## combined using the 'toString' function.
2321#' print(g2, g=TRUE, v=TRUE, e=TRUE)
2322#'
2323#' @export
2324
2325contract <- contract
2326