We have created a freely available dataset with the publication histories of nearly all Nobel prize winners from the past century, by combining unstructured data collected from CVs, university websites, and Wikipedia, together with the publication and citation database from Microsoft Academic Graph (MAG). The dataset allows to investigate the career dynamics of scientific elites. It was introduced and described in this paper. You can download it here.
Fast consensus clustering
The first consensus clustering algorithm we introduced has a worst-case time complexity which is quadratic in the number of nodes of the network. This is because it requires the calculation of all entries of the n×n consensus matrix. Such high complexity prevents the applicability of the algorithm to large networks. In Fast consensus clustering in complex networks we presented a new algorithm in which only some elements of the consensus matrix are computed, i.e. those corresponding to the pairs of neighbors of the network, plus a comparable number of node pairs, suitably chosen. Such method has the same complexity as the clustering algorithm adopted for the analysis, so it can get down to linear if a fast clustering method is used, which allows to analyze networks with millions of nodes and links. Performance is the same or close to that of the original consensus technique. The code for our fast consensus clustering method is here.
Thresholding weighted networks
Many real networks of interest have weighted links, indicating the strength or importance of the relationships (interactions) between nodes. However, in many cases networks are too dense for any meaningful analysis, so it is necessary to remove many links and leave only the most important ones. Such process, called graph sparsification, can be implemented in many different ways. The most popular way is linear thresholding, which consists in removing all links with weight below a given threshold. In Weight thresholding on complex networks we have checked how various properties of the network are affected by linear thresholding. While a number of features can be heavily affected, community structure is very robust and holds even when a major fraction of links are removed. In the paper we introduced the Minimum Absolute spectral Similarity (MASS), a measure that allows to quantify the difference in the spectral properties between the original network and its sparsified versions. The code to compute the MASS can be found here.
Multiresolution consensus clustering
Networks often exhibit structure at different scales and multiresolution community detection methods have been developed to capture the different relevant community scales. We have developed a consensus clustering technique that combines partitions at different levels of resolution. Since a single (consensus) may not adequately summarize a sample of partitions, e.g. because of multiple meaningful community scales, we have developed a recursive strategy for generating a hierarchical cluster tree to extract more information. Communities are iteratively split into smaller clusters as long as the split is above a given significance level, indicating the likelihood that pairs of nodes are not coclustered by chance. Our multiresolution consensus clustering technique is available on github.
Detection of timescales in the evolution of complex systems
In Detection of timescales in evolving complex systems we proposed a method that aims at detecting evolutionary changes in the configuration of a complex system, and generates intervals accordingly. The evolutionary timescales can be identified by looking for peaks in the similarity between the sets of events on consecutive time intervals of data. The code can be found on github.
Dynamic benchmarks for community detection
We have introduced a set of dynamic benchmark graphs with communities (see Benchmark model to assess community structure in evolving networks). They are dynamic versions of the planted partition model, which is a special class of stochastic block models. The community structure of the networks is defined at each stage of the evolution and can be used to test dynamic clustering algorithms. The code to build our dynamic benchmark graphs is downloadable from github.
In Consensus clustering in complex networks we have introduced a self-consistent technique to derive consensus partitions out of a set of partitions delivered by stochastic clustering algorithms. Most methods typically return different partitions if different random number sequences are adopted, due to noise. Consensus clustering returns a sort of average of the direct partitions provided by an algorithm, which is often better descriptive of a network's community structure. In this package you find a selected collection of clustering algorithms for networks, along with our consensus clustering method. The program can also find a consensus among partitions computed from different networks.
The package contains the Modularity Optimization Method (via simulated annealing), Infomap (also hierarchical), OSLOM, the Louvain Method, and the Label Propagation Method. They all support the same format file (edge list). Moreover, the package provides a relatively simple algorithm for visualization (cvis), ready to be opened by gephi. The ReadMe will tell you more.
Order Statistics Local Optimization Method (OSLOM)
In the paper Finding statistically significant communities in networks we proposed a novel community detection technique, the Order Statistics Local Optimization Method (OSLOM). The method is based on the local optimization of a score that expresses the statistical significance of a cluster with respect to a random null model and relies on the mathematical tools of order statistics. OSLOM assigns a p-value to each cluster, indicating its significance. The algorithm can be applied on directed and weighted networks. It can handle community overlaps and hierarchy as well. The code of the algorithm can be downloaded here. Instructions on how to use the code are included in the package.
Information filtering in complex weighted networks
In the paper Information filtering in complex weighted networks, we proposed a technique (GloSS) for the computation of the statistical significance of edges in weighted networks. The code of the implementation of the method for the case of weights with real values can be downloaded here. The version of the program for networks with discrete weights can be downloaded here. Instructions on how to compile the code and use the program are included in the packages. We have also produced some videos about the application of the filtering technique to the US airport network (avi, mov) and UK commuting network (avi, mov).
Phys Author Rank Algorithm
Phys Author Rank Algorithm is a website where physicists can check the evolution of their own scientific rank. Scientific rank is calculated using the Science Author Rank Algorithm (SARA) on a weighted author citation network built from papers published on journals of the American Physical Society. SARA is as extension of Google's PageRank algorithm to rank Web pages. It was introduced in the article Diffusion of scientific credits and the ranking of scientists.
LFR benchmark graphs
Methods to detect communities in graphs need to be thoroughly tested. To do that, one needs benchmark graphs with a built-in community structure, that the methods should identify. Standard benchmarks, like that by Girvan and Newman, do not account for important features of real networks, like the fat-tailed distributions of node degree and community size. Therefore, we have designed new classes of benchmark graphs, in which the distributions of node degree and community size are both power laws, with tunable exponents. In the paper Benchmark graphs for testing community detection algorithms, we have proposed a benchmark for the simplest case of undirected and unweighted graphs. In a Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities, we have extended our benchmark to account for the existence of overlapping communities, and to directed and weighted networks. We have also proposed a hierarchical version of the benchmark, in which clusters are embedded in larger clusters.
- Package 1 includes the code to generate undirected and unweighted graphs with overlapping communities. The extent of the overlap can be tuned by input, and it can be set to zero if one is interested in non-overlapping clusters.
- Package 2 includes the code to generate undirected weighted graphs with possibly overlapping communities.
- Package 3 includes the code to generate directed unweighted graphs with possibly overlapping communities.
- Package 4 includes the code to generate directed weighted graphs with possibly overlapping communities.
- Package 5 includes the code to generate the hierarchical benchmarks, with communities inside other communities.
The code is written in C++. In each package there is a Readme file where you can find instructions on how to use the code and simple examples.
Generalized normalized mutual information
In our paper Detecting the overlapping and hierarchical community structure in complex networks (New J. Phys. 11, 033015, 2009) we have introduced a measure of similarity between partitions that can be applied also to compare covers, i.e. divisions of a network into overlapping communities. The measure is based on the normalized mutual information used in information theory and regularly adopted by scholars to compare partitions of a network in communities. The standard normalized mutual information cannot be trivially extended to the case of overlapping communities. The code to compute our new measure for a pair of partitions/covers can be found here. The new measure does not coincide with the standard normalized mutual information when communities do not overlap, but it is quite close. For more information on the measure check this link.