Hierarchical Random Graphs

This page is a companion for the article on hierarchical random graphs (HRGs), written by Aaron Clauset (me), Cris Moore and Mark E.J. Newman. This page hosts our implementation of the basic HRG fitting procedures described in the paper. The code is implemented in ANSI C++ and requires no non-standard libraries. The basic code for (1) running the MCMC procedure, (2) using HRGs to predict missing connections, and (3) constructing the consensus dendrogram, is provided as-is below. (Note: thanks to Gabor Csardi, the igraph library now includes both a C implementation and an R implementation of many of these functions. Nick Dronen also provides a Python implementation.)

Each program is relatively straight-forward to use, and omits some of the more complex experimental things mentioned in the paper. On the other hand, it should be relatively clear to an intermediate-level programmer how to adapt this code for more complex tasks. The code runs as a command-line application, provides no graphical interface, and reads/writes plain text files. The code has been tested under Mac OS X 10.{4,5}.x and Ubuntu Linux 7.04.

Journal References

  1. Hierarchical structure and the prediction of missing links in networks.
    A. Clauset, C. Moore, and M.E.J. Newman.
    Nature 453, 98 - 101 (2008). (preprint version)

  2. Structural Inference of Hierarchies in Networks.
    A. Clauset, C. Moore, and M.E.J. Newman.
    In E. M. Airoldi et al. (Eds.): ICML 2006 Ws, Lecture Notes in Computer Science 4503,
    1 - 13. Springer-Verlag, Berlin Heidelberg (2007).

HRG file format
In order to support the i/o of dendrogram models, the code uses a special text format to describe a hierarchical random graph model. The format is basically an explicit representation of the dendrogram (binary tree) with probability annotations at the internal nodes. An example of such a model, for the Zachary's karate club network is here. The corresponding edge list file is here. Both of these files are included in the packages below.

Fitting HRGs to data using MCMC
This code runs our Markov chain Monte Carlo (MCMC) algorithm for sampling the space of dendrogram models. The input format is a standard one (tab-delimited edge list); this code will take the input graph and run the MCMC forever. Each time it samples a dendrogram model with a new highest likelihood, it writes the corresponding model out as a text file (.hrg file) along with some additional information about the model (.info file). The .hrg file can be passed back to fitHRG to restart the sampling later, or to generate a graph instance drawn from the ensemble defined by the given model (-make flag). A makefile is included with the code, and can be used for standard compilation. Run the program with no arguments to get usage information. If the input text files are not formatted correctly, the program will crash.
fitHRG code (C++)

Predicting missing edges using HRGs
This code runs the MCMC algorithm on a particular input graph and outputs the list of possible missing connections, in descending order of their likelihood under the sampled models. The MCMC has a built-in convergence criteria to decide when to start sampling (easy to modify), and the MCMC can be seeded with an HRG model (derived from fitHRG, perhaps). The basic code is almost identical to fitHRG, but with a few additional methods for making predictions.
predictHRG code (C++)

Consensus dendrograms
This code runs the MCMC algorithm from a seed HRG model (derived from fitHRG, perhaps) and outputs a consensus dendrogram file (-consensus.tree) which contains only the dendrogram features that appear in the majority of the sampled models. The basic code is almost identical to fitHRG, but with a few additional methods for constructing the consensus dendrogram.
consensusHRG code (C++)

Visualizing HRG models
This matlab function draws radial dendrogram figures like the one shown here for the karate club network (generated by fitHRG). I cobbled this together from examining other dendrogram-plotting routines, and it is sufficient for visualizing .hrg files produced by the fitHRG code. This code can also annotate the leaf nodes with group information. Place this file somewhere in your Matlab path and type 'help hrgplot' for more information.
hrgplot.m (Matlab)

Visualizing consensus dendrograms
This matlab function draws radial dendrogram figures generated by the consensusHRG code. Just like the hrgplot function above, this function can also annotate the leaf nodes with group information. Place this file somewhere in your Matlab path and type 'help consensusplot' for more information.
consensusplot.m (Matlab)

A note about bugs and your use of this code
The code located here is provided as-is, with no warranty, etc. (It's under GPL v2.) But, if you do experience problems using the code, please let me know via email. I'm not actively maintaining this code anymore (it's research code, so you're largely on your own both to understand what it does and adapt it to your needs). That being said, there are copious comments in the code files in case you want to modify them, and if you ask a short question very nicely over email, I may be able to help you out.

Finally, if you use this code in an academic publication, it would be courteous of you to thank me in your acknowledgements for providing the code.

A note about network data
The three network data sets we used in the Nature paper were drawn from the literature. For the two we provide here, please use the original citations (given below) if you use them in a paper.

  1. Metabolic network for the spirochaete Triponema pallidum: contact authors for data.

    M. Huss and P. Holme, "Currency and commodity metabolites: their identification and relation to the modularity of metabolic networks." IET Systems Biology 1, 280 (2007).

  2. Network of associations between terrorists (network, vertex labels, node names).

    V. Krebs, "Mapping networks of terrorist cells." Connections 24, 43-52 (2002).

  3. Food web of grassland species (network and vertex labels).

    H. A. Dawah, B. A. Hawkins and M. F. Claridge, "Structure of the parasitoid communities of grass-freeding chalcid wasps." Journal of Animal Ecology 64, 708-720 (1995).

Updates
27 May 2012: added a workaround for predictHRG for when the program seg faults while sorting candidate edges by their likelihoods; workaround writes out the unsorted list before sorting; if sorting is successful, this file is overwritten with the sorted list.
19 August 2011: fixed a small issue with consensusplot.m that would cause it break when vertex labels were not contiguous integers starting at 1.
13 August 2009: added links to and information about the three network data sets used in the Nature paper.
30 January 2009: fixed internal indexing issue in consensusplot.m when input file contained a leaf node with name 0.
16 January 2009: fixed issue with output file from fitHRG when -make is invoked.
5 December 2008: moved all the files from /randomgraphs to /hierarchy.
22 September 2008: fixed a small problem in consensusplot.m that caused it to throw an error when reading in .group files.
18 August 2008: fixed a small problem with the recordPredictions() function in predictHRG that prevented it from writing out the last prediction.
10 June 2008: fixed an error in sizing an array which would sometimes cause a segmentation fault in predictHRG just before writing the predictions to file.
21 May 2008: minor tweaks to fitHRG and predictHRG code (mainly cleaning up the code a little more, making header information standard, etc.). Fixed a memory deallocation issue in predictHRG that effects some users.
21 May 2008: Placed initial version of consensus HRG and consensusplot.m code online.
29 April 2008: placed initial version of hrgplot.m code online.
1 January 2008: placed initial versions (1.0) of fitHRG and predictHRG online.