Hierarchical Random Graphs
This page is a companion for the article on hierarchical random
graphs (HRGs), written by
Aaron Clauset (me),
Cris Moore and
Mark E.J. Newman. This page
hosts our implementation of the basic HRG fitting procedures described in the
paper. The code is implemented in ANSI C++ and requires no non-standard libraries.
The basic code for (1) running the MCMC procedure, (2) using HRGs to predict
missing connections, and (3) constructing the consensus dendrogram, is provided
as-is below. (Note: thanks to
Gabor Csardi, the
igraph library now includes both
a C implementation and an R implementation of many of these functions.
Nick Dronen also provides a Python implementation.)
Each program is relatively straight-forward to use, and omits some of the more complex experimental things mentioned in the paper. On the other hand, it should be relatively clear to an intermediate-level programmer how to adapt this code for more complex tasks. The code runs as a command-line application, provides no graphical interface, and reads/writes plain text files. The code has been tested under Mac OS X 10.{4,5}.x and Ubuntu Linux 7.04.
Journal References
-
Hierarchical structure and the prediction of missing links in networks.
A. Clauset, C. Moore, and M.E.J. Newman.
Nature 453, 98 - 101 (2008). (preprint version) -
Structural Inference of Hierarchies in Networks.
A. Clauset, C. Moore, and M.E.J. Newman.
In E. M. Airoldi et al. (Eds.): ICML 2006 Ws, Lecture Notes in Computer Science 4503,
1 - 13. Springer-Verlag, Berlin Heidelberg (2007).
HRG file format
In order to support the i/o of dendrogram models, the code uses a special text format
to describe a hierarchical random graph model. The format is basically an explicit
representation of the dendrogram (binary tree) with probability annotations at the
internal nodes. An example of such a model, for the Zachary's karate club network
is here. The corresponding edge list file
is here. Both of these files are included in the
packages below.
Fitting HRGs to data using MCMC
This code runs our Markov chain Monte Carlo (MCMC) algorithm for sampling the space
of dendrogram models. The input format is a standard one (tab-delimited edge list);
this code will take the input graph and run the MCMC forever. Each time it samples a
dendrogram model with a new highest likelihood, it writes the corresponding model out
as a text file (.hrg file) along with some additional information about the model
(.info file). The .hrg file can be passed back to fitHRG to restart the sampling later,
or to generate a graph instance drawn from the ensemble defined by the given model
(-make flag). A makefile is included with the code, and can be used for standard
compilation. Run the program with no arguments to get usage information. If the
input text files are not formatted correctly, the program will crash.
fitHRG code (C++)
Predicting missing edges using HRGs
This code runs the MCMC algorithm on a particular input graph and outputs the list
of possible missing connections, in descending order of their likelihood under the
sampled models. The MCMC has a built-in convergence criteria to decide when to start
sampling (easy to modify), and the MCMC can be seeded with an HRG model (derived
from fitHRG, perhaps). The basic code is almost identical to fitHRG, but
with a few additional methods for making predictions.
predictHRG code (C++)
Consensus dendrograms
This code runs the MCMC algorithm from a seed HRG model (derived from fitHRG, perhaps)
and outputs a consensus dendrogram file (-consensus.tree) which contains only the
dendrogram features that appear in the majority of the sampled models. The basic code
is almost identical to fitHRG, but with a few additional methods for constructing the
consensus dendrogram.
consensusHRG code (C++)

Visualizing HRG models
This matlab function draws radial dendrogram figures like the one shown here for the
karate club network (generated by fitHRG). I cobbled this together from examining other
dendrogram-plotting routines, and it is sufficient for visualizing .hrg files produced
by the fitHRG code. This code can also annotate the leaf nodes with group information.
Place this file somewhere in your Matlab path and type 'help hrgplot' for more
information.
hrgplot.m (Matlab)
Visualizing consensus dendrograms
This matlab function draws radial dendrogram figures generated by the consensusHRG
code. Just like the hrgplot function above, this function can also annotate the leaf
nodes with group information. Place this file somewhere in your Matlab path and type
'help consensusplot' for more information.
consensusplot.m (Matlab)
A note about bugs and your use of this code
The code located here is provided as-is, with no warranty, etc. (It's under GPL v2.)
But, if you do experience problems using the code, please let me know via email. I'm
not actively maintaining this code anymore (it's research code, so you're largely on
your own both to understand what it does and adapt it to your needs). That being
said, there are copious comments in the code files in case you want to modify them,
and if you ask a short question very nicely over email, I may be able to help you out.
Finally, if you use this code in an academic publication, it would be courteous of you to thank me in your acknowledgements for providing the code.
A note about network data
The three network data sets we used in the Nature paper were drawn from the
literature. For the two we provide here, please use the original citations (given below)
if you use them in a paper.
- Metabolic network for the spirochaete Triponema pallidum: contact authors for data.
M. Huss and P. Holme, "Currency and commodity metabolites: their identification and relation to the modularity of metabolic networks." IET Systems Biology 1, 280 (2007).
- Network of associations between terrorists (network, vertex labels, node names).
V. Krebs, "Mapping networks of terrorist cells." Connections 24, 43-52 (2002).
- Food web of grassland species (network and vertex labels).
H. A. Dawah, B. A. Hawkins and M. F. Claridge, "Structure of the parasitoid communities of grass-freeding chalcid wasps." Journal of Animal Ecology 64, 708-720 (1995).
Updates
27 May 2012: added a workaround for predictHRG for when the program seg faults while sorting candidate edges by their likelihoods; workaround writes out the unsorted list before sorting; if sorting is successful, this file is overwritten with the sorted list.
19 August 2011: fixed a small issue with consensusplot.m that would cause it break when vertex labels were not contiguous integers starting at 1.
13 August 2009: added links to and information about the three network data sets used in the Nature paper.
30 January 2009: fixed internal indexing issue in consensusplot.m when input
file contained a leaf node with name 0.
16 January 2009: fixed issue with output file from fitHRG when -make is invoked.
5 December 2008: moved all the files from /randomgraphs to /hierarchy.
22 September 2008: fixed a small problem in consensusplot.m that caused it to
throw an error when reading in .group files.
18 August 2008: fixed a small problem with the recordPredictions() function in
predictHRG that prevented it from writing out the last prediction.
10 June 2008: fixed an error in sizing an array which would sometimes cause a
segmentation fault in predictHRG just before writing the predictions to file.
21 May 2008: minor tweaks to fitHRG and predictHRG code (mainly cleaning up the
code a little more, making header information standard, etc.). Fixed a memory
deallocation issue in predictHRG that effects some users.
21 May 2008: Placed initial version of consensus HRG and consensusplot.m code
online.
29 April 2008: placed initial version of hrgplot.m code online.
1 January 2008: placed initial versions (1.0) of fitHRG and predictHRG online.