Faculty Hiring Networks

This page is a companion for the Science Advances article on faculty hiring networks, written by Aaron Clauset (me), Samuel Arbesman, and Daniel B. Larremore. This page hosts an implementation of the network ranking methods, the complete faculty hiring data sets analyzed in the paper, and links to interactive visualizations of that data.

Journal reference
A. Clauset, S. Arbesman, and D. B. Larremore, "Systematic inequality and hierarchy in faculty hiring networks." Science Advances 1(1), e1400005 (2015).
Supplementary Materials file (PDF)
Perspective piece in Slate, with Joel Warner

Network ranking code
This function takes as input an adjacency matrix representing a directed network and runs a zero-temperature Markov Chain Monte Carlo algorithm to sample all minimum violation rankings (MVRs) of the input network. Usage information is included in the file; type 'help mvrsample' at the Matlab prompt for more information.
NOTE: we cannot provide technical support for code not written by us, and we are busy with other projects now and so may not provide support for our own code.
mvrsample.m (Matlab, by Aaron Clauset)
mvrsample.m (C, by Huanshen Wei)

Faculty hiring data
Data on 18,924 tenure or tenure-track faculty was collected between May 2011 and August 2013 for the disciplines of Computer Science, Business, and History. All data was collected and validated by hand from primary sources (mainly departmental webpages and faculty homepages). The following table gives a brief sketch of the data included in the study, which represents the 16,316 faculty at the within-sample institutions who received their doctorates from another within-sample institution. (About 14% of faculty received their doctorates from outside our sample, mainly from foreign universities.) These data represent the largest and most comprehensive survey of faculty placements collected at the time of publication in 2015.

computer science business history
institutions 205 112 144
regular faculty 4388 7856 4072
mean size 21.4 70.1 28.6
collection period 5/2011-3/2012 3/2012-12/2012 1/2013-8/2013

These data are provided separately for each discipline. One file contains the directed edges (u,v), each of which denotes a person who received their doctorate from institution u and was faculty at institution v during the collection period. Each edge is annotated with the faculty rank of that individual at v and that person's gender. A second file contains vertex attribute information, including the prestige score assigned by the MVR sampling algorithm, its USNews rank and the National Research Council rank (if available) most recent to the collection period, its geographic region (generally US Census), and the institution's name. Missing values are denoted by a period '.'. All files are plain text. No personally identifying information is included in the files (but these data are by no means anonymous). Additional details of the data can be found in the paper's Supplementary Materials.

Computer Science faculty hiring network and departmental attribute files:
ComputerScience_edgelist.txt (68 KB)
ComputerScience_vertexlist.txt (11 KB)

Business School faculty hiring network and departmental attribute files:
Business_edgelist.txt (119 KB)
Business_vertexlist.txt (5 KB)

History faculty hiring network and departmental attribute files:
History_edgelist.txt (60 KB)
History_vertexlist.txt (7 KB)

All the data:
replicationData_all.zip (51 KB)

Interactive visualizations
Dan Larremore has put together an interactive visualization of these data. Within each discipline, you can explore different aspects of the network, examine individual institutions, and inspect the flows of faculty between specific pairs of institutions within the prestige hierarchy.

A note on the software and data
The Matlab code for mvrsample.m was designed to be compatible with Matlab v7.13 (2011). It is not necessarily compatible with older versions of Matlab; it should be compatible with newer versions (for the foreseeable future).
The code and data are provided as-is, with no warranty, express or implied, no guarantee of correctness, no guarantees of technical support or maintenance, etc. The code is released under GPLv2 and the data under the CC-BY-NC license attached to the paper. If you experience problems while using the code or data, please let me know via email. I am happy to host (or link to) implementations of the mvrsample program in other programming languages, but cannot provide any technical support for such code. If you are interested in commercialization of these ideas, code, or data please in contact me.
Finally, if you use my code in an academic publication, it would be courteous of you to thank me in your acknowledgements for providing you with implementations of the methods.

Updates
15 October 2017: added the C version of mvrsample, by Huanshen Wei
12 February 2015: initial page created and posted; including version 1.0 of data, version 1.0 of mvrsample, and links to interactive visualizations.