Faculty Hiring Networks
This page is a companion for the
Science Advances article on faculty hiring networks, written by
Aaron Clauset (me),
Samuel Arbesman, and
Daniel B. Larremore.
This page hosts an implementation of the network ranking methods, the complete
faculty hiring data sets analyzed in the paper, and links to interactive
visualizations of that data.
Journal reference
A. Clauset, S. Arbesman, and D. B. Larremore,
"Systematic inequality and hierarchy in faculty hiring networks."
Science Advances 1(1), e1400005 (2015).
Supplementary Materials file (PDF)
Perspective piece in Slate, with Joel Warner
Network ranking code
This function takes as input an adjacency matrix representing a directed
network and runs a zero-temperature Markov Chain Monte Carlo algorithm to
sample all minimum violation rankings (MVRs) of the input network. Usage
information is included in the file; type 'help mvrsample' at the Matlab
prompt for more information.
NOTE: we cannot provide technical support for code not written by us, and we are busy with other projects now and so may not provide support for our own code.
mvrsample.m (Matlab, by Aaron Clauset)
mvrsample.m (C, by Huanshen Wei)
Faculty hiring data
Data on 18,924 tenure or tenure-track faculty was collected between May 2011
and August 2013 for the disciplines of Computer Science, Business, and
History. All data was collected and validated by hand from primary sources
(mainly departmental webpages and faculty homepages). The following table
gives a brief sketch of the data included in the study, which represents the
16,316 faculty at the within-sample institutions who received their doctorates
from another within-sample institution. (About 14% of faculty received their
doctorates from outside our sample, mainly from foreign universities.) These
data represent the largest and most comprehensive survey of faculty placements
collected at the time of publication in 2015.
computer science | business | history | |
---|---|---|---|
institutions | 205 | 112 | 144 |
regular faculty | 4388 | 7856 | 4072 |
mean size | 21.4 | 70.1 | 28.6 |
collection period | 5/2011-3/2012 | 3/2012-12/2012 | 1/2013-8/2013 |
These data are provided separately for each discipline. One file contains the directed edges (u,v), each of which denotes a person who received their doctorate from institution u and was faculty at institution v during the collection period. Each edge is annotated with the faculty rank of that individual at v and that person's gender. A second file contains vertex attribute information, including the prestige score assigned by the MVR sampling algorithm, its USNews rank and the National Research Council rank (if available) most recent to the collection period, its geographic region (generally US Census), and the institution's name. Missing values are denoted by a period '.'. All files are plain text. No personally identifying information is included in the files (but these data are by no means anonymous). Additional details of the data can be found in the paper's Supplementary Materials.
Computer Science faculty hiring network and departmental attribute files:
ComputerScience_edgelist.txt (68 KB)
ComputerScience_vertexlist.txt (11 KB)
Business School faculty hiring network and departmental attribute files:
Business_edgelist.txt (119 KB)
Business_vertexlist.txt (5 KB)
History faculty hiring network and departmental attribute files:
History_edgelist.txt (60 KB)
History_vertexlist.txt (7 KB)
All the data:
replicationData_all.zip (51 KB)
Interactive visualizations
Dan Larremore has put together an
interactive visualization of these data.
Within each discipline, you can explore different aspects of the network,
examine individual institutions, and inspect the flows of faculty between
specific pairs of institutions within the prestige hierarchy.
A note on the software and data
The Matlab code for mvrsample.m was designed to be compatible with Matlab
v7.13 (2011). It is not necessarily compatible with older versions of Matlab;
it should be compatible with newer versions (for the foreseeable future).
The code and data are provided as-is, with no warranty, express or implied,
no guarantee of correctness, no guarantees of technical support or
maintenance, etc. The code is released under GPLv2 and the data under the
CC-BY-NC license
attached to the paper. If you experience problems while using the code or
data, please let me know via email. I am happy to host (or link to)
implementations of the mvrsample program in other programming languages, but
cannot provide any technical support for such code. If you are interested
in commercialization of these ideas, code, or data please in contact me.
Finally, if you use my code in an academic publication, it would be
courteous of you to thank me in your acknowledgements for providing you with
implementations of the methods.
Updates
15 October 2017: added the C version of mvrsample, by Huanshen Wei
12 February 2015: initial page created and posted; including version 1.0 of data, version 1.0 of mvrsample, and links to interactive visualizations.