Estimating the probability of rare events
This page is a companion for the
article on estimating the historical and future probability of rare events,
written by
Aaron Clauset (me), and
Ryan Woodard.
This page hosts implementations of the methods we describe in the article.
For now, these are simply the versions we wrote (in Matlab and Python), but
in the future includes those by others.
NOTE: that we cannot provide support for any code not written by ourselves.
Journal Reference
A. Clauset and R. Woodard,
"Estimating the historical and future probabilities of large terrorist events." Annals of Applied Statistics 7(4), 1838-1865 (2013).
(Subject of a special session at ASA Joint Statistical Meetings, Montreal Canada, 5 August 2013)
Dependencies
The Matlab implementation requires access to the plfit.m procedure and an
implementation of the zeta function, both of which can be found
here.
Python code requires numpy and
mpmath (for the zeta function).
Both are available in standard packaging (ubuntu, for example).
Scipy is needed for some plotting
niceties, but not for the main tasks.
Power law outlier detection (single variable)
This function implements both the discrete and continuous power-law
outlier detection algorithms described in the paper. Usage information is
included in the file; type 'help plout' at the Matlab prompt for more
information.
plout.m (Matlab)
plout.py (Python)
Power law outlier detection (with discrete covariates)
This function implements the same method as plout but for integer-valued
event covariates. The function applies the method to each of the marginal
distributions and combines the results across them. Type 'help ploutm' at the
Matlab prompt for more information.
ploutm.m (Matlab)
ploutm.py (Python)
Visualizing the ensemble of fitted distributions
This function takes the output of plout (or ploutm) and plots a portion of the
ensemble of fitted models against the empirical data on log-log axes. Usage
information is included in the file; type 'help pleplot' at the Matlab prompt
for more information.
pleplot.m (Matlab)
Matlab compatibility issues
All of the Matlab functions here were designed to be compatible with Matlab
v7. They are not necessarily compatible with older versions of Matlab.
That being said, it should be possible to make them compatible as the core
functionality does not depend on v7 features.
A note about bugs and alternative implementations
The code provided here is provided as-is, with no warranty, with no guarantees
of technical support or maintenance, etc. If you experience problems while
using the code, please let us know via email. We are also happy to host (or
link to) implementations of any of these functions in other programming
languages. If you have questions about any of the implementations, please
contact the respective function's author; the Matlab functions were written
by Aaron Clauset and the Python functions were written by Ryan Woodard.
Finally, if you use our code in an academic publication, it would be courteous
of you to thank us in your acknowledgements for providing you with
implementations of the methods.
Data for replication purposes
To facilitate the replication of our results, we are providing access to the
empirical data used to make our calculations. (Due to licensing restrictions,
we cannot provide access to the original databases.) If you use these data
in a publication, please cite the original source. The accompanying README
file explains the file format and provides additional information.
National Memorial Institute for the Prevention of Terrorism, (2008) "Terrorism Knowledge Base." http://www.tkb.org (accessed 29 January 2008).
Download the data
Updates
9 March 2012: version 1.0.1 of plout and ploutm (Matlab) functions
posted; these versions better document the bootstrap features and describe
how to extract confidence intervals from the results.
3 January 2012: initial version of Matlab functions posted.