Network Analysis and Modeling
CSCI 5352, Fall 2022
Time: Tuesday and Thursday, 11:00am - 12:15pm
Room: ECCS 1B14
Instructor: Aaron Clauset
Office: ECES 118B
Office hours: Thursday, 1:30-2:30pm
Email: zzilm.xozfhvg@xlolizwl.vwf (an Atbash cipher)
Syllabus
Description
Course structure
Schedule and lecture notes
Problem sets
Supplemental readings
Description
Network science is a thriving and increasingly important cross-disciplinary
domain that focuses on the representation, analysis and modeling of complex
social, biological and technological systems as networks or graphs. Modern
data sets often include some kind of network. Nodes can have locations,
directions, memory, demographic characteristics, content, and preferences.
Edges can have lengths, directions, capacities, costs, durations, and types.
And, these variables and the network structure itself can vary, with edges and
nodes appearing, disappearing and changing their characteristics over time.
Capturing, modeling and understanding networks and rich data requires
understanding both the mathematics of networks and the computational tools for
identifying and explaining the patterns they contain.
This graduate-level course will examine modern techniques for analyzing and
modeling the structure and dynamics of complex networks. The focus will be on
statistical algorithms and methods, and both lectures and assignments will
emphasize model interpretability and understanding the processes that
generate real data. Applications will be drawn from computational biology and
computational social science. No biological or social science training is
required. (Note: this is not a scientific computing course, but there will be
plenty of computing for science.)
Prerequisites
Recommended: CSCI 3104 (undergraduate algorithms) and APPM 3570 (applied
probability), or equivalent preparation.
Note: An adequate mathematical and programming background is mandatory. The
concepts and techniques covered in this course depend heavily on basic
statistics (distributions, Monte Carlo techniques), scientific programming,
and calculus (integration and differentiation). Students without sufficient
preparation will struggle to keep up with the lectures and assignments.
Students without proper preparation may audit the course.
Text
Required (available at the CU Bookstore):
Networks: An Introduction by M.E.J. Newman
Supplementary (optional):
1. All of Statistics by L. Wasserman
2. Numerical Recipes
3. Networks, Crowds and Markets by D. Easley and J. Kleinberg
4. Error and the Growth of Experimental Knowledge by D.G. Mayo
5. Pattern Recognition and Machine Learning by C.M. Bishop
Learning Objectives:
- develop network intuition, and understand how to reason about network phenomena
- understand network representations
- learn principles and methods for describing and clustering network data
- learn to predict missing network information
- understand how to conduct and interpret numerical network experiments
- analyze and model real-world network data
Overview:
- lectures 2 times a week, some guest lectures and some class discussions
- problem sets (6 total) due every 2 weeks throughout the semester
- a class project, presentation, and final report
- this will be a challenging and fun course; plan accordingly
See the syllabus for more detail on the structure and requirements of the class.
Tentative Schedule
Week 1 : Fundamentals of networks (Lecture 0 and Lecture 1)
Week 2 : Network representations and summary statistics (Lecture 2)
Week 3 : Random graphs with homogeneous degrees (Lecture 3)
Week 4 : Random graphs with heterogeneous degrees (Lecture 4)
Week 5 : Network prediction: node attributes (Lecture 5)
Week 6 : Network prediction: missing links (Lecture 6)
Week 7 : Community structure and mixing patterns (Lecture 7)
Week 8 : Community structure models (Lecture 8)
Week 9 : Spreading processes and cascades Lecture 9
Week 10 : Spreading processes with structure (Lecture 10)
Week 11 : Sampling network data (Lecture 11a) and
Modeling network growth (Lecture 11b)
Week 12 : Ranking in networks (Lecture 12)
Week 13 : Ethics and Networks (Lecture 13)
Week 14 : Fall break
Weeks 15-16 : Project presentations and Wrap up
Week 1:
- C.T. Butts, "Revisiting the foundations of network analysis." Science 325, 414-416 (2009).
- G. Shmueli, "To explain or to predict?" Statistical Science 25, 289-310 (2010).
Week 2:
- M.E.J. Newman, "The structure and function of complex networks." SIAM Review 45, 167-256 (2003).
- A. Clauset, C.R. Shalizi and M.E.J. Newman, "Power-law distributions in empirical data." SIAM Review 51(4), 661-703 (2009).
Weeks 3-4:
- E. A. Hobson et al., "A guide to choosing and implementing reference models for social network analysis." Biological Reviews (2021).
- D. R. Farine, "A guide to null models for animal social network analysis." Methods in Ecology and Evolution 8, 1309–1320 (2017).
- B.K. Fosdick et al., "Configuring random graph models with fixed degree Sequences." SIAM Review 60, 315–355 (2018).
- D.S. Callaway et al., "Are randomly grown graphs really random?" Physical Review E 64, 041902 (2001).
- B. Karrer and M.E.J. Newman, "Random graphs containing arbitrary distributions of subgraphs." Physical Review E 82, 066118 (2010).
- K. Van Koevering, A R. Benson and J. Kleinberg, "Random graphs with prescribed k-core sequences: a new null model for network analysis." Proc. Web Conference 2021, 367–378 (2021).
Week 5:
- M.E.J. Newman, "Mixing patterns in networks." Physical Review E 67, 026126 (2003).
- A.J. Kavran and A. Clauset, "Denoising large-scale biological data using network filters." BMC Bioinformatics 22, 157 (2021).
- J. Jia and A.R. Benson, "A unifying generative model for graph learning algorithms: label propagation, graph convolutions, and combinations." SIAM Journal on Mathematics of Data Science 4(1) 100-125 (2022).
- Z. Huang, A. Silva, and A. Singh, "A broader picture of random-walk based graph embedding." KDD, 685-695 (2021)
- J. Moore and J. Neville, "Deep collective inference." AAAI 2364-2372 (2017).
Week 6:
- D. Liben-Nowell and J. Kleinberg, "The link prediction problem for social networks." J. Amer. Soc. Info. Sci. and Tech. 58(7), 1019-1031 (2007).
- A. Clauset, C. Moore, and M.E.J. Newman, "Hierarchical structure and the prediction of missing links in networks." Nature 453, 98 - 101 (2008).
- A. Ghasemian et al., "Stacking models for nearly optimal link prediction in complex networks." Proc. Natl. Acad. Sci. USA 117, 23393-23400 (2020).
- T. Zhou, "Progresses and challenges in link prediction." Preprint, arxiv:2102.11472 (2021).
- F.P. Santos, Y. Lelkes, and S.A. Levin, "Link recommendation algorithms and dynamics of polarization in online social networks." Proc. Natl. Acad. Sci. USA 118, e2102141118 (2021).
Week 7:
- M. McPherson, L. Smith-Lovin and J.M. Cook, "Birds of a Feather: Homophily in Social Networks." Annual Reviews of Sociology 27, 415-444 (2001).
- L. Peel, J.-C. Delvenne, and R. Lambiotte, "Multiscale mixing patterns in networks." Proc. Natl. Acad. Sci. USA, 115(16) 4057-4062 (2018).
- A. Clauset, M.E.J. Newman and C. Moore, "Finding community structure in very large networks." Physical Review E 70, 066111 (2004).
- B.H. Good, Y.-A. de Montjoye and A. Clauset, "Performance of modularity maximization in practical contexts." Physical Review E 81, 046106 (2010).
Week 8:
- L. Peel, D.B. Larremore and A. Clauset, "The ground truth about metadata and community detection in networks." Science Advances 3(5), e1602548 (2017).
- A. Decelle, et al., "Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications." Physical Review E 84, 066106 (2011).
- B. Karrer and M.E.J. Newman, "Stochastic blockmodels and community structure in networks." Physical Review E 83, 016107 (2011).
- E.M. Airoldi, et al., "Mixed membership stochastic blockmodels." J. Machine Learning Research 9, 1981-2014 (2008).
- T.P. Peixoto, "Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models." Physical Review E 89, 012804 (2014)
- M.E.J. Newman and A. Clauset, "Structure and inference in annotated networks." Nature Communications 7, 11863 (2016).
- T.P. Peixoto, "Nonparametric weighted stochastic block models." Physical Review E 97, 012306 (2018).
Week 9:
- R. Pastor-Satorras et al., "Epidemic processes in complex networks." Reviews of Modern Physics 87, 925 (2015).
- P.S. Dodds, Slightly generalized Generalized Contagion: Unifying simple models of biological and social spreading. Preprint, arXiv:1708.09697 (2017).
- D.J. Watts and P.S. Dodds, "Influentials, Networks, and Public Opinion Formation." J. Consumer Research 34, 441-458 (2007).
- S. Goel et al., "The Structural Virality of Online Diffusion." Management Science 62(1), 180-196 (2015).
- J.L. Juul and J. Ugander, "Comparing information diffusion mechanisms by matching on cascade size." Proc. Natl. Acad. Sci. USA 118, e2100786118 (2021).
- A.C. Morgan et al., "Prestige drives epistemic inequality in the diffusion of scientific ideas." EPJ Data Science 7, 40 (2018).
Week 10:
- E. Yong, The Deceptively Simple Number Sparking Coronavirus Fears, The Atlantic, 28 January 2020.
- J. Bedson et al., A review and agenda for integrated disease models including social and behavioural factors. Nature Human Behavior 5, 834–846 (2021).
- L.A. Meyers et al., Network theory and SARS: predicting outbreak diversity. J. Theoretical Biology 232(1), 71-81 (2005).
- S.M. Kissler et al., Projecting the transmission dynamics of SARS-CoV-2 through the post-pandemic period. Preprint, medRxiv DOI:10.1101/2020.03.04.20031112 (2020).
- D.H. Morris et al., Optimal, near-optimal, and robust epidemic control. Communications Physics 4, 78 (2021).
- D.J. Watts et al., Multiscale, resurgent epidemics in a hierarchical metapopulation model. Proc. Natl. Acad. Sci. USA 102(32) 11157-11162 (2005).
- D. Brockmann and D. Helbing, The hidden geometry of complex, network-driven contagion phenomena. Science 342, 1337-1342 (2013).
Week 11:
- P. Orbanz, "Subsampling large graphs and invariance in networks." Preprint, arxiv:1710.04217 (2017).
- A.C. Thomas and J.K. Blitztein, "Valued Ties Tell Fewer Lies: Why not to dichotomize network edges with thresholds." Preprint, arXiv:1101.0788 (2011).
- N.K. Ahmed et al., "Graph Sample and Hold: A Framework for Big-Graph Analytics." Proc. KDD (2014).
- B. Ribeiro and D. Towsley, "On the Estimation Accuracy of Degree Distributions from Graph Sampling." IEEE Conference on Decision and Control (2012).
- J. Blumenstock et al., "Predicting poverty and wealth from mobile phone metadata." Science 350(6264), 1073-1076 (2015).
Week 12:
- S. Borgati, "Centrality and network flow." Social Networks 110, 5802-5805 (2013).
- B. Ball and M.E.J. Newman, "Friendship networks and social status." Network Science 1, 16-30 (2013).
- A. Clauset, S. Arbesman and D.B. Larremore, "Systematic inequality and hierarchy in faculty hiring networks." Science Advances 1(1), e1400005 (2015).
- C. De Bacco, D.B. Larremore and C. Moore, "A physical model for efficient ranking in networks." Science Advances 4(7), eaar8260 (2018).
- K. Tsukida and M.R. Gupta, "How to analyze paired comparison data." UWEE Technical Report, Number UWEETR-2011-0004 (2011).
- S. Ragain and J. Ugander, "Pairwise choice Markov chains." Advances in Neural Information Processing Systems (NeurIPS) 29 (2016)
Week 13:
- M. Kosinski, D. Stillwell, and T. Graepel, "Private traits and attributes are predictable from digital records of human behavior." Proc. Natl. Acad. Sci. USA 27, 55-71 (2005).
- A. Birhane et al. "The Values Encoded in Machine Learning Research." Preprint, arXiv:2106.15590 (2021).
- E. Zheleva and L. Getoor, "To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles." Proc. WWW, 531–540 (2009).
- P. Tubaro et al., "Social network analysis: New ethical approaches through collective reflexivity." Social Networks 67, 1-8 (2021).
- A.E. Marwick and d. boyd, "Networked privacy: How teenagers negotiate context in social media." new media & society 16(7), 1051-1067 (2014).
Network Tools
NetworkX, network analysis package (Python)
igraph, network analysis tools (Python, C++, R)
graph-tool, network analysis and visualization software (Python, C++)
graspologic, latent community and spatial network analysis (Python)
NodeXL, network analysis and visualization software
Network Visualization
Cytoscape, network visualization software
CosmoGraph.app, in-browser, larger network visualization
yEd Graph Editor, network visualization software
Graphviz, network visualization software
Gephi, network visualization software
graph-tool, network analysis and visualization software
webweb, network visualization tool joining Matlab and d3
MuxViz, multilayer analysis and visualization platform
Network Data Sets
The Colorado Index of Complex Networks (ICON; more than 698 network data sets)
Netzschleuder (network catalogue, repository and centrifuge; more than 278 network data sets)
US Census Education-Employment network (social, bipartite, weighted)
Other Courses on Networks
Network Theory (University of Michigan)
Statistical Network Analysis (Purdue University)
Networks (Cornell University)
Networks (Harvard University)
Social and Economic Networks: Models and Analysis (Coursera / Stanford)
Social Network Analysis (Coursera / University of Michigan)
Social and Information Network Analysis (Stanford)
Graphs and Networks (Yale)
Spectral Graph Theory (Yale)
The Structure of Social Data (Stanford)
Resources
LaTeX (general) and TeXShop (Mac)
Matlab license for CU staff (includes student employees)
Mathematica license for CU students
NumPy/SciPy libraries for Python (similar to Matlab)
GNU Octave (similar to Matlab)
Wolfram Alpha (Web interface for simple integration and differentiation)
Introduction to the Modeling and Analysis of Complex Systems, by Hiroki Sayama (free online textbook)
Things Worth Reading
Everything you wanted to know about Data Analysis and Fitting but were afraid to ask, by Peter Young
Machine Learning, Statistical Inference and Induction Notebook (by Cosma Shalizi)
Power Law distributions, etc. Notebook (by Cosma Shalizi)
Statistics Done Wrong, The woefully complete guide (by Alex Reinhart)
Some Advice on Process for
[Research Projects]