**Network Analysis and ModelingCSCI 5352, Fall 2022**

Time: Tuesday and Thursday, 11:00am - 12:15pm

Room: ECCS 1B14

Instructor: Aaron Clauset

Office: ECES 118B

Office hours: Thursday, 1:30-2:30pm

Email: zzilm.xozfhvg@xlolizwl.vwf (an Atbash cipher)

Syllabus

**Description**

**Course structure**

**Schedule and lecture notes**

**Problem sets**

**Supplemental readings**

**Description**

Network science is a thriving and increasingly important cross-disciplinary
domain that focuses on the representation, analysis and modeling of complex
social, biological and technological systems as networks or graphs. Modern
data sets often include some kind of network. Nodes can have locations,
directions, memory, demographic characteristics, content, and preferences.
Edges can have lengths, directions, capacities, costs, durations, and types.
And, these variables and the network structure itself can vary, with edges and
nodes appearing, disappearing and changing their characteristics over time.
Capturing, modeling and understanding networks and rich data requires
understanding both the mathematics of networks and the computational tools for
identifying and explaining the patterns they contain.

This graduate-level course will examine modern techniques for analyzing and
modeling the structure and dynamics of complex networks. The focus will be on
statistical algorithms and methods, and both lectures and assignments will
emphasize model interpretability and understanding the processes that
generate real data. Applications will be drawn from computational biology and
computational social science. No biological or social science training is
required. (Note: this is not a scientific computing course, but there will be
plenty of computing for science.)

**Prerequisites**

Recommended: CSCI 3104 (undergraduate algorithms) and APPM 3570 (applied
probability), or equivalent preparation.

Note: An adequate mathematical and programming background is mandatory. The
concepts and techniques covered in this course depend heavily on basic
statistics (distributions, Monte Carlo techniques), scientific programming,
and calculus (integration and differentiation). Students without sufficient
preparation will struggle to keep up with the lectures and assignments.
Students without proper preparation may audit the course.

**Text**

Required (available at the CU Bookstore):

Networks: An Introduction by M.E.J. Newman

Supplementary (optional):

1. All of Statistics by L. Wasserman

2. Numerical Recipes

3. Networks, Crowds and Markets by D. Easley and J. Kleinberg

4. Error and the Growth of Experimental Knowledge by D.G. Mayo

5. Pattern Recognition and Machine Learning by C.M. Bishop

Learning Objectives:

- develop network intuition, and understand how to reason about network phenomena
- understand network representations
- learn principles and methods for describing and clustering network data
- learn to predict missing network information
- understand how to conduct and interpret numerical network experiments
- analyze and model real-world network data

Overview:

- lectures 2 times a week, some guest lectures and some class discussions
- problem sets (6 total) due every 2 weeks throughout the semester
- a class project, presentation, and final report
- this will be a challenging and fun course; plan accordingly

See the syllabus for more detail on the structure and requirements of the class.

**Tentative Schedule**

Week 1 : Fundamentals of networks (Lecture 0 and Lecture 1)

Week 2 : Network representations and summary statistics (Lecture 2)

Week 3 : Random graphs with homogeneous degrees (Lecture 3)

Week 4 : Random graphs with heterogeneous degrees (Lecture 4)

Week 5 : Network prediction: node attributes (Lecture 5)

Week 6 : Network prediction: missing links (Lecture 6)

Week 7 : Community structure and mixing patterns (Lecture 7)

Week 8 : Community structure models (Lecture 8)

Week 9 : Spreading processes and cascades Lecture 9

Week 10 : Spreading processes with structure (Lecture 10)

Week 11 : Sampling network data (Lecture 11a) and

Modeling network growth (Lecture 11b)

Week 12 : Ranking in networks (Lecture 12)

Week 13 : Ethics and Networks (Lecture 13)

Week 14 : Fall break

Weeks 15-16 : Project presentations and Wrap up

Week 1:

- C.T. Butts, "Revisiting the foundations of network analysis."
*Science***325**, 414-416 (2009). - G. Shmueli, "To explain or to predict?"
*Statistical Science***25**, 289-310 (2010).

Week 2:

- M.E.J. Newman, "The structure and function of complex networks."
*SIAM Review***45**, 167-256 (2003). - A. Clauset, C.R. Shalizi and M.E.J. Newman, "Power-law distributions in empirical data."
*SIAM Review***51**(4), 661-703 (2009).

Weeks 3-4:

- E. A. Hobson et al., "A guide to choosing and implementing reference models for social network analysis."
*Biological Reviews*(2021). - D. R. Farine, "A guide to null models for animal social network analysis."
*Methods in Ecology and Evolution***8**, 1309–1320 (2017). - B.K. Fosdick et al., "Configuring random graph models with fixed degree Sequences."
*SIAM Review***60**, 315–355 (2018). - D.S. Callaway et al., "Are randomly grown graphs really random?"
*Physical Review E***64**, 041902 (2001). - B. Karrer and M.E.J. Newman, "Random graphs containing arbitrary distributions of subgraphs."
*Physical Review E***82**, 066118 (2010). - K. Van Koevering, A R. Benson and J. Kleinberg, "Random graphs with prescribed k-core sequences: a new null model for network analysis."
*Proc. Web Conference 2021*, 367–378 (2021).

Week 5:

- M.E.J. Newman, "Mixing patterns in networks."
*Physical Review E***67**, 026126 (2003). - A.J. Kavran and A. Clauset, "Denoising large-scale biological data using network filters."
*BMC Bioinformatics***22**, 157 (2021). - J. Jia and A.R. Benson, "A unifying generative model for graph learning algorithms: label propagation, graph convolutions, and combinations."
*SIAM Journal on Mathematics of Data Science***4**(1) 100-125 (2022). - Z. Huang, A. Silva, and A. Singh, "A broader picture of random-walk based graph embedding."
*KDD*, 685-695 (2021) - J. Moore and J. Neville, "Deep collective inference."
*AAAI*2364-2372 (2017).

Week 6:

- D. Liben-Nowell and J. Kleinberg, "The link prediction problem for social networks."
*J. Amer. Soc. Info. Sci. and Tech.***58**(7), 1019-1031 (2007). - A. Clauset, C. Moore, and M.E.J. Newman, "Hierarchical structure and the prediction of missing links in networks."
*Nature***453**, 98 - 101 (2008). - A. Ghasemian et al., "Stacking models for nearly optimal link prediction in complex networks."
*Proc. Natl. Acad. Sci. USA***117**, 23393-23400 (2020). - T. Zhou, "Progresses and challenges in link prediction." Preprint, arxiv:2102.11472 (2021).
- F.P. Santos, Y. Lelkes, and S.A. Levin, "Link recommendation algorithms and dynamics of polarization in online social networks."
*Proc. Natl. Acad. Sci. USA***118**, e2102141118 (2021).

Week 7:

- M. McPherson, L. Smith-Lovin and J.M. Cook, "Birds of a Feather: Homophily in Social Networks."
*Annual Reviews of Sociology***27**, 415-444 (2001). - L. Peel, J.-C. Delvenne, and R. Lambiotte, "Multiscale mixing patterns in networks."
*Proc. Natl. Acad. Sci. USA*,**115**(16) 4057-4062 (2018). - A. Clauset, M.E.J. Newman and C. Moore, "Finding community structure in very large networks."
*Physical Review E***70**, 066111 (2004). - B.H. Good, Y.-A. de Montjoye and A. Clauset, "Performance of modularity maximization in practical contexts."
*Physical Review E***81**, 046106 (2010).

Week 8:

- L. Peel, D.B. Larremore and A. Clauset, "The ground truth about metadata and community detection in networks."
*Science Advances***3**(5), e1602548 (2017). - A. Decelle, et al., "Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications."
*Physical Review E***84**, 066106 (2011). - B. Karrer and M.E.J. Newman, "Stochastic blockmodels and community structure in networks."
*Physical Review E***83**, 016107 (2011). - E.M. Airoldi, et al., "Mixed membership stochastic blockmodels."
*J. Machine Learning Research***9**, 1981-2014 (2008). - T.P. Peixoto, "Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models."
*Physical Review E***89**, 012804 (2014) - M.E.J. Newman and A. Clauset, "Structure and inference in annotated networks."
*Nature Communications***7**, 11863 (2016). - T.P. Peixoto, "Nonparametric weighted stochastic block models."
*Physical Review E***97**, 012306 (2018).

Week 9:

- R. Pastor-Satorras et al., "Epidemic processes in complex networks."
*Reviews of Modern Physics***87**, 925 (2015). - P.S. Dodds, Slightly generalized Generalized Contagion: Unifying simple models of biological and social spreading. Preprint, arXiv:1708.09697 (2017).
- D.J. Watts and P.S. Dodds, "Influentials, Networks, and Public Opinion Formation."
*J. Consumer Research***34**, 441-458 (2007). - S. Goel et al., "The Structural Virality of Online Diffusion."
*Management Science***62**(1), 180-196 (2015). - J.L. Juul and J. Ugander, "Comparing information diffusion mechanisms by matching on cascade size."
*Proc. Natl. Acad. Sci. USA***118**, e2100786118 (2021). - A.C. Morgan et al., "Prestige drives epistemic inequality in the diffusion of scientific ideas."
*EPJ Data Science***7**, 40 (2018).

Week 10:

- E. Yong, The Deceptively Simple Number Sparking Coronavirus Fears,
*The Atlantic*, 28 January 2020. - J. Bedson et al., A review and agenda for integrated disease models including social and behavioural factors.
*Nature Human Behavior***5**, 834–846 (2021). - L.A. Meyers et al., Network theory and SARS: predicting outbreak diversity.
*J. Theoretical Biology***232**(1), 71-81 (2005). - S.M. Kissler et al., Projecting the transmission dynamics of SARS-CoV-2 through the post-pandemic period. Preprint, medRxiv DOI:10.1101/2020.03.04.20031112 (2020).
- D.H. Morris et al., Optimal, near-optimal, and robust epidemic control.
*Communications Physics***4**, 78 (2021). - D.J. Watts et al., Multiscale, resurgent epidemics in a hierarchical metapopulation model.
*Proc. Natl. Acad. Sci. USA***102**(32) 11157-11162 (2005). - D. Brockmann and D. Helbing, The hidden geometry of complex, network-driven contagion phenomena.
*Science***342**, 1337-1342 (2013).

Week 11:

- P. Orbanz, "Subsampling large graphs and invariance in networks." Preprint, arxiv:1710.04217 (2017).
- A.C. Thomas and J.K. Blitztein, "Valued Ties Tell Fewer Lies: Why not to dichotomize network edges with thresholds." Preprint, arXiv:1101.0788 (2011).
- N.K. Ahmed et al., "Graph Sample and Hold: A Framework for Big-Graph Analytics." Proc. KDD (2014).
- B. Ribeiro and D. Towsley, "On the Estimation Accuracy of Degree Distributions from Graph Sampling."
*IEEE Conference on Decision and Control*(2012). - J. Blumenstock et al., "Predicting poverty and wealth from mobile phone metadata."
*Science***350**(6264), 1073-1076 (2015).

Week 12:

- S. Borgati, "Centrality and network flow."
*Social Networks***110**, 5802-5805 (2013). - B. Ball and M.E.J. Newman, "Friendship networks and social status."
*Network Science***1**, 16-30 (2013). - A. Clauset, S. Arbesman and D.B. Larremore, "Systematic inequality and hierarchy in faculty hiring networks."
*Science Advances***1**(1), e1400005 (2015). - C. De Bacco, D.B. Larremore and C. Moore, "A physical model for efficient ranking in networks."
*Science Advances***4**(7), eaar8260 (2018). - K. Tsukida and M.R. Gupta, "How to analyze paired comparison data." UWEE Technical Report, Number UWEETR-2011-0004 (2011).
- S. Ragain and J. Ugander, "Pairwise choice Markov chains."
*Advances in Neural Information Processing Systems*(NeurIPS) 29 (2016)

Week 13:

- M. Kosinski, D. Stillwell, and T. Graepel, "Private traits and attributes are predictable from digital records of human behavior."
*Proc. Natl. Acad. Sci. USA***27**, 55-71 (2005). - A. Birhane et al. "The Values Encoded in Machine Learning Research." Preprint, arXiv:2106.15590 (2021).
- E. Zheleva and L. Getoor, "To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles."
*Proc. WWW*, 531–540 (2009). - P. Tubaro et al., "Social network analysis: New ethical approaches through collective reflexivity."
*Social Networks***67**, 1-8 (2021). - A.E. Marwick and d. boyd, "Networked privacy: How teenagers negotiate context in social media."
*new media & society***16**(7), 1051-1067 (2014).

**Network Tools**

NetworkX, network analysis package (Python)

igraph, network analysis tools (Python, C++, R)

graph-tool, network analysis and visualization software (Python, C++)

graspologic, latent community and spatial network analysis (Python)

NodeXL, network analysis and visualization software

**Network Visualization**

Cytoscape, network visualization software

CosmoGraph.app, in-browser, larger network visualization

yEd Graph Editor, network visualization software

Graphviz, network visualization software

Gephi, network visualization software

graph-tool, network analysis and visualization software

webweb, network visualization tool joining Matlab and d3

MuxViz, multilayer analysis and visualization platform

**Network Data Sets**

The Colorado Index of Complex Networks (ICON; more than 698 network data sets)

Netzschleuder (network catalogue, repository and centrifuge; more than 278 network data sets)

US Census Education-Employment network (social, bipartite, weighted)

**Other Courses on Networks**

Network Theory (University of Michigan)

Statistical Network Analysis (Purdue University)

Networks (Cornell University)

Networks (Harvard University)

Social and Economic Networks: Models and Analysis (Coursera / Stanford)

Social Network Analysis (Coursera / University of Michigan)

Social and Information Network Analysis (Stanford)

Graphs and Networks (Yale)

Spectral Graph Theory (Yale)

The Structure of Social Data (Stanford)

**Resources**

LaTeX (general) and TeXShop (Mac)

Matlab license for CU staff (includes student employees)

Mathematica license for CU students

NumPy/SciPy libraries for Python (similar to Matlab)

GNU Octave (similar to Matlab)

Wolfram Alpha (Web interface for simple integration and differentiation)

Introduction to the Modeling and Analysis of Complex Systems, by Hiroki Sayama (free online textbook)

**Things Worth Reading**

Everything you wanted to know about Data Analysis and Fitting but were afraid to ask, by Peter Young

Machine Learning, Statistical Inference and Induction Notebook (by Cosma Shalizi)

Power Law distributions, etc. Notebook (by Cosma Shalizi)

Statistics Done Wrong, The woefully complete guide (by Alex Reinhart)

Some Advice on Process for
[Research Projects]