From social networks to neurosciences, researchers across all disciplines are now faced with the challenge of adapting statistical methods to vast, high-dimensional arrays of data: how can we extract patterns and make sense of such large arrays of numbers? In this setting, graphs have become ubiquitous by offering a versatile modeling framework in which data points are represented as nodes, while edges capture various aspects of the underlying organization of the data. These edges typically denote some flexible notion of proximity, ranging from affinities between users and products in recommendation networks, to causal links in directed acyclic graphs or neuronal coactivation patterns in brain datasets. Whereas most of the current literature has focused on using graphs for learning nodes’ properties at an atomic level (community detection, link prediction, etc.), in a variety of applications, the object of interest is the graph itself. In brain connectomics for instance — one of the areas of application at the center of this thesis —, the focus is on understanding the functional and anatomical “wiring” of the brain and its association with cognitive processes and psychiatric diseases. This process requires the extension of traditional statistical notions (mean, variance, etc.) to the graph setting, which currently appear as the missing, albeit crucial building blocks for principled inference on complex systems.
This PhD thesis focuses on providing some methodological tools for extending statistical inference and uncertainty quantification to graph-structured data — whether these graphs are observed or latent. In particular, we motivate this problem by its application to the analysis of fMRI data. Chapter 1 describes some of the properties of this extremely rich data source, as well as the many interesting challenges (low signal-to-noise ratio, scalability, multi-resolution behavior, non-stationarity, etc.) posed by its analysis and its representation as a graph. Building upon this overview of the multiple facets of the challenges which arise when working with real-life graph-structured data, we organize this thesis around the three following themes:
Inference and variability quantification on observed graphs. Starting with the setting where the graphs are observed and aligned, the first question that we try to tackle consists in quantifying their similarity. We divide this analysis task along two axes of variation:
(a) Horizontal: comparing multiple graphs. The first main block of our work (Chapter 2) concentrates around the definition of an appropriate distance for contrasting and comparing aligned networks.
(b) Vertical: comparing multiscale representations of graphs. In some instances, the comparison of coarsened representations of graphs can be more informative than the comparison of the original ones. We thus turn in Chapter 3 to the extraction of robust multiscale representations of graphs, which we address here by an adaptation of convex clustering.
- Inference for latent graphs: The second main part of our work tackles the case where the graphs are unobserved, and need to be simultaneously inferred and contrasted. In particular, Chapter 4 is centered around the extraction of reliable brain connectome networks through the lens of Bayesian Independent Component Analysis — an approach which allows the flexible integration of multiple sources of information while providing Bayesian uncertainty estimates.
- Inference with graphs: Finally, Chapter 5 opens our discussion to the analysis of data and signals on graphs –rather than the graphs themselves. Indeed, in a number of settings, the underlying organization of a complex system as a graph is crucial in understanding its behavior. In epidemiological studies for instance, social networks have been shown to influence the outcome of an epidemic, its propagation speed, or the variability in the transmission rate. In this context, it becomes essential to try to impute and integrate characteristics of the network structure in the analysis. We focus here on accounting for the potential heterogeneity of the contact network on predictive scenarios for epidemics. In particular, rather than giving a holistic approach, we strive to tackle and motivate each of these aspects through the lens of their application on a real, concrete dataset. While most of our motivation is provided by fMRI analysis, we emphasize that the use of graphs transcends the neuroscience realm and can be extended to many biomedical applications — which we exemplify in Chapter 2 by considering microbial networks, and in Chapter 5, by highlighting the usefulness of graphs as modeling tool in the study of contagion and infectious diseases.
Recommended citation: Donnat, Claire. “Uncertainty Quantification in Networks with Applications to Brain Connectomics..” PhD diss., Stanford University, 2020.