Random Graph Models

Intro

In the last post we learned about the MLE and applied it to a simple Gaussian model. Here we will introduce two fundamental random graph models and apply the MLE method to these models.

Erdős–Rényi Model

Random graph models define a distribution over how a collection of nodes and edges are arranged to construct a graph. One of the simplest random graph models is the Erdős–Rényi (ER) model. In this model, all configurations over a fixed set of vertices and edges are equally probable. There are two main ways that graph theorists and network scientists define the ER model: one involves a fixed number of nodes, $n$, and a fixed number of edges, $m$, denoted as $G\left(n, m\right)$. The other approach involves a fixed number of vertices, but instead of fixing the number of edges, we fix the probability $p$ of an edge, denoted as $G\left(n, p\right)$. For the purposes of doing statistical inference we will focus only on the latter, more information about these models can be found here.

The $G\left(n,p\right)$ model is perhaps the most well studied of random graph models. It was first studied by Solomonoff and Rapoport [4], but is most famously associated with the work of Paul Erdős and Alfréd Rényi, and is thus generally referred to as the Erdős–Rényi model [3]. As mentioned previously, this model is parameterized by the number of nodes $n$ and a fixed edge probability $p$, defining an ensemble of simple networks. The probability of a particular network with adjacency matrix $A$ is given by

$$ \begin{align} P\left(A| n, p\right) &= \prod\limits_{u<v} p^{a_{uv}}\left(1-p\right)^{1-a_{uv}} \tag{$1$}\\ &= p^{m}\left(1-p\right)^{{n \choose 2}-m}. \label{eq:er_ragr} \end{align} $$

The model does not fix the number of edges, just the probability, so networks drawn from $G(n,p)$ can differ in their number of edges. However, the expected number of edges is fixed and can be calculated. Equation (1) is a Bernoulli distribution with probability $p$, meaning that the expected value of an edge between a pair of nodes, $\langle a_{uv}\rangle$, is simply $p$ itself. So, $$ \begin{align} \langle m \rangle = \Biggl \langle \sum\limits_{u < v} a_{uv} \Biggr \rangle = \sum\limits_{u < v} \langle a_{uv} \rangle = {n \choose 2} p. \end{align} $$

To get a better feel for this model, lets code up a naïve random graph generator using python and graph-tool.

import graph_tool.all as gt
import numpy as np

def naive_erd_ren(n, p):

    erd_ren = gt.Graph(directed=False)
    erd_ren.add_vertex(n)

    for i in erd_ren.vertices():
        for j in erd_ren.vertices():
            if(i < j):
                if (np.random.rand() < p):
                    erd_ren.add_edge(i, j)

    return erd_ren

naive_erd_ren(n=6, p=0.6)

This implementation isn’t the fastest, but it’s quite illustrative of how graphs drawn from this model work. You can look at the code in this jupyter notebook if you are interested in other implementations.

Erdős–Rényi MLE

Now that we’ve gotten our hands dirty with the ER model, let’s do some MLE. Suppose you are analyzing a network with $n$ nodes and $m$ edges, and you’d like to know for what $p$ is the probability that it was generated from the ER maximized.

Using Bayes’ rule we can express the probability of $p$ conditional on $A$ as: $$ P\left(p|A\right)=\frac{P\left(A|p\right)P\left(p\right)}{P\left(A\right)} $$

We can calculate our best estimate for $p$ by maximizing the log-posterior: $$ \log P\left(p|A\right)=\log P\left(A|p\right) + \log P\left(p\right) - \log P\left(A\right) $$

As we argued in the previous post, $P\left(A\right)$ is a constant, as this is our observed data. Additionally, since we are assuming no prior knowledge on how the $p$’s are generated then $P\left(p\right)$ is also a constant (all $p$ are equally likely), so these quantities disappear in the optimization.

$$ \begin{align} \frac{d}{dp}\log P\left(p|A\right) &= \frac{d}{dp}\left(m\log p + \left[{n\choose 2}-m\right]\log\left(1-p\right)\right) = 0\\ &= \frac{m}{p} -\left[{n\choose 2}-m\right]\frac{1}{1-p} =0\\ &\Rightarrow \hat{p} = \boxed{\frac{m}{{n \choose 2}}} \end{align} $$

The quantity $\frac{m}{{n \choose 2}}$ is the fraction of node pairs connected by an edge, so it is unsurprising that this is what maximizes the posterior.

It’s important to consider that many real world networks do not fit the ER model particularly well. One reason for this is that in the large network limit, the degree distribution of an ER graph is Poisson. This can be shown with the following line of reasoning: for an arbitrary node $u$, the probability of connecting to any one of the other $n-1$ nodes is given by $p$. Thus, the probability that it is connected to one particular set of exactly $k$ nodes is given by $p^k\left(1-p\right)^{n - 1 - k}$. If we want to know the probability of connecting to any arbitrary set of $k$ nodes, we need to count the number of ways of choosing $k$ nodes from $n - 1$, which is given by ${n-1 \choose k}$. Thus, the total probability of having degree $k$ is given by $$ \begin{align} P\left(d_u = k\right|n, p) = {n-1 \choose k} p^k\left(1-p\right)^{n - 1 - k}, \end{align} $$ meaning that the ER model produces networks with binomially distributed degrees. In fact, one usually looks at networks where $n$ is assumed to be large. If we write $p = c/(n - 1)$, then a routine calculation shows that $$ \begin{align} \lim_{n \rightarrow \infty} P\left(d_u = k \right|n, p) = \frac{e^{-c}c^k}{k!}. \end{align} $$

If we assume that the large networks we measure empirically can be well modeled by an ER random graph, then we should expect that node degrees be Poisson distributed, sharply peaked around the mean $c=(n-1)p$. However, many empirical networks actually display heavy-tailed degree distributions and are therefore not adequately captured by a Poisson distribution. This implies that the ER model is not the most realistic model of real-world networks, limiting its utility as an analytical tool. This may seem obvious from the definition—one normally does not make friends by choosing the name of random person on earth out of hat—but now we have some theoretical justifications for why this may be. Still, it’s a useful model to experiment with due to its simplicity and serves as a good illustrative first step to understanding new techniques in network science.

Chung-Lu Model

The Chung-Lu model [2] is an ensemble of random graphs wherein a given expected degree sequence, $\{\theta_1, \theta_2,\dots, \theta_n\}$, corresponding to the expected degree of each vertex is specified, this model allows both self edges and multiedges. The probability of an edge is given by $$ p_{uv} = \begin{cases} \theta_u\theta_v, &u\neq v \\ \theta_u^{2}, &u=v. \end{cases} $$ This is in contrast to the ER model where the network necessarily has a Poisson degree distribution; in this model one specifies the expected degree of each node and therefore can have any arbitrary degree distribution.

Let’s get a feel for how this model works

def chung_lu(exp_degree_seq):

    two_m = exp_degree_seq.sum()
    assert two_m % 2 == 0, 'Expected degree sequence must sum to an even number'

    cl_g = gt.Graph(directed=False)
    cl_g.add_vertex(len(exp_degree_seq))

    for i in cl_g.get_vertices():
        for j in cl_g.get_vertices():
            if i < j:
                p = exp_degree_seq[i] * exp_degree_seq[j] / two_m
                if np.random.rand() < p:
                    cl_g.add_edge(i, j)
            elif i == j:
                p = exp_degree_seq[i] * exp_degree_seq[i] / (2 * two_m)
                if np.random.rand() < p:
                    cl_g.add_edge(i, j)

    return cl_g

The assert statement in the code is important because if the sum of the expected degree were odd, then there would be at least one node with an edge that leads to nowhere, sometimes called a stub.

mean_deg_seq = np.array([4,4,4,4,4,4])
chung_lu(mean_deg_seq)

mean_deg_seq = np.random.randint(0, 6, size=(10,))
mean_deg_seq

# degrees of nodes
# array([5, 0, 2, 2, 2, 5, 3, 3, 3, 5])

chung_lu(mean_deg_seq)

Notice that the sequence is simply the expected degrees of the nodes not the actual degrees (that would be true in a related but different model: the configuration model).

Chung-Lu Model MLE

Now that we know how this model works, let’s do some statistical inference. Consider a graph defined by its adjacency matrix $A$. The degree of each node in the graph is given by: $$ k_u = \sum\limits_{v = 1}^nA_{uv}, $$ and so, such a graph has degree sequence $\{k_1, k_2, \cdots, k_n\}$. For a node $u$ with degree $k_u$ the probability that it connects to a node $v$ with degree $k_v$ is $\left(k_u\cdot\frac{k_v}{2m-1}\right)$. To understand why, recall that the sum of the degrees is $2m$ (the number edge endpoints) where $m$ is the number of edges, and so the probability of connecting the two nodes is the number of possible ways of placing an edge between them divided by the number of endpoints minus the one in question. For a sufficiently large network, it’s safe to make the approximation $(2m-1) \rightarrow 2m$. For self-edges, a similar line of thinking leads to probability of $p_{uu} = \frac{k_u^2}{4m}$. So assuming Bernoulli trials for edge placement then, we can write the probability of $A$ conditional on the degree sequence $\{k_u\} = \{k_1, k_2,\dots, k_n\}$ as: $$ P\left(A|\{k_u\}\right) = \prod\limits_{u<v}\left(\frac{k_uk_v}{2m}\right)^{A_{uv}}\left(1-\frac{k_uk_v}{2m}\right)^{1-A_{uv}}\times \prod\limits_u \left(\frac{k_u^{2}}{4m}\right)^{A_{uu}/2}\left(1-\frac{k_u^2}{4m}\right)^{1-A_{uu}/2}. $$ This, in fact, is the configuration model [3], but if we relax the requirement of having to specify the degree of each vertex and instead specify a mean degree sequence $\{\theta_u\}=\{\theta_1, \theta_2,\dots \theta_n\}$ and take the large network limit, we can recover the Chung-Lu model and the expression can be rewritten as: $$ P\left(A\right|\{\theta_u\}) = \prod\limits_{u<v}\frac{\left(\theta_u\theta_v\right)^{A_{uv}}}{A_{uv}!}e^{-\theta_u \theta_v}\times \prod\limits_u \left(\frac{\theta_u^{2}/2}{\left(A_{uu}/2\right)!}\right)^{A_{uu}/2}e^{-\theta_u^2/2}. $$

The MLE question is now, which $\{\theta_u\}$ gives us a best fit (maximizes) to the Chung-Lu model. As before, we assume a uniform prior on the $\theta_u$ (all $\theta$ are equally likely) and $A$ is our observed data, so $P\left(A\right)$ and $P\left(\{\theta_u\}\right)$ are constants and we need only consider: $$ P\left(\{\theta_u\}| A\right) \propto P\left(A|\{\theta_u\}\right). $$ Let $\mathscr{L} = \log P\left(A|\{\theta_u\}\right)$ so that the expression we need optimize is now

$$ \begin{align*} \log P(\{\theta_u\}| A) & \propto \log\left[\prod\limits_{u<v}^{n}\frac{\left(\theta_u \theta_v\right)^{A_{uv}}}{{A_{uv}}!}e^{-\theta_u \theta_v} \times \prod\limits_{u}^{n}\frac{\left(\frac{1}{2}\theta_u^2\right)^{\frac{1}{2}A_{uu}}}{{\left[\frac{1}{2}A_{uu}\right]}!}e^{-\frac{1}{2}\theta_u ^2}\right] = \mathscr{L}\\ &= \sum\limits_{u<v}^n A_{uv}\log\theta_u \theta_v - \log A_{uv}! -\theta_u \theta_v +\frac{1}{2}\left[\sum\limits_{u}^n A_{uu}\log\theta_u^2 - 2\log\left[\left(\frac{1}{2}A_{uu}!\right)\right] -\theta_{u}^2+\log\frac{1}{2}\right]\\ \implies \mathscr{L} &= \frac{1}{2}\sum\limits_{u\neq v}^n A_{uv}\log\theta_u \theta_v -\theta_u \theta_v + \frac{1}{2}\sum\limits_{u}^n A_{uu}\log\theta_u^2 -\theta_u^2 + \text{const}. \end{align*} $$ Where the last line was calculated by allowing the indices to range over all nodes and correcting for double counting. Combining both terms we arrive at $$ \mathscr{L} = \frac{1}{2}\sum\limits_{uv}^n A_{uv}\log\theta_u \theta_v -\theta_u\theta_v $$

We can do some further simplification by splitting the $\log$ and observing that they are really the same sum and also noticing that $\sum\limits_{uv}\theta_u\theta_v = \left(\sum_v\theta_v\right)^2$

$$ \mathscr{L} = \sum\limits_{uv}A_{uv}\log{\theta_u} - \frac{1}{2}\left(\sum\limits_{v}\theta_v\right)^2. $$ Now the way to optimize this expression is to choose a particular $u = l$ and optimize with respect to $\theta_l$ $$ \begin{align*} \frac{\partial \mathscr{L}}{\partial \theta_l} &= \frac{\partial }{\partial \theta_l} \left(\sum\limits_{uv}A_{uv}\log{\theta_u} - \frac{1}{2}\left(\sum\limits_{v}\theta_v\right)^2\right) = 0 \\ \frac{\partial \mathscr{L}}{\partial \theta_l} &= A_{lv}\frac{1}{\theta_l} - \theta_v= 0 \\ \implies A_{lv} &= \theta_v \theta_l. \end{align*} $$ Now, we can use some of the properties of the adjacency matrix to get rid of the $l$

$$ \begin{align*} \sum\limits_{l}A_{lv} &= \sum\limits_{l}\theta_v\theta_l\\ \implies k_v &= \theta_v \sum\limits_l\theta_l \tag{2}\\ \sum\limits_v k_v &= \underbrace{\left(\sum\limits_v \theta_v\right)\left(\sum\limits_l \theta_l\right)}_\textrm{same sum}\\ 2m &= \left(\sum\limits_l \theta_l\right)^2\\ \implies \sum\limits_l \theta_l &= \sqrt{2m} \tag{3}\\ \end{align*} $$ subbing the expression in Equation (3) into Equation (2) $$ \begin{align} k_v &= \theta_v \sqrt{2m} \implies \boxed{\hat{\theta}_v = \frac{k_v}{\sqrt{2m}}},\\ \hat{\theta}_u\hat{\theta}_v &= \frac{k_u}{\sqrt{2m}}\frac{k_v}{\sqrt{2m}} = \frac{k_uk_v}{2m}. \end{align} $$ It’s clear from this expression that the $\{\theta_u\}$ that optimize the posterior are the degrees of the nodes themselves, which is essentially the configuration model.

Conclusion

This last derivation was quite involved, but as mentioned before, doing these calculations on simpler network models is good practice for when we need to work with more sophisticated one. Understanding these calculations will be very helpful when we eventually discuss community structure and methods for measuring this network property.

References

B. Bollobás, A probabilistic proof of an asymptotic formula for the number of labelled regular graphs. European Journal of Combinatorics 1, 311–316 (1980)
F. Chung, L. Lu, Connected components in random graphs with given degree sequences. Annals of Combinatorics 6, 125-145 (2002)
P. Erdős, A. Rényi, On random graphs. Publicationes Mathematicae Debrecen 6, 290-297 (1959)
R. Solomonoff, A. Rapoport, Connectivity of random nets. Bulletin of Mathematical Biophysics 13, 107–117 (1951)