The Degree Corrected Stochastic Block Model

Intro

This model is a variation on the Stochastic Block Model (SBM) discussed in the previous post. It was introduced by Karrer and Newman to address some of the shortcomings of the standard SBM. Recall that the SBM was characterized by a Bernoulli distribution of edge placements between nodes. This is essentially a set of Erdős–Rényi (ER) random graphs joined together, and in fact, when the probability of an edge is the same for all groups (and in-between groups) in the SBM, you recover precisely the ER model. This means that the standard SBM suffers from the same problem that the ER model does, which is, that in the large network limit, the edges follow a poisson distribution. What Karrer and Newman have done, is modify the SBM to be able to handle arbitrary degree distribution so that it behaves more like the Chung-Lu model in the sense that it can handle arbitrary degree distribution in expectation.

Definition

Just like the standard SBM, this model takes $n$ nodes and divides them into $k$ disjoint groups. The difference now, is that instead of basing the probability of an edge between two nodes solely on the groups to which they belong, we now introduce a new parameter $\theta_i$ to modify the probability. These $\{\theta_i\}$ serve the same purpose as they do in the Chung-Lu model: they are the expected degree of node 𝑖. So in the Degree Corrected SBM, the probability of an edge between two nodes becomes $\theta_i \theta_j \Omega_{rs}$ (I will show this below).

Building the DC-Stochastic Block Model

$$ A = \begin{pmatrix} 0 & 1 & 1 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 \\ 1 & 1 & 0 & 1 & 0 \\ 0 & 0 & 1 & 0 & 1 \\ 0 & 0 & 0 & 1 & 0 \end{pmatrix} $$

Let’s start with the same example as before. In fact, we will carry over most of the notation too. Recall the group assignment matrix

$$ \begin{align} \Gamma_{ir} = \begin{cases} 1 & \text{if node } i \text{ in group } r\\ 0 & \text{otherwise} \end{cases} \end{align} $$

the edge-adjacency matrix

$$ \begin{align} m_{rs} = \begin{cases} \text{number of edges between groups } r \text{ and } s & r \neq s\\ 2 \times \text{number of edges between groups } r \text{ and } s & r=s \end{cases} \end{align} $$ and the probability matrix $$ \Omega_{rs}= \begin{pmatrix} \Omega_{11} &\cdots &\Omega_{1k}\\ \vdots & \ddots &\vdots\\ \Omega_{k1} &\cdots &\Omega_{kk} \end{pmatrix} $$

Recall also that $k_i = \sum\limits_jA_{ij}$ is the degree of vertex $i$. Using the definition of the group assignment matrix, we can write the equation for the sum of the degrees in group $r$ as

$$ \begin{align} \Delta_r = \sum\limits_i\Gamma_{ri}k_i = \sum\limits_{ij}\Gamma_{ri}A_{ij} \end{align} $$ Also note that transforming the quantity $\Delta_r$ by the group assignment matrix gives us $$ \begin{align} \kappa_i = \sum\limits_{r}\Gamma_{ir}\Delta_r \end{align} $$ Where we can interpret each $\kappa_i$ as being the sum of the degrees in the group to which vertex $i$ belongs.

I will now introduce a new matrix incorporating the $\{\theta_i\}$. We denote $$ \begin{align*} \operatorname{diag}\left(\theta\right) = \begin{pmatrix} \theta_1 & 0 & 0\\ 0 & \ddots & 0 \\ 0 & 0 & \theta_n \end{pmatrix} \end{align*} $$

to be a diagonal matrix with the $\{\theta_i\}$ on the diagonal.

In the Degree Corrected Stochastic Block Model, the expanded probability matrix, $\Lambda_{ij}$, has the form $\Lambda_{ij} = \operatorname{diag}\left(\theta\right)\Gamma\Omega\Gamma^{\text{T}}\operatorname{diag}\left(\theta\right)$ $$ \begin{align*} \Lambda_{ij} &= \begin{pmatrix} \theta_1 & 0 & 0 & 0 & 0 \\ 0 & \theta_2 & 0 & 0 & 0 \\ 0 & 0 & \theta_3 & 0 & 0 \\ 0 & 0 & 0 & \theta_4 & 0 \\ 0 & 0 & 0 & 0 & \theta_5 \end{pmatrix}
\begin{pmatrix} \Omega_{11} & \Omega_{11} & \Omega_{11} & \Omega_{12} & \Omega_{12} \\ \Omega_{11} & \Omega_{11} & \Omega_{11} & \Omega_{12} & \Omega_{12} \\ \Omega_{11} & \Omega_{11} & \Omega_{11} & \Omega_{12} & \Omega_{12} \\ \Omega_{12} & \Omega_{12} & \Omega_{12} & \Omega_{22} & \Omega_{22} \\ \Omega_{12} & \Omega_{12} & \Omega_{12} & \Omega_{22} & \Omega_{22} \end{pmatrix} \begin{pmatrix} \theta_1 & 0 & 0 & 0 & 0 \\ 0 & \theta_2 & 0 & 0 & 0 \\ 0 & 0 & \theta_3 & 0 & 0 \\ 0 & 0 & 0 & \theta_4 & 0 \\ 0 & 0 & 0 & 0 & \theta_5 \end{pmatrix}\\ &= \begin{pmatrix} \color{blue}{\theta_1^2\Omega_{11}} & \color{blue}{\theta_1\theta_2\Omega_{11}} & \color{blue}{\theta_1\theta_3\Omega_{11}} & \theta_1\theta_4\Omega_{12} & \theta_1\theta_5\Omega_{12}\\ \color{blue}{\theta_2\theta_1\Omega_{11}} & \color{blue}{\theta_2^2\Omega_{11}} & \color{blue}{\theta_2\theta_3\Omega_{11}} &{\theta_2\theta_4\Omega_{12}} & {\theta_2\theta_5\Omega_{12}} \\ \color{blue}{\theta_3\theta_1\Omega_{11}} & \color{blue}{\theta_3\theta_2\Omega_{11}} & \color{blue}{\theta_3^2\Omega_{11}} &{\theta_3\theta_4\Omega_{12}} & {\theta_3\theta_5\Omega_{12}} \\ {\theta_4\theta_1\Omega_{12}} & {\theta_4\theta_2\Omega_{12}} & {\theta_4\theta_3\Omega_{12}} &\color{red}{\theta_4^2\Omega_{22}} & \color{red}{\theta_4\theta_5\Omega_{22}} \\ {\theta_5\theta_1\Omega_{12}} & {\theta_5\theta_2\Omega_{12}} & {\theta_5\theta_3\Omega_{12}} &\color{red}{\theta_5\theta_4 \Omega_{22}} & \color{red}{\theta_5^2\Omega_{22}} \end{pmatrix} \end{align*} $$

As in the SBM, the $\Lambda_{ij}$ matrix is the edge probability corresponding to each individual node, however, this time weighed by the expected degree of each node. Unlike in the standard SBM, nodes in the same group no longer have have some probability, as the degree parameter makes an edge more (or less) likely now.

Lastly, It is important to note that with the addition of this parameter, we have made the model unidentifiable. This is because if we scale each $\theta_i$ in some group $r$ by some factor $\gamma_r$ and correspondingly reduce $\Omega_{rs}$ by a factor of $\gamma_r\gamma_s$, the $\Lambda_{ij}$ remain the same. So in order to completely specify the model we will impose a constraint on the $\theta_i$

$$ \begin{align} \sum\limits_{i}\Gamma_{ir}\theta_i = 1 \label{constr} \end{align} $$

DC-Stochastic Block Model Equation

As before, we can write the probability of observing a network $G$ with adjacency matrix $A$, this time conditional on $\Omega$, $\Gamma$, and $\{\theta_i\}$ in the Poisson limit $$ \begin{align*} P\left(A| \Omega, \Gamma, \{\theta_i\}\right) = \prod\limits_{i<j}^{n}\frac{\left(\Lambda_{ij}\right)^{A_{ij}}}{{A_{ij}}!}e^{-\Lambda_{ij}} \times \prod\limits_{i}^{n}\frac{\left(\frac{1}{2}\Lambda_{ii}\right)^{\frac{1}{2}A_{ii}}}{{(\frac{1}{2}A_{ii}})! }e^{-\frac{1}{2}\Lambda_{ii}} \end{align*} $$

DC-Stochastic Block Model MAP

Again, by Bayes’ rule and assuming uniform priors on $\Omega_{rs}, \Gamma_{ir}, {\theta_i}$, the log likelihood can be expressed as $$ \begin{align*} P(\Omega, \Gamma, \vec{\theta}| A) \propto P(A| \Omega, \Gamma, \vec{\theta})\\ \mathscr{L} = \frac{1}{2}\sum\limits_{ij}^n A_{ij}\log\Lambda_{ij} -\Lambda_{ij} \end{align*} $$

The sum can be expanded as

$$ \begin{align} \sum\limits_{ij}A_{ij}\log\Lambda_{ij} &= \sum\limits_{ij}A_{ij}\log\left(\theta_i \theta_j\right) + \sum\limits_{ij}A_{ij}\log\left(\sum\limits_{rs}\Gamma_{ir}\Omega_{rs}\Gamma_{sj}\right)\\ &= \underbrace{\sum\limits_{ij}A_{ij}\log\left(\theta_i\right) + \sum\limits_{ij}A_{ij}\log\left(\theta_j\right)}\textrm{the same sum} + \sum\limits{rs}m_{rs}\log\Omega_{rs}\\ &=2\sum\limits_{i}k_i\log\left(\theta_i\right) + \sum\limits_{rs}m_{rs}\log\Omega_{rs} \end{align} $$

Where I have used the fact that $k_i = \sum\limits_{j}A_{ij}$ and just like in the Stochastic Block Model, $\sum\limits_{ij}A_{ij}\log\left[\sum\limits_{rs}\Gamma_{ir}\Omega_{rs}\Gamma_{sj}\right]=\sum\limits_{rs}m_{rs}\log\Omega_{rs}$.

the log likelihood can now be rëexpressed as

$$ \begin{align*} \mathscr{L} &= \frac{1}{2}\left[2\sum\limits_{i}k_i\log\left(\theta_i\right) + \sum\limits_{rs}m_{rs}\log\Omega_{rs} - \sum\limits_{ij}\Lambda_{ij}\right]\\ &=\frac{1}{2}\left[2\sum\limits_{i}k_i\log\left(\theta_i\right) + \sum\limits_{rs}m_{rs}\log\Omega_{rs} - \sum\limits_{ij}\sum\limits_{rs}\theta_i\theta_j\Gamma_{ir}\Omega_{rs}\Gamma_{sj}\right]\\ &=\frac{1}{2}\left[2\sum\limits_{i}k_i\log\left(\theta_i\right) + \sum\limits_{rs}m_{rs}\log\Omega_{rs}-\Omega_{rs}\right] \end{align*} $$

Where I have used the constraint mentioned earlier to make the model identifiable. We can now optimize for the probabilities by choosing a particular $\Omega_{\mu\nu}$

$$ \begin{align*} \frac{\partial\mathscr{L}}{\partial\Omega_{\mu\nu}} &= \frac{m_{\mu\nu}}{\Omega_{\mu\nu}} - 1 = 0\\ \implies \Omega_{\mu\nu} &= m_{\mu\nu} \end{align*} $$ So the optimal probabilities are given by $$ \begin{align*} \hat{\Omega}{\mu\nu} &= m{\mu\nu} \label{deg_om} \end{align*} $$

We can now substitute this back into the log likelihood and optimize for the $\theta_i$

$$ \begin{align} \mathscr{L} = \frac{1}{2}\left[2\sum\limits_{i}k_i\log\left(\theta_i\right) + \sum\limits_{rs}m_{rs}\log m_{rs}\right]\tag{$*$} \end{align} $$

Where I have dropped the constant $\sum\limits_{rs}m_{rs} = 2m$

To optimize for $\theta_i$, we summon the method of Lagrange multipliers using the constraint on $\theta_i$and add a factor of “zero” to the log likelihood.

$$ \begin{align*} \mathscr{L} = \frac{1}{2}\left[2\sum\limits_{i}k_i\log\left(\theta_i\right) + \sum\limits_{rs}m_{rs}\log m_{rs} + \lambda\left(\sum\limits_{i}\Gamma_{ir}\theta_i-1\right)\right] \end{align*} $$

Choosing a particular $\theta_l$, then $$ \begin{align*} \frac{\partial\mathscr{L}}{\partial\theta_l} = \frac{1}{2}\left(2\frac{k_l}{\theta_l} - \lambda\left(\sum\limits_{i}\Gamma_{ir}\right)\right) = 0 \end{align*} $$

recall that $\sum\limits_{i}\Gamma_{ir} = n_r$

$$ \begin{align*} k_l &= \frac{1}{2}\lambda n_r\theta_l\\ \sum\limits_{r}\Gamma_{lr}\sum\limits_{l}\Gamma_{rl}k_l &=\frac{1}{2}\lambda\sum\limits_{r}\Gamma_{lr}\sum\limits_{l}\Gamma_{rl}\theta_ln_r\\ \kappa_l&=\frac{1}{2}\lambda\sum\limits_{r}\Gamma_{jr}n_r\\ \implies \lambda &= 2\frac{\kappa_l}{\sum\limits_{r}\Gamma_{lr}n_r} \end{align*} $$

Now substituting this back into the expression for $k_l$ we arrive at $$ \begin{align*} \require{cancel} \underbrace{\sum\limits_{r}\Gamma_{lr}k_l}{=k_l} &=\frac{1}{2}\left(2\frac{\kappa_l}{\cancel{\sum\limits{r}\Gamma_{lr}n_r}}\right)\cancel{\left(\sum\limits_{r}\Gamma_{lr}n_r\right)}\theta_l\\ \implies k_l &=\kappa_l\theta_l \end{align*} $$

So the optimal $\theta_i$is given by

$$ \begin{align} \hat{\theta}_i = \frac{k_i}{\kappa_i} \end{align} $$

and now resubstituting into Equation $\left(*\right)$

$$ \begin{align*} \mathscr{L} = \frac{1}{2}\left[2\sum\limits_{i}k_i\log\left(\frac{k_i}{\kappa_i}\right) + \sum\limits_{rs}m_{rs}\log m_{rs}\right] \end{align*} $$

The first term in the log likelihood can be simplified as follows $$ \begin{align*} 2\sum\limits_{i}k_i\log\left(\frac{k_i}{\kappa_i}\right) &= 2\sum\limits_{i}k_i\log k_i - 2\underbrace{\sum\limits_{i}k_i\log \kappa_i}{= \sum\limits{r}\Delta_r\log\Delta_r}\\ &= 2\sum\limits_{i}k_i\log k_i - \sum\limits_{r}\Delta_r\log\Delta_r - \sum\limits_{s} \Delta_s\log\Delta_s\\ &= 2\sum\limits_{i}k_i\log k_i - \sum\limits_{r}\sum\limits_{s}m_{rs}\log\Delta_r - \sum\limits_{s} \sum\limits_{r}m_{rs}\log\Delta_s\\ &= 2\sum\limits_{i}k_i\log k_i - \sum\limits_{rs}m_{rs}\log\Delta_r\Delta_s \end{align*} $$

Where I have used the fact that $\Delta_r = \sum\limits_{s}m_{rs}$. Also note that the term $\sum\limits_{i}k_i\log k_i$ is a just a constant (because this quantity can be directly observed from the adjacency matrix and is independent of group assignment), and so it can be ignored.

This brings us to the final, simplified, profile likelihood of the Degree Corrected Stochastic Block Model

$$ \begin{align} \mathscr{L} = \sum\limits_{rs}m_{rs}\log\left(\frac{m_{rs}}{\Delta_r\Delta_s}\right)
\end{align} $$

Conclusion

In both the SBM and its degree corrected variant, I’ve presented a slightly different derivation than the one discussed by Karrer and Newman, however, the goal is the same. Like in the case of the standard SBM, what we have done here is used the MAP to eliminate the $\Omega_{rs}$ (and in the degree-corrected version ${\theta_i}$ too) parameters so that we are left with a likelihood only in terms of the group assignments. Observe how in the standard SBM, the likelihood is only dependent on the number of edges in a group and the number of nodes in each group. The DC version still depends on the number of edges in the group, however, rather than simply consider the number of nodes, it instead considers the degrees of the nodes in a particular group (the $\Delta_r$ parameters).

Optimizing for group assignments is precisely community detection. It will assign nodes to groups (a.k.a communities) in such away that the likelihood function is maximized. In the next post, we will tackle this problem directly. It will require a bit of programming since it is a discrete optimization problem that can only be solved computationally.