Recommender Systems and Link Prediction

Introduction

Recommender systems play an integral part in our digital lives. Whether you’re discovering a new movie on your favorite streaming platform or browsing Amazon, recommender systems are working behind the scenes presenting to you the most relevant content to keep you from searching for a needle in a haystack.

Traditionally, recommender systems have been developed using various approaches, such as content-based filtering, user-based collaborative filtering, item-based collaborative filtering, amongst others. In user-based collaborative filtering, recommendations for a user are made based on the preferences of similar users. Conversely, item-based collaborative filtering suggests items based on the similarity between items themselves. Collaborative filtering is a popular method due to its robustness in handling large datasets and its ease of implementation.

However, beyond their widespread use, traditional collaborative filtering techniques can be enriched by considering an alternative perspective: reframing the problem as a network link prediction problem. This viewpoint not only broadens our understanding but also opens up new avenues for applying network theory to recommender systems. We will focus on item-based collaborative filtering, but the ideas presented in this post can be generalized.

Item-Based Collaborative Filtering

Let’s begin our discussion with a crash course on item-based collaborative filtering (IBCF). In IBCF, the basic idea is to recommend items (such as movies, music, books, etc.) to users based on their similarity to other items users have interacted with.

There is nothing better than working with a concrete example, so we start with loading in some data. Throughout this post, we will be working with the MovieLens dataset (small).

import pandas as pd
# read in ratings and movie data
ratings = pd.read_csv('ml-latest-small/ratings.csv')
movies = pd.read_csv('ml-latest-small/movies.csv')

The movies dataframe contains movie IDs, their titles, and genres. The ratings dataframe contains ratings that each user ID gave to a movie ID along with a timestamp (which we will not make use of). We will combine these two dataframes so that we can have all the data in one place:

movie_user = movies.merge(ratings, on = "movieId")

This movie_user dataframe has $100,836$ rows representing the total number of interactions between users and the movies.

Mean-Centering Ratings

Next, we will perform one common preprocessing step: mean-centering the ratings to adjust for user bias — some users rate more generously while others are more conservative.

To illustrate why we do this, consider the rough analogy of weighing flour with a kitchen scale. Everyone one uses different bowls, so the reading on the scale will be off by a different amount for every person. To get an accurate measurement, you would first tare the scale, setting it to zero with the empty bowl on it. Mean-centering is akin to taring your scale before weighing the flour. By doing this, we adjust for individual biases in rating styles, leading to more robust recommendations. Here’s how you can mean-center the ratings:

# calculate mean centered ratings
user_mean = movie_user.groupby('userId')['rating'].mean()

# Add a column for mean-centered ratings
movie_user['rating_mean_centered'] = movie_user['rating'] - movie_user['userId'].map(user_mean)

To make this example easier to run on laptop, we will reduce the size of the dataset by considering only movies with at least 100 ratings.

movie_counts = movie_user.groupby('movieId')['rating'].count()
popular_movies = movie_counts[movie_counts >= 100].index
movie_user_pop = movie_user[movie_user['movieId'].isin(popular_movies)]

The resulting dataframe should have $138$ unique movies, $597$ unique users, and $20,188$ ratings.

Utility Matrix

A utility matrix is a fundamental concept in collaborative filtering. It is a matrix where rows represent users, columns represent items (in this case, movies), and the values are the mean-centered ratings given by users to items. This matrix forms the basis for calculating item similarities and generating recommendations.

To construct the utility matrix from our preprocessed movie_user_pop dataframe, we use the pivot_table function:

# Create the utility matrix
user_movie_mat = movie_user_pop.pivot_table(values='rating_mean_centered', index='userId', columns='movieId')
user_movie_mat.head()

movieId	1	2	…	68954	79132
userId
1	-0.366379	NaN	…	NaN	NaN
2	NaN	NaN	…	NaN	0.051724
3	NaN	NaN	…	NaN	NaN
4	NaN	NaN	…	NaN	NaN
5	0.363636	NaN	…	NaN	NaN

This user_movie_mat matrix now has users as rows, movies as columns, and the mean-centered ratings as values. Any missing values in the matrix represent situations where a user hasn’t rated a particular movie. Ultimately, the goal is to predict these missing values.

Cosine Similarity

Now that we have our utility matrix, we can calculate the similarity between items. Cosine similarity is a measure that calculates the cosine of the angle between two vectors, providing a similarity score between -1 and 1. It is widely used in item-based collaborative filtering because it effectively captures the degree of similarity between items based on user ratings. Moreover, doing this calculation on the mean-centered ratings is closely related to the Pearson correlation between the items.

Here’s how we can compute the cosine similarity between the items (movies) in our utility matrix:

from sklearn.metrics.pairwise import cosine_similarity

item_similarity_cosine = cosine_similarity(user_movie_mat.fillna(0).T)
movie_sim_df = pd.DataFrame(item_similarity_cosine, index=user_movie_mat.columns, columns=user_movie_mat.columns)
movie_sim_df.head()

movieId	1	2	…	68954	79132
movieId
1	1.000000	0.061511	…	0.184418	0.042688
2	0.061511	1.000000	…	-0.027452	-0.053825
6	0.089142	-0.031387	…	0.025957	0.105738
10	-0.053247	-0.131687	…	-0.069963	0.034767
32	0.122093	-0.103307	…	0.139529	0.117363

The resulting item_similarity_cosine dataframe contains similarity scores for each pair of movies. These scores form the basis of our item-based collaborative filtering recommendations, as we can use them to identify movies that are similar to ones a user has already rated highly.

Making Recommendations

One way to make predictions (recommendations) is to aggregate similarity scores between movies unwatched by the user and the movies that have been watched by the user. Thus, the predicted rating user $u$ might give to an unrated movie $m$, can be expressed as follows:

$$ \begin{align*} \hat{r}_{u,m} &= \bar{r}_u + \frac{\sum\limits_{m’ \in W_u} s_{m,m’}\cdot \widetilde{r}_{u, m’}}{\sum\limits_{m’ \in W_u} \lvert s_{m,m’}\rvert} \end{align*} \tag{$1$} $$

Where $\bar{r}_u$ is the mean rating user $u$ gives, $W_u$ is the set of movies that user $u$ has previously rated, $s_{m,m’}$ the similarity between an unrated movie $m$ and a previously rated movie $m’$, and $\widetilde{r}_{u, m’}$ is the mean-centered rating given to movie $m’$ by users $u$. This formula, effectively calculates a waited average of the ratings user $u$ gives to movies similar to $m$.

We can use this approach to make recommendations for user with ID 1:

recommended_movies_df = recommend_movies(user_id = 1, user_mean, user_movie_mat, movie_user_pop, movie_sim_df)
recommended_movies_df.head(5)

	movieId	title	predicted_rating
0	7153	Lord of the Rings: The Return of the King, The…	4.771283
1	3996	Crouching Tiger, Hidden Dragon (Wo hu cang lon…	4.732515
2	7361	Eternal Sunshine of the Spotless Mind (2004)	4.718561
3	4993	Lord of the Rings: The Fellowship of the Ring,…	4.715945
4	1193	One Flew Over the Cuckoo’s Nest (1975)	4.714788

I’ve left the implementation out of the post, but it can be found in the github repo here.

The Network Approach

Networks are a very versatile representation of data that particularly lends itself well to systems with sparse pairwise interactions. In the context of recommender systems, sparsity refers to the observation that most users will only interact with a small subset of items (in this case, movies). We don’t expect users to rate or even watch most of the available movies. This sparse nature of user-item interactions is a fundamental assumption in recommender systems: since users can’t possibly watch every movie, our goal is to make the best possible recommendations based on the limited interactions they have.

We will create a bipartite graph where users and movies are nodes, and the edges between them represent the ratings users have given to movies. Recommendations, are akin to predicting an edge (and in this case also its weight) between a user node and a movie node. In network theory, this is known as link prediction.

Constructing the Network

Let’s construct the bipartite graph from the movie_user_pop dataframe. A bipartite graph is a type of graph where nodes can be divided into two disjoint sets, such that each edge connects a node from one set to a node in the other set (but never two nodes of the same set). In our case, one set of nodes represents users, and the other set represents movies. The edges between these nodes are weighted by the ratings users have given to the movies.

We will use the graph-tool library to create this graph, where each user and each movie will be a node, and an edge between them will represent a rating, with the rating as the edge weight.

Here is an example of how we can do this:

import graph_tool.all as gt
# Initialize an undirected graph
g = gt.Graph(directed=False)

# add a user node
u1 = g.add_vertex()
# add a movie node
m1 = g.add_vertex()
# add an edge between user and movie
e = g.add_edge(u1, m1)

# We can add metadata to the network using property maps
nodeID = g.new_vertex_property("string") # user or movie ID 
mtitle = g.new_vertex_property("string") # movie title
node_type = g.new_vertex_property("string")
eweight = g.new_edge_property("double") # the mean-centered rating

nodeID[u1] = 'u'+str(movie_user_pop['userId'][0])
nodeID[m1] = 'm'+str(movie_user_pop['movieId'][0])

node_type[u1] = 'user'
node_type[m1] = 'movie'

mtitle[m1] = movie_user_pop['title'][0]

eweight[e] = movie_user_pop['rating_mean_centered'][0]

Doing this for every pair of nodes one-by-one is a bit of an undertaking, so we will automate it using a function that takes in a pandas dataframe and outputs a graph-tool network, the implementation of which we have left in the repo.

mlens_net = construct_network(movie_user_pop)

The above hairball is a (terrible) visualization of the network we have just constructed. There are 597 blue square nodes (users), 138 yellow circular nodes (movies), and 20,188 edges running between the two node types (ratings). All edges run between user nodes and movie nodes only. Movies do not rate other movies nor users other users: this is what it means to be bipartite.

Cosine Similarity Revisited

In network theory, a commonly studied concept is node similarity. In the simplest sense, two nodes are similar if they share many of the same neighbors, how does this intuition hold up with the network we have constructed? Well, let’s take an arbitrary user node $u$ and movie node $m$: any neighbor of a user node is a movie node and any neighbor of a movie node is a user node, so their similarity must necessarily be 0. This make sense, these are two different entities that we are modeling with the nodes: users and movies. We can, however, have non-zero similarity between nodes of the same type: users $u_1$ and $u_2$ might rate highly many of the same movies making them similar. So it is natural to assume that cosine similarity could be equivalently reformulated in the network context. As before, we can use the utility matrix, $B$ (usually called an incidence matrix in the context of networks) to calculate the cosine similarity between two nodes.

$$ \begin{align*} s_{m,m’} = \frac{\sum\limits_{u \in \Gamma\left(m\right) \cap \Gamma\left(m’\right)} B_{u,m}B_{u,m’}}{\sqrt{\sum\limits_{u \in \Gamma\left(m\right)} B_{u,m}^2}\sqrt{\sum\limits_{u \in \Gamma\left(m’\right)} B_{u,m’}^2}} \tag{$2$} \end{align*} $$

Where $\Gamma\left(m\right)$ is the set of neighbors of movie $m$. This expression is in every way equivalent to previously discussed formulation (in the linked Wikipedia page), it is still fundamentally a dot product. But by writing it in terms of sums of neighbors, we have more explicitly illustrated the network structure of the data.

A quick and dirty implementation of the above formula can be found in the github repo. Just as a good sanity check let’s see if the similarity calculation produces sensible results, here are the top $10$ similar movies to The Lion King (movieId: 364):

rank	movieId	title	similarity_score
1	588	Aladdin (1992)	0.356733
2	595	Beauty and the Beast (1991)	0.293266
3	4995	Beautiful Mind, A (2001)	0.266820
4	356	Forrest Gump (1994)	0.256900
5	4306	Shrek (2001)	0.246165
6	2571	Matrix, The (1999)	0.241855
7	6377	Finding Nemo (2003)	0.236003
8	1	Toy Story (1995)	0.235402
9	457	Fugitive, The (1993)	0.219770
10	480	Jurassic Park (1993)	0.214132

On the left we have a table of the top 10 most similar movies to The Lion King and on the right another hairball visualization with the node corresponding to The Lion King shown in red. From the table, you see that many of the movies most similar to The Lion King are other childrens movies. However, we also see some more adult movies in the mix that do not necessarily align with our human intuition for what is similar. Why do you think this is? Do you think this is just a poor metric for similarity or are we instead seeing some of the underlying social dynamics behind movie watching? Do children normally have their own accounts or do they generally use the accounts of the adults in the family? These types of questions demonstrate some of the complexity in real world data, and highlight the challenge of building good recommender systems.

Network Based Recommendations

The last step is to make the actual recommendations. This can be done in exactly the same way using the Equation (1), and if we apply it to the user with ID $1$ we get the following top-5 recommendations with predicted rating:

Conclusions

Traditional collaborative filtering techniques have been honed over years and are supported by highly optimized libraries that are often best-suited for straightforward recommendation tasks. If your goal is to do traditional collaborative filtering, then you should just do traditional collaborative filtering. However, by viewing our data through the lens of networks, we gain access to a broader array of methodologies that can yield deeper insights and potentially unveil complex patterns that might remain hidden otherwise.

While reframing collaborative filtering as a network link prediction problem may not necessarily offer superior performance over traditional methods, it serves as a compelling motivation for incorporating network-based approaches into machine learning. The exploration of this perspective is a simple yet illustrative example of how considering data as a network can enrich our analytical toolkit. For a non-trivial application of link prediction that also ties into recommendation systems, take a look at our paper on link prediction and drug repurposing here. Where we discuss and compare various link prediction methods, evaluating their performance on the very important problem of drug repurposing.