Recommender Systems, an Introduction#

Intro#

… there was a girl called Marinka

Marinka

… there was a girl called Marinka

Marinka

and Marinka was happy again

Marinka

As the time passes by …

Marinka

Idea

Marinka

But

Marinka
Marinka

Recommender Engines#

What are Recommenders?#

  • Algorithms to find similar items and recommend them to user

  • Examples: Youtube, Amazon, Netflix and Spotify

Why Recommenders?#

  • User engagement: keep users on the platform

  • Personalization: tailor content to individual preferences

  • Revenue generation: increase sales through targeted recommendations

  • Data collection: gather user preferences for future improvements

Data Collection#

  • User data: user profiles, preferences, and behavior

    • implicit feedback (views, clicks, purchases)

    • explicit feedback (ratings, reviews)

  • Item data: item features, descriptions, and metadata

Types#

  • Content based

  • Collaborative filtering

  • Hybrid models

Content based model#

if the user liked/bought an item, recommend similar items.

Marinka

How to find similar Items?#

Similarity measure:

  • euclidian distance

  • pearson correlation

  • cosine similarity –> most used

Cosine similarity#

can take value between 1 and 0

Marinka

Cosine similarity#

Dot product of the vectors divided by the product of their lengths

Marinka

Example#

Marinka

Calculation example#


Marinka

Similarity matrix#


similarity

Outcome#


similarity

Collaborative filtering#

if user A and user B have similar tastes, recommend items that user B liked to user A.

Types:

  • item based

    • similarities between items based on user ratings

  • user based

    • similarities between users based on their ratings

User-Item-Rating Matrix#


similarity

User-Item-Rating Matrix#

We usually deal with sparse matrices

similarity

A simple user based Example#

We want to use the most similar user’s rating to predict rating of other user

  • calculate similarity matrix (e.g. Cosine, Pearson, Euclidean)

  • use most similar user to predict the rating

User-User-Similarity Matrix#

First we calculate the siimilarity matrix

similarity

User-User-Similarity Matrix#

Then we look for the most simialar user

similarity

Predict Rating#

Then we use the rating of the most similar user to predict the rating

similarity

User-User-Similarity Matrix#

Again we look for the most similar user

similarity

Predict Rating#

Then we use the rating of the most similar user to predict the rating

similarity

Predict Rating#

Do this for all users

similarity

Recommend Item#

recommend highest rated item which was not rated before

similarity

Model Based#

(A not so simple Example)

SVD - Singular Value Decomposition

  • Approximate Rating Matrix by product of three matrices

  • Latent features can often be interpreted (genre etc.)

  • Won the Netflix Prize

SVD#

Singular Value Decomposition

U - User–Feature Matrix:

  • Each row corresponds to a user, each column to a latent feature (hidden factor).

  • Example: A latent feature could represent a hidden preference dimension like “likes action movies” …

  • The values tell you how strongly each user relates to each latent feature.

Σ - Singular Values (Diagonal Matrix):

  • Contains the strengths (weights) of the latent features.

  • Larger singular values = more significant latent dimensions for explaining the data.

\(V^{T}\) - Feature–Item Matrix:

  • Each row corresponds to a latent feature, each column to an item.

  • The values indicate how strongly each item relates to each latent feature.

  • Example: An item might score high on the “action movie” factor and low on the “romantic comedy” factor.

similarity

SVD#

How to predict the missing ratings

similarity

SVD#

How to predict the missing ratings

similarity

SVD#

How to predict the missing ratings

Optimal decomposition can be found using gradient descent

similarity

Recommend Item#


Recommend highest rated item which was not rated before
similarity

How to evaluate recommenders?#

Offline evaluation (for algorithm tuning):

  • Rating prediction accuracy:

    • RMSE (Root Mean Squared Error)

    • MAE (Mean Absolute Error)

  • Ranking quality:

    • Precision, Recall

    • MAP (Mean Average Precision)

  • diversity, novelty (how broad and varied the recommendations are)

Online evaluation (for validating the real effect on users and the business):

  • A/B testing pproach:

    • Split live users into groups:

      • Control: baseline system (e.g., popularity-based recommendations)

      • Treatment: new algorithm

  • Typical online metrics

    • CTR (Click-Through Rate)

    • Conversion rate (purchase, subscription, etc.)

    • Engagement time (session duration, items viewed)

    • Retention / churn rate

    • Revenue per user

Drawbacks of Recommenders#

  • Cold start problem: new users/items have no data

  • Sparsity: many items have few ratings, making it hard to find similar items/users

  • Scalability: large datasets can be computationally expensive

  • Bias: algorithms can reinforce existing biases in data