Note: This repository includes both an HTML and PDF versions of my write-up, along with the corresponding code.
This project implements a Book Recommendation System for young adults, focusing on users aged 18-25. The system is built using collaborative filtering techniques and matrix factorisation to provide personalised book suggestions based on users' past ratings.
The goal of this project is to create a recommender system that can accurately predict how a user might rate a particular book, and ultimately, recommend books that align with their preferences. The project uses Kaggle's Book-Crossing Dataset, which contains anonymised ratings of books provided by users, along with metadata about the books themselves.
-
Build recommendation systems using:
- Item-based collaborative filtering
- User-based collaborative filtering
- Matrix factorisation (with and without regularisation)
-
Ensemble Model:
- Combine the predictions from the three methods above using an ensemble approach to improve accuracy.
-
Assess Model Performance:
- Evaluate the accuracy of each model using Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).
The dataset used in this project is publicly available on Kaggle: Book Recommendation Dataset. It contains:
- 278 858 users with anonymised user information
- 271 379 books with metadata including titles, authors, publication year, and ISBN
- 1 149 780 ratings, both explicit (1-10 scale) and implicit (indicated by a 0).
- User-based Collaborative Filtering (UBCF): Recommendations are made by identifying users with similar preferences and suggesting books based on what those similar users have rated highly.
- Item-based Collaborative Filtering (IBCF): Recommendations are generated by identifying books that are similar to those the user has already rated positively.
Matrix factorisation techniques decompose the user-item rating matrix into smaller matrices, capturing hidden factors that describe the relationships between users and books. This method helps fill in missing ratings by leveraging these latent factors. The model was built using the recosystem package.
- Without regularisation: The model directly decomposes the matrix, with the risk of overfitting.
- With regularisation (L2): Regularisation was applied to prevent overfitting by penalising large parameter values.
The final model combines the predictions from the user-based, item-based, and matrix factorisation approaches. This method averages predictions to improve overall accuracy, leveraging the strengths of each individual model.
The models were evaluated using Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) on a test set. The performance of the models is summarised as follows:
| Model | MSE | RMSE |
|---|---|---|
| Matrix Factorisation (without reg.) | 3.39 | 1.84 |
| Matrix Factorisation (with reg.) | 3.30 | 1.82 |
| User-based Collaborative Filtering | 11.75 | 3.43 |
| Item-based Collaborative Filtering | 3861.38 | 62.14 |
| Ensemble Model | 1.41 | 1.19 |
The Ensemble Model outperformed all other approaches, demonstrating the effectiveness of combining various techniques for more accurate predictions.
- R: Used for data processing, model building, and evaluation
- recosystem: Matrix factorisation model building
- proxyC: Used for calculating cosine similarities between users and items
- dplyr, tidyr, ggplot2: Data manipulation and visualisation
- Matrix: Efficient handling of sparse matrices
- Clone this repository:
git clone https://github.com/jessicastow/book_recommender.git
- Install the necessary libraries in R:
install.packages(c("dplyr", "tidyr", "ggplot2", "proxyC", "Matrix", "recosystem"))
- Open the R project and run the provided R scripts to preprocess the data, train the models, and evaluate their performance.
This project demonstrates how various collaborative filtering methods and matrix factorisation can be used to build an effective book recommendation system. The ensemble model, which combines the predictions of different techniques, proved to be the most accurate, highlighting the value of integrating diverse approaches to enhance recommendation accuracy.