Skip to content

jessicastow/book_recommender

Repository files navigation

Book Recommendation System for Young Adults

Note: This repository includes both an HTML and PDF versions of my write-up, along with the corresponding code.

This project implements a Book Recommendation System for young adults, focusing on users aged 18-25. The system is built using collaborative filtering techniques and matrix factorisation to provide personalised book suggestions based on users' past ratings.

Project Overview

The goal of this project is to create a recommender system that can accurately predict how a user might rate a particular book, and ultimately, recommend books that align with their preferences. The project uses Kaggle's Book-Crossing Dataset, which contains anonymised ratings of books provided by users, along with metadata about the books themselves.

Objectives

  1. Build recommendation systems using:

    • Item-based collaborative filtering
    • User-based collaborative filtering
    • Matrix factorisation (with and without regularisation)
  2. Ensemble Model:

    • Combine the predictions from the three methods above using an ensemble approach to improve accuracy.
  3. Assess Model Performance:

    • Evaluate the accuracy of each model using Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).

Dataset

The dataset used in this project is publicly available on Kaggle: Book Recommendation Dataset. It contains:

  • 278 858 users with anonymised user information
  • 271 379 books with metadata including titles, authors, publication year, and ISBN
  • 1 149 780 ratings, both explicit (1-10 scale) and implicit (indicated by a 0).

Methods Used

1. Collaborative Filtering

  • User-based Collaborative Filtering (UBCF): Recommendations are made by identifying users with similar preferences and suggesting books based on what those similar users have rated highly.
  • Item-based Collaborative Filtering (IBCF): Recommendations are generated by identifying books that are similar to those the user has already rated positively.

2. Matrix Factorisation

Matrix factorisation techniques decompose the user-item rating matrix into smaller matrices, capturing hidden factors that describe the relationships between users and books. This method helps fill in missing ratings by leveraging these latent factors. The model was built using the recosystem package.

  • Without regularisation: The model directly decomposes the matrix, with the risk of overfitting.
  • With regularisation (L2): Regularisation was applied to prevent overfitting by penalising large parameter values.

3. Ensemble Model

The final model combines the predictions from the user-based, item-based, and matrix factorisation approaches. This method averages predictions to improve overall accuracy, leveraging the strengths of each individual model.

Model Performance

The models were evaluated using Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) on a test set. The performance of the models is summarised as follows:

Model MSE RMSE
Matrix Factorisation (without reg.) 3.39 1.84
Matrix Factorisation (with reg.) 3.30 1.82
User-based Collaborative Filtering 11.75 3.43
Item-based Collaborative Filtering 3861.38 62.14
Ensemble Model 1.41 1.19

The Ensemble Model outperformed all other approaches, demonstrating the effectiveness of combining various techniques for more accurate predictions.

Tools and Libraries

  • R: Used for data processing, model building, and evaluation
  • recosystem: Matrix factorisation model building
  • proxyC: Used for calculating cosine similarities between users and items
  • dplyr, tidyr, ggplot2: Data manipulation and visualisation
  • Matrix: Efficient handling of sparse matrices

How to Run This Project

  1. Clone this repository:
    git clone https://github.com/jessicastow/book_recommender.git
  2. Install the necessary libraries in R:
    install.packages(c("dplyr", "tidyr", "ggplot2", "proxyC", "Matrix", "recosystem"))
  3. Open the R project and run the provided R scripts to preprocess the data, train the models, and evaluate their performance.

Conclusion

This project demonstrates how various collaborative filtering methods and matrix factorisation can be used to build an effective book recommendation system. The ensemble model, which combines the predictions of different techniques, proved to be the most accurate, highlighting the value of integrating diverse approaches to enhance recommendation accuracy.

About

Book recommendation system for assignment 1 of DSFI.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors