Book Recommendation System for Young Adults

Note: This repository includes both an HTML and PDF versions of my write-up, along with the corresponding code.

This project implements a Book Recommendation System for young adults, focusing on users aged 18-25. The system is built using collaborative filtering techniques and matrix factorisation to provide personalised book suggestions based on users' past ratings.

Project Overview

The goal of this project is to create a recommender system that can accurately predict how a user might rate a particular book, and ultimately, recommend books that align with their preferences. The project uses Kaggle's Book-Crossing Dataset, which contains anonymised ratings of books provided by users, along with metadata about the books themselves.

Objectives

Build recommendation systems using:
- Item-based collaborative filtering
- User-based collaborative filtering
- Matrix factorisation (with and without regularisation)
Ensemble Model:
- Combine the predictions from the three methods above using an ensemble approach to improve accuracy.
Assess Model Performance:
- Evaluate the accuracy of each model using Mean Squared Error (MSE) and Root Mean Squared Error (RMSE).

Dataset

The dataset used in this project is publicly available on Kaggle: Book Recommendation Dataset. It contains:

278 858 users with anonymised user information
271 379 books with metadata including titles, authors, publication year, and ISBN
1 149 780 ratings, both explicit (1-10 scale) and implicit (indicated by a 0).

Methods Used

1. Collaborative Filtering

User-based Collaborative Filtering (UBCF): Recommendations are made by identifying users with similar preferences and suggesting books based on what those similar users have rated highly.
Item-based Collaborative Filtering (IBCF): Recommendations are generated by identifying books that are similar to those the user has already rated positively.

2. Matrix Factorisation

Matrix factorisation techniques decompose the user-item rating matrix into smaller matrices, capturing hidden factors that describe the relationships between users and books. This method helps fill in missing ratings by leveraging these latent factors. The model was built using the recosystem package.

Without regularisation: The model directly decomposes the matrix, with the risk of overfitting.
With regularisation (L2): Regularisation was applied to prevent overfitting by penalising large parameter values.

3. Ensemble Model

The final model combines the predictions from the user-based, item-based, and matrix factorisation approaches. This method averages predictions to improve overall accuracy, leveraging the strengths of each individual model.

Model Performance

The models were evaluated using Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) on a test set. The performance of the models is summarised as follows:

Model	MSE	RMSE
Matrix Factorisation (without reg.)	3.39	1.84
Matrix Factorisation (with reg.)	3.30	1.82
User-based Collaborative Filtering	11.75	3.43
Item-based Collaborative Filtering	3861.38	62.14
Ensemble Model	1.41	1.19

The Ensemble Model outperformed all other approaches, demonstrating the effectiveness of combining various techniques for more accurate predictions.

Tools and Libraries

R: Used for data processing, model building, and evaluation
recosystem: Matrix factorisation model building
proxyC: Used for calculating cosine similarities between users and items
dplyr, tidyr, ggplot2: Data manipulation and visualisation
Matrix: Efficient handling of sparse matrices

How to Run This Project

Clone this repository:

git clone https://github.com/jessicastow/book_recommender.git

Install the necessary libraries in R:

install.packages(c("dplyr", "tidyr", "ggplot2", "proxyC", "Matrix", "recosystem"))

Open the R project and run the provided R scripts to preprocess the data, train the models, and evaluate their performance.

Conclusion

This project demonstrates how various collaborative filtering methods and matrix factorisation can be used to build an effective book recommendation system. The ensemble model, which combines the predictions of different techniques, proved to be the most accurate, highlighting the value of integrating diverse approaches to enhance recommendation accuracy.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
assignment1_cache		assignment1_cache
assignment1_files		assignment1_files
data		data
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
assignment1.html		assignment1.html
assignment1.pdf		assignment1.pdf
assignment1.qmd		assignment1.qmd
book_recommender.Rproj		book_recommender.Rproj
dsfi_2024_assignment_1.pdf		dsfi_2024_assignment_1.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Book Recommendation System for Young Adults

Project Overview

Objectives

Dataset

Methods Used

1. Collaborative Filtering

2. Matrix Factorisation

3. Ensemble Model

Model Performance

Tools and Libraries

How to Run This Project

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Book Recommendation System for Young Adults

Project Overview

Objectives

Dataset

Methods Used

1. Collaborative Filtering

2. Matrix Factorisation

3. Ensemble Model

Model Performance

Tools and Libraries

How to Run This Project

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages