In this article, we will explore how to create, build and deploy every component behind this Bikes recommendation website: 🔗 demo

🎯 Goal

High-level

The idea behind this project is to test the opportunity to build a recommendation system using public data, unsupervised machine learning (ML) models, and only free resources.

To achieve this we will:

Implementation

Garrascobike is a mountain bike (MTB) recommendation system. In other words, you could choose a bike brand or model that you like and then the system will suggest 3 bikes considered interesting and related to your chosen bike.

The idea behind the recommendation system is: when people talk about some Bikes on the same subreddit thread those bikes should be related in some way. So we could extract the Bike’s names and/or brands from one thread’s comments and intersect that information with other Reddit threads with similar bikes.

📢 Foreword

The goal of this guide is to chase all the aspects involved in the creation of a Web App that serves a simple recommendation system, trying to keep the complexity level lower as possible.

So the technical level of this experiment won’t be too deep and we don’t follow industrial-level best practices, nevertheless, this is the guide I would like to have one year ago before starting a simple project: create a WebApp with an ML model at is core.

🗺️ Roadmap

What we will do could be summarized in the following steps:

  1. Download text comments from Reddit 🐍

  2. Extract interesting entities from the comments 🐍🤖

  3. Create a simple recommendation system model 🐍🤖

  4. Deploy the model on a back-end 🐍

  5. Create a front-end that expose the model predictions 🌐

🐍 = blocks that use Python

🤖 = blocks with Machine Learning topics involved

🌐 = blocks that use HTML, CSS & Javascript

🚵‍♀️ Garrascobike

The project is structured in five major sections: scraping, entities extraction, ML train, back-end, front-end (and they coincide with the five chapters of this post).

On the above image, the first four sections (front-end is excluded) are reported with their dependencies and where necessary the platform where the code should be executed.

Let’s dive deep into each component.

1. Download text comments from Reddit 🐍

Intro

First of all: we need the data and Reddit is an amazing social network where people talk about any topic. Moreover, Reddit expose some API that python packages like praw could use to scrape the data.

Prerequisites

How to

Outcome

2. Extract interesting entities from the comments 🐍🤖

Intro

Prerequisites

How to

3. Create a simple recommendation system model 🐍🤖

Intro

Prerequisites

How to

Run the 03_correlation_extraction.py script, parameters:

$ python garrascobike/03_correlation_extraction.py --es_host localhost \
                                                   --es_port 9200 \
                                                   --es_index_list my_index

Outcome

4. Deploy the model on a back-end 🐍


Intro

Prerequisites

How to

Outcome

5. Create a front-end that expose the model predictions 🌐

Intro

Prerequisites

How to

Outcome

🎌 Other languages

💭 Final Thoughts

We have just seen how to build a recommendation system from the scraping process to a WebApp. All the data are scraped from Reddit and processed with unsupervised ML models (Spacy), using the Google Colab platform. This has saved us a lot of work and supplied a straightforward path that could be automatized also.

Finally, a back-end free hosted on Heruko and a front-end free hosted on GitHub pages complete the project thanks to a web app page.

From here, a lot of work could be done to refine each of the components, e.g. if we want to move from a POC project like this to production and more serious system we should:

As final, I want to say that it was nice to build this project, I haven’t experience with the front-end world and the simple website is made after following theodinproject.com free course.

🍺 And now, after completing every component and the website is up and running, I must admit that I feel satisfied, hope you’ve enjoyed the journey!

                         **📧 Found an error or have a question? let’s [connect](https://www.pistocop.dev/)**

This article was first published here: https://www.pistocop.dev/posts/garrascobike/