Amazon and other online retailers seem to know exactly what you’ll add to your cart next. How? In physical stores, products are organized into sections, aisles, and shelves, making it easy to find related items together. This real-world experience was missing from early digital stores, where breadcrumbs were the only way to navigate categories. E-commerce sites have only recently begun to replicate the convenience of finding similar products grouped together, just like in a physical store.
You're back-to-school shopping and add a notebook to your cart. Suddenly, you're shown pens, glue sticks, backpacks, printer ink, and scissors all the essentials, right there in one swipeable carousel. So convenient, way better than hunting them down aisle by aisle at Walmart.
But how do eCommerce sites know exactly what to show you?
🎉 Drumroll... Enter Association Rule Mining the behind-the-scenes magic of data mining that finds patterns, relationships, and "frequently bought together" combos across massive datasets.
Even though these products are made, shipped, and sold by totally different suppliers, this tech connects the dots — making your shopping experience smarter and smoother.
Association Rule Mining helps computers find these patterns automatically from huge amounts of data, so businesses can use them to make better decisions like showing related products online or organizing store shelves smarter.
Think of it like this: "If X, then Y" that's the heart of association rule mining.
The Apriori Algorithm is a classic method used in Association Rule Mining to find items that frequently appear together in large datasets like in shopping carts, web clicks, or user behaviors. So, if we have seen previous patterns where products being bought together are used as guiding tools to create these associations.
The Apriori Algorithm
Explanation
- Purpose: Identify frequent itemsets and derive association rules.
- Steps:
- Identify all itemsets that meet a minimum support threshold.
- Generate larger itemsets from smaller frequent itemsets.
- Derive association rules from frequent itemsets that meet minimum confidence.
- Mathematical Intuition:
- Support: Frequency of itemset in dataset.
- Confidence: Likelihood of consequent given antecedent.
- Lift: How much more likely the consequent is, given the antecedent, compared to random chance.
Mental map
Term | Meaning | In Plain English |
|---|---|---|
antecedents | The "if" part of the rule | What the customer has bought already |
consequents | The "then" part of the rule | What the customer is likely to buy next |
support | Frequency of this item combination in the whole dataset | How common this rule is overall |
confidence | How often the rule has been true | If people buy A, how often they also buy B |
lift | How much more likely B is given A (vs. just randomly buying B) | Measures strength of the rule (Lift > 1 is good) |
Data Preprocessing & Visualization
The table below shows the likelihood of items being purchased together. For example, if a customer buys a SPACEBOY LUNCH BOX, they are also likely to buy a DOLLY GIRL LUNCH BOX. This pattern suggests that parents may be purchasing lunch boxes for both their son and daughter.
Similarly, if someone buys all these ROSES REGENCY TEACUP AND SAUCER , PINK REGENCY TEACUP AND SAUCER then they are likely to buy Green one as well to complete color combination of the teasets.
antecedents | consequents | support | confidence | lift |
|---|---|---|---|---|
ALARM CLOCK BAKELIKE RED | ALARM CLOCK BAKELIKE GREEN | 0.029 | 0.604 | 14.198 |
ALARM CLOCK BAKELIKE GREEN | ALARM CLOCK BAKELIKE RED | 0.029 | 0.672 | 14.198 |
ALARM CLOCK BAKELIKE RED | ALARM CLOCK BAKELIKE PINK | 0.021 | 0.452 | 13.654 |
ALARM CLOCK BAKELIKE PINK | ALARM CLOCK BAKELIKE RED | 0.021 | 0.646 | 13.654 |
SPACEBOY LUNCH BOX | DOLLY GIRL LUNCH BOX | 0.023 | 0.602 | 18.123 |
DOLLY GIRL LUNCH BOX | SPACEBOY LUNCH BOX | 0.023 | 0.688 | 18.123 |
GARDENERS KNEELING PAD KEEP CALM | GARDENERS KNEELING PAD CUP OF TEA | 0.025 | 0.612 | 17.877 |
GARDENERS KNEELING PAD CUP OF TEA | GARDENERS KNEELING PAD KEEP CALM | 0.025 | 0.729 | 17.877 |
LUNCH BAG SPACEBOY DESIGN | LUNCH BAG BLACK SKULL. | 0.023 | 0.423 | 7.455 |
LUNCH BAG SUKI DESIGN | LUNCH BAG BLACK SKULL. | 0.022 | 0.455 | 8.016 |
LUNCH BAG BLACK SKULL. | LUNCH BAG SUKI DESIGN | 0.022 | 0.389 | 8.016 |
LUNCH BAG RED RETROSPOT | LUNCH BAG APPLE DESIGN | 0.021 | 0.302 | 6.464 |
LUNCH BAG APPLE DESIGN | LUNCH BAG RED RETROSPOT | 0.021 | 0.449 | 6.464 |
LUNCH BAG CARS BLUE | LUNCH BAG PINK POLKADOT | 0.023 | 0.442 | 8.801 |
LUNCH BAG PINK POLKADOT | LUNCH BAG CARS BLUE | 0.023 | 0.459 | 8.801 |
LUNCH BAG CARS BLUE | LUNCH BAG RED RETROSPOT | 0.025 | 0.474 | 6.823 |
LUNCH BAG RED RETROSPOT | LUNCH BAG CARS BLUE | 0.025 | 0.356 | 6.823 |
LUNCH BAG CARS BLUE | LUNCH BAG SPACEBOY DESIGN | 0.021 | 0.405 | 7.594 |
LUNCH BAG SPACEBOY DESIGN | LUNCH BAG CARS BLUE | 0.021 | 0.396 | 7.594 |
LUNCH BAG CARS BLUE | LUNCH BAG SUKI DESIGN | 0.021 | 0.404 | 8.324 |
LUNCH BAG SUKI DESIGN | LUNCH BAG CARS BLUE | 0.021 | 0.434 | 8.324 |
LUNCH BAG PINK POLKADOT | LUNCH BAG RED RETROSPOT | 0.028 | 0.562 | 8.084 |
LUNCH BAG RED RETROSPOT | LUNCH BAG PINK POLKADOT | 0.028 | 0.406 | 8.084 |
LUNCH BAG RED RETROSPOT | LUNCH BAG SPACEBOY DESIGN | 0.025 | 0.363 | 6.802 |
LUNCH BAG SPACEBOY DESIGN | LUNCH BAG RED RETROSPOT | 0.025 | 0.473 | 6.802 |
LUNCH BAG SUKI DESIGN | LUNCH BAG RED RETROSPOT | 0.024 | 0.501 | 7.204 |
LUNCH BAG RED RETROSPOT | LUNCH BAG SUKI DESIGN | 0.024 | 0.349 | 7.204 |
LUNCH BAG RED RETROSPOT | LUNCH BAG WOODLAND | 0.023 | 0.336 | 7.599 |
LUNCH BAG WOODLAND | LUNCH BAG RED RETROSPOT | 0.023 | 0.528 | 7.599 |
LUNCH BAG SUKI DESIGN | LUNCH BAG SPACEBOY DESIGN | 0.020 | 0.420 | 7.888 |
LUNCH BAG SPACEBOY DESIGN | LUNCH BAG SUKI DESIGN | 0.020 | 0.383 | 7.888 |
LUNCH BAG WOODLAND | LUNCH BAG SPACEBOY DESIGN | 0.022 | 0.494 | 9.266 |
LUNCH BAG SPACEBOY DESIGN | LUNCH BAG WOODLAND | 0.022 | 0.410 | 9.266 |
PAPER CHAIN KIT 50'S CHRISTMAS | PAPER CHAIN KIT VINTAGE CHRISTMAS | 0.024 | 0.460 | 12.239 |
PAPER CHAIN KIT VINTAGE CHRISTMAS | PAPER CHAIN KIT 50'S CHRISTMAS | 0.024 | 0.647 | 12.239 |
PARTY BUNTING | SPOTTY BUNTING | 0.021 | 0.282 | 5.209 |
SPOTTY BUNTING | PARTY BUNTING | 0.021 | 0.388 | 5.209 |
ROSES REGENCY TEACUP AND SAUCER | PINK REGENCY TEACUP AND SAUCER | 0.024 | 0.557 | 18.564 |
PINK REGENCY TEACUP AND SAUCER | ROSES REGENCY TEACUP AND SAUCER | 0.024 | 0.784 | 18.564 |
WHITE HANGING HEART T-LIGHT HOLDER | RED HANGING HEART T-LIGHT HOLDER | 0.025 | 0.231 | 6.302 |
RED HANGING HEART T-LIGHT HOLDER | WHITE HANGING HEART T-LIGHT HOLDER | 0.025 | 0.670 | 6.302 |
REGENCY CAKESTAND 3 TIER | ROSES REGENCY TEACUP AND SAUCER | 0.023 | 0.246 | 5.835 |
ROSES REGENCY TEACUP AND SAUCER | REGENCY CAKESTAND 3 TIER | 0.023 | 0.536 | 5.835 |
WOODEN PICTURE FRAME WHITE FINISH | WOODEN FRAME ANTIQUE WHITE | 0.025 | 0.534 | 12.211 |
WOODEN FRAME ANTIQUE WHITE | WOODEN PICTURE FRAME WHITE FINISH | 0.025 | 0.577 | 12.211 |
ROSES REGENCY TEACUP AND SAUCER , PINK REGENCY TEACUP AND SAUCER | GREEN REGENCY TEACUP AND SAUCER | 0.021 | 0.894 | 23.995 |
GREEN REGENCY TEACUP AND SAUCER, PINK REGENCY TEACUP AND SAUCER | ROSES REGENCY TEACUP AND SAUCER | 0.021 | 0.848 | 20.071 |
ROSES REGENCY TEACUP AND SAUCER , GREEN REGENCY TEACUP AND SAUCER | PINK REGENCY TEACUP AND SAUCER | 0.021 | 0.721 | 24.033 |
PINK REGENCY TEACUP AND SAUCER | ROSES REGENCY TEACUP AND SAUCER , GREEN REGENCY TEACUP AND SAUCER | 0.021 | 0.701 | 24.033 |
ROSES REGENCY TEACUP AND SAUCER | GREEN REGENCY TEACUP AND SAUCER, PINK REGENCY TEACUP AND SAUCER | 0.021 | 0.498 | 20.071 |
GREEN REGENCY TEACUP AND SAUCER | ROSES REGENCY TEACUP AND SAUCER , PINK REGENCY TEACUP AND SAUCER | 0.021 | 0.564 | 23.995 |
Interpretation
- antecedents: The item(s) on the left side of the rule (the "if" part). For example, (ALARM CLOCK BAKELIKE RED ) means the rule is considering transactions that include this item.
- consequents: The item(s) on the right side of the rule (the "then" part). For example, (ALARM CLOCK BAKELIKE GREEN) means the rule is predicting that this item is also likely to be present.
- support: The proportion of transactions that contain both the antecedent and the consequent. For example, 0.028593 means about 2.86% of all transactions contain both items.
- confidence: The probability that the consequent is present given the antecedent is present. For example, 0.604333 means that if a transaction contains the antecedent, there is a 60.4% chance it also contains the consequent.
- lift: How much more likely the consequent is to appear with the antecedent than it would be by chance. A lift greater than 1 means the rule is useful; for example, 14.197612 means the items appear together about 14 times more often than if they were independent.
Support: Proportion of transactions containing the itemset.
- Confidence: Probability of consequent given antecedent.
- Lift: Ratio of observed support to expected support if independent.
- Conviction: Measure of implication strength.
Retailers use association rule mining to optimize product placement and promotions. For example, if analysis shows that customers who buy a SPACEBOY LUNCH BOX often also buy a DOLLY GIRL LUNCH BOX, the store can place these items together on shelves or offer bundle discounts. This increases the likelihood of customers purchasing both items, boosting sales and improving click count on their page.
Try Next:
We have used the Apriori algorithm to discover associations, but there are other effective options such as FP-Growth and Eclat. By creating a pipeline that runs the data through all these algorithms and applies a scoring system, we can select rules that are identified by at least two methods and have strong metrics. This approach increases confidence in the reliability of the discovered associations.
Practical Retail Application: Digital Shelf Optimization
This approach can be used to pre-populate related products in an online system, allowing users to quickly find complementary items when searching for a product. In a physical store, this is similar to placing frequently bought-together items side by side on shelves. By replicating this model digitally, we streamline the shopping experience, saving users time and effort while increasing the likelihood of additional purchases.
References-
- UCI Machine Learning Repository: Online Retail
- mlxtend documentation: http://rasbt.github.io/mlxtend/
- Han, J., Kamber, M., & Pei, J. (2012). Data Mining: Concepts and Techniques.