Association Rules Recommendation
Recommendation based on frequent itemset mining and association rules (e.g., 'users who bought X also bought Y'). Great for product bundling.
Recommendation based on frequent itemset mining and association rules (e.g., 'users who bought X also bought Y'). Great for product bundling.
When to use:
- E-commerce cross-selling and upselling
- Market basket analysis
- Product bundling strategies
- Need highly explainable rules
Strengths: Extremely explainable, discovers strong co-occurrence patterns, good for bundling, simple to understand Weaknesses: Not personalized, requires frequent co-occurrences, can miss rare but valuable associations
How it Works
Association Rules mining discovers patterns like "users who bought X also bought Y" by analyzing transaction histories. The algorithm:
- Frequent Itemset Mining: Identifies sets of items that appear together frequently
- Rule Generation: Creates rules from frequent itemsets (X -> Y)
- Rule Filtering: Keeps only rules meeting minimum confidence and lift thresholds
Key Metrics:
- Support: How often items appear together (P(X ∩ Y))
- Confidence: When X is purchased, how often Y is purchased (P(Y|X))
- Lift: How much more likely Y is purchased when X is purchased vs. baseline (P(Y|X) / P(Y))
Example Rule:
{bread, butter} -> {milk}
Support: 0.05 (5% of transactions)
Confidence: 0.65 (65% of bread+butter buyers also buy milk)
Lift: 1.3 (milk is 30% more likely with bread+butter)Parameters
Feature Configuration
Feature Columns (required) List of columns to use: must include user_id (or transaction_id) and item_id.
User Column (default: "user_id", required) Name of the column containing transaction identifiers. Each unique value represents a transaction or user session.
- Can be transaction_id, session_id, basket_id, or user_id
- Groups items that were purchased/interacted together
Item Column (default: "item_id", required) Name of the column containing item identifiers. Items that appear together in transactions.
Model-Specific Parameters
Minimum Support (default: 0.01) Minimum frequency threshold for itemsets (0 to 1). Itemsets appearing less frequently are ignored.
- 0.001-0.01: Very rare patterns (large catalogs)
- 0.01-0.05: Moderate patterns (default range)
- 0.05-0.1: Only common patterns
- Too low: Too many rules, noise
- Too high: Miss interesting patterns
Minimum Confidence (default: 0.3) Minimum confidence threshold for rules (0 to 1). Rules with lower confidence are filtered out.
- 0.1-0.3: Exploratory, capture weak associations
- 0.3-0.5: Balanced (default range)
- 0.5-0.8: Strong associations only
- 0.8+: Very strict, few rules
- Higher = more reliable but fewer recommendations
Minimum Lift (default: 1.0) Minimum lift threshold for rules. Rules with lift < 1.0 indicate negative association.
- 1.0: Neutral, no filtering by lift (default)
- 1.2-1.5: Slight positive association
- 1.5-2.0: Moderate positive association
- 2.0+: Strong positive association
- Higher = stronger patterns but fewer rules
Top-K Recommendations (default: 10) Number of items to recommend per user/transaction based on discovered rules.
- 3-5: Focused bundling suggestions
- 5-10: Standard cross-sell recommendations
- 10-20: Broader exploration
Configuration Tips
Dataset Size Considerations
- Small (<10k transactions): Support: 0.03-0.05, may not find many patterns
- Medium (10k-100k): Support: 0.01-0.03, ideal range
- Large (100k-1M): Support: 0.005-0.01, many patterns
- Very Large (>1M): Support: 0.001-0.005, reduce for performance
Parameter Tuning Guidance
Balancing Support and Confidence:
-
Too few rules:
- Decrease min_support (find rarer patterns)
- Decrease min_confidence (allow weaker associations)
- Decrease min_lift
-
Too many rules:
- Increase min_support (only frequent patterns)
- Increase min_confidence (stronger associations)
- Increase min_lift (stronger relationships)
-
Good starting point:
- Support: 0.01 (1% of transactions)
- Confidence: 0.3 (30% conditional probability)
- Lift: 1.0 (any positive association)
- Adjust based on number and quality of rules
Optimization Process:
- Start with defaults
- Check number of rules generated (aim for 100-1000)
- Review top rules by confidence and lift
- Adjust thresholds to balance quantity and quality
- Validate rules make business sense
When to Choose This Over Alternatives
- vs. Item-Based KNN: Choose this for explicit co-purchase patterns and bundling
- vs. Collaborative Filtering: Choose this for non-personalized, general patterns
- vs. Content-Based: Choose this when you don't have item features
- Best for: Cross-selling, bundling, "frequently bought together", market basket analysis
Common Issues and Solutions
Too Few Rules
Issue: Not finding enough association rules. Solution:
- Lower min_support (try 0.005-0.01)
- Lower min_confidence (try 0.2-0.3)
- Ensure sufficient transaction data (10k+ transactions)
- Check that transactions have multiple items
Too Many Rules
Issue: Overwhelming number of rules, many low-quality. Solution:
- Increase min_support (try 0.02-0.05)
- Increase min_confidence (try 0.4-0.6)
- Increase min_lift (try 1.5-2.0)
- Filter by number of antecedents (prefer simple rules)
Not Personalized
Issue: Same recommendations for everyone. Solution:
- This is expected behavior for association rules
- Combine with collaborative filtering for personalization
- Use user's current basket to select relevant rules
- Consider Item-Based KNN or Hybrid model for personalization
Recommendations Too Obvious
Issue: Rules only capture obvious patterns (e.g., batteries with electronics). Solution:
- Increase min_lift to find surprising associations
- Filter out trivial category-level patterns
- Focus on cross-category recommendations
- Look for rules with high lift but moderate support
Seasonal/Temporal Patterns Missed
Issue: Rules don't capture time-based patterns. Solution:
- Generate separate rules for different time periods
- Weight recent transactions more heavily
- Use sliding time windows
- Consider BERT4Rec for sequential patterns
Scalability Issues
Issue: Rule mining too slow on large datasets. Solution:
- Increase min_support to reduce candidate itemsets
- Sample transactions for initial exploration
- Limit maximum itemset size
- Use FP-Growth algorithm variant (more efficient)
Example Use Cases
E-commerce Cross-Selling
Scenario: Online retailer wants to suggest complementary products at checkout Configuration:
- Min Support: 0.02 (2% of transactions)
- Min Confidence: 0.4 (40% likelihood)
- Min Lift: 1.5 (50% more likely than random)
- Top-5 recommendations
- Transaction ID: order_id Why: "Frequently bought together", highly explainable, drives upsells
Grocery Store Bundling
Scenario: Supermarket chain wants to create product bundles and optimize layout Configuration:
- Min Support: 0.05 (5% of baskets)
- Min Confidence: 0.35
- Min Lift: 1.3
- Top-10 recommendations
- Transaction ID: basket_id Why: Market basket analysis, discover co-purchase patterns, inform merchandising
Streaming Service Content Bundles
Scenario: Video platform wants to suggest "watch next" based on viewing sessions Configuration:
- Min Support: 0.01 (1% of sessions)
- Min Confidence: 0.3
- Min Lift: 1.2
- Top-8 recommendations
- Transaction ID: session_id (videos watched in same session) Why: Discover content that's often watched together, create playlists, binge-watching patterns