Dokumentation (english)

Recommendation

Build personalized recommendation systems using collaborative filtering, content-based methods, and hybrid approaches

Recommendation systems predict items users might like based on past behavior and item characteristics. Use recommendation models to personalize product suggestions, content discovery, and user experiences.

🎓 Learn About Recommendation Systems

New to recommendation systems? Visit our Recommendation Concepts Guide to learn about collaborative filtering, content-based filtering, evaluation metrics (Precision@K, NDCG, Hit Rate), and when to use different recommendation approaches.

Available Models

We support 9 different recommendation algorithms, each suited for different scenarios:

Collaborative Filtering Models

Content-Based Models

Hybrid Models

Pattern Mining Models

Deep Learning Models

  • Embeddings Similarity - Semantic recommendations using sentence transformers
  • BERT4Rec - Sequential recommendations using transformer architecture

Common Configuration

Most models share these settings:

Feature Configuration

Feature Columns (required) Select which columns from your dataset to use. At minimum, you need:

  • User ID Column: Unique identifier for each user
  • Item ID Column: Unique identifier for each item

Optional columns based on model:

  • Rating Column: For explicit feedback (ratings, scores)
  • Content Column: For content-based approaches (descriptions, features)
  • Timestamp Column: For sequential models (order of interactions)

Data Formats

Explicit Feedback: User-item-rating triplets where users explicitly rate items

  • Example: Movie ratings (1-5 stars), product reviews
  • Format: user_id, item_id, rating

Implicit Feedback: User-item interactions without explicit ratings

  • Example: Purchases, clicks, views, likes
  • Format: user_id, item_id (presence indicates interaction)

Content Features: Item metadata for content-based filtering

  • Example: Product descriptions, movie genres, article text
  • Format: item_id, description or item_id, features

Top-K Parameter

K (default: 10) Number of recommendations to generate per user. Common values:

  • 5-10: For focused recommendations
  • 10-20: Standard recommendation lists
  • 20-50: For exploration and diversity

Understanding Recommendation Metrics

Precision@K

Fraction of recommended items that are relevant.

  • Range: 0 to 1
  • Higher is better: 1.0 = all recommendations are relevant
  • Interpretation: How many of your recommendations are correct
  • Use: When false positives are costly

Recall@K

Fraction of relevant items that are recommended.

  • Range: 0 to 1
  • Higher is better: 1.0 = all relevant items are recommended
  • Interpretation: How many relevant items you're finding
  • Use: When you want comprehensive coverage

NDCG (Normalized Discounted Cumulative Gain)

Measures ranking quality with position-based discount.

  • Range: 0 to 1
  • Higher is better: 1.0 = perfect ranking
  • Interpretation: Rewards relevant items at top positions
  • Use: When ranking order matters

Hit Rate@K

Fraction of users with at least one relevant item in top-K.

  • Range: 0 to 1
  • Higher is better: 1.0 = everyone got at least one good recommendation
  • Interpretation: Success rate per user
  • Use: When any correct recommendation is valuable

Coverage

Percentage of items that appear in recommendations.

  • Range: 0 to 1
  • Higher is better: 1.0 = all items get recommended
  • Interpretation: Diversity of recommendations
  • Use: Avoiding filter bubbles

Diversity

Average dissimilarity between recommended items.

  • Range: 0 to 1
  • Higher is better: More varied recommendations
  • Interpretation: How different recommendations are from each other
  • Use: Ensuring diverse suggestions

Choosing the Right Model

Quick Start Guide

  1. Know your data: Explicit ratings vs. implicit interactions
  2. Start with Matrix Factorization: Great baseline for collaborative filtering
  3. Try KNN: For explainable recommendations
  4. Add content features: Use hybrid or content-based if you have item descriptions
  5. Evaluate: Use multiple metrics and A/B testing

By Data Type

Have ratings (explicit feedback)

  • Matrix Factorization (SVD) - Best for rating prediction
  • User-Based KNN - Good for smaller datasets
  • Hybrid CF + Content-Based - When you also have item features

Have interactions only (implicit feedback)

  • Item-Based KNN - Fast and scalable
  • Matrix Factorization (sklearn) - Good baseline
  • BERT4Rec - For sequential patterns

Have item descriptions

  • Content-Based TF-IDF - When user data is sparse
  • Embeddings Similarity - For semantic understanding
  • Hybrid CF + Content-Based - Best of both worlds

Have transaction data

  • Association Rules - For product bundling and cross-selling

Have sequential data

  • BERT4Rec - Captures temporal patterns
  • Embeddings Similarity - With timestamp-ordered features

By Dataset Size

Small (<1k users or items)

  • User-Based KNN - Works well on small datasets
  • Content-Based TF-IDF - Doesn't need many users
  • Matrix Factorization - But may overfit

Medium (1k-100k users/items)

  • Item-Based KNN - Scalable and effective
  • Matrix Factorization (SVD or sklearn) - Great baseline
  • Hybrid CF + Content-Based - Best accuracy

Large (>100k users/items)

  • Matrix Factorization (sklearn) - Efficient
  • Item-Based KNN - Scales well
  • BERT4Rec - With sufficient compute

By Business Requirements

Need explainability

  • Item-Based KNN - "Because you liked X"
  • User-Based KNN - "Users like you enjoyed"
  • Association Rules - "Frequently bought together"

Need cold start handling

  • Content-Based TF-IDF - Works for new items
  • Embeddings Similarity - Semantic matching for new items
  • Hybrid CF + Content-Based - Best of both

Need real-time recommendations

  • Item-Based KNN - Pre-computed similarities
  • Association Rules - Fast lookup
  • Matrix Factorization (sklearn) - Fast inference

Need diversity

  • Content-Based TF-IDF - Avoids filter bubbles
  • Embeddings Similarity - Semantic diversity
  • Hybrid CF + Content-Based - Balanced approach

Need sequential understanding

  • BERT4Rec - Understands patterns over time
  • Association Rules - Captures co-occurrence

Best Practices

  1. Understand your feedback type - Explicit ratings need different models than implicit interactions
  2. Handle the cold start problem - New users/items need content-based or hybrid approaches
  3. Balance accuracy and diversity - Don't just recommend similar items
  4. Use temporal validation - Train on past, test on future (not random split)
  5. Consider implicit feedback - Even with ratings, use interaction data
  6. Filter by business rules - Remove unavailable, inappropriate, or already-purchased items
  7. A/B test in production - Offline metrics don't always match online performance
  8. Monitor coverage - Ensure recommendations aren't dominated by popular items
  9. Update regularly - User preferences change over time
  10. Combine approaches - Hybrid models often outperform single methods

Common Pitfalls

Cold Start Problems

  • Issue: No recommendations for new users/items
  • Solution: Use content-based or hybrid models, have default popular items

Popularity Bias

  • Issue: Only recommending popular items
  • Solution: Use diversity metrics, apply popularity penalties, ensure coverage

Filter Bubbles

  • Issue: Users only see similar items
  • Solution: Add diversity objectives, explore-exploit balance, serendipity

Data Sparsity

  • Issue: Most user-item pairs have no interaction
  • Solution: Matrix factorization, hybrid approaches, implicit feedback

Temporal Drift

  • Issue: User preferences change over time
  • Solution: Weight recent interactions more, retrain regularly, use sequential models

Scalability Issues

  • Issue: Computation too slow for large datasets
  • Solution: Use approximate methods, pre-compute similarities, batch processing

Evaluation Mismatch

  • Issue: Good offline metrics, poor online performance
  • Solution: Use temporal splits, A/B test, measure business metrics

Lack of Diversity

  • Issue: All recommendations too similar
  • Solution: Post-process for diversity, use coverage metrics, hybrid approaches

Next Steps

Ready to build a recommendation system? Head to the Training page and:

  1. Prepare your data with user_id, item_id, and optionally rating/content columns
  2. Choose a model based on your data type and requirements
  3. Configure parameters (or enable hyperparameter tuning)
  4. Evaluate with relevant metrics (Precision@K, NDCG, Coverage)
  5. Test with real users and iterate based on feedback

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items