Dokumentation (english)

Item-Based Collaborative Filtering (KNN)

Item-based collaborative filtering using cosine similarity on user-item interaction matrix. Recommends items similar to what the user has interacted with.

Item-based collaborative filtering using cosine similarity on user-item interaction matrix. Recommends items similar to what the user has interacted with.

When to use:

  • Need explainable recommendations ("Because you liked X")
  • Have sufficient item interactions
  • Items change less frequently than users
  • Want stable recommendations

Strengths: Explainable, stable over time, scales well, handles sparsity reasonably Weaknesses: Cannot discover novel patterns, popularity bias, cold start for new items

How it Works

Item-Based KNN computes similarity between items based on users who interacted with them. Items are considered similar if they were liked/purchased/viewed by the same users.

For each user, the algorithm:

  1. Identifies items the user has interacted with
  2. Finds similar items using pre-computed item-item similarities
  3. Ranks candidate items by aggregated similarity scores
  4. Returns top-K recommendations

Key Concept: "Users who liked item A also liked item B" - recommendations are based on item co-occurrence patterns.

Parameters

Feature Configuration

Feature Columns (required) List of columns to use: must include user_id and item_id.

User Column (default: "user_id", required) Name of the column containing user identifiers. Each unique value represents a different user.

Item Column (default: "item_id", required) Name of the column containing item identifiers. Each unique value represents a different item to recommend.

Rating Column (optional) Name of the column containing ratings. If provided, uses weighted similarity. If not provided, treats all interactions equally.

Model-Specific Parameters

Top-K Recommendations (default: 10) Number of items to recommend for each user.

  • 5-10: Focused, high-confidence recommendations
  • 10-20: Standard recommendation lists
  • 20-50: For exploration and serendipity

Configuration Tips

Dataset Size Considerations

  • Small (<1k items): Works well, but limited recommendation diversity
  • Medium (1k-10k items): Ideal range, good balance of coverage and performance
  • Large (10k-100k items): Good performance, but similarity computation is expensive
  • Very Large (>100k items): Consider using approximation techniques or Matrix Factorization

Parameter Tuning Guidance

  1. Monitor coverage: Ensure recommendations aren't dominated by popular items
  2. Check diversity: Use diversity metrics to avoid filter bubbles
  3. Validate explainability: Recommendations should make intuitive sense
  4. Track novelty: Balance between safe recommendations and discovery
  5. A/B test: Offline metrics don't always match online engagement

When to Choose This Over Alternatives

  • vs. User-Based KNN: Choose this for more stable recommendations (items change less than users)
  • vs. Matrix Factorization: Choose this for explainability and when you don't need rating prediction
  • vs. Content-Based: Choose this when you have sufficient interaction data
  • vs. BERT4Rec: Choose this for non-sequential, simpler recommendations
  • vs. Association Rules: Choose this for personalized recommendations vs. general patterns

Common Issues and Solutions

Cold Start Problem (New Items)

Issue: Cannot recommend new items with no interaction history. Solution:

  • Use content-based features for new items (Hybrid model)
  • Implement "exploration" strategy to gather initial interactions
  • Bootstrap with item metadata similarity
  • Show new items to diverse users initially

Cold Start Problem (New Users)

Issue: No interaction history to base recommendations on. Solution:

  • Show popular items initially
  • Collect quick preferences through onboarding
  • Use demographic or contextual signals
  • Switch to content-based until sufficient interactions

Popularity Bias

Issue: Only popular items get recommended. Solution:

  • Apply inverse frequency weighting
  • Use diversity-aware reranking
  • Set minimum interaction threshold
  • Balance popularity with personalization

Limited Diversity

Issue: All recommendations too similar to each other. Solution:

  • Use diversity-aware selection (e.g., MMR - Maximal Marginal Relevance)
  • Filter out overly similar items
  • Combine with other recommendation signals
  • Apply category diversification

Scalability Issues

Issue: Computing item similarities is expensive with many items. Solution:

  • Pre-compute and cache similarities
  • Use approximate nearest neighbors
  • Limit similarity computation to top-N similar items per item
  • Consider Matrix Factorization for very large catalogs

Poor Coverage

Issue: Many items never get recommended. Solution:

  • Lower similarity thresholds
  • Boost less popular items
  • Use exploration strategies
  • Combine with other recommendation methods

Example Use Cases

E-commerce Product Recommendations

Scenario: Online retailer with 50k products, 100k users Configuration:

  • Top-10 recommendations
  • Use purchase history as implicit feedback
  • No rating column Why: Explainable recommendations ("Customers who bought this also bought..."), stable product catalog

Movie Recommendations

Scenario: Streaming service with 10k movies, 500k users Configuration:

  • Top-15 recommendations
  • Use viewing history
  • Optional rating weights for explicit feedback Why: Movies don't change, users want similar content, needs explainability

News Article Recommendations

Scenario: News platform with 1M articles, 2M users Configuration:

  • Top-20 recommendations
  • Use read history (implicit feedback)
  • Recent interactions weighted more Why: While articles are numerous, can focus on recent articles, users who read similar articles have similar interests

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items