Dokumentation (english)

User-Based Collaborative Filtering (KNN)

User-based collaborative filtering using user-user similarity. Recommends items that similar users have liked.

User-based collaborative filtering using user-user similarity. Recommends items that similar users have liked.

When to use:

  • Have stable user base
  • Users have clear preferences
  • Need serendipity (discovering new types of items)
  • Good for smaller datasets

Strengths: Good for discovery, captures user preferences holistically, explainable ("Users like you also liked") Weaknesses: Less stable than item-based, doesn't scale as well, cold start for new users

How it Works

User-Based KNN computes similarity between users based on their interaction patterns. Users are considered similar if they liked/purchased/viewed similar items.

For each user, the algorithm:

  1. Finds the most similar users (neighbors) using cosine similarity
  2. Identifies items those similar users liked but the target user hasn't interacted with
  3. Ranks candidates by aggregated similarity scores from neighbors
  4. Returns top-K recommendations

Key Concept: "Users with similar tastes to yours also enjoyed these items" - recommendations leverage wisdom of similar users.

Parameters

Feature Configuration

Feature Columns (required) List of columns to use: must include user_id and item_id.

User Column (default: "user_id", required) Name of the column containing user identifiers. Each unique value represents a different user.

Item Column (default: "item_id", required) Name of the column containing item identifiers. Each unique value represents a different item to recommend.

Rating Column (optional) Name of the column containing ratings. If provided, uses weighted similarity. If not provided, treats all interactions equally.

Model-Specific Parameters

Top-K Recommendations (default: 10) Number of items to recommend for each user.

  • 5-10: Focused recommendations
  • 10-20: Standard recommendation lists
  • 20-50: For broad exploration

Number of Neighbors (default: 20) Number of similar users to consider for generating recommendations.

  • 5-10: Very focused, only most similar users
  • 10-20: Good balance (default)
  • 20-50: Broader perspective, more diversity
  • 50+: May include less relevant users

Configuration Tips

Dataset Size Considerations

  • Small (<10k users): Works well, ideal use case
  • Medium (10k-100k users): Acceptable performance, consider item-based instead
  • Large (100k-1M users): Performance issues, use Matrix Factorization
  • Very Large (>1M users): Not recommended, too slow

Parameter Tuning Guidance

  1. Adjust neighbors: More neighbors = more diversity, fewer = more precision
  2. Monitor stability: User preferences change, may need frequent retraining
  3. Balance similarity: Too strict = few recommendations, too loose = poor quality
  4. Track serendipity: Measure how often novel items are recommended
  5. Optimize performance: Pre-compute similarities, use approximate methods for large datasets

When to Choose This Over Alternatives

  • vs. Item-Based KNN: Choose this for better discovery and serendipity
  • vs. Matrix Factorization: Choose this for smaller datasets and explainability
  • vs. Content-Based: Choose this when you have sufficient user interaction data
  • vs. BERT4Rec: Choose this for simpler, non-sequential recommendations
  • vs. Hybrid: Choose this when you don't have item content features

Common Issues and Solutions

Cold Start Problem (New Users)

Issue: Cannot recommend to users with no interaction history. Solution:

  • Use demographic similarity if available
  • Show popular items initially
  • Collect initial preferences through questionnaire
  • Fall back to content-based recommendations

Scalability Issues

Issue: Computing user-user similarities is expensive with many users. Solution:

  • Use sampling (compute similarities for subset of users)
  • Pre-compute and cache similarities
  • Use approximate nearest neighbors algorithms
  • Switch to Item-Based KNN (more scalable)
  • Consider Matrix Factorization for large datasets

Unstable Recommendations

Issue: Recommendations change frequently as user behavior updates. Solution:

  • Use more neighbors for stability
  • Weight recent interactions higher
  • Apply smoothing to similarity scores
  • Consider Item-Based KNN (more stable)

Sparsity Issues

Issue: Users with few interactions get poor recommendations. Solution:

  • Lower minimum similarity threshold
  • Increase number of neighbors
  • Combine with content-based features
  • Use Matrix Factorization which handles sparsity better

Privacy Concerns

Issue: Recommendations reveal information about other users' preferences. Solution:

  • Aggregate neighbor preferences without revealing individuals
  • Use item-based approach instead (doesn't expose user similarity)
  • Apply differential privacy techniques
  • Use Matrix Factorization with encoded representations

Popularity Bias

Issue: Recommendations dominated by popular items liked by many neighbors. Solution:

  • Apply inverse frequency weighting
  • Use diversity-aware ranking
  • Balance popular and niche recommendations
  • Consider user's unique tastes in scoring

Example Use Cases

Small Community Platform

Scenario: Niche interest platform with 5k active users, 10k items Configuration:

  • 15 neighbors
  • Top-10 recommendations
  • Use interaction history (views, likes, comments) Why: Small user base, strong community with similar interests, explainability valued

Music Discovery Service

Scenario: Music streaming app with 50k users discovering new artists Configuration:

  • 25 neighbors (broader discovery)
  • Top-20 recommendations
  • Weight recent listens more heavily Why: Good for discovering new music through similar users, emphasizes serendipity

Book Recommendations

Scenario: Online bookstore with 30k regular readers, 100k books Configuration:

  • 20 neighbors
  • Top-15 recommendations
  • Use purchase and rating history Why: Book preferences are personal and nuanced, readers trust recommendations from similar readers, explainable

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items