Dokumentation (english)

Relim

Memory-efficient recursive elimination algorithm

Recursive Elimination algorithm that builds patterns by recursively eliminating items while maintaining frequency counts.

When to Use Relim

  • Memory-constrained environments
  • Need efficient recursive approach
  • Alternative to FP-Growth
  • Balance of speed and memory efficiency

Strengths

  • Memory-efficient compared to other algorithms
  • Recursive approach is elegant
  • Good balance of speed and memory
  • Handles moderate to large datasets well
  • No candidate generation

Weaknesses

  • Less popular than FP-Growth
  • Fewer optimizations available
  • Not as fast as FP-Growth on most datasets
  • Less documentation and community support

How it Works

  1. Build initial item frequency lists
  2. Recursively eliminate items while tracking patterns
  3. Build frequent itemsets through elimination process
  4. Maintain only necessary information at each recursion level

Key Advantage: The recursive elimination approach keeps memory usage lower than building explicit tree structures, while still avoiding candidate generation.

Recursive Process:

  • At each level, eliminate one item from consideration
  • Track which transactions remain relevant
  • Recursively process remaining items
  • Build itemsets bottom-up from elimination

When to Choose Relim

Best for:

  • Embedded systems or limited memory environments
  • Moderate-sized datasets (10k-1M transactions)
  • Need predictable memory usage
  • When FP-Growth uses too much memory

Choose FP-Growth instead when:

  • Performance is critical
  • Memory is not constrained
  • Need fastest possible mining
  • Large community support is important

Choose Apriori instead when:

  • Learning and understanding
  • Very small datasets
  • Interpretability is key

Parameters

All association algorithms share these common parameters:

Data Format

Input Format: 'long' or 'wide'

How your transaction data is structured:

Wide Format:

  • Each column represents one item
  • Each row is a transaction
  • Values are 1 (item present) or 0 (item absent)
  • Example:
    TransactionID | Bread | Milk | Eggs | Butter
    1             | 1     | 1    | 0    | 1
    2             | 0     | 1    | 1    | 0

Long Format:

  • Each row is one item in a transaction
  • Requires Transaction ID column to group items
  • More natural for real-world data
  • Example:
    TransactionID | Item
    1             | Bread
    1             | Milk
    1             | Butter
    2             | Milk
    2             | Eggs

Feature Configuration

Feature Columns (required)

  • Wide format: List all item columns
  • Long format: Select the single column containing item names

Transaction ID Column (required for long format) Column that identifies which transaction each item belongs to.

Contains Multiple Items (long format only) Check if a single row can contain multiple items (e.g., "Bread, Milk, Eggs").

Item Separator (if multiset) Character separating multiple items (default: comma).

  • Example: "Bread, Milk, Eggs" uses "," as separator

Segmentation (Optional)

Segmentation Column Analyze different customer segments separately:

  • Store locations (downtown vs. suburban)
  • Customer types (premium vs. regular)
  • Time periods (weekday vs. weekend)

Target Segment Value Filter to analyze only specific segment.

Model Parameters

Minimum Support (default: 0.02, required) Threshold for how frequently an itemset must appear.

  • 0.02 = 2% of transactions
  • Lower values: Find rare patterns, but slower and more results
  • Higher values: Only common patterns, faster
  • Recommendations:
    • Large stores (>10k transactions): 0.001-0.01 (0.1%-1%)
    • Medium stores: 0.01-0.05 (1%-5%)
    • Small datasets: 0.05-0.1 (5%-10%)

Maximum Itemset Length (default: 3, required) Maximum number of items in a pattern.

  • 2: Pairs only (A -> B)
  • 3: Triples (A, B -> C)
  • 4+: Complex patterns (slower, harder to interpret)
  • Recommendations:
    • Start with 2-3 for interpretability
    • Increase only if needed

Rule Evaluation Metric (default: "lift", required) How to measure rule strength:

  • lift: Strength of association (recommended)
  • confidence: Reliability of rule
  • leverage: Lift adjusted by item frequencies
  • conviction: Dependency strength

Metric Threshold (default: 1.2, required) Minimum value for the selected metric to keep a rule.

  • For lift: >1.0 (1.2 = 20% more likely)
  • For confidence: 0.5-0.9 (50%-90% probability)

Advanced Filtering (Optional)

Enable Advanced Filtering Set both confidence and lift thresholds simultaneously for stricter rules.

Minimum Confidence (default: 0.6) Probability that Y is purchased given X is purchased.

  • 0.6 = 60% of transactions with X also have Y
  • Range: 0.1-1.0

Minimum Lift (default: 1.1) How much more likely Y is with X versus without X.

  • 1.0 = No association (independent)
  • 1.1 = 10% increase in likelihood
  • 2.0 = 2x more likely
  • Range: >0.0 (typically >1.0 for meaningful rules)

Understanding Association Metrics

Support

Definition: How frequently an itemset appears in the database.

Formula: support(X) = (transactions containing X) / (total transactions)

Example:

  • 100 transactions total
  • [Bread, Milk] appears in 20 transactions
  • support([Bread, Milk]) = 20/100 = 0.2 = 20%

Interpretation:

  • 0.01 (1%): Rare pattern
  • 0.05 (5%): Moderate frequency
  • 0.2 (20%): Very common pattern

Use: Filter out rare, potentially spurious patterns

Confidence

Definition: Probability of finding Y in transactions that contain X.

Formula: confidence(X -> Y) = support(X U Y) / support(X)

Example:

  • support([Bread]) = 0.5 (50% of transactions)
  • support([Bread, Butter]) = 0.3 (30% of transactions)
  • confidence(Bread -> Butter) = 0.3 / 0.5 = 0.6 = 60%

Interpretation:

  • 0.6 = 60% of customers who buy bread also buy butter
  • Higher confidence = more reliable rule

Limitation: Can be misleading if Y is very common

Lift

Definition: How much more likely Y is with X versus without X.

Formula: lift(X -> Y) = confidence(X -> Y) / support(Y)

Example:

  • confidence(Bread -> Butter) = 0.6
  • support(Butter) = 0.4 (40% buy butter overall)
  • lift(Bread -> Butter) = 0.6 / 0.4 = 1.5

Interpretation:

  • lift = 1.0: No association (X and Y are independent)
  • lift > 1.0: Positive association (Y more likely with X)
    • 1.5 = 50% increase in likelihood
    • 2.0 = 2x more likely (100% increase)
  • lift < 1.0: Negative association (Y less likely with X)

Why Lift is Best for Discovery:

  • Accounts for item popularity
  • Detects true associations vs. coincidence
  • Symmetric: lift(X -> Y) = lift(Y -> X)

Leverage

Definition: Difference between observed and expected co-occurrence.

Formula: leverage(X -> Y) = support(X U Y) - support(X) x support(Y)

Example:

  • support([Bread, Butter]) = 0.3 (observed)
  • support(Bread) x support(Butter) = 0.5 x 0.4 = 0.2 (expected if independent)
  • leverage = 0.3 - 0.2 = 0.1

Interpretation:

  • 0: No association
  • Positive: Items appear together more than expected
  • Negative: Items appear together less than expected
  • Magnitude matters: Higher absolute value = stronger relationship

Conviction

Definition: Dependency measure - how much more Y depends on X.

Formula: conviction(X -> Y) = (1 - support(Y)) / (1 - confidence(X -> Y))

Example:

  • support(Butter) = 0.4
  • confidence(Bread -> Butter) = 0.6
  • conviction = (1 - 0.4) / (1 - 0.6) = 0.6 / 0.4 = 1.5

Interpretation:

  • 1.0: No association (independent)
  • 1.0: Y depends on X

  • infinity: Perfect dependency (always Y when X)

Use: Measures how much the rule deviates from independence

Configuration Tips

Best Practices for Relim

Memory-Constrained Scenarios:

  • Relim is your best choice when memory is limited
  • Use min_support >= 0.01 for best memory efficiency
  • Monitor memory usage during execution
  • Consider segmentation to process data in chunks

Optimal Settings:

  • min_support = 0.01-0.02 (good balance)
  • max_length = 3 (standard depth)
  • Enable advanced filtering to reduce output size

Performance Characteristics:

  • Typically 20-40% slower than FP-Growth
  • Uses 30-50% less memory than FP-Growth
  • More predictable memory usage than other algorithms
  • Good for consistent performance

When Relim is the Right Choice

Ideal Scenarios:

  • Cloud instances with memory limits
  • Embedded systems
  • Mobile or edge computing
  • Need predictable resource usage
  • Memory is more constrained than CPU

Example Use Cases:

  • IoT devices analyzing local transaction data
  • Mobile apps with on-device mining
  • Cost-optimized cloud deployments
  • Systems with hard memory limits

Common Issues and Solutions

Performance Slower than Expected

Symptom: Relim takes longer than anticipated

Explanation: Relim trades some speed for memory efficiency. This is expected behavior.

Solutions:

  • If speed is critical, switch to FP-Growth
  • Increase min_support to reduce search space
  • Reduce max_length
  • Ensure memory constraints actually require Relim

Memory Usage Higher than Expected

Symptom: Still hitting memory limits

Causes:

  • Very low min_support
  • Very large dataset
  • High max_length setting

Solutions:

  • Increase min_support to 0.02 or higher
  • Reduce max_length to 2-3
  • Use segmentation to process in chunks
  • Pre-filter to fewer items
  • Consider data sampling

Results Differ from Other Algorithms

Symptom: Different itemsets found

Note: All algorithms should find identical frequent itemsets above threshold. If they differ:

  • Verify parameters match exactly
  • Check data preprocessing is identical
  • Ensure min_support is the same
  • Order may differ, but content should match

Recursion Depth Issues

Symptom: Maximum recursion depth exceeded errors

Causes:

  • Extremely low min_support
  • Very high max_length
  • Unusual data characteristics

Solutions:

  • Increase min_support
  • Reduce max_length to 3 or less
  • Switch to FP-Growth if problem persists
  • Check for data quality issues

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items