Eclat

Uses vertical data format (itemset -> transaction list) instead of horizontal format (transaction -> itemset list). Finds frequent itemsets through set intersection operations.

When to Use Eclat

Sparse datasets (many items, few per transaction)
Need fast depth-first search
Memory is available for vertical representation
Alternative to FP-Growth for sparse data

Strengths

Fast for sparse data
Simple set intersection operations
Good for large number of items
Depth-first search is memory efficient
No repeated database scans

Weaknesses

High memory for dense data
Less intuitive vertical format
May be slower than FP-Growth on dense data
Requires full vertical database in memory

How it Works

Convert to vertical format: Each item -> list of transactions containing it
Intersect transaction lists to find co-occurrences
Use depth-first search to find patterns
Calculate support from transaction list sizes

Example Vertical Format:

Bread:  [1, 3, 5, 7]      (appears in transactions 1, 3, 5, 7)
Milk:   [1, 2, 5, 8]
Butter: [1, 5]

[Bread, Milk] -> intersection: [1, 5] -> support = 2/8 = 0.25

Key Advantage: Set intersection is very fast, especially for sparse data where transaction lists are short.

When to Choose Eclat

Best for:

Retail data with many SKUs but few items per basket
Web clickstream data (many pages, few per session)
Library checkout records
Any scenario with high item count, low items per transaction

Choose FP-Growth instead when:

Dense transactions (many items per transaction)
Limited memory
Need the fastest possible algorithm

Parameters

All association algorithms share these common parameters:

Data Format

Input Format: 'long' or 'wide'

How your transaction data is structured:

Wide Format:

Each column represents one item
Each row is a transaction
Values are 1 (item present) or 0 (item absent)

Example:

TransactionID | Bread | Milk | Eggs | Butter
1             | 1     | 1    | 0    | 1
2             | 0     | 1    | 1    | 0

Long Format:

Each row is one item in a transaction
Requires Transaction ID column to group items
More natural for real-world data

Example:

TransactionID | Item
1             | Bread
1             | Milk
1             | Butter
2             | Milk
2             | Eggs

Feature Configuration

Feature Columns (required)

Wide format: List all item columns
Long format: Select the single column containing item names

Transaction ID Column (required for long format) Column that identifies which transaction each item belongs to.

Contains Multiple Items (long format only) Check if a single row can contain multiple items (e.g., "Bread, Milk, Eggs").

Item Separator (if multiset) Character separating multiple items (default: comma).

Example: "Bread, Milk, Eggs" uses "," as separator

Segmentation (Optional)

Segmentation Column Analyze different customer segments separately:

Store locations (downtown vs. suburban)
Customer types (premium vs. regular)
Time periods (weekday vs. weekend)

Target Segment Value Filter to analyze only specific segment.

Model Parameters

Minimum Support (default: 0.02, required) Threshold for how frequently an itemset must appear.

0.02 = 2% of transactions
Lower values: Find rare patterns, but slower and more results
Higher values: Only common patterns, faster
Recommendations:
- Large stores (>10k transactions): 0.001-0.01 (0.1%-1%)
- Medium stores: 0.01-0.05 (1%-5%)
- Small datasets: 0.05-0.1 (5%-10%)

Maximum Itemset Length (default: 3, required) Maximum number of items in a pattern.

2: Pairs only (A -> B)
3: Triples (A, B -> C)
4+: Complex patterns (slower, harder to interpret)
Recommendations:
- Start with 2-3 for interpretability
- Increase only if needed

Rule Evaluation Metric (default: "lift", required) How to measure rule strength:

lift: Strength of association (recommended)
confidence: Reliability of rule
leverage: Lift adjusted by item frequencies
conviction: Dependency strength

Metric Threshold (default: 1.2, required) Minimum value for the selected metric to keep a rule.

For lift: >1.0 (1.2 = 20% more likely)
For confidence: 0.5-0.9 (50%-90% probability)

Advanced Filtering (Optional)

Enable Advanced Filtering Set both confidence and lift thresholds simultaneously for stricter rules.

Minimum Confidence (default: 0.6) Probability that Y is purchased given X is purchased.

0.6 = 60% of transactions with X also have Y
Range: 0.1-1.0

Minimum Lift (default: 1.1) How much more likely Y is with X versus without X.

1.0 = No association (independent)
1.1 = 10% increase in likelihood
2.0 = 2x more likely
Range: >0.0 (typically >1.0 for meaningful rules)

Understanding Association Metrics

Support

Definition: How frequently an itemset appears in the database.

Formula: support(X) = (transactions containing X) / (total transactions)

Example:

100 transactions total
[Bread, Milk] appears in 20 transactions
support([Bread, Milk]) = 20/100 = 0.2 = 20%

Interpretation:

0.01 (1%): Rare pattern
0.05 (5%): Moderate frequency
0.2 (20%): Very common pattern

Use: Filter out rare, potentially spurious patterns

Confidence

Definition: Probability of finding Y in transactions that contain X.

Formula: confidence(X -> Y) = support(X U Y) / support(X)

Example:

support([Bread]) = 0.5 (50% of transactions)
support([Bread, Butter]) = 0.3 (30% of transactions)
confidence(Bread -> Butter) = 0.3 / 0.5 = 0.6 = 60%

Interpretation:

0.6 = 60% of customers who buy bread also buy butter
Higher confidence = more reliable rule

Limitation: Can be misleading if Y is very common

Lift

Definition: How much more likely Y is with X versus without X.

Formula: lift(X -> Y) = confidence(X -> Y) / support(Y)

Example:

confidence(Bread -> Butter) = 0.6
support(Butter) = 0.4 (40% buy butter overall)
lift(Bread -> Butter) = 0.6 / 0.4 = 1.5

Interpretation:

lift = 1.0: No association (X and Y are independent)
lift > 1.0: Positive association (Y more likely with X)
- 1.5 = 50% increase in likelihood
- 2.0 = 2x more likely (100% increase)
lift < 1.0: Negative association (Y less likely with X)

Why Lift is Best for Discovery:

Accounts for item popularity
Detects true associations vs. coincidence
Symmetric: lift(X -> Y) = lift(Y -> X)

Leverage

Definition: Difference between observed and expected co-occurrence.

Formula: leverage(X -> Y) = support(X U Y) - support(X) x support(Y)

Example:

support([Bread, Butter]) = 0.3 (observed)
support(Bread) x support(Butter) = 0.5 x 0.4 = 0.2 (expected if independent)
leverage = 0.3 - 0.2 = 0.1

Interpretation:

0: No association
Positive: Items appear together more than expected
Negative: Items appear together less than expected
Magnitude matters: Higher absolute value = stronger relationship

Conviction

Definition: Dependency measure - how much more Y depends on X.

Formula: conviction(X -> Y) = (1 - support(Y)) / (1 - confidence(X -> Y))

Example:

support(Butter) = 0.4
confidence(Bread -> Butter) = 0.6
conviction = (1 - 0.4) / (1 - 0.6) = 0.6 / 0.4 = 1.5

Interpretation:

1.0: No association (independent)
1.0: Y depends on X
infinity: Perfect dependency (always Y when X)

Use: Measures how much the rule deviates from independence

Configuration Tips

Best Practices for Eclat

Optimal Use Cases:

Sparse transaction data
Many unique items (thousands of SKUs)
Few items per transaction (average < 10)
Need depth-first mining strategy

Memory Considerations:

Eclat stores transaction IDs for each item
Sparse data: Small transaction lists, low memory
Dense data: Large transaction lists, high memory
Monitor memory usage with large datasets

Performance Tips:

Works best with min_support >= 0.01
Increase min_support if memory issues occur
Consider data characteristics before choosing

When Eclat Performs Best

Ideal Characteristics:

1000+ unique items
Average 5-15 items per transaction
Sparse transaction matrix
Need to find rare patterns efficiently

Examples:

Supermarket with 10,000 products, baskets of 20 items
E-commerce site with 50,000 products, orders of 3-5 items
Library with 100,000 books, checkouts of 2-3 books

Common Issues and Solutions

High Memory Usage

Symptom: Out of memory errors or swapping

Causes:

Dense transaction data
Very low min_support
Large number of transactions

Solutions:

Increase min_support to 0.02 or higher
Switch to FP-Growth for dense data
Process data in segments
Filter to most relevant items first

Slower than Expected

Symptom: Eclat slower than FP-Growth

Causes:

Dense transactions (many items per transaction)
Data not actually sparse
Very low min_support

Solutions:

Verify data is sparse (check avg items per transaction)
Switch to FP-Growth if data is dense
Increase min_support slightly
Reduce max_length

Results Differ from Other Algorithms

Symptom: Different itemsets found

Note: All algorithms should find identical frequent itemsets above threshold. If they differ:

Verify parameters match exactly
Check data preprocessing
Ensure min_support is identical
Order may differ, but content should match

Conversion to Vertical Format Takes Long

Symptom: Slow initialization before mining starts

Explanation: Eclat must convert horizontal to vertical format first. This is one-time cost.

Solutions:

Normal for large datasets
Consider caching vertical format if running multiple times
Switch to FP-Growth if initialization dominates runtime

Eclat

When to Use Eclat

Strengths

Weaknesses

How it Works

When to Choose Eclat

Parameters

Data Format

Feature Configuration

Segmentation (Optional)

Model Parameters

Advanced Filtering (Optional)

Understanding Association Metrics

Support

Confidence

Lift

Leverage

Conviction

Configuration Tips

Best Practices for Eclat

When Eclat Performs Best

Common Issues and Solutions

High Memory Usage

Slower than Expected

Results Differ from Other Algorithms

Conversion to Vertical Format Takes Long

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Eclat

When to Use Eclat

Strengths

Weaknesses

How it Works

When to Choose Eclat

Parameters

Data Format

Feature Configuration

Segmentation (Optional)

Model Parameters

Advanced Filtering (Optional)

Understanding Association Metrics

Support

Confidence

Lift

Leverage

Conviction

Configuration Tips

Best Practices for Eclat

When Eclat Performs Best

Common Issues and Solutions

High Memory Usage

Slower than Expected

Results Differ from Other Algorithms

Conversion to Vertical Format Takes Long

On this page

Command Palette