Association Analysis

Association analysis discovers relationships between items that frequently occur together in transactions. Use it for market basket analysis, recommendation systems, cross-selling strategies, and pattern discovery in transaction databases.

Understanding Association Analysis

New to association analysis? Check out our Association Analysis AI Task Guide to learn the fundamentals of market basket analysis, key concepts like itemsets and rules, and when to use this technique.

Available Algorithms

We support 5 algorithms for mining frequent itemsets and generating association rules. Each algorithm finds the same frequent itemsets, but uses different strategies with various performance trade-offs.

Core Algorithms

Apriori - Classic breadth-first algorithm, easy to understand, good for learning
FP-Growth - Fast tree-based algorithm, best for most use cases
Eclat - Vertical format algorithm, fast for sparse data

Specialized Algorithms

Relim - Memory-efficient recursive elimination
FPMax - Finds only maximal itemsets (compact representation)

What is Association Analysis?

Association analysis answers questions like:

"What products do customers buy together?"
"If someone buys X, what else are they likely to buy?"
"What items frequently appear in the same transaction?"

Classic Example: "Customers who buy diapers also buy beer" - a famous retail discovery showing unexpected associations.

Key Concepts

Itemset: A collection of items (e.g., [Bread, Milk])

Transaction: A set of items purchased together (e.g., one shopping cart)

Association Rule: If X then Y, written as X -> Y

Example: [Bread, Butter] -> [Milk]
Meaning: "Customers who buy bread and butter also buy milk"

Understanding Association Metrics

Support

Definition: How frequently an itemset appears in the database.

Formula: support(X) = (transactions containing X) / (total transactions)

Example:

100 transactions total
[Bread, Milk] appears in 20 transactions
support([Bread, Milk]) = 20/100 = 0.2 = 20%

Interpretation:

0.01 (1%): Rare pattern
0.05 (5%): Moderate frequency
0.2 (20%): Very common pattern

Use: Filter out rare, potentially spurious patterns

Confidence

Definition: Probability of finding Y in transactions that contain X.

Formula: confidence(X -> Y) = support(X U Y) / support(X)

Example:

support([Bread]) = 0.5 (50% of transactions)
support([Bread, Butter]) = 0.3 (30% of transactions)
confidence(Bread -> Butter) = 0.3 / 0.5 = 0.6 = 60%

Interpretation:

0.6 = 60% of customers who buy bread also buy butter
Higher confidence = more reliable rule

Limitation: Can be misleading if Y is very common

Lift

Definition: How much more likely Y is with X versus without X.

Formula: lift(X -> Y) = confidence(X -> Y) / support(Y)

Example:

confidence(Bread -> Butter) = 0.6
support(Butter) = 0.4 (40% buy butter overall)
lift(Bread -> Butter) = 0.6 / 0.4 = 1.5

Interpretation:

lift = 1.0: No association (X and Y are independent)
lift > 1.0: Positive association (Y more likely with X)
- 1.5 = 50% increase in likelihood
- 2.0 = 2x more likely (100% increase)
lift < 1.0: Negative association (Y less likely with X)

Why Lift is Best for Discovery:

Accounts for item popularity
Detects true associations vs. coincidence
Symmetric: lift(X -> Y) = lift(Y -> X)

Leverage

Definition: Difference between observed and expected co-occurrence.

Formula: leverage(X -> Y) = support(X U Y) - support(X) x support(Y)

Example:

support([Bread, Butter]) = 0.3 (observed)
support(Bread) x support(Butter) = 0.5 x 0.4 = 0.2 (expected if independent)
leverage = 0.3 - 0.2 = 0.1

Interpretation:

0: No association
Positive: Items appear together more than expected
Negative: Items appear together less than expected
Magnitude matters: Higher absolute value = stronger relationship

Conviction

Definition: Dependency measure - how much more Y depends on X.

Formula: conviction(X -> Y) = (1 - support(Y)) / (1 - confidence(X -> Y))

Example:

support(Butter) = 0.4
confidence(Bread -> Butter) = 0.6
conviction = (1 - 0.4) / (1 - 0.6) = 0.6 / 0.4 = 1.5

Interpretation:

1.0: No association (independent)
>1.0: Y depends on X
infinity: Perfect dependency (always Y when X)

Use: Measures how much the rule deviates from independence

Choosing the Right Algorithm

Quick Decision Guide

Start with FP-Growth unless:

Learning (use Apriori for intuition)
Sparse data (try Eclat)
Limited memory (try Relim)
Want compact results (use FPMax)

By Dataset Size

Small (<1k transactions): Any algorithm, Apriori is fine
Medium (1k-100k): FP-Growth (best), Eclat (for sparse)
Large (>100k): FP-Growth, Eclat, Relim

By Data Characteristics

Dense transactions (many items per transaction):

FP-Growth (best)
Apriori (small datasets)

Sparse transactions (few items per transaction):

Eclat (best for sparse)
FP-Growth

Many unique items:

Eclat (handles many items well)
FP-Growth

By Goal

Learning / Understanding:

Apriori (most intuitive)

Production / Performance:

FP-Growth (fastest, most reliable)

Compact Results:

FPMax (only longest patterns)

Memory Constraints:

Relim (memory-efficient)

Best Practices

1. Start with the Right Support

Don't start too low (<0.001)
Begin with moderate support (0.01-0.05)
Lower gradually if needed
Monitor number of results

2. Focus on Actionable Rules

High lift (>1.5) for strong associations
Reasonable confidence (>0.5) for reliability
Consider support (not too rare)
Look for surprising patterns (high lift + moderate confidence)

3. Filter and Interpret Results

Good Rules:

Lift >1.5 (strong association)
Confidence >0.5 (reliable)
Support >0.01 (not too rare)
Make business sense

Suspicious Rules:

Lift ≈ 1.0 (no real association)
Very high confidence + low lift (item just popular)
Very low support (might be noise)
Contradicts domain knowledge

4. Domain Validation

Validate with domain experts
Check if patterns make business sense
Look for actionable insights
Test recommendations with A/B testing

5. Segment Your Analysis

Analyze different segments separately:

Store locations
Customer demographics
Time periods (seasonal patterns)
Product categories

6. Practical Applications

Retail / E-commerce:

Product recommendations ("You might also like...")
Store layout optimization
Promotional bundling
Cross-selling strategies

Healthcare:

Symptom-disease associations
Drug interaction patterns
Treatment combinations

Web Analytics:

Page navigation patterns
Feature usage combinations
User behavior sequences

Common Pitfalls

1. Support Too Low

Generates too many patterns
Includes noise and spurious patterns
Very slow computation
Fix: Start with 0.01-0.05, lower gradually

2. Ignoring Lift

Using only confidence can be misleading
Popular items have high confidence by default
Fix: Always check lift >1.0, prefer >1.5

3. Too Many Items

Exponential growth in patterns
Overwhelming results
Fix:
- Increase min_support
- Limit max_length to 2-3
- Focus on specific product categories

4. Not Filtering Results

Raw output is overwhelming
Many redundant patterns
Fix:
- Use advanced filters (confidence + lift)
- Focus on high-lift rules
- Sort by interestingness metrics

5. Misinterpreting Causation

Association ≠ causation
Correlation might be coincidental
Fix: Validate with experiments and domain knowledge

Association Analysis

Available Algorithms

Core Algorithms

Specialized Algorithms

What is Association Analysis?

Key Concepts

Understanding Association Metrics

Support

Confidence

Lift

Leverage

Conviction

Choosing the Right Algorithm

Quick Decision Guide

By Dataset Size

By Data Characteristics

By Goal

Best Practices

1. Start with the Right Support

2. Focus on Actionable Rules

3. Filter and Interpret Results

4. Domain Validation

5. Segment Your Analysis

6. Practical Applications

Common Pitfalls

1. Support Too Low

2. Ignoring Lift

3. Too Many Items

4. Not Filtering Results

5. Misinterpreting Causation

Model Deployment

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Association Analysis

Available Algorithms

Core Algorithms

Specialized Algorithms

What is Association Analysis?

Key Concepts

Understanding Association Metrics

Support

Confidence

Lift

Leverage

Conviction

Choosing the Right Algorithm

Quick Decision Guide

By Dataset Size

By Data Characteristics

By Goal

Best Practices

1. Start with the Right Support

2. Focus on Actionable Rules

3. Filter and Interpret Results

4. Domain Validation

5. Segment Your Analysis

6. Practical Applications

Common Pitfalls

1. Support Too Low

2. Ignoring Lift

3. Too Many Items

4. Not Filtering Results

5. Misinterpreting Causation

Model Deployment

On this page

Command Palette