Outlier Detection
Identify and visualize outliers in your data
Use me when you want to find the rebels - the data points that don't fit in. I'll highlight the unusual, the extreme, and the potentially problematic values. That 200-year-old customer? The $1 million typo? The measurement that's way off? I'll find them for you. Perfect for data quality checks, fraud detection, or just understanding what's weird in your dataset.
Overview
An outlier detection plot helps identify data points that deviate significantly from the rest of your dataset. These unusual values can represent errors, rare events, or important anomalies worth investigating. The visualization highlights outliers using statistical methods, making it easy to spot and analyze unusual patterns.
Best used for:
- Data quality checks and validation
- Identifying data entry errors
- Detecting anomalies or unusual events
- Understanding data distribution
- Preparing data for analysis or modeling
- Finding extreme values that need attention
Common Use Cases
Data Quality & Cleaning
- Detecting data entry errors
- Finding measurement errors
- Validating data ranges
- Identifying corrupted records
- Pre-processing before analysis
Business Analytics
- Fraud detection
- Unusual transaction identification
- Customer behavior anomalies
- Sales spike or drop detection
- Inventory discrepancies
Scientific Research
- Experimental measurement errors
- Unusual observations worth investigating
- Quality control in manufacturing
- Sensor malfunction detection
- Research data validation
Options
Target Column
Required - Select the column to analyze for outliers.
Choose a numerical or categorical column to examine. The plot will identify values that significantly deviate from the typical pattern.
Accepts: NUMERICAL or CATEGORICAL columns
Settings
Highlight Outliers
Optional - Visually emphasize outlier points.
When enabled, outlier points are highlighted with distinct colors or markers, making them easier to identify and analyze.
Understanding Outliers
What is an Outlier?
An outlier is a data point that differs significantly from other observations. It may indicate:
- Measurement or data entry error
- Natural variation (rare but valid)
- Fraud or anomaly
- Important discovery
- Equipment malfunction
Types of Outliers
Univariate Outliers
- Extreme values in a single variable
- Example: A 200-year-old person in age data
- Detected using methods like IQR or Z-score
Multivariate Outliers
- Unusual combinations of values
- Example: Low income with luxury purchases
- May appear normal individually
Contextual Outliers
- Unusual in specific context
- Example: Winter temperature in summer
- Depends on time, location, or other factors
Detection Methods
IQR Method (Interquartile Range)
- Most common approach
- Outliers: values < Q1 - 1.5×IQR or > Q3 + 1.5×IQR
- Robust to extreme values
- Good for skewed distributions
Z-Score Method
- Based on standard deviations from mean
- Outliers: typically |Z| > 3
- Assumes normal distribution
- Sensitive to extreme outliers
Visual Inspection
- Look for points far from main cluster
- Examine distribution tails
- Check for unexpected patterns
- Consider domain knowledge
Interpreting the Plot
Visual Indicators
- Highlighted points: Identified outliers
- Position: How far from typical values
- Clustering: Groups of outliers vs. isolated points
- Patterns: Systematic vs. random outliers
What to Look For
- Isolated extreme values: Potential errors
- Clusters of outliers: Subpopulations or patterns
- Direction: High vs. low outliers
- Frequency: How common are outliers?
Handling Outliers
Investigation Steps
- Verify the value: Is it a data error?
- Check context: Does it make sense?
- Look for patterns: Are there similar cases?
- Consider cause: Why might this occur?
Common Actions
Remove: When outliers are clear errors
- Data entry mistakes
- Measurement failures
- Corrupted records
Keep: When outliers are valid
- Rare but real events
- Important discoveries
- Natural variation
Transform: When appropriate
- Log transformation for right-skewed data
- Winsorization (cap at threshold)
- Separate analysis of outliers
Investigate: When uncertain
- Gather more information
- Check source data
- Consult domain experts
Tips for Effective Outlier Analysis
-
Always Investigate Before Removing:
- Never automatically delete outliers
- Understand why they exist
- Document your decisions
- Consider impact on analysis
-
Use Multiple Methods:
- Different methods for different data types
- Compare IQR and Z-score results
- Visual inspection alongside statistics
- Consider domain-specific rules
-
Consider Context:
- What's normal for this data?
- Are extreme values possible?
- Check data collection process
- Review time periods and conditions
-
Document Outliers:
- Record outlier criteria used
- Note removed vs. kept outliers
- Explain reasoning for decisions
- Track impact on results
-
Be Careful with Automatic Removal:
- May remove important information
- Can bias your analysis
- Test sensitivity to outlier treatment
- Report analysis with and without outliers
Common Outlier Patterns
Single Extreme Point
One value far from all others - often a data error or rare event.
Multiple Outliers in Same Direction
Several high or low values - may indicate subgroup or systematic issue.
Outliers on Both Ends
Both very high and very low values - check data range validity.
Clustered Outliers
Groups of unusual values - may represent valid subpopulation.
Related Analyses
After Outlier Detection
- Box plot - See outliers in context of distribution
- Histogram - Understand overall data distribution
- Scatter plot - Check relationships with other variables
- Time series - See if outliers occur at specific times
- Summary statistics - Compare with/without outliers
Troubleshooting
Issue: Too many points marked as outliers
- Solution: Method may be too sensitive. Try different detection method. Consider if data has natural high variance. Check if multiple subgroups exist.
Issue: No outliers detected but data looks suspicious
- Solution: Try different detection method. Lower threshold for Z-score. Use visual inspection. Check for multivariate outliers.
Issue: Same points always flagged
- Solution: These may be persistent data issues. Investigate root cause. Check if valid extreme values. Consider separate handling.
Issue: Outliers only in specific groups
- Solution: Analyze groups separately. May indicate data quality issues in subset. Check if different scales or units used.
Issue: Can't decide if outlier is valid
- Solution: Check source data and documentation. Consult domain experts. Run analysis with and without. Consider impact on conclusions.
Issue: Outliers change analysis results significantly
- Solution: Report sensitivity analysis. Consider robust methods. Investigate outlier cause. May indicate influential points worth studying.