Regression
Train regression models to predict continuous numerical values
Regression models predict continuous numerical values. Use regression when you want to forecast quantities - like house prices, temperature, sales revenue, or customer lifetime value.
🎓 Learn About Regression
New to regression? Visit our Regression Concepts Guide to learn about evaluation metrics (MSE, RMSE, R², MAE), common approaches, and when to use regression for your machine learning tasks.
Available Models
We support 17 different regression algorithms, each with its own strengths:
Linear Models
- Linear Regression - Fast, interpretable baseline
- Ridge Regression - Linear with L2 regularization
- Lasso Regression - Linear with L1 regularization (feature selection)
- ElasticNet Regression - Combines L1 and L2 regularization
- Polynomial Regression - Captures non-linear relationships
- Huber Regressor - Robust to outliers
Tree-Based Models
- Decision Tree - Simple, interpretable rules-based model
- Random Forest - Ensemble of decision trees, robust and accurate
- Extra Trees - Like Random Forest but faster training
Gradient Boosting Models
- XGBoost - Industry standard, excellent performance
- LightGBM - Fast and memory efficient
- CatBoost - Handles categorical features automatically
- Gradient Boosting - Classic boosting algorithm
- AdaBoost - Adaptive boosting for weak learners
Other Models
- Support Vector Regression (SVR) - Effective for complex patterns
- K-Nearest Neighbors - Instance-based learning
- Multi-layer Perceptron - Neural network for complex patterns
Common Configuration
All models share these common settings:
Feature Configuration
Feature Columns (required) Select which columns from your dataset to use as input features for training. These are the variables the model will learn from to make predictions.
Target Column (required) The column containing the continuous values you want to predict. This should be a numerical column (not categorical).
Hyperparameter Tuning
Enable Hyperparameter Tuning Automatically search for the best model parameters. This improves accuracy but takes longer to train.
- Disabled: Use default parameters (faster)
- Enabled: Search for optimal parameters (better accuracy)
Tuning Method (when tuning is enabled)
- Grid Search: Try all combinations systematically (slow but thorough)
- Random Search: Try random combinations (faster, good results)
- Bayesian Search: Intelligently search the parameter space (most efficient)
CV Folds (when tuning is enabled) Number of cross-validation folds (default: 5). Higher values give more reliable results but take longer.
N Iterations (for Random/Bayesian search) How many parameter combinations to try (default: 10). More iterations may find better parameters but take longer.
Scoring Metric (when tuning is enabled) How to evaluate model performance:
- Neg Mean Squared Error: Penalizes large errors heavily (default)
- Neg Mean Absolute Error: Treats all errors equally
- R² Score: Proportion of variance explained (0-1, higher is better)
- Explained Variance: Similar to R² but doesn't account for systematic offset
- Neg Root Mean Squared Error: Square root of MSE, in same units as target
Understanding Regression Metrics
Mean Squared Error (MSE) Average of squared differences. Penalizes large errors heavily. Lower is better.
Root Mean Squared Error (RMSE) Square root of MSE. In same units as target. More interpretable than MSE.
Mean Absolute Error (MAE) Average of absolute differences. Treats all errors equally. In same units as target.
R² Score (Coefficient of Determination) Proportion of variance explained (0-1). 1 = perfect predictions, 0 = as good as predicting mean.
Explained Variance Similar to R² but doesn't account for systematic bias in predictions.
Choosing the Right Model
Quick Start Guide
- Start simple: Try Linear Regression first
- Add regularization: Ridge or Lasso if overfitting
- Go to trees: Random Forest or XGBoost for better accuracy
- Fine-tune: Use hyperparameter tuning on your best model
- Specialized: Try model-specific features (CatBoost for categories, SVR for high dimensions)
By Dataset Size
- Small (<1k rows): Linear/Ridge/Lasso, SVR, Decision Tree
- Medium (1k-100k): Random Forest, XGBoost, LightGBM
- Large (>100k): LightGBM, XGBoost (with GPU)
By Priority
- Accuracy: XGBoost, LightGBM, CatBoost
- Speed: Linear Regression, Ridge, LightGBM
- Interpretability: Linear Regression, Ridge, Lasso, Decision Tree
- Robustness: Random Forest, Huber, Extra Trees
By Data Characteristics
- Linear relationships: Linear, Ridge, Lasso, ElasticNet
- Non-linear patterns: XGBoost, Random Forest, Neural Network
- Many irrelevant features: Lasso, ElasticNet
- Correlated features: Ridge, ElasticNet
- Categorical features: CatBoost, XGBoost
- Outliers present: Huber, Random Forest
- High-dimensional: SVR, Ridge, Lasso
By Problem Type
- Need coefficients: Linear, Ridge, Lasso (interpretable weights)
- Feature selection: Lasso, ElasticNet (zeros out features)
- Curved relationships: Polynomial, tree-based models
- Unknown relationship: Random Forest (good default for anything)
Best Practices
- Always start with a baseline - Linear Regression trains fast and sets the bar
- Scale your features - Critical for Linear, Ridge, Lasso, SVR, KNN, Neural Networks
- Handle outliers - Use Huber Regressor or tree-based models for robustness
- Use cross-validation - Enable hyperparameter tuning for reliable results
- Monitor for overfitting - Check train vs. validation metrics
- Feature engineering matters - Better features > fancier models
- Check residual plots - Ensure model assumptions are met (especially for linear models)
- Consider ensembles - Average predictions from multiple models often works best
- Start simple, iterate - Don't jump to neural networks immediately
- Understand your metrics:
- MSE: Penalizes large errors heavily (good when large errors are costly)
- MAE: Treats all errors equally (good when all errors equally bad)
- R²: Proportion of variance explained (0-1, intuitive interpretation)
Next Steps
Ready to train? Head to the Training page and:
- Select your dataset
- Choose a regression model
- Configure the parameters based on this guide
- Enable hyperparameter tuning for best results
- Compare multiple models to find the winner
- Analyze residuals to ensure model quality