Regression

Regression models predict continuous numerical values. Use regression when you want to forecast quantities - like house prices, temperature, sales revenue, or customer lifetime value.

🎓 Learn About Regression

New to regression? Visit our Regression Concepts Guide to learn about evaluation metrics (MSE, RMSE, R², MAE), common approaches, and when to use regression for your machine learning tasks.

Available Models

We support 17 different regression algorithms, each with its own strengths:

Linear Models

Linear Regression - Fast, interpretable baseline
Ridge Regression - Linear with L2 regularization
Lasso Regression - Linear with L1 regularization (feature selection)
ElasticNet Regression - Combines L1 and L2 regularization
Polynomial Regression - Captures non-linear relationships
Huber Regressor - Robust to outliers

Tree-Based Models

Decision Tree - Simple, interpretable rules-based model
Random Forest - Ensemble of decision trees, robust and accurate
Extra Trees - Like Random Forest but faster training

Gradient Boosting Models

XGBoost - Industry standard, excellent performance
LightGBM - Fast and memory efficient
CatBoost - Handles categorical features automatically
Gradient Boosting - Classic boosting algorithm
AdaBoost - Adaptive boosting for weak learners

Other Models

Support Vector Regression (SVR) - Effective for complex patterns
K-Nearest Neighbors - Instance-based learning
Multi-layer Perceptron - Neural network for complex patterns

Common Configuration

All models share these common settings:

Feature Configuration

Feature Columns (required) Select which columns from your dataset to use as input features for training. These are the variables the model will learn from to make predictions.

Target Column (required) The column containing the continuous values you want to predict. This should be a numerical column (not categorical).

Hyperparameter Tuning

Enable Hyperparameter Tuning Automatically search for the best model parameters. This improves accuracy but takes longer to train.

Disabled: Use default parameters (faster)
Enabled: Search for optimal parameters (better accuracy)

Tuning Method (when tuning is enabled)

Grid Search: Try all combinations systematically (slow but thorough)
Random Search: Try random combinations (faster, good results)
Bayesian Search: Intelligently search the parameter space (most efficient)

CV Folds (when tuning is enabled) Number of cross-validation folds (default: 5). Higher values give more reliable results but take longer.

N Iterations (for Random/Bayesian search) How many parameter combinations to try (default: 10). More iterations may find better parameters but take longer.

Scoring Metric (when tuning is enabled) How to evaluate model performance:

Neg Mean Squared Error: Penalizes large errors heavily (default)
Neg Mean Absolute Error: Treats all errors equally
R² Score: Proportion of variance explained (0-1, higher is better)
Explained Variance: Similar to R² but doesn't account for systematic offset
Neg Root Mean Squared Error: Square root of MSE, in same units as target

Understanding Regression Metrics

Mean Squared Error (MSE) Average of squared differences. Penalizes large errors heavily. Lower is better.

Root Mean Squared Error (RMSE) Square root of MSE. In same units as target. More interpretable than MSE.

Mean Absolute Error (MAE) Average of absolute differences. Treats all errors equally. In same units as target.

R² Score (Coefficient of Determination) Proportion of variance explained (0-1). 1 = perfect predictions, 0 = as good as predicting mean.

Explained Variance Similar to R² but doesn't account for systematic bias in predictions.

Choosing the Right Model

Quick Start Guide

Start simple: Try Linear Regression first
Add regularization: Ridge or Lasso if overfitting
Go to trees: Random Forest or XGBoost for better accuracy
Fine-tune: Use hyperparameter tuning on your best model
Specialized: Try model-specific features (CatBoost for categories, SVR for high dimensions)

By Dataset Size

Small (<1k rows): Linear/Ridge/Lasso, SVR, Decision Tree
Medium (1k-100k): Random Forest, XGBoost, LightGBM
Large (>100k): LightGBM, XGBoost (with GPU)

By Priority

Accuracy: XGBoost, LightGBM, CatBoost
Speed: Linear Regression, Ridge, LightGBM
Interpretability: Linear Regression, Ridge, Lasso, Decision Tree
Robustness: Random Forest, Huber, Extra Trees

By Data Characteristics

Linear relationships: Linear, Ridge, Lasso, ElasticNet
Non-linear patterns: XGBoost, Random Forest, Neural Network
Many irrelevant features: Lasso, ElasticNet
Correlated features: Ridge, ElasticNet
Categorical features: CatBoost, XGBoost
Outliers present: Huber, Random Forest
High-dimensional: SVR, Ridge, Lasso

By Problem Type

Need coefficients: Linear, Ridge, Lasso (interpretable weights)
Feature selection: Lasso, ElasticNet (zeros out features)
Curved relationships: Polynomial, tree-based models
Unknown relationship: Random Forest (good default for anything)

Best Practices

Always start with a baseline - Linear Regression trains fast and sets the bar
Scale your features - Critical for Linear, Ridge, Lasso, SVR, KNN, Neural Networks
Handle outliers - Use Huber Regressor or tree-based models for robustness
Use cross-validation - Enable hyperparameter tuning for reliable results
Monitor for overfitting - Check train vs. validation metrics
Feature engineering matters - Better features > fancier models
Check residual plots - Ensure model assumptions are met (especially for linear models)
Consider ensembles - Average predictions from multiple models often works best
Start simple, iterate - Don't jump to neural networks immediately
Understand your metrics:
- MSE: Penalizes large errors heavily (good when large errors are costly)
- MAE: Treats all errors equally (good when all errors equally bad)
- R²: Proportion of variance explained (0-1, intuitive interpretation)

Next Steps

Ready to train? Head to the Training page and:

Select your dataset
Choose a regression model
Configure the parameters based on this guide
Enable hyperparameter tuning for best results
Compare multiple models to find the winner
Analyze residuals to ensure model quality

Regression

Available Models

Linear Models

Tree-Based Models

Gradient Boosting Models

Other Models

Common Configuration

Feature Configuration

Hyperparameter Tuning

Understanding Regression Metrics

Choosing the Right Model

Quick Start Guide

By Dataset Size

By Priority

By Data Characteristics

By Problem Type

Best Practices

Next Steps

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Regression

Available Models

Linear Models

Tree-Based Models

Gradient Boosting Models

Other Models

Common Configuration

Feature Configuration

Hyperparameter Tuning

Understanding Regression Metrics

Choosing the Right Model

Quick Start Guide

By Dataset Size

By Priority

By Data Characteristics

By Problem Type

Best Practices

Next Steps

On this page

Command Palette