Dokumentation (english)

Regression

Train regression models to predict continuous numerical values

Regression models predict continuous numerical values. Use regression when you want to forecast quantities - like house prices, temperature, sales revenue, or customer lifetime value.

🎓 Learn About Regression

New to regression? Visit our Regression Concepts Guide to learn about evaluation metrics (MSE, RMSE, R², MAE), common approaches, and when to use regression for your machine learning tasks.

Available Models

We support 17 different regression algorithms, each with its own strengths:

Linear Models

Tree-Based Models

Gradient Boosting Models

Other Models

Common Configuration

All models share these common settings:

Feature Configuration

Feature Columns (required) Select which columns from your dataset to use as input features for training. These are the variables the model will learn from to make predictions.

Target Column (required) The column containing the continuous values you want to predict. This should be a numerical column (not categorical).

Hyperparameter Tuning

Enable Hyperparameter Tuning Automatically search for the best model parameters. This improves accuracy but takes longer to train.

  • Disabled: Use default parameters (faster)
  • Enabled: Search for optimal parameters (better accuracy)

Tuning Method (when tuning is enabled)

  • Grid Search: Try all combinations systematically (slow but thorough)
  • Random Search: Try random combinations (faster, good results)
  • Bayesian Search: Intelligently search the parameter space (most efficient)

CV Folds (when tuning is enabled) Number of cross-validation folds (default: 5). Higher values give more reliable results but take longer.

N Iterations (for Random/Bayesian search) How many parameter combinations to try (default: 10). More iterations may find better parameters but take longer.

Scoring Metric (when tuning is enabled) How to evaluate model performance:

  • Neg Mean Squared Error: Penalizes large errors heavily (default)
  • Neg Mean Absolute Error: Treats all errors equally
  • R² Score: Proportion of variance explained (0-1, higher is better)
  • Explained Variance: Similar to R² but doesn't account for systematic offset
  • Neg Root Mean Squared Error: Square root of MSE, in same units as target

Understanding Regression Metrics

Mean Squared Error (MSE) Average of squared differences. Penalizes large errors heavily. Lower is better.

Root Mean Squared Error (RMSE) Square root of MSE. In same units as target. More interpretable than MSE.

Mean Absolute Error (MAE) Average of absolute differences. Treats all errors equally. In same units as target.

R² Score (Coefficient of Determination) Proportion of variance explained (0-1). 1 = perfect predictions, 0 = as good as predicting mean.

Explained Variance Similar to R² but doesn't account for systematic bias in predictions.

Choosing the Right Model

Quick Start Guide

  1. Start simple: Try Linear Regression first
  2. Add regularization: Ridge or Lasso if overfitting
  3. Go to trees: Random Forest or XGBoost for better accuracy
  4. Fine-tune: Use hyperparameter tuning on your best model
  5. Specialized: Try model-specific features (CatBoost for categories, SVR for high dimensions)

By Dataset Size

  • Small (<1k rows): Linear/Ridge/Lasso, SVR, Decision Tree
  • Medium (1k-100k): Random Forest, XGBoost, LightGBM
  • Large (>100k): LightGBM, XGBoost (with GPU)

By Priority

  • Accuracy: XGBoost, LightGBM, CatBoost
  • Speed: Linear Regression, Ridge, LightGBM
  • Interpretability: Linear Regression, Ridge, Lasso, Decision Tree
  • Robustness: Random Forest, Huber, Extra Trees

By Data Characteristics

  • Linear relationships: Linear, Ridge, Lasso, ElasticNet
  • Non-linear patterns: XGBoost, Random Forest, Neural Network
  • Many irrelevant features: Lasso, ElasticNet
  • Correlated features: Ridge, ElasticNet
  • Categorical features: CatBoost, XGBoost
  • Outliers present: Huber, Random Forest
  • High-dimensional: SVR, Ridge, Lasso

By Problem Type

  • Need coefficients: Linear, Ridge, Lasso (interpretable weights)
  • Feature selection: Lasso, ElasticNet (zeros out features)
  • Curved relationships: Polynomial, tree-based models
  • Unknown relationship: Random Forest (good default for anything)

Best Practices

  1. Always start with a baseline - Linear Regression trains fast and sets the bar
  2. Scale your features - Critical for Linear, Ridge, Lasso, SVR, KNN, Neural Networks
  3. Handle outliers - Use Huber Regressor or tree-based models for robustness
  4. Use cross-validation - Enable hyperparameter tuning for reliable results
  5. Monitor for overfitting - Check train vs. validation metrics
  6. Feature engineering matters - Better features > fancier models
  7. Check residual plots - Ensure model assumptions are met (especially for linear models)
  8. Consider ensembles - Average predictions from multiple models often works best
  9. Start simple, iterate - Don't jump to neural networks immediately
  10. Understand your metrics:
    • MSE: Penalizes large errors heavily (good when large errors are costly)
    • MAE: Treats all errors equally (good when all errors equally bad)
    • R²: Proportion of variance explained (0-1, intuitive interpretation)

Next Steps

Ready to train? Head to the Training page and:

  1. Select your dataset
  2. Choose a regression model
  3. Configure the parameters based on this guide
  4. Enable hyperparameter tuning for best results
  5. Compare multiple models to find the winner
  6. Analyze residuals to ensure model quality

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items