Dokumentation (english)

Classification

Train classification models to predict categorical outcomes

Classification models predict which category or class a data point belongs to. Use classification when you want to categorize data into discrete groups - like spam vs. not spam, disease vs. healthy, or customer segments.

🎓 Learn About Classification

New to classification? Visit our Classification Concepts Guide to learn about evaluation metrics, common approaches, and when to use classification for your machine learning tasks.

Available Models

We support 14 different classification algorithms, each with its own strengths:

Linear Models

Tree-Based Models

Gradient Boosting Models

Other Models

Common Configuration

All models share these common settings:

Feature Configuration

Feature Columns (required) Select which columns from your dataset to use as input features for training. These are the variables the model will learn from to make predictions.

Target Column (required) The column containing the categories you want to predict. This should be a categorical column with discrete class labels.

Hyperparameter Tuning

Enable Hyperparameter Tuning Automatically search for the best model parameters. This improves accuracy but takes longer to train.

  • Disabled: Use default parameters (faster)
  • Enabled: Search for optimal parameters (better accuracy)

Tuning Method (when tuning is enabled)

  • Grid Search: Try all combinations systematically (slow but thorough)
  • Random Search: Try random combinations (faster, good results)
  • Bayesian Search: Intelligently search the parameter space (most efficient)

CV Folds (when tuning is enabled) Number of cross-validation folds (default: 5). Higher values give more reliable results but take longer.

N Iterations (for Random/Bayesian search) How many parameter combinations to try (default: 10). More iterations may find better parameters but take longer.

Scoring Metric (when tuning is enabled) How to evaluate model performance:

  • Accuracy: Percentage of correct predictions
  • Precision (weighted): How many predicted positives are actually positive
  • Recall (weighted): How many actual positives are correctly identified
  • F1 Score (weighted): Balance between precision and recall
  • ROC AUC (OVR): Area under the ROC curve (good for imbalanced data)

Choosing the Right Model

Quick Start Guide

  1. Start simple: Try Logistic Regression first
  2. Go to trees: Random Forest or XGBoost for better accuracy
  3. Fine-tune: Use hyperparameter tuning on your best model
  4. Specialized: Try model-specific features (CatBoost for categories, SVM for high dimensions)

By Dataset Size

  • Small (<1k rows): Logistic Regression, SVM, Decision Tree
  • Medium (1k-100k): Random Forest, XGBoost, LightGBM
  • Large (>100k): LightGBM, XGBoost (with GPU)

By Priority

  • Accuracy: XGBoost, LightGBM, CatBoost
  • Speed: Naive Bayes, Logistic Regression, LightGBM
  • Interpretability: Logistic Regression, Decision Tree
  • Robustness: Random Forest, Extra Trees

By Data Type

  • Categorical features: CatBoost, XGBoost
  • Text data: Naive Bayes, Logistic Regression
  • High-dimensional: SVM, Logistic Regression
  • Non-linear patterns: XGBoost, Random Forest, Neural Network

Best Practices

  1. Always start with a baseline - Logistic Regression trains fast and sets the bar
  2. Scale your features - Critical for SVM, KNN, Neural Networks (not needed for tree-based)
  3. Handle imbalanced data - Use class_weight='balanced' or sampling techniques
  4. Use cross-validation - Enable hyperparameter tuning for reliable results
  5. Monitor for overfitting - Check train vs. validation metrics
  6. Feature engineering matters - Better features > fancier models
  7. Start simple, iterate - Don't jump to neural networks immediately

Next Steps

Ready to train? Head to the Training page and:

  1. Select your dataset
  2. Choose a classification model
  3. Configure the parameters based on this guide
  4. Enable hyperparameter tuning for best results
  5. Compare multiple models to find the winner

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items