Classification

Classification models predict which category or class a data point belongs to. Use classification when you want to categorize data into discrete groups - like spam vs. not spam, disease vs. healthy, or customer segments.

🎓 Learn About Classification

New to classification? Visit our Classification Concepts Guide to learn about evaluation metrics, common approaches, and when to use classification for your machine learning tasks.

Available Models

We support 14 different classification algorithms, each with its own strengths:

Linear Models

Logistic Regression - Fast, interpretable baseline for binary and multiclass problems
Ordinal Logistic Regression - For ordered categories (low, medium, high)

Tree-Based Models

Decision Tree - Simple, interpretable rules-based model
Random Forest - Ensemble of decision trees, robust and accurate
Extra Trees - Like Random Forest but faster training

Gradient Boosting Models

XGBoost - Industry standard, excellent performance
LightGBM - Fast and memory efficient
CatBoost - Handles categorical features automatically
Gradient Boosting - Classic boosting algorithm
AdaBoost - Adaptive boosting for weak learners

Other Models

Support Vector Machine (SVM) - Effective for high-dimensional spaces
K-Nearest Neighbors - Instance-based learning
Naive Bayes - Fast probabilistic classifier
Multi-layer Perceptron - Neural network for complex patterns

Common Configuration

All models share these common settings:

Feature Configuration

Feature Columns (required) Select which columns from your dataset to use as input features for training. These are the variables the model will learn from to make predictions.

Target Column (required) The column containing the categories you want to predict. This should be a categorical column with discrete class labels.

Hyperparameter Tuning

Enable Hyperparameter Tuning Automatically search for the best model parameters. This improves accuracy but takes longer to train.

Disabled: Use default parameters (faster)
Enabled: Search for optimal parameters (better accuracy)

Tuning Method (when tuning is enabled)

Grid Search: Try all combinations systematically (slow but thorough)
Random Search: Try random combinations (faster, good results)
Bayesian Search: Intelligently search the parameter space (most efficient)

CV Folds (when tuning is enabled) Number of cross-validation folds (default: 5). Higher values give more reliable results but take longer.

N Iterations (for Random/Bayesian search) How many parameter combinations to try (default: 10). More iterations may find better parameters but take longer.

Scoring Metric (when tuning is enabled) How to evaluate model performance:

Accuracy: Percentage of correct predictions
Precision (weighted): How many predicted positives are actually positive
Recall (weighted): How many actual positives are correctly identified
F1 Score (weighted): Balance between precision and recall
ROC AUC (OVR): Area under the ROC curve (good for imbalanced data)

Choosing the Right Model

Quick Start Guide

Start simple: Try Logistic Regression first
Go to trees: Random Forest or XGBoost for better accuracy
Fine-tune: Use hyperparameter tuning on your best model
Specialized: Try model-specific features (CatBoost for categories, SVM for high dimensions)

By Dataset Size

Small (<1k rows): Logistic Regression, SVM, Decision Tree
Medium (1k-100k): Random Forest, XGBoost, LightGBM
Large (>100k): LightGBM, XGBoost (with GPU)

By Priority

Accuracy: XGBoost, LightGBM, CatBoost
Speed: Naive Bayes, Logistic Regression, LightGBM
Interpretability: Logistic Regression, Decision Tree
Robustness: Random Forest, Extra Trees

By Data Type

Categorical features: CatBoost, XGBoost
Text data: Naive Bayes, Logistic Regression
High-dimensional: SVM, Logistic Regression
Non-linear patterns: XGBoost, Random Forest, Neural Network

Best Practices

Always start with a baseline - Logistic Regression trains fast and sets the bar
Scale your features - Critical for SVM, KNN, Neural Networks (not needed for tree-based)
Handle imbalanced data - Use class_weight='balanced' or sampling techniques
Use cross-validation - Enable hyperparameter tuning for reliable results
Monitor for overfitting - Check train vs. validation metrics
Feature engineering matters - Better features > fancier models
Start simple, iterate - Don't jump to neural networks immediately

Next Steps

Ready to train? Head to the Training page and:

Select your dataset
Choose a classification model
Configure the parameters based on this guide
Enable hyperparameter tuning for best results
Compare multiple models to find the winner

Classification

Available Models

Linear Models

Tree-Based Models

Gradient Boosting Models

Other Models

Common Configuration

Feature Configuration

Hyperparameter Tuning

Choosing the Right Model

Quick Start Guide

By Dataset Size

By Priority

By Data Type

Best Practices

Next Steps

On this page

Sicherheit auf Enterprise-Niveau

In jeder Infrastruktur einsetzbar

DSGVO-konform

Classification

Available Models

Linear Models

Tree-Based Models

Gradient Boosting Models

Other Models

Common Configuration

Feature Configuration

Hyperparameter Tuning

Choosing the Right Model

Quick Start Guide

By Dataset Size

By Priority

By Data Type

Best Practices

Next Steps

On this page

Command Palette