CatBoost
Gradient boosting with native categorical feature handling
CatBoost handles categorical features natively using ordered target statistics, eliminating the need for manual encoding. It is competitive with XGBoost and LightGBM and often requires less preprocessing.
When to use:
- Datasets with many categorical features
- When you want to skip one-hot encoding or target encoding
- Competitive accuracy with minimal data preparation
Input: Tabular data with the feature columns defined during training Output: Predicted class label and class probabilities
Model Settings (set during training, used at inference)
Iterations (default: 1000) Number of boosting rounds.
Learning Rate (default: auto) Step size for gradient updates. Auto-tuned by default; lower values with more iterations generalize better.
Depth (default: 6) Depth of symmetric trees. Typical range is 4–10.
L2 Leaf Reg (default: 3.0) L2 regularization on leaf values.
Border Count (default: 254) Number of splits for numerical features. Higher values capture finer boundaries.
Bagging Temperature (default: 1.0) Controls sampling randomness. Higher values add more noise to training.
Class Weights (default: null) Per-class weight multipliers for imbalanced datasets.
Inference Settings
No dedicated inference-time settings. CatBoost applies the trained symmetric tree structure at prediction time.