Logistic Regression
Train Logistic Regression to predict categorical outcomes
Fast and interpretable linear model that predicts probabilities. Despite its name, it's used for classification, not regression.
When to use:
- First model to try - serves as a strong baseline
- When you need interpretable results
- Linear relationships between features and outcome
- Limited training data
Strengths: Fast training, interpretable, works well with many features, probabilistic outputs Weaknesses: Assumes linear relationships, struggles with complex patterns
Model Parameters
Penalty Regularization type to prevent overfitting:
- l1: LASSO - drives some coefficients to zero (feature selection)
- l2: Ridge - shrinks all coefficients (default, most common)
- elasticnet: Combination of L1 and L2
- none: No regularization (may overfit)
C (default: 1.0) Inverse of regularization strength. Smaller values = stronger regularization.
- Low values (0.01-0.1): Strong regularization, simpler model
- Default (1.0): Balanced
- High values (10-100): Weak regularization, more complex model
Solver Algorithm for optimization:
- lbfgs: Good default for most cases
- liblinear: Good for small datasets
- saga: Fast for large datasets, supports all penalties
- newton-cg: Faster on some problems
- sag: Similar to saga but faster
- newton-cholesky: New, can be very fast
Max Iterations (default: 100) Maximum training iterations. Increase if model doesn't converge.
Tolerance (default: 0.0001) Stopping criteria - lower values train longer for marginal improvements.
Class Weight
- None: Treat all classes equally
- Balanced: Automatically adjust for imbalanced classes
Fit Intercept (default: true) Whether to calculate intercept term. Keep true unless data is centered.
L1 Ratio (for elasticnet only) Mix of L1 and L2 (0 = pure L2, 1 = pure L1).
Random State (default: 42) Seed for reproducibility.