Dokumentation (english)

SARIMAX

Seasonal ARIMA with eXogenous variables for time series influenced by external factors

SARIMAX extends SARIMA with exogenous (external) variables. Use it when you have seasonal patterns and additional features that influence your target variable, such as weather, promotions, holidays, or economic indicators.

When to Use SARIMAX

SARIMAX is best suited for:

  • Time series with seasonal patterns and external influencing factors
  • When you have additional predictors available at prediction time
  • Business forecasting with promotional calendars or marketing spend
  • Weather-dependent forecasting with meteorological features
  • Economic forecasting with leading indicators
  • Scenarios where you need both seasonality and feature engineering
  • When SARIMA alone underfits due to missing explanatory variables

Strengths

  • Combines seasonal patterns with external predictors
  • Leverages both historical patterns and current conditions
  • Interpretable coefficients for exogenous variables
  • Statistical framework with confidence intervals
  • Can improve accuracy by incorporating domain knowledge
  • Flexible: works with continuous or categorical external variables
  • Handles trend, seasonality, and covariates simultaneously

Weaknesses

  • Requires exogenous variables to be known at forecast time
  • More parameters to tune than SARIMA: (p,d,q)(P,D,Q,s) + exogenous coefficients
  • Exogenous variables must be available for future periods during inference
  • Assumes linear relationships with exogenous variables
  • Limited to single seasonality
  • Computationally more intensive than SARIMA
  • Requires careful feature engineering
  • Can overfit if too many exogenous variables relative to data size

Parameters

Common Time Series Parameters

All time series models share these parameters:

  • Timestamp Column (required): Column containing dates/times
  • Target Column (required): Numeric value to forecast
  • Exogenous Variables (optional): List of external feature columns (e.g., ['temperature', 'promotion', 'holiday'])
    • Critical: These must be available for future forecast periods during inference
  • Frequency (optional): Time spacing (D, H, W, M). Auto-inferred if not specified
  • Forecast Steps (required, default=1): How many periods to predict

SARIMAX-Specific Parameters

Non-Seasonal Components

AR Order (p)
  • Type: Integer
  • Default: 1
  • Description: Number of autoregressive terms (past values of target)
  • Typical Range: 0-5
Differencing (d)
  • Type: Integer
  • Default: 1
  • Description: Degree of differencing to achieve stationarity
    • 0: No differencing
    • 1: First difference (removes trend)
  • Typical Range: 0-2
MA Order (q)
  • Type: Integer
  • Default: 1
  • Description: Number of moving average terms (past errors)
  • Typical Range: 0-5

Seasonal Components

Seasonal AR (P)
  • Type: Integer
  • Default: 1
  • Description: Seasonal autoregressive order
  • Typical Range: 0-2
Seasonal Diff (D)
  • Type: Integer
  • Default: 1
  • Description: Seasonal differencing order
  • Typical Range: 0-1
Seasonal MA (Q)
  • Type: Integer
  • Default: 1
  • Description: Seasonal moving average order
  • Typical Range: 0-2
Seasonal Period
  • Type: Integer
  • Default: 12
  • Description: Number of periods in seasonal cycle
  • Common Values:
    • 7 for weekly in daily data
    • 12 for yearly in monthly data
    • 24 for daily in hourly data

Configuration Tips

Selecting Exogenous Variables

Good Exogenous Variables:

  • Known in advance or predictable (temperature forecasts, planned promotions)
  • Causally related to target (not just correlated)
  • Consistent relationship over time
  • Available at forecast time

Poor Exogenous Variables:

  • Not available at forecast time (future stock prices)
  • Outcome variables (don't predict sales with revenue)
  • Highly correlated with each other (multicollinearity)
  • Too many relative to sample size (risk of overfitting)

Example Exogenous Variables by Domain

Retail Sales:

  • is_holiday (binary)
  • promotion_discount (numeric)
  • competitor_price (numeric)
  • marketing_spend (numeric)

Energy Consumption:

  • temperature (numeric)
  • is_weekend (binary)
  • is_holiday (binary)
  • day_of_week (categorical → one-hot encoded)

Website Traffic:

  • email_campaign_sent (binary)
  • ad_impressions (numeric)
  • content_posts (count)

Starting Configuration

For seasonal data with exogenous variables, start with:

Non-seasonal: (p=1, d=1, q=1)
Seasonal: (P=1, D=1, Q=1, s=[your period])
Exogenous: 1-5 carefully selected features

Feature Engineering for Exogenous Variables

  1. Binary Indicators: is_holiday, is_weekend, is_sale_period
  2. Lagged Exogenous: If the effect is delayed, create lags
  3. Interactions: product of two features (e.g., temperature × is_summer)
  4. Categorical Encoding: One-hot or ordinal encoding for categories

Handling Missing Future Values

Problem: Exogenous variables must be known during forecasting.

Solutions:

  • Use Predictable Features: Calendar features (day_of_week, month, is_holiday)
  • Use Planned Values: Scheduled promotions, planned marketing spend
  • Forecast Exogenous First: Build separate models for weather, prices, etc., then use forecasts
  • Use Scenarios: Create multiple forecasts with different exogenous assumptions (best/worst case)

Model Complexity Management

  • Start with 1-2 most important exogenous variables
  • Add more only if they significantly improve cross-validation performance
  • Use regularization or feature selection if you have many candidates
  • Monitor for overfitting (training error << validation error)

Common Issues and Solutions

Issue: Exogenous Variables Not Available at Forecast Time

Solution:

  • Only use features you can know in advance (holidays, planned events)
  • Build separate forecast models for uncertain exogenous variables
  • Use scenario-based forecasting (multiple forecasts with different assumptions)
  • Consider switching to Prophet or SARIMA if you lack reliable future features

Issue: Model Ignores Exogenous Variables

Solution:

  • Check that exogenous variables have sufficient variation
  • Ensure they're not constant or near-constant
  • Verify they're properly scaled (large range differences can cause issues)
  • Confirm they're actually related to the target (check correlations)
  • Try increasing the number of observations

Issue: Worse Performance Than SARIMA

Solution:

  • Your exogenous variables may be adding noise
  • Try simpler SARIMA without exogenous variables
  • Check for multicollinearity (remove highly correlated features)
  • Ensure exogenous variables are properly preprocessed
  • Reduce number of exogenous variables

Issue: Training Fails or Doesn't Converge

Solution:

  • Scale exogenous variables to similar ranges
  • Remove features with missing values
  • Check for perfect multicollinearity (identical features)
  • Simplify model orders (p,d,q)(P,D,Q)s
  • Ensure sufficient data (need more data with more exogenous variables)

Issue: Need Multiple Seasonalities

Solution:

  • SARIMAX handles only one seasonal period
  • Use Prophet with additional regressors (handles multiple seasonalities)
  • Apply seasonal decomposition, then use SARIMAX on deseasonalized data
  • Consider TBATS if multiple seasonalities are critical

Issue: Coefficients Have Wrong Signs

Solution:

  • Check for multicollinearity between exogenous variables
  • Verify data quality (no data entry errors)
  • Consider interaction effects or non-linear relationships
  • Remove redundant features

Issue: Poor Out-of-Sample Performance

Solution:

  • Overfitting to exogenous variables
  • Use time series cross-validation to tune
  • Reduce number of exogenous variables
  • Ensure future exogenous values are realistic (not using hindsight)

Example Use Cases

Retail Sales with Promotions

Target: daily_sales
Exogenous: [is_promotion, discount_percent, is_holiday]
SARIMAX(1,1,1)(1,1,1)7  # weekly seasonality

Captures weekly patterns and the impact of promotions.

Electricity Demand

Target: hourly_demand
Exogenous: [temperature, is_weekend, is_holiday]
SARIMAX(2,0,1)(1,1,0)24  # daily seasonality

Models daily cycles with temperature and calendar effects.

App Downloads with Marketing

Target: daily_downloads
Exogenous: [ad_spend, email_sent, app_store_feature]
SARIMAX(1,1,1)(1,0,1)7  # weekly seasonality

Separates organic weekly patterns from marketing-driven spikes.

Restaurant Revenue

Target: daily_revenue
Exogenous: [temperature, is_raining, local_events]
SARIMAX(1,1,0)(1,1,1)7  # weekly seasonality

Accounts for weather and special events beyond regular weekly patterns.

HVAC Energy Usage

Target: daily_energy
Exogenous: [avg_temperature, humidity, occupancy]
SARIMAX(1,0,1)(1,1,1)7  # weekly seasonality

Models energy use as a function of environmental conditions and weekly patterns.

Inference Requirements

When using a trained SARIMAX model for forecasting, you must provide:

  1. Trained Model: The fitted SARIMAX model
  2. Preprocessing Config: How exogenous variables were scaled/encoded
  3. Training Tail: Recent historical values for lag computation
  4. Future Exogenous Values: Values of all exogenous variables for each forecast step

Example Inference Input: If forecasting 7 days ahead with exogenous variables [temperature, is_holiday]:

forecast_steps = 7
future_exogenous = [
  [25.0, 0],  # day 1
  [26.5, 0],  # day 2
  [24.0, 0],  # day 3
  [23.5, 0],  # day 4
  [25.0, 1],  # day 5 (holiday)
  [27.0, 0],  # day 6
  [28.0, 0],  # day 7
]

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items