Dokumentation (english)

Vector Autoregression (VAR)

Multivariate time series model for forecasting multiple interdependent variables simultaneously

Vector Autoregression (VAR) models multiple time series jointly using their past values. It captures interdependencies between variables by modeling each variable as a function of its own lagged values and the lagged values of all other variables in the system.

When to Use Vector Autoregression

VAR is best suited for:

  • Multivariate time series forecasting (multiple target variables)
  • When variables influence each other (e.g., price and volume, temperature and electricity)
  • Economic and financial systems with interdependencies
  • Forecasting multiple metrics that move together
  • Analyzing impulse responses and causal relationships
  • When you need to forecast multiple correlated series simultaneously

Strengths

  • Captures interdependencies between multiple time series
  • No need to specify which variables are "dependent" vs "independent"
  • Provides forecasts for all variables simultaneously
  • Each variable's forecast uses information from other variables
  • Well-established econometric framework
  • Can analyze Granger causality and impulse responses
  • Symmetric treatment of all variables

Weaknesses

  • Requires multiple target variables (not suitable for univariate forecasting)
  • Large number of parameters scales with number of variables and lags (variables × lags × variables)
  • Needs substantial historical data for stable estimates
  • All variables must be stationary or integrated of the same order
  • Does not handle seasonal patterns directly
  • Cannot incorporate exogenous predictors easily
  • Interpretability decreases with many variables
  • Computationally intensive for high-dimensional systems

Parameters

Common Time Series Parameters

All time series models share these parameters:

  • Timestamp Column (required): Column containing dates/times for time series indexing
  • Target Columns (required): List of target variables for multivariate forecasting
    • Important: VAR requires multiple target columns (minimum 2)
    • Example: ['sales', 'inventory', 'price']
  • Frequency (optional): Time spacing (D, H, W, M). Auto-inferred if not specified
  • Forecast Steps (required, default=1): How many periods to predict ahead
  • Aggregation (default='mean'): Method for duplicate timestamps (mean, sum, first, last, min, max)
  • Fill Method (default='interpolate'): How to handle missing timestamps (interpolate, ffill, bfill)

VAR-Specific Parameters

Differencing Order

  • Type: Integer
  • Default: 0
  • Min: 0
  • Max: 2
  • Description: Number of times each target series is differenced to achieve stationarity
    • 0: Use raw values (series is already stationary)
    • 1: First differences (removes linear trends)
    • 2: Second differences (removes quadratic trends)
  • Guidance: Most economic time series need d=1. Check stationarity with statistical tests

Lag Order

  • Type: Integer
  • Default: null (auto-selected)
  • Min: 1
  • Description: Exact number of lagged time steps to include for each variable
    • If null, automatically selected using information criteria (AIC/BIC)
    • Higher lag order captures longer memory but increases parameters
  • Example: lag_order=3 means each variable uses its past 3 values and past 3 values of all other variables

Max Lags

  • Type: Integer
  • Default: 10
  • Min: 1
  • Description: Maximum lag order considered when automatically selecting lag_order
  • Visible When: lag_order is null
  • Guidance:
    • For daily data: try 7-14
    • For monthly data: try 12-24
    • For quarterly data: try 4-8

Configuration Tips

Preparing Your Data for VAR

  1. Ensure Stationarity: VAR requires stationary data

    • Check each variable for trends and unit roots
    • Set difference_order=1 if variables have trends
    • Consider logarithmic transformation for exponential growth
  2. Select Related Variables: Include variables that likely influence each other

    • Example: [price, demand, inventory]
    • Avoid unrelated variables that add noise
  3. Sufficient History: VAR needs enough data for stable estimation

    • Minimum: (number of variables × lag order × 10) observations
    • Example: 3 variables, lag=5 → need at least 150 observations

Choosing Lag Order

Automatic Selection (Recommended):

  • Leave lag_order=null
  • Set maxlags based on your frequency (7-14 for daily, 12 for monthly)
  • The model will use information criteria to find optimal lag length

Manual Selection:

  • Start with lag_order = number of periods in the shortest cycle
    • Daily data with weekly patterns: lag_order=7
    • Monthly data with yearly patterns: lag_order=12
  • Increase if cross-validation shows improvement
  • Decrease if model is overfitting or too slow

Handling Non-Stationarity

If your variables have trends:

  1. Set difference_order=1 to difference all series
  2. Forecasts will be in differenced space, then inverted back to original scale
  3. Alternatively, use SARIMAX or Prophet for series with complex trends

Model Validation

  • Use time series cross-validation to assess forecast accuracy
  • Check that residuals are uncorrelated (white noise)
  • Compare forecast accuracy across all target variables
  • Consider multivariate metrics like trace RMSE

Common Issues and Solutions

Issue: Model Doesn't Converge or Produces Unstable Forecasts

Solution:

  • Check that all variables are stationary (set difference_order=1)
  • Reduce lag_order or maxlags
  • Ensure sufficient data (at least 10× the number of parameters)
  • Remove variables with very different scales (normalize if needed)

Issue: Too Many Parameters

Solution:

  • Reduce the number of target variables (focus on most important)
  • Decrease lag_order or maxlags
  • Use a simpler model (ARIMA for univariate, or separate models per variable)
  • Consider dimensionality reduction techniques before VAR

Issue: Poor Forecast for Some Variables

Solution:

  • Check if those variables are truly interdependent with others
  • Consider separate models for weakly related variables
  • Ensure those variables don't have different stationarity properties
  • Verify data quality for poorly forecasted variables

Issue: Need Seasonal Patterns

Solution:

  • VAR doesn't handle seasonality directly
  • Apply seasonal differencing before VAR (e.g., difference at lag=12 for monthly)
  • Use seasonal decomposition, then apply VAR to deseasonalized data
  • Consider SARIMA or Prophet for strongly seasonal data

Issue: Want to Include Exogenous Variables

Solution:

  • VAR doesn't easily handle exogenous predictors
  • Use SARIMAX (extends ARIMA with exogenous variables)
  • Or use multivariate models like VECM if variables are cointegrated
  • Alternatively, include exogenous variables as additional targets if they're available at forecast time

Issue: Forecasts Revert to Mean Too Quickly

Solution:

  • This is expected for stationary VAR (forecasts converge to mean)
  • Ensure difference_order is appropriate
  • For persistent dynamics, consider VECM if variables are cointegrated
  • Increase lag_order to capture longer-term dependencies

Issue: High Computational Cost

Solution:

  • Reduce number of target variables
  • Decrease maxlags to reduce search space
  • Specify lag_order manually to skip automatic selection
  • Use daily aggregation instead of hourly if frequency is high

Example Use Cases

  • Retail forecasting: Forecast [sales, inventory, returns] simultaneously to capture their interactions
  • Energy markets: Model [electricity_price, demand, temperature] jointly
  • Financial markets: Predict [stock_price, volume, volatility] considering their interdependencies
  • Macroeconomic modeling: Forecast [GDP, inflation, unemployment] in an integrated framework
  • Supply chain: Model [orders, shipments, inventory_levels] to understand system dynamics
  • Marketing: Forecast [impressions, clicks, conversions] to capture funnel effects

Technical Notes

Granger Causality

VAR models can test if one variable "Granger-causes" another (if past values of X help predict Y beyond Y's own history).

Impulse Response Analysis

VAR allows analyzing how a shock to one variable propagates through the system over time.

Model Order

Total parameters = (number of variables)² × lag_order + intercepts

  • 3 variables, lag=5: 3² × 5 + 3 = 48 parameters
  • Ensure you have at least 10× data points for stable estimation

Command Palette

Search for a command to run...

Schnellzugriffe
STRG + KSuche
STRG + DNachtmodus / Tagmodus
STRG + LSprache ändern

Software-Details
Kompiliert vor 1 Tag
Release: v4.0.0-production
Buildnummer: master@64a3463
Historie: 68 Items