Vector Autoregression (VAR)
Multivariate time series model for forecasting multiple interdependent variables simultaneously
Vector Autoregression (VAR) models multiple time series jointly using their past values. It captures interdependencies between variables by modeling each variable as a function of its own lagged values and the lagged values of all other variables in the system.
When to Use Vector Autoregression
VAR is best suited for:
- Multivariate time series forecasting (multiple target variables)
- When variables influence each other (e.g., price and volume, temperature and electricity)
- Economic and financial systems with interdependencies
- Forecasting multiple metrics that move together
- Analyzing impulse responses and causal relationships
- When you need to forecast multiple correlated series simultaneously
Strengths
- Captures interdependencies between multiple time series
- No need to specify which variables are "dependent" vs "independent"
- Provides forecasts for all variables simultaneously
- Each variable's forecast uses information from other variables
- Well-established econometric framework
- Can analyze Granger causality and impulse responses
- Symmetric treatment of all variables
Weaknesses
- Requires multiple target variables (not suitable for univariate forecasting)
- Large number of parameters scales with number of variables and lags (variables × lags × variables)
- Needs substantial historical data for stable estimates
- All variables must be stationary or integrated of the same order
- Does not handle seasonal patterns directly
- Cannot incorporate exogenous predictors easily
- Interpretability decreases with many variables
- Computationally intensive for high-dimensional systems
Parameters
Common Time Series Parameters
All time series models share these parameters:
- Timestamp Column (required): Column containing dates/times for time series indexing
- Target Columns (required): List of target variables for multivariate forecasting
- Important: VAR requires multiple target columns (minimum 2)
- Example: ['sales', 'inventory', 'price']
- Frequency (optional): Time spacing (D, H, W, M). Auto-inferred if not specified
- Forecast Steps (required, default=1): How many periods to predict ahead
- Aggregation (default='mean'): Method for duplicate timestamps (mean, sum, first, last, min, max)
- Fill Method (default='interpolate'): How to handle missing timestamps (interpolate, ffill, bfill)
VAR-Specific Parameters
Differencing Order
- Type: Integer
- Default: 0
- Min: 0
- Max: 2
- Description: Number of times each target series is differenced to achieve stationarity
- 0: Use raw values (series is already stationary)
- 1: First differences (removes linear trends)
- 2: Second differences (removes quadratic trends)
- Guidance: Most economic time series need d=1. Check stationarity with statistical tests
Lag Order
- Type: Integer
- Default: null (auto-selected)
- Min: 1
- Description: Exact number of lagged time steps to include for each variable
- If null, automatically selected using information criteria (AIC/BIC)
- Higher lag order captures longer memory but increases parameters
- Example: lag_order=3 means each variable uses its past 3 values and past 3 values of all other variables
Max Lags
- Type: Integer
- Default: 10
- Min: 1
- Description: Maximum lag order considered when automatically selecting lag_order
- Visible When: lag_order is null
- Guidance:
- For daily data: try 7-14
- For monthly data: try 12-24
- For quarterly data: try 4-8
Configuration Tips
Preparing Your Data for VAR
-
Ensure Stationarity: VAR requires stationary data
- Check each variable for trends and unit roots
- Set
difference_order=1if variables have trends - Consider logarithmic transformation for exponential growth
-
Select Related Variables: Include variables that likely influence each other
- Example: [price, demand, inventory]
- Avoid unrelated variables that add noise
-
Sufficient History: VAR needs enough data for stable estimation
- Minimum: (number of variables × lag order × 10) observations
- Example: 3 variables, lag=5 → need at least 150 observations
Choosing Lag Order
Automatic Selection (Recommended):
- Leave
lag_order=null - Set
maxlagsbased on your frequency (7-14 for daily, 12 for monthly) - The model will use information criteria to find optimal lag length
Manual Selection:
- Start with lag_order = number of periods in the shortest cycle
- Daily data with weekly patterns: lag_order=7
- Monthly data with yearly patterns: lag_order=12
- Increase if cross-validation shows improvement
- Decrease if model is overfitting or too slow
Handling Non-Stationarity
If your variables have trends:
- Set
difference_order=1to difference all series - Forecasts will be in differenced space, then inverted back to original scale
- Alternatively, use SARIMAX or Prophet for series with complex trends
Model Validation
- Use time series cross-validation to assess forecast accuracy
- Check that residuals are uncorrelated (white noise)
- Compare forecast accuracy across all target variables
- Consider multivariate metrics like trace RMSE
Common Issues and Solutions
Issue: Model Doesn't Converge or Produces Unstable Forecasts
Solution:
- Check that all variables are stationary (set
difference_order=1) - Reduce
lag_orderormaxlags - Ensure sufficient data (at least 10× the number of parameters)
- Remove variables with very different scales (normalize if needed)
Issue: Too Many Parameters
Solution:
- Reduce the number of target variables (focus on most important)
- Decrease
lag_orderormaxlags - Use a simpler model (ARIMA for univariate, or separate models per variable)
- Consider dimensionality reduction techniques before VAR
Issue: Poor Forecast for Some Variables
Solution:
- Check if those variables are truly interdependent with others
- Consider separate models for weakly related variables
- Ensure those variables don't have different stationarity properties
- Verify data quality for poorly forecasted variables
Issue: Need Seasonal Patterns
Solution:
- VAR doesn't handle seasonality directly
- Apply seasonal differencing before VAR (e.g., difference at lag=12 for monthly)
- Use seasonal decomposition, then apply VAR to deseasonalized data
- Consider SARIMA or Prophet for strongly seasonal data
Issue: Want to Include Exogenous Variables
Solution:
- VAR doesn't easily handle exogenous predictors
- Use SARIMAX (extends ARIMA with exogenous variables)
- Or use multivariate models like VECM if variables are cointegrated
- Alternatively, include exogenous variables as additional targets if they're available at forecast time
Issue: Forecasts Revert to Mean Too Quickly
Solution:
- This is expected for stationary VAR (forecasts converge to mean)
- Ensure
difference_orderis appropriate - For persistent dynamics, consider VECM if variables are cointegrated
- Increase
lag_orderto capture longer-term dependencies
Issue: High Computational Cost
Solution:
- Reduce number of target variables
- Decrease
maxlagsto reduce search space - Specify
lag_ordermanually to skip automatic selection - Use daily aggregation instead of hourly if frequency is high
Example Use Cases
- Retail forecasting: Forecast [sales, inventory, returns] simultaneously to capture their interactions
- Energy markets: Model [electricity_price, demand, temperature] jointly
- Financial markets: Predict [stock_price, volume, volatility] considering their interdependencies
- Macroeconomic modeling: Forecast [GDP, inflation, unemployment] in an integrated framework
- Supply chain: Model [orders, shipments, inventory_levels] to understand system dynamics
- Marketing: Forecast [impressions, clicks, conversions] to capture funnel effects
Technical Notes
Granger Causality
VAR models can test if one variable "Granger-causes" another (if past values of X help predict Y beyond Y's own history).
Impulse Response Analysis
VAR allows analyzing how a shock to one variable propagates through the system over time.
Model Order
Total parameters = (number of variables)² × lag_order + intercepts
- 3 variables, lag=5: 3² × 5 + 3 = 48 parameters
- Ensure you have at least 10× data points for stable estimation