←back to Blog

A Coding Guide to Build Flexible Multi-Model Workflows in GluonTS with Synthetic Data, Evaluation, and Advanced Visualizations

«`html

Understanding the Target Audience

The target audience for this coding guide includes data scientists, machine learning engineers, and business analysts who are looking to enhance their forecasting capabilities using GluonTS. They are typically familiar with Python programming and have a basic understanding of time series analysis. Their pain points include:

  • Difficulty in managing and comparing multiple forecasting models.
  • Challenges in generating synthetic datasets for testing and validation.
  • Need for effective evaluation metrics to assess model performance.
  • Desire for clear visualizations to interpret forecasting results.

Their goals include:

  • Building robust forecasting workflows that can handle various models.
  • Improving accuracy and reliability in predictions.
  • Gaining insights from data through advanced visualizations.

Interests often lie in the latest advancements in machine learning techniques and their applications in business contexts. They prefer clear, concise communication with practical examples that can be directly applied to their work.

A Coding Guide to Build Flexible Multi-Model Workflows in GluonTS with Synthetic Data, Evaluation, and Advanced Visualizations

This tutorial explores GluonTS from a practical perspective, focusing on generating complex synthetic datasets, preparing them, and applying multiple models in parallel. We emphasize how to work with diverse estimators in the same pipeline, handle missing dependencies gracefully, and produce usable results. By incorporating evaluation and visualization steps, we create a workflow that highlights how models can be trained, compared, and interpreted in a seamless process.

Importing Required Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

from gluonts.dataset.pandas import PandasDataset
from gluonts.dataset.split import split
from gluonts.evaluation import make_evaluation_predictions, Evaluator
from gluonts.dataset.artificial import ComplexSeasonalTimeSeries

try:
   from gluonts.torch import DeepAREstimator
   TORCH_AVAILABLE = True
except ImportError:
   TORCH_AVAILABLE = False

try:
   from gluonts.mx import DeepAREstimator as MXDeepAREstimator
   from gluonts.mx import SimpleFeedForwardEstimator
   MX_AVAILABLE = True
except ImportError:
   MX_AVAILABLE = False

We begin by importing the core libraries for data handling, visualization, and GluonTS utilities. We also set up conditional imports for PyTorch and MXNet estimators, allowing us to flexibly use whichever backend is available in our environment.

Creating Synthetic Datasets

def create_synthetic_dataset(num_series=50, length=365, prediction_length=30):
   """Generate synthetic multi-variate time series with trends, seasonality, and noise"""
   np.random.seed(42)
   series_list = []
  
   for i in range(num_series):
       trend = np.cumsum(np.random.normal(0.1 + i*0.01, 0.1, length))
      
       daily_season = 10 * np.sin(2 * np.pi * np.arange(length) / 7) 
       yearly_season = 20 * np.sin(2 * np.pi * np.arange(length) / 365.25) 
      
       noise = np.random.normal(0, 5, length)
       values = np.maximum(trend + daily_season + yearly_season + noise + 100, 1)
      
       dates = pd.date_range(start='2020-01-01', periods=length, freq='D')
      
       series_list.append(pd.Series(values, index=dates, name=f'series_{i}'))
  
   return pd.concat(series_list, axis=1)

We create a synthetic dataset where each series combines trend, seasonality, and noise. This design ensures consistent results with every run, returning a clean multi-series DataFrame ready for experimentation.

Initializing Forecasting Models

print("Creating synthetic multi-series dataset...")
df = create_synthetic_dataset(num_series=10, length=200, prediction_length=30)

dataset = PandasDataset(df, target=df.columns.tolist())

training_data, test_gen = split(dataset, offset=-60)
test_data = test_gen.generate_instances(prediction_length=30)

print("Initializing forecasting models...")

models = {}

if TORCH_AVAILABLE:
   try:
       models['DeepAR_Torch'] = DeepAREstimator(
           freq='D',
           prediction_length=30
       )
       print("PyTorch DeepAR loaded")
   except Exception as e:
       print(f"PyTorch DeepAR failed to load: {e}")

if MX_AVAILABLE:
   try:
       models['DeepAR_MX'] = MXDeepAREstimator(
           freq='D',
           prediction_length=30,
           trainer=dict(epochs=5)
       )
       print("MXNet DeepAR loaded")
   except Exception as e:
       print(f"MXNet DeepAR failed to load: {e}")
  
   try:
       models['FeedForward'] = SimpleFeedForwardEstimator(
           freq='D',
           prediction_length=30,
           trainer=dict(epochs=5)
       )
       print("FeedForward model loaded")
   except Exception as e:
       print(f"FeedForward failed to load: {e}")

if not models:
   print("Using artificial dataset with built-in models...")
   artificial_ds = ComplexSeasonalTimeSeries(
       num_series=10,
       prediction_length=30,
       freq='D',
       length_low=150,
       length_high=200
   ).generate()
  
   training_data, test_gen = split(artificial_ds, offset=-60)
   test_data = test_gen.generate_instances(prediction_length=30)

We generate a 10-series dataset, wrap it into a GluonTS PandasDataset, and split it into training and test windows. We then initialize multiple estimators (PyTorch DeepAR, MXNet DeepAR, and FeedForward) when available, and fall back to a built-in artificial dataset if no backends load.

Training Models and Evaluating Performance

trained_models = {}
all_forecasts = {}

if models:
   for name, estimator in models.items():
       print(f"Training {name} model...")
       try:
           predictor = estimator.train(training_data)
           trained_models[name] = predictor
          
           forecasts = list(predictor.predict(test_data.input))
           all_forecasts[name] = forecasts
           print(f"{name} training completed!")
          
       except Exception as e:
           print(f"{name} training failed: {e}")
           continue

print("Evaluating model performance...")
evaluator = Evaluator(quantiles=[0.1, 0.5, 0.9])
evaluation_results = {}

for name, forecasts in all_forecasts.items():
   if forecasts: 
       try:
           agg_metrics, item_metrics = evaluator(test_data.label, forecasts)
           evaluation_results[name] = agg_metrics
           print(f"\n{name} Performance:")
           print(f"  MASE: {agg_metrics['MASE']:.4f}")
           print(f"  sMAPE: {agg_metrics['sMAPE']:.4f}")
           print(f"  Mean wQuantileLoss: {agg_metrics['mean_wQuantileLoss']:.4f}")
       except Exception as e:
           print(f"Evaluation failed for {name}: {e}")

We train each available estimator, collect probabilistic forecasts, and store the fitted predictors for reuse. We then evaluate results using MASE, sMAPE, and weighted quantile loss, providing a consistent, comparative view of model performance.

Advanced Visualizations of Forecasts

def plot_advanced_forecasts(test_data, forecasts_dict, series_idx=0):
   """Advanced plotting with multiple models and uncertainty bands"""
   fig, axes = plt.subplots(2, 2, figsize=(15, 10))
   fig.suptitle('Advanced GluonTS Forecasting Results', fontsize=16, fontweight='bold')
  
   if not forecasts_dict:
       fig.text(0.5, 0.5, 'No successful forecasts to display',
               ha='center', va='center', fontsize=20)
       return fig
  
   if series_idx < len(test_data.label):
       ts_label = test_data.label[series_idx]
       ts_input = test_data.input[series_idx]['target']
      
       colors = ['blue', 'red', 'green', 'purple', 'orange']
      
       ax1 = axes[0, 0]
       ax1.plot(range(len(ts_input)), ts_input, 'k-', label='Historical', alpha=0.8, linewidth=2)
       ax1.plot(range(len(ts_input), len(ts_input) + len(ts_label)),
               ts_label, 'k--', label='True Future', alpha=0.8, linewidth=2)
      
       for i, (name, forecasts) in enumerate(forecasts_dict.items()):
           if series_idx < len(forecasts):
               forecast = forecasts[series_idx]
               forecast_range = range(len(ts_input), len(ts_input) + len(forecast.mean))
              
               color = colors[i % len(colors)]
               ax1.plot(forecast_range, forecast.mean,
                       color=color, label=f'{name} Mean', linewidth=2)
              
               try:
                   ax1.fill_between(forecast_range,
                                  forecast.quantile(0.1), forecast.quantile(0.9),
                                  alpha=0.2, color=color, label=f'{name} 80% CI')
               except:
                   pass 
      
       ax1.set_title('Multi-Model Forecasts Comparison', fontsize=12, fontweight='bold')
       ax1.legend()
       ax1.grid(True, alpha=0.3)
       ax1.set_xlabel('Time Steps')
       ax1.set_ylabel('Value')
      
       ax2 = axes[0, 1]
       if all_forecasts:
           first_model = list(all_forecasts.keys())[0]
           if series_idx < len(all_forecasts[first_model]):
               forecast = all_forecasts[first_model][series_idx]
               ax2.scatter(ts_label, forecast.mean, alpha=0.7, s=60)
              
               min_val = min(min(ts_label), min(forecast.mean))
               max_val = max(max(ts_label), max(forecast.mean))
               ax2.plot([min_val, max_val], [min_val, max_val], 'r--', alpha=0.8)
              
               ax2.set_title(f'Prediction vs Actual - {first_model}', fontsize=12, fontweight='bold')
               ax2.set_xlabel('Actual Values')
               ax2.set_ylabel('Predicted Values')
               ax2.grid(True, alpha=0.3)
      
       ax3 = axes[1, 0]
       if all_forecasts:
           first_model = list(all_forecasts.keys())[0]
           if series_idx < len(all_forecasts[first_model]):
               forecast = all_forecasts[first_model][series_idx]
               residuals = ts_label - forecast.mean
               ax3.hist(residuals, bins=15, alpha=0.7, color='skyblue', edgecolor='black')
               ax3.axvline(x=0, color='r', linestyle='--', linewidth=2)
               ax3.set_title(f'Residuals Distribution - {first_model}', fontsize=12, fontweight='bold')
               ax3.set_xlabel('Residuals')
               ax3.set_ylabel('Frequency')
               ax3.grid(True, alpha=0.3)
      
       ax4 = axes[1, 1]
       if evaluation_results:
           metrics = ['MASE', 'sMAPE'] 
           model_names = list(evaluation_results.keys())
           x = np.arange(len(metrics))
           width = 0.35
          
           for i, model_name in enumerate(model_names):
               values = [evaluation_results[model_name].get(metric, 0) for metric in metrics]
               ax4.bar(x + i*width, values, width,
                      label=model_name, color=colors[i % len(colors)], alpha=0.8)
          
           ax4.set_title('Model Performance Comparison', fontsize=12, fontweight='bold')
           ax4.set_xlabel('Metrics')
           ax4.set_ylabel('Value')
           ax4.set_xticks(x + width/2 if len(model_names) > 1 else x)
           ax4.set_xticklabels(metrics)
           ax4.legend()
           ax4.grid(True, alpha=0.3)
       else:
           ax4.text(0.5, 0.5, 'No evaluation\nresults available',
                   ha='center', va='center', transform=ax4.transAxes, fontsize=14)
  
   plt.tight_layout()
   return fig

If all forecasts and test data are available, we create advanced visualizations to compare the results. This includes plotting the mean forecasts, actual values, residuals distribution, and model performance metrics.

Conclusion

In conclusion, we have established a robust setup that balances data creation, model experimentation, and performance analysis. This guide demonstrates how to adapt flexibly, test multiple options, and visualize results in a way that makes comparison intuitive. This foundation allows for experimentation with GluonTS and application of the same principles to real datasets while keeping the process modular and easy to extend.

Check out the FULL CODES here. Feel free to check out our GitHub Page for Tutorials, Codes, and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

«`