Dynamic Ensembles: Comparing Several Approaches

This notebook explores how to combine nixtla’s neuralforecast models using several ensemble approaches.

  1. Loading M3’s monthly dataset

  2. Preparing and running cross-validation with 7 neural networks

  3. Fitting several combination rules using out-of-sample data from cross-validation

  4. Getting forecasts and combining them

  5. Evaluating all models

[1]:
import warnings

warnings.filterwarnings("ignore")

If necessary, install the package using pip:

[2]:
# !pip install metaforecast=='0.1.3'

1. Data preparation

Let’s start by loading the dataset. This tutorial uses the M3 monthly dataset available on datasetsforecast.

We also set the forecasting horizon and input size (number of lags) to 12.

[3]:
import pandas as pd

from datasetsforecast.m3 import M3

horizon = 12
n_lags = 12

df, *_ = M3.load('.',group='Monthly')

Split the dataset into training and testing sets:

[4]:
df_by_unq = df.groupby('unique_id')

train_l, test_l = [], []
for g, df_ in df_by_unq:
    df_ = df_.sort_values('ds')

    train_df_g = df_.head(-horizon)
    test_df_g = df_.tail(horizon)

    train_l.append(train_df_g)
    test_l.append(test_df_g)

train_df = pd.concat(train_l).reset_index(drop=True)
test_df = pd.concat(test_l).reset_index(drop=True)

train_df.query('unique_id=="M1000"').tail()
[4]:
unique_id ds y
286 M1000 1992-10-31 4563.4
287 M1000 1992-11-30 4551.8
288 M1000 1992-12-31 4577.4
289 M1000 1993-01-31 4592.4
290 M1000 1993-02-28 4632.2
[5]:
test_df.query('unique_id=="M1000"').head()
[5]:
unique_id ds y
36 M1000 1993-03-31 4625.6
37 M1000 1993-04-30 4668.2
38 M1000 1993-05-31 4598.0
39 M1000 1993-06-30 4619.4
40 M1000 1993-07-31 4640.4

2. Model setup and cross-validation

Let’s start by setting up the ensemble members: 7 neural networks based on neuralforecast.

We carry out a cross-validation procedure to get out-of-sample forecasts that will be used to fit the combination rule

[ ]:
from neuralforecast import NeuralForecast
from neuralforecast.models import (NHITS,
                                   LSTM,
                                   GRU,
                                   NBEATS,
                                   MLP,
                                   TiDE)

CONFIG = {
    'input_size': n_lags,
    'h': horizon,
    'accelerator': 'cpu',
    'max_steps':1000,
}

models = [
    NBEATS(**CONFIG, stack_types=3*["identity"]),
    NHITS(**CONFIG),
    MLP( **CONFIG),
    MLP(num_layers=3, **CONFIG),
    LSTM(**CONFIG),
    GRU(**CONFIG),
    TiDE( **CONFIG),
]


nf = NeuralForecast(models=models, freq='M')
[ ]:
%%capture

# should try nf.predict_insample
n_windows = train_df['unique_id'].value_counts().min() - n_lags - horizon
n_windows = int(n_windows // 2)

fcst_cv = nf.cross_validation(df=train_df, n_windows=n_windows, step_size=1)
fcst_cv = fcst_cv.reset_index()
fcst_cv = fcst_cv.groupby(['unique_id', 'cutoff']).head(1).drop(columns='cutoff')
[8]:
fcst_cv.head()
[8]:
unique_id ds NBEATS NHITS MLP MLP1 LSTM GRU TiDE y
0 M1 1992-07-31 4270.607422 4316.121094 3988.292969 4269.679199 3331.872803 3312.583252 4209.511719 5880.0
12 M1 1992-08-31 5383.976562 5088.954102 4664.032227 4608.907227 3408.262695 3532.260254 4589.873047 1680.0
24 M1 1992-09-30 3962.103271 3926.756348 3292.554443 3432.223633 3158.096924 3155.144531 6003.843750 6720.0
36 M1 1992-10-31 3850.518799 3936.579346 4416.983398 4314.682617 2995.153809 3089.166016 4937.783691 2040.0
48 M1 1992-11-30 4311.153320 4394.302734 3661.406250 3534.437988 3103.114746 3088.328125 3061.584473 6480.0

3. Ensemble combination rule

In this example, we’ll compare several dynamic ensembles

[9]:
from metaforecast.ensembles import (MLpol, MLewa, ADE, LossOnTrain, BestOnTrain, EqAverage, Windowing)
[10]:
TRIM_R = 0.8

combiners = {
    'ADE': ADE(freq='M', meta_lags=list(range(1, 4)), trim_ratio=TRIM_R, trim_by_uid=False),
    'MLpol': MLpol(loss_type='square', gradient=True, trim_ratio=TRIM_R, weight_by_uid=False),
    'MLewa': MLewa(loss_type='square', gradient=True, trim_ratio=TRIM_R, weight_by_uid=False),
    'LossOnTrain': LossOnTrain(trim_ratio=TRIM_R, weight_by_uid=False),
    'BestOnTrain': BestOnTrain(select_by_uid=False),
    'EqAverage': EqAverage(select_by_uid=False, trim_ratio=TRIM_R),
    'Windowing': Windowing(freq='M', trim_ratio=TRIM_R, select_best=False, weight_by_uid=False),
    'BLAST': Windowing(freq='M', trim_ratio=TRIM_R, select_best=True, weight_by_uid=False),
}
[11]:
for k in combiners:
    print(f'Fitting {k}')
    combiners[k].fit(fcst_cv)
Fitting ADE
Fitting MLpol
Fitting MLewa
Fitting LossOnTrain
Fitting BestOnTrain
Fitting EqAverage
Fitting Windowing
Fitting BLAST

Let’s look at the weights of different combiners

[12]:
combiners['MLpol'].uid_coefficient['M1000'].plot.bar()
[12]:
<Axes: >
../_images/notebooks_02_ensembles_cv_18_1.png
[13]:
combiners['MLewa'].uid_coefficient['M1000'].plot.bar()
[13]:
<Axes: >
../_images/notebooks_02_ensembles_cv_19_1.png
[14]:
combiners['Windowing'].weights.loc['M1000',:].plot.bar()
[14]:
<Axes: >
../_images/notebooks_02_ensembles_cv_20_1.png

Then, we refit the neural networks are get the test forecasts

[ ]:
%%capture

nf.fit(df=train_df)
[ ]:
%%capture

fcst = nf.predict(step_size=1)
[17]:
fcst.query('unique_id=="M1"')
[17]:
ds NBEATS NHITS MLP MLP1 LSTM GRU TiDE
unique_id
M1 1994-09-30 2820.708008 2837.058838 2383.298828 2454.077393 2507.238525 2390.672607 2070.891602
M1 1994-10-31 2353.479248 2515.124512 2158.801270 2129.693848 2583.255859 2305.683838 2451.634033
M1 1994-11-30 2303.218750 2391.572754 2291.510254 2139.453857 2939.735107 2787.014160 1721.748047
M1 1994-12-31 2779.379883 2526.065918 2591.097412 2590.750244 2814.711182 2776.601318 2240.189941
M1 1995-01-31 2188.668457 2063.863525 2234.466553 1887.351929 2838.986328 2785.913818 1796.146118
M1 1995-02-28 1911.853271 2175.461670 1985.965088 1816.956787 2691.198242 2710.279541 2631.498779
M1 1995-03-31 2451.522461 2249.096436 2160.844971 1911.716309 2619.605469 2653.789551 1920.423096
M1 1995-04-30 1735.337402 1839.155762 1614.420044 1335.228027 2504.666992 2538.148926 2000.088135
M1 1995-05-31 1939.377686 2237.283691 2295.252930 2200.254150 2474.093994 2538.217285 3073.768555
M1 1995-06-30 1589.561279 1889.871582 2036.288086 1789.682861 2385.954834 2385.965332 1627.879883
M1 1995-07-31 2011.243408 2133.244385 1954.133423 1680.693481 2248.914551 2293.520508 1596.402466
M1 1995-08-31 2416.882568 2366.746582 2237.609131 2308.385254 2482.302979 2661.940918 1933.988281
[18]:
ensemble_fcst = {}
for k in combiners:
    print(k)
    if k == 'ADE':
        # need to make sure unique_id is in columns
        fc = combiners[k].predict(fcst.reset_index(), train=train_df, h=horizon)
    else:
        fc = combiners[k].predict(fcst.reset_index())

    ensemble_fcst[k] = fc

ensemble_fcst_df = pd.DataFrame(ensemble_fcst)
ADE
MLpol
MLewa
LossOnTrain
BestOnTrain
EqAverage
Windowing
BLAST
[19]:
ensemble_fcst_df
[19]:
ADE MLpol MLewa LossOnTrain BestOnTrain EqAverage Windowing BLAST
0 2734.990919 2633.876117 2687.956142 2647.597933 2820.708008 2600.476318 2609.517685 2820.708008
1 2413.129298 2346.027112 2375.503276 2416.961861 2353.479248 2348.070947 2372.301306 2353.479248
2 2368.810476 2263.482370 2296.720479 2512.772930 2303.218750 2413.098145 2457.888033 2303.218750
3 2615.789141 2581.755320 2583.917473 2726.531104 2779.379883 2660.400928 2688.874069 2779.379883
4 2119.322474 2066.855810 2072.368257 2353.839662 2188.668457 2242.667358 2297.588398 2188.668457
... ... ... ... ... ... ... ... ...
17131 5235.239195 5263.440157 5260.213604 5228.020998 5255.608398 5242.863867 5228.022853 5255.608398
17132 5236.594895 5274.892725 5272.110740 5228.778586 5260.446777 5251.575293 5228.879529 5260.446777
17133 5233.346570 5275.373552 5271.142153 5224.094460 5230.098633 5253.171680 5223.840454 5230.098633
17134 5247.344425 5305.818907 5300.775081 5238.732074 5271.788086 5278.508789 5238.902449 5271.788086
17135 5254.623441 5304.463775 5299.146796 5242.637530 5259.853516 5277.595410 5242.359317 5259.853516

17136 rows × 8 columns

[20]:
fcst_all = pd.concat([fcst.reset_index(),ensemble_fcst_df],axis=1)

fcst_all.head()
[20]:
unique_id ds NBEATS NHITS MLP MLP1 LSTM GRU TiDE ADE MLpol MLewa LossOnTrain BestOnTrain EqAverage Windowing BLAST
0 M1 1994-09-30 2820.708008 2837.058838 2383.298828 2454.077393 2507.238525 2390.672607 2070.891602 2734.990919 2633.876117 2687.956142 2647.597933 2820.708008 2600.476318 2609.517685 2820.708008
1 M1 1994-10-31 2353.479248 2515.124512 2158.801270 2129.693848 2583.255859 2305.683838 2451.634033 2413.129298 2346.027112 2375.503276 2416.961861 2353.479248 2348.070947 2372.301306 2353.479248
2 M1 1994-11-30 2303.218750 2391.572754 2291.510254 2139.453857 2939.735107 2787.014160 1721.748047 2368.810476 2263.482370 2296.720479 2512.772930 2303.218750 2413.098145 2457.888033 2303.218750
3 M1 1994-12-31 2779.379883 2526.065918 2591.097412 2590.750244 2814.711182 2776.601318 2240.189941 2615.789141 2581.755320 2583.917473 2726.531104 2779.379883 2660.400928 2688.874069 2779.379883
4 M1 1995-01-31 2188.668457 2063.863525 2234.466553 1887.351929 2838.986328 2785.913818 1796.146118 2119.322474 2066.855810 2072.368257 2353.839662 2188.668457 2242.667358 2297.588398 2188.668457

4. Evaluation

Finally, we compare all approaches

[21]:
test_df = test_df.merge(fcst_all, on=['unique_id','ds'], how="left")
[22]:
from neuralforecast.losses.numpy import smape
from datasetsforecast.evaluation import accuracy

evaluation_df = accuracy(test_df, [smape], agg_by=['unique_id'])
evaluation_df
[22]:
metric unique_id NBEATS NHITS MLP MLP1 LSTM GRU TiDE ADE MLpol MLewa LossOnTrain BestOnTrain EqAverage Windowing BLAST
0 smape M1 0.557906 0.565255 0.534834 0.545832 0.581718 0.577866 0.542147 0.553334 0.548903 0.550354 0.555674 0.557906 0.550354 0.551796 0.557906
1 smape M10 0.203316 0.192507 0.188595 0.183235 0.187489 0.188297 0.199679 0.192563 0.192023 0.192340 0.187226 0.203316 0.189806 0.186955 2.000000
2 smape M100 0.092943 0.084622 0.090799 0.095776 0.118231 0.119362 0.115861 0.098916 0.091347 0.089711 0.090521 0.092943 0.096197 0.090546 0.092943
3 smape M1000 0.011336 0.011435 0.013824 0.010244 0.010340 0.004813 0.006411 0.006530 0.010227 0.010599 0.006014 0.011336 0.007341 0.006490 0.011336
4 smape M1001 0.019537 0.019699 0.024808 0.019278 0.015281 0.007963 0.011028 0.013953 0.018969 0.019145 0.014497 0.019537 0.015982 0.014861 0.019537
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1423 smape M995 0.007859 0.005549 0.012240 0.006474 0.026410 0.007585 0.010029 0.007985 0.004580 0.004588 0.008676 0.007859 0.006321 0.009893 0.007859
1424 smape M996 0.022319 0.022942 0.029199 0.026535 0.009965 0.007247 0.018761 0.017294 0.024090 0.023890 0.017470 0.022319 0.019209 0.017903 0.022319
1425 smape M997 0.081531 0.080787 0.077392 0.075628 0.065652 0.060192 0.080346 0.075875 0.079193 0.079543 0.076511 0.081531 0.075707 0.076070 0.081531
1426 smape M998 0.008090 0.007945 0.010729 0.010613 0.026844 0.018760 0.022788 0.012541 0.008733 0.008367 0.012279 0.008090 0.011400 0.012000 0.008090
1427 smape M999 0.009194 0.009976 0.018110 0.011560 0.013838 0.008884 0.012119 0.006620 0.010070 0.009821 0.007000 0.009194 0.008707 0.007049 0.009194

1428 rows × 17 columns

[23]:
evaluation_df.drop(columns=['metric','unique_id']).mean().sort_values()
[23]:
MLewa          0.126739
MLpol          0.126860
Windowing      0.126917
LossOnTrain    0.127034
EqAverage      0.127254
NBEATS         0.127429
ADE            0.127547
NHITS          0.127779
MLP1           0.130285
MLP            0.130502
GRU            0.135435
TiDE           0.139523
LSTM           0.140415
BestOnTrain    0.183605
BLAST          0.188772
dtype: float64