# Building Multi-Step Forecasting Models

November 29, 2020 2021-01-02 19:01## Building Multi-Step Forecasting Models

Both the single-output and multiple-output models in the previous sections made single time step predictions, i.e., an hour into the future. In this lesson, we will be going over how to build different multiple-step time-series forecasting models using TensorFlow 2.0.

In a multi-step prediction, the model needs to learn to predict a range of future values. Thus, unlike a single step model, where only a single future point is predicted, a multi-step model predicts a sequence of the values into the future.

Multi-step forecasting can be done in following two approaches,

- Direct method where the entire sequence of future values is predicted at once.
- Recursive method where the model only makes single step predictions such that the prediction made is again fed back into the model as input recursively.

This chapter will cover both the approaches for multi-step forecasting.

For the multi-step model, the training data will again consist of hourly samples. However, here, the models will learn to predict 24h of the future, given 24h of the past.

Here is a `Window`

object that generates these slices from the dataset:

OUT_STEPS = 24 multi_window = WindowGenerator(input_width=24, label_width=OUT_STEPS, shift=OUT_STEPS) # Plotting the window multi_window.plot() multi_window

Total window size: 48 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Label indices: [24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47] Label column name(s): None

In the above diagram, the blue dots represent the input features whereas the green dots represent the labels to be predicted.

#### Creating a baseline

A simple baseline for this task is to repeat the last input time step for the required number of output timesteps:

class MultiStepLastBaseline(tf.keras.Model): def call(self, inputs): return tf.tile(inputs[:, -1:, :], [1, OUT_STEPS, 1]) last_baseline = MultiStepLastBaseline() last_baseline.compile(loss=tf.losses.MeanSquaredError(), metrics=[tf.metrics.MeanAbsoluteError()]) multi_val_performance = {} multi_performance = {} multi_val_performance['Last'] = last_baseline.evaluate(multi_window.val) multi_performance['Last'] = last_baseline.evaluate(multi_window.val, verbose=0) multi_window.plot(last_baseline)

437/437 [==============================] - 1s 2ms/step - loss: 0.6285 - mean_absolute_error: 0.5007

From the above-plotted graphs, we can clearly see that our baseline model is returning the last input value as its output for the next 24h.

Another approach to creating a baseline model would be to repeat the exact same pattern of the previous 24 hrs (input data) as shown below,

class RepeatBaseline(tf.keras.Model): def call(self, inputs): return inputs repeat_baseline = RepeatBaseline() repeat_baseline.compile(loss=tf.losses.MeanSquaredError(), metrics=[tf.metrics.MeanAbsoluteError()]) multi_val_performance['Repeat'] = repeat_baseline.evaluate(multi_window.val) multi_performance['Repeat'] = repeat_baseline.evaluate(multi_window.test, verbose=0) multi_window.plot(repeat_baseline)

437/437 [==============================] - 1s 2ms/step - loss: 0.4270 - mean_absolute_error: 0.3959

From the above figures we can clearly observe that the predictions from our base line model (represented by orange cross) is exactly the same as the input data (represented by blue dots).

#### Single-shot models for multi-step forecasting

One high-level approach to this problem is to use a “single-shot” model, where the model makes the entire sequence prediction in a single step. Such a forecasting method is also sometimes referred to as a direct method.

This can be implemented efficiently as a `layers.Dense`

with `OUT_STEPS*features`

output units. The model just needs to reshape that output to the required `(OUTPUT_STEPS, features)`

.

###### 1. Using a linear model

A simple linear model based on the last input time step does better than either baseline, but is underpowered. The model needs to predict `OUTPUT_STEPS`

time steps, from a single input time step with a linear projection. It can only capture a low-dimensional slice of the behavior, likely based mainly on the time of day and time of year.

multi_linear_model = tf.keras.Sequential([ # Take the last time-step. # Shape [batch, time, features] => [batch, 1, features] tf.keras.layers.Lambda(lambda x: x[:, -1:, :]), # Shape => [batch, 1, out_steps*features] tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros), # Shape => [batch, out_steps, features] tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_linear_model, multi_window) IPython.display.clear_output() multi_val_performance['Linear'] = multi_linear_model.evaluate(multi_window.val) multi_performance['Linear'] = multi_linear_model.evaluate(multi_window.test, verbose=0) multi_window.plot(multi_linear_model)

437/437 [==============================] - 1s 2ms/step - loss: 0.2556 - mean_absolute_error: 0.3057

###### 2. Using a dense model

Adding a `layers.Dense`

between the input and output gives the linear model more power, but is still only based on a single input timestep.

multi_dense_model = tf.keras.Sequential([ # Take the last time step. # Shape [batch, time, features] => [batch, 1, features] tf.keras.layers.Lambda(lambda x: x[:, -1:, :]), # Shape => [batch, 1, dense_units] tf.keras.layers.Dense(512, activation='relu'), # Shape => [batch, out_steps*features] tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros), # Shape => [batch, out_steps, features] tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_dense_model, multi_window) IPython.display.clear_output() multi_val_performance['Dense'] = multi_dense_model.evaluate(multi_window.val) multi_performance['Dense'] = multi_dense_model.evaluate(multi_window.test, verbose=0) multi_window.plot(multi_dense_model)

437/437 [==============================] - 1s 2ms/step - loss: 0.2202 - mean_absolute_error: 0.2803

###### 3. Using a Convolutional Neural Network (CNN)

A convolutional model makes predictions based on a fixed-width history, which may lead to better performance than the dense model since it can see how things are changing over time:

CONV_WIDTH = 3 multi_conv_model = tf.keras.Sequential([ # Shape [batch, time, features] => [batch, CONV_WIDTH, features] tf.keras.layers.Lambda(lambda x: x[:, -CONV_WIDTH:, :]), # Shape => [batch, 1, conv_units] tf.keras.layers.Conv1D(256, activation='relu', kernel_size=(CONV_WIDTH)), # Shape => [batch, 1, out_steps*features] tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros), # Shape => [batch, out_steps, features] tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_conv_model, multi_window) IPython.display.clear_output() multi_val_performance['Conv'] = multi_conv_model.evaluate(multi_window.val) multi_performance['Conv'] = multi_conv_model.evaluate(multi_window.test, verbose=0) multi_window.plot(multi_conv_model)

437/437 [==============================] - 1s 2ms/step - loss: 0.2142 - mean_absolute_error: 0.2798

###### 4. Using a Recurrent Neural Network (RNN)

A recurrent model can learn to use a long history of inputs, if it’s relevant to the predictions the model is making. Here the model will accumulate internal state for 24h, before making a single prediction for the next 24h.

In this single-shot format, the LSTM only needs to produce an output at the last time step, so set `return_sequences=False`

.

multi_lstm_model = tf.keras.Sequential([ # Shape [batch, time, features] => [batch, lstm_units] # Adding more `lstm_units` just overfits more quickly. tf.keras.layers.LSTM(32, return_sequences=False), # Shape => [batch, out_steps*features] tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros), # Shape => [batch, out_steps, features] tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_lstm_model, multi_window) IPython.display.clear_output() multi_val_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.val) multi_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.train, verbose=0) multi_window.plot(multi_lstm_model)

437/437 [==============================] - 1s 3ms/step - loss: 0.2133 - mean_absolute_error: 0.2833

#### Auto-regressive models for multi-step forecasting

The above models all predict the entire output sequence as a in a single shot.

In some cases, it may be helpful for the model to decompose this prediction into individual time steps. Then each model’s output can be fed back into itself at each step and predictions can be made conditioned on the previous one. Such a method is also sometimes referred to as a recursive method.

One clear advantage to this style of model is that it can be used to produce output with a varying length as the output itself is fed into the model.

You could take any of single single-step multi-output models trained in the first half of this tutorial and run in an autoregressive feedback loop, but here we’ll focus on building a model that’s been explicitly trained to do that.

###### Using a Recurrent Neural Network

This course only builds an autoregressive RNN model, but this pattern could be applied to any model that was designed to output a single timestep.

The model will have the same basic form as the single-step `LSTM`

models we studied in the previous chapter: An `LSTM`

followed by a `layers.Dense`

that converts the `LSTM`

outputs to model predictions.

In this case the model has to manually manage the inputs for each step so it uses `layers.LSTMCell`

directly for the lower level, single time step interface.

class FeedBack(tf.keras.Model): def __init__(self, units, out_steps): super().__init__() self.out_steps = out_steps self.units = units self.lstm_cell = tf.keras.layers.LSTMCell(units) # Also wrap the LSTMCell in an RNN to simplify the `warmup` method. self.lstm_rnn = tf.keras.layers.RNN(self.lstm_cell, return_state=True) self.dense = tf.keras.layers.Dense(num_features) feedback_model = FeedBack(units=32, out_steps=OUT_STEPS)

The first method this model needs is a `warmup`

method to initialize is its internal state based on the inputs. Once trained this state will capture the relevant parts of the input history. This is equivalent to the single-step `LSTM`

model from earlier:

def warmup(self, inputs): # inputs.shape => (batch, time, features) # x.shape => (batch, lstm_units) x, *state = self.lstm_rnn(inputs) # predictions.shape => (batch, features) prediction = self.dense(x) return prediction, state FeedBack.warmup = warmup

This method returns a single time-step prediction, and the internal state of the LSTM. We can see the shape of the prediction made by the model as,

prediction, state = feedback_model.warmup(multi_window.example[0]) prediction.shape

TensorShape([32, 19])

With the `RNN`

‘s state, and an initial prediction, we can now continue iterating the model feeding the predictions at each step back as the input. Using this approach iteratively, we can make predictions much further into the future.

The simplest approach to collecting the output predictions is to use a python list, and `tf.stack`

after the loop.

def call(self, inputs, training=None): # Use a TensorArray to capture dynamically unrolled outputs. predictions = [] # Initialize the lstm state prediction, state = self.warmup(inputs) # Insert the first prediction predictions.append(prediction) # Run the rest of the prediction steps for n in range(1, self.out_steps): # Use the last prediction as input. x = prediction # Execute one lstm step. x, state = self.lstm_cell(x, states=state, training=training) # Convert the lstm output to a prediction. prediction = self.dense(x) # Add the prediction to the output predictions.append(prediction) # predictions.shape => (time, batch, features) predictions = tf.stack(predictions) # predictions.shape => (batch, time, features) predictions = tf.transpose(predictions, [1, 0, 2]) return predictions FeedBack.call = call

We can now test run this model on the example inputs as,

print('Output shape (batch, time, features): ', feedback_model(multi_window.example[0]).shape)

Output shape (batch, time, features): (32, 24, 19)

Now train the model:

history = compile_and_fit(feedback_model, multi_window) IPython.display.clear_output() multi_val_performance['AR LSTM'] = feedback_model.evaluate(multi_window.val) multi_performance['AR LSTM'] = feedback_model.evaluate(multi_window.test, verbose=0) multi_window.plot(feedback_model)

437/437 [==============================] - 3s 6ms/step - loss: 0.2250 - mean_absolute_error: 0.2998

#### Performance

In this final section of this chapter, we are going to evaluate the performance of every models built in this lesson.

x = np.arange(len(multi_performance)) width = 0.3 metric_name = 'mean_absolute_error' metric_index = lstm_model.metrics_names.index('mean_absolute_error') val_mae = [v[metric_index] for v in multi_val_performance.values()] test_mae = [v[metric_index] for v in multi_performance.values()] plt.bar(x - 0.17, val_mae, width, label='Validation') plt.bar(x + 0.17, test_mae, width, label='Test') plt.xticks(ticks=x, labels=multi_performance.keys(), rotation=45) plt.ylabel(f'MAE (average over all times and outputs)') _ = plt.legend()

The metrics for the multi-output models in the first half of this tutorial show the performance averaged across all output features. These performances similar but also averaged across output timesteps.

for name, value in multi_performance.items(): print(f'{name:8s}: {value[1]:0.4f}')

Last : 0.5007 Repeat : 0.3774 Linear : 0.2986 Dense : 0.2758 Conv : 0.2749 LSTM : 0.2727 AR LSTM : 0.2927

The gains achieved going from a dense model to convolutional and recurrent models are only a few percent (if any), and the autoregressive model performed clearly worse. So these more complex approaches may not be worthwhile on **this** problem, but there was no way to know without trying, and these models could be helpful for your problem.

## End of the Course

With this, we have come to the end of our course on time-series forecasting with TensorFlow 2.0. We hope that this course helped you as a stepping stone towards your journey in deep learning with TensorFlow 2.0. If you have any questions or feedback, please feel free to let us know in the comment section.

Also, now that you are able to code in Python, you can enroll in a domain-specific course by going through all of our intermediate and expert courses.

(As a reminder, we are constantly updating our courses. So, make sure that you check in, in the future as well!)