Multi-Step Time Series Forecasting

Table of Contents

Both the single-output and multiple-output models in the previous sections made single-time step predictions, i.e., an hour into the future. In this lesson, we will be going over how to build different multiple-step time-series forecasting models using TensorFlow 2.0.

In a multi-step prediction, the model needs to learn to predict a range of future values. Thus, unlike a single-step model, where only a single future point is predicted, a multi-step model predicts a sequence of the values into the future.

Multi-step forecasting can be done in the following two approaches,

Direct method where the entire sequence of future values is predicted at once.
Recursive method where the model only makes single-step predictions such that the prediction made is again fed back into the model as input recursively.

This chapter will cover both approaches for multi-step forecasting.

For the multi-step model, the training data will again consist of hourly samples. However, here, the models will learn to predict 24h of the future, given 24h of the past.

Here is a Window object that generates these slices from the dataset:

OUT_STEPS = 24
multi_window = WindowGenerator(input_width=24,
                               label_width=OUT_STEPS,
                               shift=OUT_STEPS)

# Plotting the window
multi_window.plot()
multi_window

Total window size: 48 
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] 
Label indices: [24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47] 
Label column name(s): None

In the above diagram, the blue dots represent the input features whereas the green dots represent the labels to be predicted.

Creating a baseline

A simple baseline for this task is to repeat the last input time step for the required number of output timesteps:

class MultiStepLastBaseline(tf.keras.Model):
  def call(self, inputs):
    return tf.tile(inputs[:, -1:, :], [1, OUT_STEPS, 1])

last_baseline = MultiStepLastBaseline()
last_baseline.compile(loss=tf.losses.MeanSquaredError(),
                      metrics=[tf.metrics.MeanAbsoluteError()])

multi_val_performance = {}
multi_performance = {}

multi_val_performance['Last'] = last_baseline.evaluate(multi_window.val)
multi_performance['Last'] = last_baseline.evaluate(multi_window.val, verbose=0)
multi_window.plot(last_baseline)

437/437 [==============================] - 1s 2ms/step - loss: 0.6285 - mean_absolute_error: 0.5007

From the above-plotted graphs, we can clearly see that our baseline model is returning the last input value as its output for the next 24h.

Another approach to creating a baseline model would be to repeat the exact same pattern of the previous 24 hrs (input data) as shown below,

class RepeatBaseline(tf.keras.Model):
  def call(self, inputs):
    return inputs

repeat_baseline = RepeatBaseline()
repeat_baseline.compile(loss=tf.losses.MeanSquaredError(),
                        metrics=[tf.metrics.MeanAbsoluteError()])

multi_val_performance['Repeat'] = repeat_baseline.evaluate(multi_window.val)
multi_performance['Repeat'] = repeat_baseline.evaluate(multi_window.test, verbose=0)
multi_window.plot(repeat_baseline)

437/437 [==============================] - 1s 2ms/step - loss: 0.4270 - mean_absolute_error: 0.3959

Multi-step forecasting baseline model 2 plot

From the above figures we can clearly observe that the predictions from our base line model (represented by orange cross) is exactly the same as the input data (represented by blue dots).

Single-shot models for multi-step forecasting

One high-level approach to this problem is to use a "single-shot" model, where the model makes the entire sequence prediction in a single step. Such a forecasting method is also sometimes referred to as a direct method.

This can be implemented efficiently as a layers.Dense with OUT_STEPS*features output units. The model just needs to reshape that output to the required (OUTPUT_STEPS, features).

1. Using a linear model

A simple linear model based on the last input time step does better than either baseline, but is underpowered. The model needs to predict OUTPUT_STEPS time steps, from a single input time step with a linear projection. It can only capture a low-dimensional slice of the behavior, likely based mainly on the time of day and time of year.

multi_linear_model = tf.keras.Sequential([
    # Take the last time-step.
    # Shape [batch, time, features] => [batch, 1, features]
    tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
    # Shape => [batch, 1, out_steps*features]
    tf.keras.layers.Dense(OUT_STEPS*num_features,
                          kernel_initializer=tf.initializers.zeros),
    # Shape => [batch, out_steps, features]
    tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_linear_model, multi_window)

IPython.display.clear_output()
multi_val_performance['Linear'] = multi_linear_model.evaluate(multi_window.val)
multi_performance['Linear'] = multi_linear_model.evaluate(multi_window.test, verbose=0)
multi_window.plot(multi_linear_model)

437/437 [==============================] - 1s 2ms/step - loss: 0.2556 - mean_absolute_error: 0.3057

2. Using a dense model

Adding a layers.Dense between the input and output gives the linear model more power, but is still only based on a single input timestep.

multi_dense_model = tf.keras.Sequential([
    # Take the last time step.
    # Shape [batch, time, features] => [batch, 1, features]
    tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
    # Shape => [batch, 1, dense_units]
    tf.keras.layers.Dense(512, activation='relu'),
    # Shape => [batch, out_steps*features]
    tf.keras.layers.Dense(OUT_STEPS*num_features,
                          kernel_initializer=tf.initializers.zeros),
    # Shape => [batch, out_steps, features]
    tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_dense_model, multi_window)

IPython.display.clear_output()
multi_val_performance['Dense'] = multi_dense_model.evaluate(multi_window.val)
multi_performance['Dense'] = multi_dense_model.evaluate(multi_window.test, verbose=0)
multi_window.plot(multi_dense_model)

437/437 [==============================] - 1s 2ms/step - loss: 0.2202 - mean_absolute_error: 0.2803

3. Using a Convolutional Neural Network (CNN)

A convolutional model makes predictions based on a fixed-width history, which may lead to better performance than the dense model since it can see how things are changing over time:

CONV_WIDTH = 3
multi_conv_model = tf.keras.Sequential([
    # Shape [batch, time, features] => [batch, CONV_WIDTH, features]
    tf.keras.layers.Lambda(lambda x: x[:, -CONV_WIDTH:, :]),
    # Shape => [batch, 1, conv_units]
    tf.keras.layers.Conv1D(256, activation='relu', kernel_size=(CONV_WIDTH)),
    # Shape => [batch, 1,  out_steps*features]
    tf.keras.layers.Dense(OUT_STEPS*num_features,
                          kernel_initializer=tf.initializers.zeros),
    # Shape => [batch, out_steps, features]
    tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_conv_model, multi_window)

IPython.display.clear_output()

multi_val_performance['Conv'] = multi_conv_model.evaluate(multi_window.val)
multi_performance['Conv'] = multi_conv_model.evaluate(multi_window.test, verbose=0)
multi_window.plot(multi_conv_model)

437/437 [==============================] - 1s 2ms/step - loss: 0.2142 - mean_absolute_error: 0.2798

4. Using a Recurrent Neural Network (RNN)

A recurrent model can learn to use a long history of inputs, if it's relevant to the predictions the model is making. Here the model will accumulate internal state for 24h, before making a single prediction for the next 24h.

In this single-shot format, the LSTM only needs to produce an output at the last time step, so set return_sequences=False.

multi_lstm_model = tf.keras.Sequential([
    # Shape [batch, time, features] => [batch, lstm_units]
    # Adding more `lstm_units` just overfits more quickly.
    tf.keras.layers.LSTM(32, return_sequences=False),
    # Shape => [batch, out_steps*features]
    tf.keras.layers.Dense(OUT_STEPS*num_features,
                          kernel_initializer=tf.initializers.zeros),
    # Shape => [batch, out_steps, features]
    tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_lstm_model, multi_window)

IPython.display.clear_output()

multi_val_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.val)
multi_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.train, verbose=0)
multi_window.plot(multi_lstm_model)

437/437 [==============================] - 1s 3ms/step - loss: 0.2133 - mean_absolute_error: 0.2833

Auto-regressive models for multi-step forecasting

The above models all predict the entire output sequence as a in a single shot.

In some cases, it may be helpful for the model to decompose this prediction into individual time steps. Then each model's output can be fed back into itself at each step and predictions can be made conditioned on the previous one. Such a method is also sometimes referred to as a recursive method.

One clear advantage to this style of model is that it can be used to produce output with a varying length as the output itself is fed into the model.

You could take any of single single-step multi-output models trained in the first half of this tutorial and run in an autoregressive feedback loop, but here we'll focus on building a model that's been explicitly trained to do that.

Multi-step forecasting auto-regressive RNN model

Using a Recurrent Neural Network

This course only builds an autoregressive RNN model, but this pattern could be applied to any model that was designed to output a single timestep.

The model will have the same basic form as the single-step LSTM models we studied in the previous chapter: An LSTM followed by a layers.Dense that converts the LSTM outputs to model predictions.

In this case the model has to manually manage the inputs for each step so it uses layers.LSTMCell directly for the lower level, single time step interface.

class FeedBack(tf.keras.Model):
  def __init__(self, units, out_steps):
    super().__init__()
    self.out_steps = out_steps
    self.units = units
    self.lstm_cell = tf.keras.layers.LSTMCell(units)
    # Also wrap the LSTMCell in an RNN to simplify the `warmup` method.
    self.lstm_rnn = tf.keras.layers.RNN(self.lstm_cell, return_state=True)
    self.dense = tf.keras.layers.Dense(num_features)

feedback_model = FeedBack(units=32, out_steps=OUT_STEPS)

The first method this model needs is a warmup method to initialize is its internal state based on the inputs. Once trained this state will capture the relevant parts of the input history. This is equivalent to the single-step LSTM model from earlier:

def warmup(self, inputs):
  # inputs.shape => (batch, time, features)
  # x.shape => (batch, lstm_units)
  x, *state = self.lstm_rnn(inputs)

  # predictions.shape => (batch, features)
  prediction = self.dense(x)
  return prediction, state

FeedBack.warmup = warmup

This method returns a single time-step prediction, and the internal state of the LSTM. We can see the shape of the prediction made by the model as,

prediction, state = feedback_model.warmup(multi_window.example[0])
prediction.shape

TensorShape([32, 19])

With the RNN's state, and an initial prediction, we can now continue iterating the model feeding the predictions at each step back as the input. Using this approach iteratively, we can make predictions much further into the future.

The simplest approach to collecting the output predictions is to use a python list, and tf.stack after the loop.

def call(self, inputs, training=None):
  # Use a TensorArray to capture dynamically unrolled outputs.
  predictions = []
  # Initialize the lstm state
  prediction, state = self.warmup(inputs)

  # Insert the first prediction
  predictions.append(prediction)

  # Run the rest of the prediction steps
  for n in range(1, self.out_steps):
    # Use the last prediction as input.
    x = prediction
    # Execute one lstm step.
    x, state = self.lstm_cell(x, states=state,
                              training=training)
    # Convert the lstm output to a prediction.
    prediction = self.dense(x)
    # Add the prediction to the output
    predictions.append(prediction)

  # predictions.shape => (time, batch, features)
  predictions = tf.stack(predictions)
  # predictions.shape => (batch, time, features)
  predictions = tf.transpose(predictions, [1, 0, 2])
  return predictions

FeedBack.call = call

We can now test run this model on the example inputs as,

print('Output shape (batch, time, features): ', feedback_model(multi_window.example[0]).shape)

Output shape (batch, time, features): (32, 24, 19)

Now train the model:

history = compile_and_fit(feedback_model, multi_window)

IPython.display.clear_output()

multi_val_performance['AR LSTM'] = feedback_model.evaluate(multi_window.val)
multi_performance['AR LSTM'] = feedback_model.evaluate(multi_window.test, verbose=0)
multi_window.plot(feedback_model)

437/437 [==============================] - 3s 6ms/step - loss: 0.2250 - mean_absolute_error: 0.2998

Multi-step forecasting auto-regressive RNN model plot

Performance

In this final section of this chapter, we are going to evaluate the performance of every models built in this lesson.

x = np.arange(len(multi_performance))
width = 0.3


metric_name = 'mean_absolute_error'
metric_index = lstm_model.metrics_names.index('mean_absolute_error')
val_mae = [v[metric_index] for v in multi_val_performance.values()]
test_mae = [v[metric_index] for v in multi_performance.values()]

plt.bar(x - 0.17, val_mae, width, label='Validation')
plt.bar(x + 0.17, test_mae, width, label='Test')
plt.xticks(ticks=x, labels=multi_performance.keys(),
           rotation=45)
plt.ylabel(f'MAE (average over all times and outputs)')
_ = plt.legend()

Multi-step forecasting model performance evaluation

The metrics for the multi-output models in the first half of this tutorial show the performance averaged across all output features. These performances similar but also averaged across output timesteps.

for name, value in multi_performance.items():
  print(f'{name:8s}: {value[1]:0.4f}')

Last : 0.5007 
Repeat : 0.3774 
Linear : 0.2986 
Dense : 0.2758 
Conv : 0.2749 
LSTM : 0.2727 
AR LSTM : 0.2927

The gains achieved going from a dense model to convolutional and recurrent models are only a few percent (if any), and the autoregressive model performed clearly worse. So these more complex approaches may not be worthwhile on this problem, but there was no way to know without trying, and these models could be helpful for your problem.

End of the Course

With this, we have come to the end of our course on time-series forecasting with TensorFlow 2.0. We hope that this course helped you as a stepping stone towards your journey in deep learning with TensorFlow 2.0. If you have any questions or feedback, please feel free to let us know in the comment section.

Also, now that you are able to code in Python, you can enroll in a domain-specific course by going through all of our data science courses.

(As a reminder, we are constantly updating our courses. So, make sure that you check in, in the future as well!)

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
Introduction to Data Science in Python- 400,000+ students already enrolled!
Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Written by

The Click Reader

At The Click Reader, we are committed to empowering individuals with the tools and knowledge needed to excel in the ever-evolving field of data science. Our sole focus is delivering a world-class data science bootcamp that transforms beginners and upskillers into industry-ready professionals.

Multi-Step Time Series Forecasting

Creating a baseline

Single-shot models for multi-step forecasting

1. Using a linear model

2. Using a dense model

3. Using a Convolutional Neural Network (CNN)

4. Using a Recurrent Neural Network (RNN)

Auto-regressive models for multi-step forecasting

Using a Recurrent Neural Network

Performance

End of the Course

Related Articles

Pandas DataFrame

The Pooling Operation

Support Vector Regression

Introduction to Convolutional Neural Networks

Interested In Data Science Bootcamp?
Request more info now.

Multi-Step Time Series Forecasting

Creating a baseline

Single-shot models for multi-step forecasting

1. Using a linear model

2. Using a dense model

3. Using a Convolutional Neural Network (CNN)

4. Using a Recurrent Neural Network (RNN)

Auto-regressive models for multi-step forecasting

Using a Recurrent Neural Network

Performance

End of the Course

Related Articles

Pandas DataFrame

The Pooling Operation

Support Vector Regression

Introduction to Convolutional Neural Networks

Interested In Data Science Bootcamp?Request more info now.

Interested In Data Science Bootcamp?
Request more info now.