Both the single-output and multiple-output models in the previous sections made single-time step predictions, i.e., an hour into the future. In this lesson, we will be going over how to build different multiple-step time-series forecasting models using TensorFlow 2.0.
In a multi-step prediction, the model needs to learn to predict a range of future values. Thus, unlike a single-step model, where only a single future point is predicted, a multi-step model predicts a sequence of the values into the future.
Multi-step forecasting can be done in the following two approaches,
- Direct method where the entire sequence of future values is predicted at once.
- Recursive method where the model only makes single-step predictions such that the prediction made is again fed back into the model as input recursively.
This chapter will cover both approaches for multi-step forecasting.
For the multi-step model, the training data will again consist of hourly samples. However, here, the models will learn to predict 24h of the future, given 24h of the past.
Here is a Window
object that generates these slices from the dataset:
OUT_STEPS = 24 multi_window = WindowGenerator(input_width=24, label_width=OUT_STEPS, shift=OUT_STEPS) # Plotting the window multi_window.plot() multi_window
Total window size: 48 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Label indices: [24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47] Label column name(s): None
In the above diagram, the blue dots represent the input features whereas the green dots represent the labels to be predicted.
Creating a baseline
A simple baseline for this task is to repeat the last input time step for the required number of output timesteps:
class MultiStepLastBaseline(tf.keras.Model): def call(self, inputs): return tf.tile(inputs[:, -1:, :], [1, OUT_STEPS, 1]) last_baseline = MultiStepLastBaseline() last_baseline.compile(loss=tf.losses.MeanSquaredError(), metrics=[tf.metrics.MeanAbsoluteError()]) multi_val_performance = {} multi_performance = {} multi_val_performance['Last'] = last_baseline.evaluate(multi_window.val) multi_performance['Last'] = last_baseline.evaluate(multi_window.val, verbose=0) multi_window.plot(last_baseline)
437/437 [==============================] - 1s 2ms/step - loss: 0.6285 - mean_absolute_error: 0.5007
From the above-plotted graphs, we can clearly see that our baseline model is returning the last input value as its output for the next 24h.
Another approach to creating a baseline model would be to repeat the exact same pattern of the previous 24 hrs (input data) as shown below,
class RepeatBaseline(tf.keras.Model): def call(self, inputs): return inputs repeat_baseline = RepeatBaseline() repeat_baseline.compile(loss=tf.losses.MeanSquaredError(), metrics=[tf.metrics.MeanAbsoluteError()]) multi_val_performance['Repeat'] = repeat_baseline.evaluate(multi_window.val) multi_performance['Repeat'] = repeat_baseline.evaluate(multi_window.test, verbose=0) multi_window.plot(repeat_baseline)
437/437 [==============================] - 1s 2ms/step - loss: 0.4270 - mean_absolute_error: 0.3959
From the above figures we can clearly observe that the predictions from our base line model (represented by orange cross) is exactly the same as the input data (represented by blue dots).
Single-shot models for multi-step forecasting
One high-level approach to this problem is to use a “single-shot” model, where the model makes the entire sequence prediction in a single step. Such a forecasting method is also sometimes referred to as a direct method.
This can be implemented efficiently as a layers.Dense
with OUT_STEPS*features
output units. The model just needs to reshape that output to the required (OUTPUT_STEPS, features)
.
1. Using a linear model
A simple linear model based on the last input time step does better than either baseline, but is underpowered. The model needs to predict OUTPUT_STEPS
time steps, from a single input time step with a linear projection. It can only capture a low-dimensional slice of the behavior, likely based mainly on the time of day and time of year.
multi_linear_model = tf.keras.Sequential([ # Take the last time-step. # Shape [batch, time, features] => [batch, 1, features] tf.keras.layers.Lambda(lambda x: x[:, -1:, :]), # Shape => [batch, 1, out_steps*features] tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros), # Shape => [batch, out_steps, features] tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_linear_model, multi_window) IPython.display.clear_output() multi_val_performance['Linear'] = multi_linear_model.evaluate(multi_window.val) multi_performance['Linear'] = multi_linear_model.evaluate(multi_window.test, verbose=0) multi_window.plot(multi_linear_model)
437/437 [==============================] - 1s 2ms/step - loss: 0.2556 - mean_absolute_error: 0.3057
2. Using a dense model
Adding a layers.Dense
between the input and output gives the linear model more power, but is still only based on a single input timestep.
multi_dense_model = tf.keras.Sequential([ # Take the last time step. # Shape [batch, time, features] => [batch, 1, features] tf.keras.layers.Lambda(lambda x: x[:, -1:, :]), # Shape => [batch, 1, dense_units] tf.keras.layers.Dense(512, activation='relu'), # Shape => [batch, out_steps*features] tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros), # Shape => [batch, out_steps, features] tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_dense_model, multi_window) IPython.display.clear_output() multi_val_performance['Dense'] = multi_dense_model.evaluate(multi_window.val) multi_performance['Dense'] = multi_dense_model.evaluate(multi_window.test, verbose=0) multi_window.plot(multi_dense_model)
437/437 [==============================] - 1s 2ms/step - loss: 0.2202 - mean_absolute_error: 0.2803
3. Using a Convolutional Neural Network (CNN)
A convolutional model makes predictions based on a fixed-width history, which may lead to better performance than the dense model since it can see how things are changing over time:
CONV_WIDTH = 3 multi_conv_model = tf.keras.Sequential([ # Shape [batch, time, features] => [batch, CONV_WIDTH, features] tf.keras.layers.Lambda(lambda x: x[:, -CONV_WIDTH:, :]), # Shape => [batch, 1, conv_units] tf.keras.layers.Conv1D(256, activation='relu', kernel_size=(CONV_WIDTH)), # Shape => [batch, 1, out_steps*features] tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros), # Shape => [batch, out_steps, features] tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_conv_model, multi_window) IPython.display.clear_output() multi_val_performance['Conv'] = multi_conv_model.evaluate(multi_window.val) multi_performance['Conv'] = multi_conv_model.evaluate(multi_window.test, verbose=0) multi_window.plot(multi_conv_model)
437/437 [==============================] - 1s 2ms/step - loss: 0.2142 - mean_absolute_error: 0.2798
4. Using a Recurrent Neural Network (RNN)
A recurrent model can learn to use a long history of inputs, if it’s relevant to the predictions the model is making. Here the model will accumulate internal state for 24h, before making a single prediction for the next 24h.
In this single-shot format, the LSTM only needs to produce an output at the last time step, so set return_sequences=False
.
multi_lstm_model = tf.keras.Sequential([ # Shape [batch, time, features] => [batch, lstm_units] # Adding more `lstm_units` just overfits more quickly. tf.keras.layers.LSTM(32, return_sequences=False), # Shape => [batch, out_steps*features] tf.keras.layers.Dense(OUT_STEPS*num_features, kernel_initializer=tf.initializers.zeros), # Shape => [batch, out_steps, features] tf.keras.layers.Reshape([OUT_STEPS, num_features]) ]) history = compile_and_fit(multi_lstm_model, multi_window) IPython.display.clear_output() multi_val_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.val) multi_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.train, verbose=0) multi_window.plot(multi_lstm_model)
437/437 [==============================] - 1s 3ms/step - loss: 0.2133 - mean_absolute_error: 0.2833
Auto-regressive models for multi-step forecasting
The above models all predict the entire output sequence as a in a single shot.
In some cases, it may be helpful for the model to decompose this prediction into individual time steps. Then each model’s output can be fed back into itself at each step and predictions can be made conditioned on the previous one. Such a method is also sometimes referred to as a recursive method.
One clear advantage to this style of model is that it can be used to produce output with a varying length as the output itself is fed into the model.
You could take any of single single-step multi-output models trained in the first half of this tutorial and run in an autoregressive feedback loop, but here we’ll focus on building a model that’s been explicitly trained to do that.
Using a Recurrent Neural Network
This course only builds an autoregressive RNN model, but this pattern could be applied to any model that was designed to output a single timestep.
The model will have the same basic form as the single-step LSTM
models we studied in the previous chapter: An LSTM
followed by a layers.Dense
that converts the LSTM
outputs to model predictions.
In this case the model has to manually manage the inputs for each step so it uses layers.LSTMCell
directly for the lower level, single time step interface.
class FeedBack(tf.keras.Model): def __init__(self, units, out_steps): super().__init__() self.out_steps = out_steps self.units = units self.lstm_cell = tf.keras.layers.LSTMCell(units) # Also wrap the LSTMCell in an RNN to simplify the `warmup` method. self.lstm_rnn = tf.keras.layers.RNN(self.lstm_cell, return_state=True) self.dense = tf.keras.layers.Dense(num_features) feedback_model = FeedBack(units=32, out_steps=OUT_STEPS)
The first method this model needs is a warmup
method to initialize is its internal state based on the inputs. Once trained this state will capture the relevant parts of the input history. This is equivalent to the single-step LSTM
model from earlier:
def warmup(self, inputs): # inputs.shape => (batch, time, features) # x.shape => (batch, lstm_units) x, *state = self.lstm_rnn(inputs) # predictions.shape => (batch, features) prediction = self.dense(x) return prediction, state FeedBack.warmup = warmup
This method returns a single time-step prediction, and the internal state of the LSTM. We can see the shape of the prediction made by the model as,
prediction, state = feedback_model.warmup(multi_window.example[0]) prediction.shape
TensorShape([32, 19])
With the RNN
‘s state, and an initial prediction, we can now continue iterating the model feeding the predictions at each step back as the input. Using this approach iteratively, we can make predictions much further into the future.
The simplest approach to collecting the output predictions is to use a python list, and tf.stack
after the loop.
def call(self, inputs, training=None): # Use a TensorArray to capture dynamically unrolled outputs. predictions = [] # Initialize the lstm state prediction, state = self.warmup(inputs) # Insert the first prediction predictions.append(prediction) # Run the rest of the prediction steps for n in range(1, self.out_steps): # Use the last prediction as input. x = prediction # Execute one lstm step. x, state = self.lstm_cell(x, states=state, training=training) # Convert the lstm output to a prediction. prediction = self.dense(x) # Add the prediction to the output predictions.append(prediction) # predictions.shape => (time, batch, features) predictions = tf.stack(predictions) # predictions.shape => (batch, time, features) predictions = tf.transpose(predictions, [1, 0, 2]) return predictions FeedBack.call = call
We can now test run this model on the example inputs as,
print('Output shape (batch, time, features): ', feedback_model(multi_window.example[0]).shape)
Output shape (batch, time, features): (32, 24, 19)
Now train the model:
history = compile_and_fit(feedback_model, multi_window) IPython.display.clear_output() multi_val_performance['AR LSTM'] = feedback_model.evaluate(multi_window.val) multi_performance['AR LSTM'] = feedback_model.evaluate(multi_window.test, verbose=0) multi_window.plot(feedback_model)
437/437 [==============================] - 3s 6ms/step - loss: 0.2250 - mean_absolute_error: 0.2998
Performance
In this final section of this chapter, we are going to evaluate the performance of every models built in this lesson.
x = np.arange(len(multi_performance)) width = 0.3 metric_name = 'mean_absolute_error' metric_index = lstm_model.metrics_names.index('mean_absolute_error') val_mae = [v[metric_index] for v in multi_val_performance.values()] test_mae = [v[metric_index] for v in multi_performance.values()] plt.bar(x - 0.17, val_mae, width, label='Validation') plt.bar(x + 0.17, test_mae, width, label='Test') plt.xticks(ticks=x, labels=multi_performance.keys(), rotation=45) plt.ylabel(f'MAE (average over all times and outputs)') _ = plt.legend()
The metrics for the multi-output models in the first half of this tutorial show the performance averaged across all output features. These performances similar but also averaged across output timesteps.
for name, value in multi_performance.items(): print(f'{name:8s}: {value[1]:0.4f}')
Last : 0.5007 Repeat : 0.3774 Linear : 0.2986 Dense : 0.2758 Conv : 0.2749 LSTM : 0.2727 AR LSTM : 0.2927
The gains achieved going from a dense model to convolutional and recurrent models are only a few percent (if any), and the autoregressive model performed clearly worse. So these more complex approaches may not be worthwhile on this problem, but there was no way to know without trying, and these models could be helpful for your problem.
End of the Course
With this, we have come to the end of our course on time-series forecasting with TensorFlow 2.0. We hope that this course helped you as a stepping stone towards your journey in deep learning with TensorFlow 2.0. If you have any questions or feedback, please feel free to let us know in the comment section.
Also, now that you are able to code in Python, you can enroll in a domain-specific course by going through all of our data science courses.
(As a reminder, we are constantly updating our courses. So, make sure that you check in, in the future as well!)
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:
- Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
- Introduction to Data Science in Python- 400,000+ students already enrolled!
- Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
- Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!