Thanks to theidioms.com

# Time-Series Forecasting with TensorFlow 2.0

## Time-Series Forecasting with TensorFlow 2.0

### Building Multi-Step Forecasting Models

Both the single-output and multiple-output models in the previous sections made single time step predictions, i.e., an hour into the future. In this lesson, we will be going over how to build different multiple-step time-series forecasting models using TensorFlow 2.0.

In a multi-step prediction, the model needs to learn to predict a range of future values. Thus, unlike a single step model, where only a single future point is predicted, a multi-step model predicts a sequence of the values into the future.

Multi-step forecasting can be done in following two approaches,

1. Direct method where the entire sequence of future values is predicted at once.
2. Recursive method where the model only makes single step predictions such that the prediction made is again fed back into the model as input recursively.

This chapter will cover both the approaches for multi-step forecasting.

For the multi-step model, the training data will again consist of hourly samples. However, here, the models will learn to predict 24h of the future, given 24h of the past.

Here is a `Window` object that generates these slices from the dataset:

```OUT_STEPS = 24
multi_window = WindowGenerator(input_width=24,
label_width=OUT_STEPS,
shift=OUT_STEPS)

# Plotting the window
multi_window.plot()
multi_window```
```Total window size: 48
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Label indices: [24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47]
Label column name(s): None```

In the above diagram, the blue dots represent the input features whereas the green dots represent the labels to be predicted.

#### Creating a baseline

A simple baseline for this task is to repeat the last input time step for the required number of output timesteps:

```class MultiStepLastBaseline(tf.keras.Model):
def call(self, inputs):
return tf.tile(inputs[:, -1:, :], [1, OUT_STEPS, 1])

last_baseline = MultiStepLastBaseline()
last_baseline.compile(loss=tf.losses.MeanSquaredError(),
metrics=[tf.metrics.MeanAbsoluteError()])

multi_val_performance = {}
multi_performance = {}

multi_val_performance['Last'] = last_baseline.evaluate(multi_window.val)
multi_performance['Last'] = last_baseline.evaluate(multi_window.val, verbose=0)
multi_window.plot(last_baseline)```
`437/437 [==============================] - 1s 2ms/step - loss: 0.6285 - mean_absolute_error: 0.5007`

From the above-plotted graphs, we can clearly see that our baseline model is returning the last input value as its output for the next 24h.

Another approach to creating a baseline model would be to repeat the exact same pattern of the previous 24 hrs (input data) as shown below,

```class RepeatBaseline(tf.keras.Model):
def call(self, inputs):
return inputs

repeat_baseline = RepeatBaseline()
repeat_baseline.compile(loss=tf.losses.MeanSquaredError(),
metrics=[tf.metrics.MeanAbsoluteError()])

multi_val_performance['Repeat'] = repeat_baseline.evaluate(multi_window.val)
multi_performance['Repeat'] = repeat_baseline.evaluate(multi_window.test, verbose=0)
multi_window.plot(repeat_baseline)```
`437/437 [==============================] - 1s 2ms/step - loss: 0.4270 - mean_absolute_error: 0.3959`

From the above figures we can clearly observe that the predictions from our base line model (represented by orange cross) is exactly the same as the input data (represented by blue dots).

#### Single-shot models for multi-step forecasting

One high-level approach to this problem is to use a “single-shot” model, where the model makes the entire sequence prediction in a single step. Such a forecasting method is also sometimes referred to as a direct method.

This can be implemented efficiently as a `layers.Dense` with `OUT_STEPS*features` output units. The model just needs to reshape that output to the required `(OUTPUT_STEPS, features)`.

###### 1. Using a linear model

A simple linear model based on the last input time step does better than either baseline, but is underpowered. The model needs to predict `OUTPUT_STEPS` time steps, from a single input time step with a linear projection. It can only capture a low-dimensional slice of the behavior, likely based mainly on the time of day and time of year.

```multi_linear_model = tf.keras.Sequential([
# Take the last time-step.
# Shape [batch, time, features] => [batch, 1, features]
tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
# Shape => [batch, 1, out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_linear_model, multi_window)

IPython.display.clear_output()
multi_val_performance['Linear'] = multi_linear_model.evaluate(multi_window.val)
multi_performance['Linear'] = multi_linear_model.evaluate(multi_window.test, verbose=0)
multi_window.plot(multi_linear_model)```
`437/437 [==============================] - 1s 2ms/step - loss: 0.2556 - mean_absolute_error: 0.3057`
###### 2. Using a dense model

Adding a `layers.Dense` between the input and output gives the linear model more power, but is still only based on a single input timestep.

```multi_dense_model = tf.keras.Sequential([
# Take the last time step.
# Shape [batch, time, features] => [batch, 1, features]
tf.keras.layers.Lambda(lambda x: x[:, -1:, :]),
# Shape => [batch, 1, dense_units]
tf.keras.layers.Dense(512, activation='relu'),
# Shape => [batch, out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_dense_model, multi_window)

IPython.display.clear_output()
multi_val_performance['Dense'] = multi_dense_model.evaluate(multi_window.val)
multi_performance['Dense'] = multi_dense_model.evaluate(multi_window.test, verbose=0)
multi_window.plot(multi_dense_model)```
`437/437 [==============================] - 1s 2ms/step - loss: 0.2202 - mean_absolute_error: 0.2803`
###### 3. Using a Convolutional Neural Network (CNN)

A convolutional model makes predictions based on a fixed-width history, which may lead to better performance than the dense model since it can see how things are changing over time:

```CONV_WIDTH = 3
multi_conv_model = tf.keras.Sequential([
# Shape [batch, time, features] => [batch, CONV_WIDTH, features]
tf.keras.layers.Lambda(lambda x: x[:, -CONV_WIDTH:, :]),
# Shape => [batch, 1, conv_units]
tf.keras.layers.Conv1D(256, activation='relu', kernel_size=(CONV_WIDTH)),
# Shape => [batch, 1,  out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_conv_model, multi_window)

IPython.display.clear_output()

multi_val_performance['Conv'] = multi_conv_model.evaluate(multi_window.val)
multi_performance['Conv'] = multi_conv_model.evaluate(multi_window.test, verbose=0)
multi_window.plot(multi_conv_model)```
`437/437 [==============================] - 1s 2ms/step - loss: 0.2142 - mean_absolute_error: 0.2798`
###### 4. Using a Recurrent Neural Network (RNN)

A recurrent model can learn to use a long history of inputs, if it’s relevant to the predictions the model is making. Here the model will accumulate internal state for 24h, before making a single prediction for the next 24h.

In this single-shot format, the LSTM only needs to produce an output at the last time step, so set `return_sequences=False`.

```multi_lstm_model = tf.keras.Sequential([
# Shape [batch, time, features] => [batch, lstm_units]
# Adding more `lstm_units` just overfits more quickly.
tf.keras.layers.LSTM(32, return_sequences=False),
# Shape => [batch, out_steps*features]
tf.keras.layers.Dense(OUT_STEPS*num_features,
kernel_initializer=tf.initializers.zeros),
# Shape => [batch, out_steps, features]
tf.keras.layers.Reshape([OUT_STEPS, num_features])
])

history = compile_and_fit(multi_lstm_model, multi_window)

IPython.display.clear_output()

multi_val_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.val)
multi_performance['LSTM'] = multi_lstm_model.evaluate(multi_window.train, verbose=0)
multi_window.plot(multi_lstm_model)```
`437/437 [==============================] - 1s 3ms/step - loss: 0.2133 - mean_absolute_error: 0.2833`

#### Auto-regressive models for multi-step forecasting

The above models all predict the entire output sequence as a in a single shot.

In some cases, it may be helpful for the model to decompose this prediction into individual time steps. Then each model’s output can be fed back into itself at each step and predictions can be made conditioned on the previous one. Such a method is also sometimes referred to as a recursive method.

One clear advantage to this style of model is that it can be used to produce output with a varying length as the output itself is fed into the model.

You could take any of single single-step multi-output models trained in the first half of this tutorial and run in an autoregressive feedback loop, but here we’ll focus on building a model that’s been explicitly trained to do that.

###### Using a Recurrent Neural Network

This course only builds an autoregressive RNN model, but this pattern could be applied to any model that was designed to output a single timestep.

The model will have the same basic form as the single-step `LSTM` models we studied in the previous chapter: An `LSTM` followed by a `layers.Dense` that converts the `LSTM` outputs to model predictions.

In this case the model has to manually manage the inputs for each step so it uses `layers.LSTMCell` directly for the lower level, single time step interface.

```class FeedBack(tf.keras.Model):
def __init__(self, units, out_steps):
super().__init__()
self.out_steps = out_steps
self.units = units
self.lstm_cell = tf.keras.layers.LSTMCell(units)
# Also wrap the LSTMCell in an RNN to simplify the `warmup` method.
self.lstm_rnn = tf.keras.layers.RNN(self.lstm_cell, return_state=True)
self.dense = tf.keras.layers.Dense(num_features)

feedback_model = FeedBack(units=32, out_steps=OUT_STEPS)```

The first method this model needs is a `warmup` method to initialize is its internal state based on the inputs. Once trained this state will capture the relevant parts of the input history. This is equivalent to the single-step `LSTM` model from earlier:

```def warmup(self, inputs):
# inputs.shape => (batch, time, features)
# x.shape => (batch, lstm_units)
x, *state = self.lstm_rnn(inputs)

# predictions.shape => (batch, features)
prediction = self.dense(x)
return prediction, state

FeedBack.warmup = warmup```

This method returns a single time-step prediction, and the internal state of the LSTM. We can see the shape of the prediction made by the model as,

```prediction, state = feedback_model.warmup(multi_window.example[0])
prediction.shape```
`TensorShape([32, 19])`

With the `RNN`‘s state, and an initial prediction, we can now continue iterating the model feeding the predictions at each step back as the input. Using this approach iteratively, we can make predictions much further into the future.

The simplest approach to collecting the output predictions is to use a python list, and `tf.stack` after the loop.

```def call(self, inputs, training=None):
# Use a TensorArray to capture dynamically unrolled outputs.
predictions = []
# Initialize the lstm state
prediction, state = self.warmup(inputs)

# Insert the first prediction
predictions.append(prediction)

# Run the rest of the prediction steps
for n in range(1, self.out_steps):
# Use the last prediction as input.
x = prediction
# Execute one lstm step.
x, state = self.lstm_cell(x, states=state,
training=training)
# Convert the lstm output to a prediction.
prediction = self.dense(x)
# Add the prediction to the output
predictions.append(prediction)

# predictions.shape => (time, batch, features)
predictions = tf.stack(predictions)
# predictions.shape => (batch, time, features)
predictions = tf.transpose(predictions, [1, 0, 2])
return predictions

FeedBack.call = call```

We can now test run this model on the example inputs as,

`print('Output shape (batch, time, features): ', feedback_model(multi_window.example[0]).shape)`
`Output shape (batch, time, features): (32, 24, 19)`

Now train the model:

```history = compile_and_fit(feedback_model, multi_window)

IPython.display.clear_output()

multi_val_performance['AR LSTM'] = feedback_model.evaluate(multi_window.val)
multi_performance['AR LSTM'] = feedback_model.evaluate(multi_window.test, verbose=0)
multi_window.plot(feedback_model)```
`437/437 [==============================] - 3s 6ms/step - loss: 0.2250 - mean_absolute_error: 0.2998`

#### Performance

In this final section of this chapter, we are going to evaluate the performance of every models built in this lesson.

```x = np.arange(len(multi_performance))
width = 0.3

metric_name = 'mean_absolute_error'
metric_index = lstm_model.metrics_names.index('mean_absolute_error')
val_mae = [v[metric_index] for v in multi_val_performance.values()]
test_mae = [v[metric_index] for v in multi_performance.values()]

plt.bar(x - 0.17, val_mae, width, label='Validation')
plt.bar(x + 0.17, test_mae, width, label='Test')
plt.xticks(ticks=x, labels=multi_performance.keys(),
rotation=45)
plt.ylabel(f'MAE (average over all times and outputs)')
_ = plt.legend()```

The metrics for the multi-output models in the first half of this tutorial show the performance averaged across all output features. These performances similar but also averaged across output timesteps.

```for name, value in multi_performance.items():
print(f'{name:8s}: {value[1]:0.4f}')```
```Last : 0.5007
Repeat : 0.3774
Linear : 0.2986
Dense : 0.2758
Conv : 0.2749
LSTM : 0.2727
AR LSTM : 0.2927```

The gains achieved going from a dense model to convolutional and recurrent models are only a few percent (if any), and the autoregressive model performed clearly worse. So these more complex approaches may not be worthwhile on this problem, but there was no way to know without trying, and these models could be helpful for your problem.