# Single-Step Time Series Forecasting

November 29, 2020 2021-11-29 9:48## Single-Step Time Series Forecasting

In this lesson, we will be going over how to build a single-step time series forecasting model using TensorFlow 2.0. We will start by predicting the ‘T (degC)’ feature one step into the future.

Single-step forecasting models are those which predict the observation at the next time step, i.e., only one time-step is to be predicted. For example, given the weather information of the past 6 days, a single step forecasting model will only predict the weather of the 7th day, i.e., only one time-step into the future.

We will first start by creating a single step window using the `WindowGenerator`

class we defined in the previous chapter. Also, we will only be taking the `'T (degC)'`

column of our data.

single_step_window = WindowGenerator( input_width=1, label_width=1, shift=1, label_columns=['T (degC)']) single_step_window

Total window size: 2 Input indices: [0] Label indices: [1] Label column name(s): ['T (degC)']

The `window`

object creates `tf.data.Datasets`

from the training, validation, and test sets, that allow us to iterate over batches of data. The following example illustrates how we can iterate over the train set,

for example_inputs, example_labels in single_step_window.train.take(1): print(f'Inputs shape (batch, time, features): {example_inputs.shape}') print(f'Labels shape (batch, time, features): {example_labels.shape}')

Inputs shape (batch, time, features): (32, 1, 19) Labels shape (batch, time, features): (32, 1, 1)

Now, we will move on to build forecasting models that could predict temperature 1h in the future given the current value of all features.

## 1. Creating a baseline model

Before building an actual trainable model, it would be easier for us to understand a simple model which is nothing but a model that takes in input values and just returns the current temperature as a prediction.

In other words, the model we will be building will predict “No change”. This is a reasonable baseline since temperature changes slowly. Of course, this baseline will work less well if we make a prediction further in the future.

class Baseline(tf.keras.Model): def __init__(self, label_index=None): super().__init__() self.label_index = label_index def call(self, inputs): if self.label_index is None: return inputs result = inputs[:, :, self.label_index] return result[:, :, tf.newaxis] # Instantiatie and compile the model baseline = Baseline(label_index=column_indices['T (degC)']) baseline.compile(loss=tf.losses.MeanSquaredError(), metrics=[tf.metrics.MeanAbsoluteError()]) val_performance = {} performance = {} val_performance['Baseline'] = baseline.evaluate(single_step_window.val) performance['Baseline'] = baseline.evaluate(single_step_window.test, verbose=0)

439/439 [==============================] - 2s 5ms/step - loss: 0.0128 - mean_absolute_error: 0.0785

The above block of code created our model named baseline which simply returned the inputs by adding another axis to it using tf.newaxis. Then we compiled the instantiated and compiled the model using Mean Squared Error as our loss function and Mean Absolute Error as our performance metrics. Finally, we printed some metrics. But, as we are not training the model, those metrics don’t give us an idea of how well our model is doing.

The `WindowGenerator`

has a plot method, but the plots won’t be very interesting with only a single sample. So, we will create a wider `WindowGenerator`

named `wide_window`

that generates windows 24h of consecutive inputs and labels at a time.

wide_window = WindowGenerator( input_width=24, label_width=24, shift=1, label_columns=['T (degC)']) wide_window

Total window size: 25 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Label indices: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] Label column name(s): ['T (degC)']

The `wide_window`

doesn’t change the way the model operates. The model still makes predictions 1h into the future based on a single input time step. Here the `time`

axis acts like the `batch`

axis: Each prediction is made independently with no interaction between time steps.

This expanded window can be passed directly to the same `baseline`

model without any code changes. This is possible because the inputs and labels have the same number of timesteps, and the baseline just forwards the input to the output as illustrated below,

When we plot the baseline model’s predictions, we can clearly see that the predictions are simply the labels, shifted right by 1h.

wide_window.plot(baseline)

In the above plots of three graphs, the single-step model is run over the course of 24h. Here is a brief explanation of the above graphs,

- The blue “Inputs” line shows the input temperature at each time step. Though the model receives all features, this plot only shows the temperature feature.
- The green “Labels” dots show the target prediction value. These dots are shown at the prediction time, not the input time. That is why the range of labels is shifted 1 step relative to the inputs.
- The orange “Predictions” crosses are the model’s predictions for each output time step. If the model was predicting perfectly, the predictions would have landed directly on the “labels” (green dots).

## 2. Using a **single step time series forecasting linear model**

Linear model refers to those models that perform a linear transformation between the input variables to obtain the output. It is one of the simplest trainable model that we can apply. In a single time step forecasting model, the output from a time step only depends on that step.

To create our linear model, we use TensorFlow’s sequential model having only a single dense layer and no activation function. The dense layer only transforms the last axis of the data from `(batch, time, inputs)`

to `(batch, time, units)`

. The transformation is applied independently to every item across the `batch`

and `time`

axes.

linear = tf.keras.Sequential([ tf.keras.layers.Dense(units=1) ]) print('Input shape:', single_step_window.example[0].shape) print('Output shape:', linear(single_step_window.example[0]).shape)

Input shape: (32, 1, 19) Output shape: (32, 1, 1)

Thus, we can see that our model takes in an input shape of (batch, time, inputs) and transforms it to (batch, time, units).

Now, we will define a function for compiling and training the model. Packaging the compiling and training process into a single function will also be useful for training other models later in this course.

MAX_EPOCHS = 20 def compile_and_fit(model, window, patience=2): early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience, mode='min') model.compile(loss=tf.losses.MeanSquaredError(), optimizer=tf.optimizers.Adam(), metrics=[tf.metrics.MeanAbsoluteError()]) history = model.fit(window.train, epochs=MAX_EPOCHS, validation_data=window.val, callbacks=[early_stopping]) return history

Finally, we train the model and evaluate its performance,

history = compile_and_fit(linear, single_step_window) val_performance['Linear'] = linear.evaluate(single_step_window.val) performance['Linear'] = linear.evaluate(single_step_window.test, verbose=0)

Epoch 1/20 1534/1534 [==============================] - 5s 3ms/step - loss: 0.4313 - mean_absolute_error: 0.3443 - val_loss: 0.0185 - val_mean_absolute_error: 0.1006 Epoch 2/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0145 - mean_absolute_error: 0.0891 - val_loss: 0.0107 - val_mean_absolute_error: 0.0760 Epoch 3/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0101 - mean_absolute_error: 0.0739 - val_loss: 0.0089 - val_mean_absolute_error: 0.0692 Epoch 4/20 1534/1534 [==============================] - 5s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0700 - val_loss: 0.0086 - val_mean_absolute_error: 0.0680 Epoch 5/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090 - mean_absolute_error: 0.0695 - val_loss: 0.0087 - val_mean_absolute_error: 0.0682 Epoch 6/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0697 - val_loss: 0.0086 - val_mean_absolute_error: 0.0675 Epoch 7/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0697 - val_loss: 0.0087 - val_mean_absolute_error: 0.0685 Epoch 8/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090 - mean_absolute_error: 0.0695 - val_loss: 0.0087 - val_mean_absolute_error: 0.0677 439/439 [==============================] - 1s 2ms/step - loss: 0.0087 - mean_absolute_error: 0.0677

Like the `baseline`

model, the linear model can be called on batches of wide windows. Used this way the model makes a set of independent predictions on consecutive time steps. The time axis acts like another batch axis. There are no interactions between the predictions at each time step.

wide_window.plot(linear)

One advantage of linear models is that they’re relatively simple to interpret. You can pull out the layer’s weights, and see the weight assigned to each input:

plt.bar(x = range(len(train_df.columns)), height=linear.layers[0].kernel[:,0].numpy()) axis = plt.gca() axis.set_xticks(range(len(train_df.columns))) _ = axis.set_xticklabels(train_df.columns, rotation=90)

**3. Using a single step time series forecasting dense model**

In the previous section the model was a linear model which had only a single dense layer. In this section, we will be building a more dense, more powerful NN model.

The Dense model is similar to the linear model, except it has more number of Dense layers between the input and output as shown below,

dense = tf.keras.Sequential([ tf.keras.layers.Dense(units=64, activation='relu'), tf.keras.layers.Dense(units=64, activation='relu'), tf.keras.layers.Dense(units=1) ])

As we can see, the Dense model has 3 sequential dense layers as opposed to a single dense layer in the Linear Model. Now, we will compile, train and evaluate the Dense model,

history = compile_and_fit(dense, single_step_window) val_performance['Dense'] = dense.evaluate(single_step_window.val) performance['Dense'] = dense.evaluate(single_step_window.test, verbose=0)

Epoch 1/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0173 - mean_absolute_error: 0.0803 - val_loss: 0.0082 - val_mean_absolute_error: 0.0669 Epoch 2/20 1534/1534 [==============================] - 7s 4ms/step - loss: 0.0077 - mean_absolute_error: 0.0632 - val_loss: 0.0084 - val_mean_absolute_error: 0.0691 Epoch 3/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0073 - mean_absolute_error: 0.0615 - val_loss: 0.0069 - val_mean_absolute_error: 0.0586 Epoch 4/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0070 - mean_absolute_error: 0.0598 - val_loss: 0.0067 - val_mean_absolute_error: 0.0575 Epoch 5/20 1534/1534 [==============================] - 5s 4ms/step - loss: 0.0069 - mean_absolute_error: 0.0592 - val_loss: 0.0064 - val_mean_absolute_error: 0.0561 Epoch 6/20 1534/1534 [==============================] - 5s 4ms/step - loss: 0.0068 - mean_absolute_error: 0.0584 - val_loss: 0.0064 - val_mean_absolute_error: 0.0563 Epoch 7/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0068 - mean_absolute_error: 0.0582 - val_loss: 0.0065 - val_mean_absolute_error: 0.0574 439/439 [==============================] - 1s 2ms/step - loss: 0.0065 - mean_absolute_error: 0.0574

**4. Using a multi-step input dense model**

A single-time-step model has no context for the current values of its inputs. The model fails to see how the input features change over time. To address this issue the model needs access to multiple time steps when making predictions,

The `baseline`

, `linear`

and `dense`

models we studied earlier handled each time step independently. However, now our model will take multiple time steps as input to produce a single output.

Let us first start by creating our data for the model. We will create a `WindowGenerator`

that will produce batches of the 3h of inputs and, 1h of labels:

CONV_WIDTH = 3 conv_window = WindowGenerator( input_width=CONV_WIDTH, label_width=1, shift=1, label_columns=['T (degC)']) conv_window

Total window size: 4 Input indices: [0 1 2] Label indices: [3] Label column name(s): ['T (degC)']

Note that the `Window`

‘s `shift`

parameter is relative to the end of the two windows. Now we will plot the window created,

conv_window.plot() plt.title("Given 3h as input, predict 1h into the future.") plt.tight_layout()

As we can see from the above figure, we need to predict 1h into the future given past 3 hours of data as input.

We could train a `dense`

model on a multiple-input-step window by adding a `layers.Flatten`

as the first layer of the model. This layer will flatten the input matrix into a 1-D array which can then be passed to other dense layers. For the dense layers, we will be using the ReLU activation function.

multi_step_dense = tf.keras.Sequential([ # Shape: (time, features) => (time*features) tf.keras.layers.Flatten(), tf.keras.layers.Dense(units=32, activation='relu'), tf.keras.layers.Dense(units=32, activation='relu'), tf.keras.layers.Dense(units=1), # Add back the time dimension. # Shape: (outputs) => (1, outputs) tf.keras.layers.Reshape([1, -1]), ]) print('Input shape:', conv_window.example[0].shape) print('Output shape:', multi_step_dense(conv_window.example[0]).shape)

Input shape: (32, 3, 19) Output shape: (32, 1, 1)

After creating the model, we will use the `compile_and_fit`

function we defined earlier to compile the model created and fit the model to data.

history = compile_and_fit(multi_step_dense, conv_window) IPython.display.clear_output() val_performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.val) performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.test, verbose=0)

438/438 [==============================] - 1s 2ms/step - loss: 0.0068 - mean_absolute_error: 0.0596

Finally, we plot the prediction from the model as,

conv_window.plot(multi_step_dense)

The main down-side of using a dense neural network is that the resulting model can only be executed on input windows of exactly this shape. For example, if we used any other input windows, we will get back an error as shown below,

print('Input shape:', wide_window.example[0].shape) try: print('Output shape:', multi_step_dense(wide_window.example[0]).shape) except Exception as e: print(f'\n{type(e).__name__}:{e}')

Input shape: (32, 24, 19) InvalidArgumentError:Matrix size-incompatible: In[0]: [32,456], In[1]: [57,32] [Op:MatMul]

This problem can be resolved by using another type of Neural Network called the Convolutional Neural Network (CNN) that we will study in the next section.

## 5. **Using a ****multi-step input** **Convolutional Neural Network (CNN)**

**multi-step input**

Convolutional Neural Network (CNN) is a type of Neural Network Architecture that is widely used for performing deep learning on imagery data. It has convolutional layers followed by dense layers. In time series forecasting, the convolution layer takes multiple time steps as input to each prediction.

conv_model = tf.keras.Sequential([ tf.keras.layers.Conv1D(filters=32, kernel_size=(CONV_WIDTH,), activation='relu'), tf.keras.layers.Dense(units=32, activation='relu'), tf.keras.layers.Dense(units=1), ])

In the above block of code, creates a Convolutional Neural Network (CNN) with a convolutional layer as an input layer followed by two dense layers. If you are not familiar with what CNNs are, you can refer our another course on CNNs.

Now we run the model on an example batch of data to print out the input shape and output shape of the Conv model created,

print("Conv model on `conv_window`") print('Input shape:', conv_window.example[0].shape) print('Output shape:', conv_model(conv_window.example[0]).shape)

Conv model on `conv_window` Input shape: (32, 3, 19) Output shape: (32, 1, 1)

Finally, we compile and train the model as,

history = compile_and_fit(conv_model, conv_window) IPython.display.clear_output() val_performance['Conv'] = conv_model.evaluate(conv_window.val) performance['Conv'] = conv_model.evaluate(conv_window.test, verbose=0)

438/438 [==============================] - 1s 2ms/step - loss: 0.0070 - mean_absolute_error: 0.0607

The difference between this `conv_model`

and the `multi_step_dense`

model is that the `conv_model`

can be run on inputs of any length. The convolutional layer is applied to a sliding window of inputs:

If you run it on wider input, it produces wider output as shown below,

print("Wide window") print('Input shape:', wide_window.example[0].shape) print('Labels shape:', wide_window.example[1].shape) print('Output shape:', conv_model(wide_window.example[0]).shape)

Wide window Input shape: (32, 24, 19) Labels shape: (32, 24, 1) Output shape: (32, 22, 1)

Note that the output is shorter than the input. To make training or plotting work, you need the labels, and prediction to have the same length. So we build a `WindowGenerator`

to produce wide windows with a few extra input time steps so the label and prediction lengths match,

LABEL_WIDTH = 24 INPUT_WIDTH = LABEL_WIDTH + (CONV_WIDTH - 1) wide_conv_window = WindowGenerator( input_width=INPUT_WIDTH, label_width=LABEL_WIDTH, shift=1, label_columns=['T (degC)']) wide_conv_window

Total window size: 27 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25] Label indices: [ 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26] Label column name(s): ['T (degC)']

print("Wide conv window") print('Input shape:', wide_conv_window.example[0].shape) print('Labels shape:', wide_conv_window.example[1].shape) print('Output shape:', conv_model(wide_conv_window.example[0]).shape)

Wide conv window Input shape: (32, 26, 19) Labels shape: (32, 24, 1) Output shape: (32, 24, 1)

Now you can plot the model’s predictions on a wider window.

Note the 3 input time steps before the first prediction. Every prediction here is based on the 3 preceding timesteps:

wide_conv_window.plot(conv_model)

## 6. Using a **multi-step input** Recurrent Neural Network

A Recurrent Neural Network (RNN) is a type of neural network well-suited for time series data. RNNs process a time series step-by-step, maintaining an internal state from time-step to time-step.

In this tutorial, we will use an RNN layer called Long Short Term Memory (LSTM).

An important constructor argument for keras RNN is the `return_sequences`

argument. This setting can configure the layer in one of two ways,

- If
`False`

(the default), the layer only returns the output of the final timestep, giving the model time to warm up its internal state before making a single prediction:

2. If `True`

the layer returns an output for each input. This is useful for,

- Stacking RNN layers.
- Training a model on multiple timesteps simultaneously.

lstm_model = tf.keras.models.Sequential([ # Shape [batch, time, features] => [batch, time, lstm_units] tf.keras.layers.LSTM(32, return_sequences=True), # Shape => [batch, time, features] tf.keras.layers.Dense(units=1) ])

With `return_sequences=True`

the model can be trained on 24h of data at a time.

print('Input shape:', wide_window.example[0].shape) print('Output shape:', lstm_model(wide_window.example[0]).shape)

Input shape: (32, 24, 19) Output shape: (32, 24, 1)

history = compile_and_fit(lstm_model, wide_window) IPython.display.clear_output() val_performance['LSTM'] = lstm_model.evaluate(wide_window.val) performance['LSTM'] = lstm_model.evaluate(wide_window.test, verbose=0)

438/438 [==============================] - 1s 3ms/step - loss: 0.0057 - mean_absolute_error: 0.0523

wide_window.plot(lstm_model)

## Evaluating the performance of single step models

In this final section of this chapter, we are going to evaluate the performance of every models built in this lesson.

x = np.arange(len(performance)) width = 0.3 metric_name = 'mean_absolute_error' metric_index = lstm_model.metrics_names.index('mean_absolute_error') val_mae = [v[metric_index] for v in val_performance.values()] test_mae = [v[metric_index] for v in performance.values()] plt.ylabel('mean_absolute_error [T (degC), normalized]') plt.bar(x - 0.17, val_mae, width, label='Validation') plt.bar(x + 0.17, test_mae, width, label='Test') plt.xticks(ticks=x, labels=performance.keys(), rotation=45) _ = plt.legend()

for name, value in performance.items(): print(f'{name:12s}: {value[1]:0.4f}')

Baseline : 0.0852 Linear : 0.0667 Dense : 0.0580 Multi step dense: 0.0612 Conv : 0.0553 LSTM : 0.0532

Thus, we can observe that each model performs slightly better than another one. Also, note that the Baseline model is not actually an intelligent model, so its metrics might be misleading.

This is it for building single-step time series forecasting models using TensorFlow 2.0. Now in the next chapter on ‘Multi-Step Time Series Forecasting‘, we will learn about building multi-step forecasting models using TensorFlow 2.0.

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Udemy courses that we recommend you enroll in:

- 2022 Complete Python Bootcamp From Zero to Hero in Python - 1,000,000+ students already enrolled!
- Python for Data Science and Machine Learning Bootcamp - 400,000+ students already enrolled!
- Complete Guide to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
- Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!
- The Complete SQL Bootcamp 2021: Go from Zero to Hero - 400,000+ students already enrolled!