Thanks to theidioms.com

### Infosys Tech Scholars Scholarship ## Single-Step Time Series Forecasting # Single-Step Time Series Forecasting

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

In this lesson, we will be going over how to build a single-step time series forecasting model using TensorFlow 2.0. We will start by predicting the ‘T (degC)’ feature one step into the future.

Single-step forecasting models are those which predict the observation at the next time step, i.e., only one time-step is to be predicted. For example, given the weather information of the past 6 days, a single step forecasting model will only predict the weather of the 7th day, i.e., only one time-step into the future.

We will first start by creating a single step window using the `WindowGenerator` class we defined in the previous chapter. Also, we will only be taking the `'T (degC)'` column of our data.

```single_step_window = WindowGenerator(
input_width=1, label_width=1, shift=1,
label_columns=['T (degC)'])
single_step_window```
```Total window size: 2
Input indices: 
Label indices: 
Label column name(s): ['T (degC)']```

The `window` object creates `tf.data.Datasets` from the training, validation, and test sets, that allow us to iterate over batches of data. The following example illustrates how we can iterate over the train set,

```for example_inputs, example_labels in single_step_window.train.take(1):
print(f'Inputs shape (batch, time, features): {example_inputs.shape}')
print(f'Labels shape (batch, time, features): {example_labels.shape}')```
```Inputs shape (batch, time, features): (32, 1, 19)
Labels shape (batch, time, features): (32, 1, 1)```

Now, we will move on to build forecasting models that could predict temperature 1h in the future given the current value of all features.

## 1. Creating a baseline model

Before building an actual trainable model, it would be easier for us to understand a simple model which is nothing but a model that takes in input values and just returns the current temperature as a prediction.

In other words, the model we will be building will predict “No change”. This is a reasonable baseline since temperature changes slowly. Of course, this baseline will work less well if we make a prediction further in the future.

```class Baseline(tf.keras.Model):
def __init__(self, label_index=None):
super().__init__()
self.label_index = label_index

def call(self, inputs):
if self.label_index is None:
return inputs
result = inputs[:, :, self.label_index]
return result[:, :, tf.newaxis]

# Instantiatie and compile the model
baseline = Baseline(label_index=column_indices['T (degC)'])

baseline.compile(loss=tf.losses.MeanSquaredError(),
metrics=[tf.metrics.MeanAbsoluteError()])

val_performance = {}
performance = {}
val_performance['Baseline'] = baseline.evaluate(single_step_window.val)
performance['Baseline'] = baseline.evaluate(single_step_window.test, verbose=0)```
`439/439 [==============================] - 2s 5ms/step - loss: 0.0128 - mean_absolute_error: 0.0785`

The above block of code created our model named baseline which simply returned the inputs by adding another axis to it using tf.newaxis. Then we compiled the instantiated and compiled the model using Mean Squared Error as our loss function and Mean Absolute Error as our performance metrics. Finally, we printed some metrics. But, as we are not training the model, those metrics don’t give us an idea of how well our model is doing.

The `WindowGenerator` has a plot method, but the plots won’t be very interesting with only a single sample. So, we will create a wider `WindowGenerator` named `wide_window` that generates windows 24h of consecutive inputs and labels at a time.

```wide_window = WindowGenerator(
input_width=24, label_width=24, shift=1,
label_columns=['T (degC)'])

wide_window```
```Total window size: 25
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Label indices: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
Label column name(s): ['T (degC)']```

The `wide_window` doesn’t change the way the model operates. The model still makes predictions 1h into the future based on a single input time step. Here the `time` axis acts like the `batch` axis: Each prediction is made independently with no interaction between time steps.

This expanded window can be passed directly to the same `baseline` model without any code changes. This is possible because the inputs and labels have the same number of timesteps, and the baseline just forwards the input to the output as illustrated below,

When we plot the baseline model’s predictions, we can clearly see that the predictions are simply the labels, shifted right by 1h.

`wide_window.plot(baseline)`

In the above plots of three graphs, the single-step model is run over the course of 24h. Here is a brief explanation of the above graphs,

• The blue “Inputs” line shows the input temperature at each time step. Though the model receives all features, this plot only shows the temperature feature.
• The green “Labels” dots show the target prediction value. These dots are shown at the prediction time, not the input time. That is why the range of labels is shifted 1 step relative to the inputs.
• The orange “Predictions” crosses are the model’s predictions for each output time step. If the model was predicting perfectly, the predictions would have landed directly on the “labels” (green dots).

## 2. Using a single step time series forecasting linear model

Linear model refers to those models that perform a linear transformation between the input variables to obtain the output. It is one of the simplest trainable model that we can apply. In a single time step forecasting model, the output from a time step only depends on that step.

To create our linear model, we use TensorFlow’s sequential model having only a single dense layer and no activation function. The dense layer only transforms the last axis of the data from `(batch, time, inputs)` to `(batch, time, units)`. The transformation is applied independently to every item across the `batch` and `time` axes.

```linear = tf.keras.Sequential([
tf.keras.layers.Dense(units=1)
])

print('Input shape:', single_step_window.example.shape)
print('Output shape:', linear(single_step_window.example).shape)```
```Input shape: (32, 1, 19)
Output shape: (32, 1, 1)```

Thus, we can see that our model takes in an input shape of (batch, time, inputs) and transforms it to (batch, time, units).

Now, we will define a function for compiling and training the model. Packaging the compiling and training process into a single function will also be useful for training other models later in this course.

```MAX_EPOCHS = 20

def compile_and_fit(model, window, patience=2):
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss',
patience=patience,
mode='min')

model.compile(loss=tf.losses.MeanSquaredError(),
metrics=[tf.metrics.MeanAbsoluteError()])

history = model.fit(window.train, epochs=MAX_EPOCHS,
validation_data=window.val,
callbacks=[early_stopping])
return history```

Finally, we train the model and evaluate its performance,

```history = compile_and_fit(linear, single_step_window)

val_performance['Linear'] = linear.evaluate(single_step_window.val)
performance['Linear'] = linear.evaluate(single_step_window.test, verbose=0)```
```Epoch 1/20 1534/1534 [==============================] - 5s 3ms/step - loss: 0.4313 - mean_absolute_error: 0.3443 - val_loss: 0.0185 - val_mean_absolute_error: 0.1006
Epoch 2/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0145 - mean_absolute_error: 0.0891 - val_loss: 0.0107 - val_mean_absolute_error: 0.0760
Epoch 3/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0101 - mean_absolute_error: 0.0739 - val_loss: 0.0089 - val_mean_absolute_error: 0.0692
Epoch 4/20 1534/1534 [==============================] - 5s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0700 - val_loss: 0.0086 - val_mean_absolute_error: 0.0680
Epoch 5/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090 - mean_absolute_error: 0.0695 - val_loss: 0.0087 - val_mean_absolute_error: 0.0682
Epoch 6/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0697 - val_loss: 0.0086 - val_mean_absolute_error: 0.0675
Epoch 7/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0697 - val_loss: 0.0087 - val_mean_absolute_error: 0.0685
Epoch 8/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090 - mean_absolute_error: 0.0695 - val_loss: 0.0087 - val_mean_absolute_error: 0.0677 439/439 [==============================] - 1s 2ms/step - loss: 0.0087 - mean_absolute_error: 0.0677```

Like the `baseline` model, the linear model can be called on batches of wide windows. Used this way the model makes a set of independent predictions on consecutive time steps. The time axis acts like another batch axis. There are no interactions between the predictions at each time step.

`wide_window.plot(linear)`

One advantage of linear models is that they’re relatively simple to interpret. You can pull out the layer’s weights, and see the weight assigned to each input:

```plt.bar(x = range(len(train_df.columns)),
height=linear.layers.kernel[:,0].numpy())
axis = plt.gca()
axis.set_xticks(range(len(train_df.columns)))
_ = axis.set_xticklabels(train_df.columns, rotation=90)```

## 3. Using a single step time series forecasting dense model

In the previous section the model was a linear model which had only a single dense layer. In this section, we will be building a more dense, more powerful NN model.

The Dense model is similar to the linear model, except it has more number of Dense layers between the input and output as shown below,

```dense = tf.keras.Sequential([
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=64, activation='relu'),
tf.keras.layers.Dense(units=1)
])```

As we can see, the Dense model has 3 sequential dense layers as opposed to a single dense layer in the Linear Model. Now, we will compile, train and evaluate the Dense model,

```history = compile_and_fit(dense, single_step_window)

val_performance['Dense'] = dense.evaluate(single_step_window.val)
performance['Dense'] = dense.evaluate(single_step_window.test, verbose=0)```
```Epoch 1/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0173 - mean_absolute_error: 0.0803 - val_loss: 0.0082 - val_mean_absolute_error: 0.0669
Epoch 2/20 1534/1534 [==============================] - 7s 4ms/step - loss: 0.0077 - mean_absolute_error: 0.0632 - val_loss: 0.0084 - val_mean_absolute_error: 0.0691
Epoch 3/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0073 - mean_absolute_error: 0.0615 - val_loss: 0.0069 - val_mean_absolute_error: 0.0586
Epoch 4/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0070 - mean_absolute_error: 0.0598 - val_loss: 0.0067 - val_mean_absolute_error: 0.0575
Epoch 5/20 1534/1534 [==============================] - 5s 4ms/step - loss: 0.0069 - mean_absolute_error: 0.0592 - val_loss: 0.0064 - val_mean_absolute_error: 0.0561
Epoch 6/20 1534/1534 [==============================] - 5s 4ms/step - loss: 0.0068 - mean_absolute_error: 0.0584 - val_loss: 0.0064 - val_mean_absolute_error: 0.0563
Epoch 7/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0068 - mean_absolute_error: 0.0582 - val_loss: 0.0065 - val_mean_absolute_error: 0.0574 439/439 [==============================] - 1s 2ms/step - loss: 0.0065 - mean_absolute_error: 0.0574```

## 4. Using a multi-step input dense model

A single-time-step model has no context for the current values of its inputs. The model fails to see how the input features change over time. To address this issue the model needs access to multiple time steps when making predictions,

The `baseline``linear` and `dense` models we studied earlier handled each time step independently. However, now our model will take multiple time steps as input to produce a single output.

Let us first start by creating our data for the model. We will create a `WindowGenerator` that will produce batches of the 3h of inputs and, 1h of labels:

```CONV_WIDTH = 3
conv_window = WindowGenerator(
input_width=CONV_WIDTH,
label_width=1,
shift=1,
label_columns=['T (degC)'])

conv_window```
```Total window size: 4
Input indices: [0 1 2]
Label indices: 
Label column name(s): ['T (degC)']```

Note that the `Window`‘s `shift` parameter is relative to the end of the two windows. Now we will plot the window created,

```conv_window.plot()
plt.title("Given 3h as input, predict 1h into the future.")
plt.tight_layout()```

As we can see from the above figure, we need to predict 1h into the future given past 3 hours of data as input.

We could train a `dense` model on a multiple-input-step window by adding a `layers.Flatten` as the first layer of the model. This layer will flatten the input matrix into a 1-D array which can then be passed to other dense layers. For the dense layers, we will be using the ReLU activation function.

```multi_step_dense = tf.keras.Sequential([
# Shape: (time, features) => (time*features)
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(units=32, activation='relu'),
tf.keras.layers.Dense(units=32, activation='relu'),
tf.keras.layers.Dense(units=1),
# Add back the time dimension.
# Shape: (outputs) => (1, outputs)
tf.keras.layers.Reshape([1, -1]),
])

print('Input shape:', conv_window.example.shape)
print('Output shape:', multi_step_dense(conv_window.example).shape)```
```Input shape: (32, 3, 19)
Output shape: (32, 1, 1)```

After creating the model, we will use the `compile_and_fit` function we defined earlier to compile the model created and fit the model to data.

```history = compile_and_fit(multi_step_dense, conv_window)

IPython.display.clear_output()
val_performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.val)
performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.test, verbose=0)```
`438/438 [==============================] - 1s 2ms/step - loss: 0.0068 - mean_absolute_error: 0.0596`

Finally, we plot the prediction from the model as,

`conv_window.plot(multi_step_dense)`

The main down-side of using a dense neural network is that the resulting model can only be executed on input windows of exactly this shape. For example, if we used any other input windows, we will get back an error as shown below,

```print('Input shape:', wide_window.example.shape)
try:
print('Output shape:', multi_step_dense(wide_window.example).shape)
except Exception as e:
print(f'\n{type(e).__name__}:{e}')```
```Input shape: (32, 24, 19)
InvalidArgumentError:Matrix size-incompatible: In: [32,456], In: [57,32] [Op:MatMul]```

This problem can be resolved by using another type of Neural Network called the Convolutional Neural Network (CNN) that we will study in the next section.

## 5. Using a multi-step inputConvolutional Neural Network (CNN)

Convolutional Neural Network (CNN) is a type of Neural Network Architecture that is widely used for performing deep learning on imagery data. It has convolutional layers followed by dense layers. In time series forecasting, the convolution layer takes multiple time steps as input to each prediction.

```conv_model = tf.keras.Sequential([
tf.keras.layers.Conv1D(filters=32,
kernel_size=(CONV_WIDTH,),
activation='relu'),
tf.keras.layers.Dense(units=32, activation='relu'),
tf.keras.layers.Dense(units=1),
])```

In the above block of code, creates a Convolutional Neural Network (CNN) with a convolutional layer as an input layer followed by two dense layers. If you are not familiar with what CNNs are, you can refer our another course on CNNs.

Now we run the model on an example batch of data to print out the input shape and output shape of the Conv model created,

```print("Conv model on `conv_window`")
print('Input shape:', conv_window.example.shape)
print('Output shape:', conv_model(conv_window.example).shape)```
```Conv model on `conv_window`
Input shape: (32, 3, 19)
Output shape: (32, 1, 1)```

Finally, we compile and train the model as,

```history = compile_and_fit(conv_model, conv_window)

IPython.display.clear_output()
val_performance['Conv'] = conv_model.evaluate(conv_window.val)
performance['Conv'] = conv_model.evaluate(conv_window.test, verbose=0)```
`438/438 [==============================] - 1s 2ms/step - loss: 0.0070 - mean_absolute_error: 0.0607`

The difference between this `conv_model` and the `multi_step_dense` model is that the `conv_model` can be run on inputs of any length. The convolutional layer is applied to a sliding window of inputs:

If you run it on wider input, it produces wider output as shown below,

```print("Wide window")
print('Input shape:', wide_window.example.shape)
print('Labels shape:', wide_window.example.shape)
print('Output shape:', conv_model(wide_window.example).shape)```
```Wide window
Input shape: (32, 24, 19)
Labels shape: (32, 24, 1)
Output shape: (32, 22, 1)```

Note that the output is shorter than the input. To make training or plotting work, you need the labels, and prediction to have the same length. So we build a `WindowGenerator` to produce wide windows with a few extra input time steps so the label and prediction lengths match,

```LABEL_WIDTH = 24
INPUT_WIDTH = LABEL_WIDTH + (CONV_WIDTH - 1)
wide_conv_window = WindowGenerator(
input_width=INPUT_WIDTH,
label_width=LABEL_WIDTH,
shift=1,
label_columns=['T (degC)'])

wide_conv_window```
```Total window size: 27
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25]
Label indices: [ 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26]
Label column name(s): ['T (degC)']```
```print("Wide conv window")
print('Input shape:', wide_conv_window.example.shape)
print('Labels shape:', wide_conv_window.example.shape)
print('Output shape:', conv_model(wide_conv_window.example).shape)```
```Wide conv window
Input shape: (32, 26, 19)
Labels shape: (32, 24, 1)
Output shape: (32, 24, 1)```

Now you can plot the model’s predictions on a wider window.

Note the 3 input time steps before the first prediction. Every prediction here is based on the 3 preceding timesteps:

`wide_conv_window.plot(conv_model)`

## 6. Using a multi-step input Recurrent Neural Network

A Recurrent Neural Network (RNN) is a type of neural network well-suited for time series data. RNNs process a time series step-by-step, maintaining an internal state from time-step to time-step.

In this tutorial, we will use an RNN layer called Long Short Term Memory (LSTM).

An important constructor argument for keras RNN is the `return_sequences` argument. This setting can configure the layer in one of two ways,

1. If `False` (the default), the layer only returns the output of the final timestep, giving the model time to warm up its internal state before making a single prediction:

2. If `True` the layer returns an output for each input. This is useful for,

• Stacking RNN layers.
• Training a model on multiple timesteps simultaneously.
```lstm_model = tf.keras.models.Sequential([
# Shape [batch, time, features] => [batch, time, lstm_units]
tf.keras.layers.LSTM(32, return_sequences=True),
# Shape => [batch, time, features]
tf.keras.layers.Dense(units=1)
])```

With `return_sequences=True` the model can be trained on 24h of data at a time.

```print('Input shape:', wide_window.example.shape)
print('Output shape:', lstm_model(wide_window.example).shape)```
```Input shape: (32, 24, 19)
Output shape: (32, 24, 1)
```
```history = compile_and_fit(lstm_model, wide_window)

IPython.display.clear_output()
val_performance['LSTM'] = lstm_model.evaluate(wide_window.val)
performance['LSTM'] = lstm_model.evaluate(wide_window.test, verbose=0)```
```438/438 [==============================] - 1s 3ms/step - loss: 0.0057 - mean_absolute_error: 0.0523
```
`wide_window.plot(lstm_model)`

## Evaluating the performance of single step models

In this final section of this chapter, we are going to evaluate the performance of every models built in this lesson.

```x = np.arange(len(performance))
width = 0.3
metric_name = 'mean_absolute_error'
metric_index = lstm_model.metrics_names.index('mean_absolute_error')
val_mae = [v[metric_index] for v in val_performance.values()]
test_mae = [v[metric_index] for v in performance.values()]

plt.ylabel('mean_absolute_error [T (degC), normalized]')
plt.bar(x - 0.17, val_mae, width, label='Validation')
plt.bar(x + 0.17, test_mae, width, label='Test')
plt.xticks(ticks=x, labels=performance.keys(),
rotation=45)
_ = plt.legend() ```
```for name, value in performance.items():
print(f'{name:12s}: {value:0.4f}')```
```Baseline : 0.0852
Linear : 0.0667
Dense : 0.0580
Multi step dense: 0.0612
Conv : 0.0553
LSTM : 0.0532
```

Thus, we can observe that each model performs slightly better than another one. Also, note that the Baseline model is not actually an intelligent model, so its metrics might be misleading.

This is it for building single-step time series forecasting models using TensorFlow 2.0. Now in the next chapter on ‘Multi-Step Time Series Forecasting‘, we will learn about building multi-step forecasting models using TensorFlow 2.0.  Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!