In this lesson, we will be going over how to build a single-step time series forecasting model using TensorFlow 2.0. We will start by predicting the 'T (degC)' feature one step into the future.
Single-step forecasting models are those which predict the observation at the next time step, i.e., only one time-step is to be predicted. For example, given the weather information of the past 6 days, a single step forecasting model will only predict the weather of the 7th day, i.e., only one time-step into the future.
We will first start by creating a single step window using the WindowGenerator
class we defined in the previous chapter. Also, we will only be taking the 'T (degC)'
column of our data.
single_step_window = WindowGenerator( input_width=1, label_width=1, shift=1, label_columns=['T (degC)']) single_step_window
Total window size: 2 Input indices: [0] Label indices: [1] Label column name(s): ['T (degC)']
The window
object creates tf.data.Datasets
from the training, validation, and test sets, that allow us to iterate over batches of data. The following example illustrates how we can iterate over the train set,
for example_inputs, example_labels in single_step_window.train.take(1): print(f'Inputs shape (batch, time, features): {example_inputs.shape}') print(f'Labels shape (batch, time, features): {example_labels.shape}')
Inputs shape (batch, time, features): (32, 1, 19) Labels shape (batch, time, features): (32, 1, 1)
Now, we will move on to build forecasting models that could predict temperature 1h in the future given the current value of all features.
Before building an actual trainable model, it would be easier for us to understand a simple model which is nothing but a model that takes in input values and just returns the current temperature as a prediction.
In other words, the model we will be building will predict "No change". This is a reasonable baseline since temperature changes slowly. Of course, this baseline will work less well if we make a prediction further in the future.
class Baseline(tf.keras.Model): def __init__(self, label_index=None): super().__init__() self.label_index = label_index def call(self, inputs): if self.label_index is None: return inputs result = inputs[:, :, self.label_index] return result[:, :, tf.newaxis] # Instantiatie and compile the model baseline = Baseline(label_index=column_indices['T (degC)']) baseline.compile(loss=tf.losses.MeanSquaredError(), metrics=[tf.metrics.MeanAbsoluteError()]) val_performance = {} performance = {} val_performance['Baseline'] = baseline.evaluate(single_step_window.val) performance['Baseline'] = baseline.evaluate(single_step_window.test, verbose=0)
439/439 [==============================] - 2s 5ms/step - loss: 0.0128 - mean_absolute_error: 0.0785
The above block of code created our model named baseline which simply returned the inputs by adding another axis to it using tf.newaxis. Then we compiled the instantiated and compiled the model using Mean Squared Error as our loss function and Mean Absolute Error as our performance metrics. Finally, we printed some metrics. But, as we are not training the model, those metrics don't give us an idea of how well our model is doing.
The WindowGenerator
has a plot method, but the plots won't be very interesting with only a single sample. So, we will create a wider WindowGenerator
named wide_window
that generates windows 24h of consecutive inputs and labels at a time.
wide_window = WindowGenerator( input_width=24, label_width=24, shift=1, label_columns=['T (degC)']) wide_window
Total window size: 25 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Label indices: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] Label column name(s): ['T (degC)']
The wide_window
doesn't change the way the model operates. The model still makes predictions 1h into the future based on a single input time step. Here the time
axis acts like the batch
axis: Each prediction is made independently with no interaction between time steps.
This expanded window can be passed directly to the same baseline
model without any code changes. This is possible because the inputs and labels have the same number of timesteps, and the baseline just forwards the input to the output as illustrated below,
When we plot the baseline model's predictions, we can clearly see that the predictions are simply the labels, shifted right by 1h.
wide_window.plot(baseline)
In the above plots of three graphs, the single-step model is run over the course of 24h. Here is a brief explanation of the above graphs,
Linear model refers to those models that perform a linear transformation between the input variables to obtain the output. It is one of the simplest trainable model that we can apply. In a single time step forecasting model, the output from a time step only depends on that step.
To create our linear model, we use TensorFlow's sequential model having only a single dense layer and no activation function. The dense layer only transforms the last axis of the data from (batch, time, inputs)
to (batch, time, units)
. The transformation is applied independently to every item across the batch
and time
axes.
linear = tf.keras.Sequential([ tf.keras.layers.Dense(units=1) ]) print('Input shape:', single_step_window.example[0].shape) print('Output shape:', linear(single_step_window.example[0]).shape)
Input shape: (32, 1, 19) Output shape: (32, 1, 1)
Thus, we can see that our model takes in an input shape of (batch, time, inputs) and transforms it to (batch, time, units).
Now, we will define a function for compiling and training the model. Packaging the compiling and training process into a single function will also be useful for training other models later in this course.
MAX_EPOCHS = 20 def compile_and_fit(model, window, patience=2): early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=patience, mode='min') model.compile(loss=tf.losses.MeanSquaredError(), optimizer=tf.optimizers.Adam(), metrics=[tf.metrics.MeanAbsoluteError()]) history = model.fit(window.train, epochs=MAX_EPOCHS, validation_data=window.val, callbacks=[early_stopping]) return history
Finally, we train the model and evaluate its performance,
history = compile_and_fit(linear, single_step_window) val_performance['Linear'] = linear.evaluate(single_step_window.val) performance['Linear'] = linear.evaluate(single_step_window.test, verbose=0)
Epoch 1/20 1534/1534 [==============================] - 5s 3ms/step - loss: 0.4313 - mean_absolute_error: 0.3443 - val_loss: 0.0185 - val_mean_absolute_error: 0.1006 Epoch 2/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0145 - mean_absolute_error: 0.0891 - val_loss: 0.0107 - val_mean_absolute_error: 0.0760 Epoch 3/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0101 - mean_absolute_error: 0.0739 - val_loss: 0.0089 - val_mean_absolute_error: 0.0692 Epoch 4/20 1534/1534 [==============================] - 5s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0700 - val_loss: 0.0086 - val_mean_absolute_error: 0.0680 Epoch 5/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090 - mean_absolute_error: 0.0695 - val_loss: 0.0087 - val_mean_absolute_error: 0.0682 Epoch 6/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0697 - val_loss: 0.0086 - val_mean_absolute_error: 0.0675 Epoch 7/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0697 - val_loss: 0.0087 - val_mean_absolute_error: 0.0685 Epoch 8/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090 - mean_absolute_error: 0.0695 - val_loss: 0.0087 - val_mean_absolute_error: 0.0677 439/439 [==============================] - 1s 2ms/step - loss: 0.0087 - mean_absolute_error: 0.0677
Like the baseline
model, the linear model can be called on batches of wide windows. Used this way the model makes a set of independent predictions on consecutive time steps. The time axis acts like another batch axis. There are no interactions between the predictions at each time step.
wide_window.plot(linear)
One advantage of linear models is that they're relatively simple to interpret. You can pull out the layer's weights, and see the weight assigned to each input:
plt.bar(x = range(len(train_df.columns)), height=linear.layers[0].kernel[:,0].numpy()) axis = plt.gca() axis.set_xticks(range(len(train_df.columns))) _ = axis.set_xticklabels(train_df.columns, rotation=90)
In the previous section the model was a linear model which had only a single dense layer. In this section, we will be building a more dense, more powerful NN model.
The Dense model is similar to the linear model, except it has more number of Dense layers between the input and output as shown below,
dense = tf.keras.Sequential([ tf.keras.layers.Dense(units=64, activation='relu'), tf.keras.layers.Dense(units=64, activation='relu'), tf.keras.layers.Dense(units=1) ])
As we can see, the Dense model has 3 sequential dense layers as opposed to a single dense layer in the Linear Model. Now, we will compile, train and evaluate the Dense model,
history = compile_and_fit(dense, single_step_window) val_performance['Dense'] = dense.evaluate(single_step_window.val) performance['Dense'] = dense.evaluate(single_step_window.test, verbose=0)
Epoch 1/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0173 - mean_absolute_error: 0.0803 - val_loss: 0.0082 - val_mean_absolute_error: 0.0669 Epoch 2/20 1534/1534 [==============================] - 7s 4ms/step - loss: 0.0077 - mean_absolute_error: 0.0632 - val_loss: 0.0084 - val_mean_absolute_error: 0.0691 Epoch 3/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0073 - mean_absolute_error: 0.0615 - val_loss: 0.0069 - val_mean_absolute_error: 0.0586 Epoch 4/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0070 - mean_absolute_error: 0.0598 - val_loss: 0.0067 - val_mean_absolute_error: 0.0575 Epoch 5/20 1534/1534 [==============================] - 5s 4ms/step - loss: 0.0069 - mean_absolute_error: 0.0592 - val_loss: 0.0064 - val_mean_absolute_error: 0.0561 Epoch 6/20 1534/1534 [==============================] - 5s 4ms/step - loss: 0.0068 - mean_absolute_error: 0.0584 - val_loss: 0.0064 - val_mean_absolute_error: 0.0563 Epoch 7/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0068 - mean_absolute_error: 0.0582 - val_loss: 0.0065 - val_mean_absolute_error: 0.0574 439/439 [==============================] - 1s 2ms/step - loss: 0.0065 - mean_absolute_error: 0.0574
A single-time-step model has no context for the current values of its inputs. The model fails to see how the input features change over time. To address this issue the model needs access to multiple time steps when making predictions,
The baseline
, linear
and dense
models we studied earlier handled each time step independently. However, now our model will take multiple time steps as input to produce a single output.
Let us first start by creating our data for the model. We will create a WindowGenerator
that will produce batches of the 3h of inputs and, 1h of labels:
CONV_WIDTH = 3 conv_window = WindowGenerator( input_width=CONV_WIDTH, label_width=1, shift=1, label_columns=['T (degC)']) conv_window
Total window size: 4 Input indices: [0 1 2] Label indices: [3] Label column name(s): ['T (degC)']
Note that the Window
's shift
parameter is relative to the end of the two windows. Now we will plot the window created,
conv_window.plot() plt.title("Given 3h as input, predict 1h into the future.") plt.tight_layout()
As we can see from the above figure, we need to predict 1h into the future given past 3 hours of data as input.
We could train a dense
model on a multiple-input-step window by adding a layers.Flatten
as the first layer of the model. This layer will flatten the input matrix into a 1-D array which can then be passed to other dense layers. For the dense layers, we will be using the ReLU activation function.
multi_step_dense = tf.keras.Sequential([ # Shape: (time, features) => (time*features) tf.keras.layers.Flatten(), tf.keras.layers.Dense(units=32, activation='relu'), tf.keras.layers.Dense(units=32, activation='relu'), tf.keras.layers.Dense(units=1), # Add back the time dimension. # Shape: (outputs) => (1, outputs) tf.keras.layers.Reshape([1, -1]), ]) print('Input shape:', conv_window.example[0].shape) print('Output shape:', multi_step_dense(conv_window.example[0]).shape)
Input shape: (32, 3, 19) Output shape: (32, 1, 1)
After creating the model, we will use the compile_and_fit
function we defined earlier to compile the model created and fit the model to data.
history = compile_and_fit(multi_step_dense, conv_window) IPython.display.clear_output() val_performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.val) performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.test, verbose=0)
438/438 [==============================] - 1s 2ms/step - loss: 0.0068 - mean_absolute_error: 0.0596
Finally, we plot the prediction from the model as,
conv_window.plot(multi_step_dense)
The main down-side of using a dense neural network is that the resulting model can only be executed on input windows of exactly this shape. For example, if we used any other input windows, we will get back an error as shown below,
print('Input shape:', wide_window.example[0].shape) try: print('Output shape:', multi_step_dense(wide_window.example[0]).shape) except Exception as e: print(f'\n{type(e).__name__}:{e}')
Input shape: (32, 24, 19) InvalidArgumentError:Matrix size-incompatible: In[0]: [32,456], In[1]: [57,32] [Op:MatMul]
This problem can be resolved by using another type of Neural Network called the Convolutional Neural Network (CNN) that we will study in the next section.
Convolutional Neural Network (CNN) is a type of Neural Network Architecture that is widely used for performing deep learning on imagery data. It has convolutional layers followed by dense layers. In time series forecasting, the convolution layer takes multiple time steps as input to each prediction.
conv_model = tf.keras.Sequential([ tf.keras.layers.Conv1D(filters=32, kernel_size=(CONV_WIDTH,), activation='relu'), tf.keras.layers.Dense(units=32, activation='relu'), tf.keras.layers.Dense(units=1), ])
In the above block of code, creates a Convolutional Neural Network (CNN) with a convolutional layer as an input layer followed by two dense layers. If you are not familiar with what CNNs are, you can refer our another course on CNNs.
Now we run the model on an example batch of data to print out the input shape and output shape of the Conv model created,
print("Conv model on `conv_window`") print('Input shape:', conv_window.example[0].shape) print('Output shape:', conv_model(conv_window.example[0]).shape)
Conv model on `conv_window` Input shape: (32, 3, 19) Output shape: (32, 1, 1)
Finally, we compile and train the model as,
history = compile_and_fit(conv_model, conv_window) IPython.display.clear_output() val_performance['Conv'] = conv_model.evaluate(conv_window.val) performance['Conv'] = conv_model.evaluate(conv_window.test, verbose=0)
438/438 [==============================] - 1s 2ms/step - loss: 0.0070 - mean_absolute_error: 0.0607
The difference between this conv_model
and the multi_step_dense
model is that the conv_model
can be run on inputs of any length. The convolutional layer is applied to a sliding window of inputs:
If you run it on wider input, it produces wider output as shown below,
print("Wide window") print('Input shape:', wide_window.example[0].shape) print('Labels shape:', wide_window.example[1].shape) print('Output shape:', conv_model(wide_window.example[0]).shape)
Wide window Input shape: (32, 24, 19) Labels shape: (32, 24, 1) Output shape: (32, 22, 1)
Note that the output is shorter than the input. To make training or plotting work, you need the labels, and prediction to have the same length. So we build a WindowGenerator
to produce wide windows with a few extra input time steps so the label and prediction lengths match,
LABEL_WIDTH = 24 INPUT_WIDTH = LABEL_WIDTH + (CONV_WIDTH - 1) wide_conv_window = WindowGenerator( input_width=INPUT_WIDTH, label_width=LABEL_WIDTH, shift=1, label_columns=['T (degC)']) wide_conv_window
Total window size: 27 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25] Label indices: [ 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26] Label column name(s): ['T (degC)']
print("Wide conv window") print('Input shape:', wide_conv_window.example[0].shape) print('Labels shape:', wide_conv_window.example[1].shape) print('Output shape:', conv_model(wide_conv_window.example[0]).shape)
Wide conv window Input shape: (32, 26, 19) Labels shape: (32, 24, 1) Output shape: (32, 24, 1)
Now you can plot the model's predictions on a wider window.
Note the 3 input time steps before the first prediction. Every prediction here is based on the 3 preceding timesteps:
wide_conv_window.plot(conv_model)
A Recurrent Neural Network (RNN) is a type of neural network well-suited for time series data. RNNs process a time series step-by-step, maintaining an internal state from time-step to time-step.
In this tutorial, we will use an RNN layer called Long Short Term Memory (LSTM).
An important constructor argument for keras RNN is the return_sequences
argument. This setting can configure the layer in one of two ways,
False
(the default), the layer only returns the output of the final timestep, giving the model time to warm up its internal state before making a single prediction:2. If True
the layer returns an output for each input. This is useful for,
lstm_model = tf.keras.models.Sequential([ # Shape [batch, time, features] => [batch, time, lstm_units] tf.keras.layers.LSTM(32, return_sequences=True), # Shape => [batch, time, features] tf.keras.layers.Dense(units=1) ])
With return_sequences=True
the model can be trained on 24h of data at a time.
print('Input shape:', wide_window.example[0].shape) print('Output shape:', lstm_model(wide_window.example[0]).shape)
Input shape: (32, 24, 19) Output shape: (32, 24, 1)
history = compile_and_fit(lstm_model, wide_window) IPython.display.clear_output() val_performance['LSTM'] = lstm_model.evaluate(wide_window.val) performance['LSTM'] = lstm_model.evaluate(wide_window.test, verbose=0)
438/438 [==============================] - 1s 3ms/step - loss: 0.0057 - mean_absolute_error: 0.0523
wide_window.plot(lstm_model)
In this final section of this chapter, we are going to evaluate the performance of every models built in this lesson.
x = np.arange(len(performance)) width = 0.3 metric_name = 'mean_absolute_error' metric_index = lstm_model.metrics_names.index('mean_absolute_error') val_mae = [v[metric_index] for v in val_performance.values()] test_mae = [v[metric_index] for v in performance.values()] plt.ylabel('mean_absolute_error [T (degC), normalized]') plt.bar(x - 0.17, val_mae, width, label='Validation') plt.bar(x + 0.17, test_mae, width, label='Test') plt.xticks(ticks=x, labels=performance.keys(), rotation=45) _ = plt.legend()
for name, value in performance.items(): print(f'{name:12s}: {value[1]:0.4f}')
Baseline : 0.0852 Linear : 0.0667 Dense : 0.0580 Multi step dense: 0.0612 Conv : 0.0553 LSTM : 0.0532
Thus, we can observe that each model performs slightly better than another one. Also, note that the Baseline model is not actually an intelligent model, so its metrics might be misleading.
This is it for building single-step time series forecasting models using TensorFlow 2.0. Now in the next chapter on 'Multi-Step Time Series Forecasting', we will learn about building multi-step forecasting models using TensorFlow 2.0.
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in: