Single-Step Time Series Forecasting

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

In this lesson, we will be going over how to build a single-step time series forecasting model using TensorFlow 2.0. We will start by predicting the ‘T (degC)’ feature one step into the future.

Single-step forecasting models are those which predict the observation at the next time step, i.e., only one time-step is to be predicted. For example, given the weather information of the past 6 days, a single step forecasting model will only predict the weather of the 7th day, i.e., only one time-step into the future.

Single step time series forecasting using TensorFlow

We will first start by creating a single step window using the WindowGenerator class we defined in the previous chapter. Also, we will only be taking the 'T (degC)' column of our data.

single_step_window = WindowGenerator(
    input_width=1, label_width=1, shift=1,
    label_columns=['T (degC)'])
Total window size: 2 
Input indices: [0] 
Label indices: [1] 
Label column name(s): ['T (degC)']

The window object creates from the training, validation, and test sets, that allow us to iterate over batches of data. The following example illustrates how we can iterate over the train set,

for example_inputs, example_labels in single_step_window.train.take(1):
    print(f'Inputs shape (batch, time, features): {example_inputs.shape}')
    print(f'Labels shape (batch, time, features): {example_labels.shape}')
Inputs shape (batch, time, features): (32, 1, 19) 
Labels shape (batch, time, features): (32, 1, 1)

Now, we will move on to build forecasting models that could predict temperature 1h in the future given the current value of all features.

1. Creating a baseline model

Before building an actual trainable model, it would be easier for us to understand a simple model which is nothing but a model that takes in input values and just returns the current temperature as a prediction.

In other words, the model we will be building will predict “No change”. This is a reasonable baseline since temperature changes slowly. Of course, this baseline will work less well if we make a prediction further in the future.

class Baseline(tf.keras.Model):
  def __init__(self, label_index=None):
    self.label_index = label_index

  def call(self, inputs):
    if self.label_index is None:
      return inputs
    result = inputs[:, :, self.label_index]
    return result[:, :, tf.newaxis]

# Instantiatie and compile the model
baseline = Baseline(label_index=column_indices['T (degC)'])


val_performance = {}
performance = {}
val_performance['Baseline'] = baseline.evaluate(single_step_window.val)
performance['Baseline'] = baseline.evaluate(single_step_window.test, verbose=0)
439/439 [==============================] - 2s 5ms/step - loss: 0.0128 - mean_absolute_error: 0.0785

The above block of code created our model named baseline which simply returned the inputs by adding another axis to it using tf.newaxis. Then we compiled the instantiated and compiled the model using Mean Squared Error as our loss function and Mean Absolute Error as our performance metrics. Finally, we printed some metrics. But, as we are not training the model, those metrics don’t give us an idea of how well our model is doing.

The WindowGenerator has a plot method, but the plots won’t be very interesting with only a single sample. So, we will create a wider WindowGenerator named wide_window that generates windows 24h of consecutive inputs and labels at a time.

wide_window = WindowGenerator(
    input_width=24, label_width=24, shift=1,
    label_columns=['T (degC)'])

Total window size: 25 
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] 
Label indices: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] 
Label column name(s): ['T (degC)']

The wide_window doesn’t change the way the model operates. The model still makes predictions 1h into the future based on a single input time step. Here the time axis acts like the batch axis: Each prediction is made independently with no interaction between time steps.

This expanded window can be passed directly to the same baseline model without any code changes. This is possible because the inputs and labels have the same number of timesteps, and the baseline just forwards the input to the output as illustrated below,

Single step time series forecasting baseline model

When we plot the baseline model’s predictions, we can clearly see that the predictions are simply the labels, shifted right by 1h.

Single step time series line plot

In the above plots of three graphs, the single-step model is run over the course of 24h. Here is a brief explanation of the above graphs,

  • The blue “Inputs” line shows the input temperature at each time step. Though the model receives all features, this plot only shows the temperature feature.
  • The green “Labels” dots show the target prediction value. These dots are shown at the prediction time, not the input time. That is why the range of labels is shifted 1 step relative to the inputs.
  • The orange “Predictions” crosses are the model’s predictions for each output time step. If the model was predicting perfectly, the predictions would have landed directly on the “labels” (green dots).

2. Using a single step time series forecasting linear model

Linear model refers to those models that perform a linear transformation between the input variables to obtain the output. It is one of the simplest trainable model that we can apply. In a single time step forecasting model, the output from a time step only depends on that step.

To create our linear model, we use TensorFlow’s sequential model having only a single dense layer and no activation function. The dense layer only transforms the last axis of the data from (batch, time, inputs) to (batch, time, units). The transformation is applied independently to every item across the batch and time axes.

linear = tf.keras.Sequential([

print('Input shape:', single_step_window.example[0].shape)
print('Output shape:', linear(single_step_window.example[0]).shape)
Input shape: (32, 1, 19) 
Output shape: (32, 1, 1)

Thus, we can see that our model takes in an input shape of (batch, time, inputs) and transforms it to (batch, time, units).

Now, we will define a function for compiling and training the model. Packaging the compiling and training process into a single function will also be useful for training other models later in this course.


def compile_and_fit(model, window, patience=2):
  early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss',


  history =, epochs=MAX_EPOCHS,
  return history

Finally, we train the model and evaluate its performance,

history = compile_and_fit(linear, single_step_window)

val_performance['Linear'] = linear.evaluate(single_step_window.val)
performance['Linear'] = linear.evaluate(single_step_window.test, verbose=0)
Epoch 1/20 1534/1534 [==============================] - 5s 3ms/step - loss: 0.4313 - mean_absolute_error: 0.3443 - val_loss: 0.0185 - val_mean_absolute_error: 0.1006 
Epoch 2/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0145 - mean_absolute_error: 0.0891 - val_loss: 0.0107 - val_mean_absolute_error: 0.0760 
Epoch 3/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0101 - mean_absolute_error: 0.0739 - val_loss: 0.0089 - val_mean_absolute_error: 0.0692 
Epoch 4/20 1534/1534 [==============================] - 5s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0700 - val_loss: 0.0086 - val_mean_absolute_error: 0.0680 
Epoch 5/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090 - mean_absolute_error: 0.0695 - val_loss: 0.0087 - val_mean_absolute_error: 0.0682 
Epoch 6/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0697 - val_loss: 0.0086 - val_mean_absolute_error: 0.0675 
Epoch 7/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0091 - mean_absolute_error: 0.0697 - val_loss: 0.0087 - val_mean_absolute_error: 0.0685 
Epoch 8/20 1534/1534 [==============================] - 4s 3ms/step - loss: 0.0090 - mean_absolute_error: 0.0695 - val_loss: 0.0087 - val_mean_absolute_error: 0.0677 439/439 [==============================] - 1s 2ms/step - loss: 0.0087 - mean_absolute_error: 0.0677

Like the baseline model, the linear model can be called on batches of wide windows. Used this way the model makes a set of independent predictions on consecutive time steps. The time axis acts like another batch axis. There are no interactions between the predictions at each time step.

Single step linear time series forecasting model
Single step linear time series forecasting plot

One advantage of linear models is that they’re relatively simple to interpret. You can pull out the layer’s weights, and see the weight assigned to each input: = range(len(train_df.columns)),
axis = plt.gca()
_ = axis.set_xticklabels(train_df.columns, rotation=90)
Feature weights bar plot

3. Using a single step time series forecasting dense model

In the previous section the model was a linear model which had only a single dense layer. In this section, we will be building a more dense, more powerful NN model.

Single step time series forecasting dense model

The Dense model is similar to the linear model, except it has more number of Dense layers between the input and output as shown below,

dense = tf.keras.Sequential([
    tf.keras.layers.Dense(units=64, activation='relu'),
    tf.keras.layers.Dense(units=64, activation='relu'),

As we can see, the Dense model has 3 sequential dense layers as opposed to a single dense layer in the Linear Model. Now, we will compile, train and evaluate the Dense model,

history = compile_and_fit(dense, single_step_window)

val_performance['Dense'] = dense.evaluate(single_step_window.val)
performance['Dense'] = dense.evaluate(single_step_window.test, verbose=0)
Epoch 1/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0173 - mean_absolute_error: 0.0803 - val_loss: 0.0082 - val_mean_absolute_error: 0.0669 
Epoch 2/20 1534/1534 [==============================] - 7s 4ms/step - loss: 0.0077 - mean_absolute_error: 0.0632 - val_loss: 0.0084 - val_mean_absolute_error: 0.0691 
Epoch 3/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0073 - mean_absolute_error: 0.0615 - val_loss: 0.0069 - val_mean_absolute_error: 0.0586 
Epoch 4/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0070 - mean_absolute_error: 0.0598 - val_loss: 0.0067 - val_mean_absolute_error: 0.0575 
Epoch 5/20 1534/1534 [==============================] - 5s 4ms/step - loss: 0.0069 - mean_absolute_error: 0.0592 - val_loss: 0.0064 - val_mean_absolute_error: 0.0561 
Epoch 6/20 1534/1534 [==============================] - 5s 4ms/step - loss: 0.0068 - mean_absolute_error: 0.0584 - val_loss: 0.0064 - val_mean_absolute_error: 0.0563 
Epoch 7/20 1534/1534 [==============================] - 6s 4ms/step - loss: 0.0068 - mean_absolute_error: 0.0582 - val_loss: 0.0065 - val_mean_absolute_error: 0.0574 439/439 [==============================] - 1s 2ms/step - loss: 0.0065 - mean_absolute_error: 0.0574

4. Using a multi-step input dense model

A single-time-step model has no context for the current values of its inputs. The model fails to see how the input features change over time. To address this issue the model needs access to multiple time steps when making predictions,

Single step forecasting dense model

The baselinelinear and dense models we studied earlier handled each time step independently. However, now our model will take multiple time steps as input to produce a single output.

Let us first start by creating our data for the model. We will create a WindowGenerator that will produce batches of the 3h of inputs and, 1h of labels:

conv_window = WindowGenerator(
    label_columns=['T (degC)'])

Total window size: 4 
Input indices: [0 1 2] 
Label indices: [3] 
Label column name(s): ['T (degC)']

Note that the Window‘s shift parameter is relative to the end of the two windows. Now we will plot the window created,

plt.title("Given 3h as input, predict 1h into the future.")
Time Series Forecasting Line Plot

As we can see from the above figure, we need to predict 1h into the future given past 3 hours of data as input.

We could train a dense model on a multiple-input-step window by adding a layers.Flatten as the first layer of the model. This layer will flatten the input matrix into a 1-D array which can then be passed to other dense layers. For the dense layers, we will be using the ReLU activation function.

multi_step_dense = tf.keras.Sequential([
    # Shape: (time, features) => (time*features)
    tf.keras.layers.Dense(units=32, activation='relu'),
    tf.keras.layers.Dense(units=32, activation='relu'),
    # Add back the time dimension.
    # Shape: (outputs) => (1, outputs)
    tf.keras.layers.Reshape([1, -1]),

print('Input shape:', conv_window.example[0].shape)
print('Output shape:', multi_step_dense(conv_window.example[0]).shape)
Input shape: (32, 3, 19) 
Output shape: (32, 1, 1)

After creating the model, we will use the compile_and_fit function we defined earlier to compile the model created and fit the model to data.

history = compile_and_fit(multi_step_dense, conv_window)

val_performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.val)
performance['Multi step dense'] = multi_step_dense.evaluate(conv_window.test, verbose=0)
438/438 [==============================] - 1s 2ms/step - loss: 0.0068 - mean_absolute_error: 0.0596

Finally, we plot the prediction from the model as,

Time series forecasting prediction

The main down-side of using a dense neural network is that the resulting model can only be executed on input windows of exactly this shape. For example, if we used any other input windows, we will get back an error as shown below,

print('Input shape:', wide_window.example[0].shape)
  print('Output shape:', multi_step_dense(wide_window.example[0]).shape)
except Exception as e:
Input shape: (32, 24, 19)
InvalidArgumentError:Matrix size-incompatible: In[0]: [32,456], In[1]: [57,32] [Op:MatMul]

This problem can be resolved by using another type of Neural Network called the Convolutional Neural Network (CNN) that we will study in the next section.

5. Using a multi-step input Convolutional Neural Network (CNN)

Convolutional Neural Network (CNN) is a type of Neural Network Architecture that is widely used for performing deep learning on imagery data. It has convolutional layers followed by dense layers. In time series forecasting, the convolution layer takes multiple time steps as input to each prediction.

conv_model = tf.keras.Sequential([
    tf.keras.layers.Dense(units=32, activation='relu'),

In the above block of code, creates a Convolutional Neural Network (CNN) with a convolutional layer as an input layer followed by two dense layers. If you are not familiar with what CNNs are, you can refer our another course on CNNs.

Now we run the model on an example batch of data to print out the input shape and output shape of the Conv model created,

print("Conv model on `conv_window`")
print('Input shape:', conv_window.example[0].shape)
print('Output shape:', conv_model(conv_window.example[0]).shape)
Conv model on `conv_window` 
Input shape: (32, 3, 19) 
Output shape: (32, 1, 1)

Finally, we compile and train the model as,

history = compile_and_fit(conv_model, conv_window)

val_performance['Conv'] = conv_model.evaluate(conv_window.val)
performance['Conv'] = conv_model.evaluate(conv_window.test, verbose=0)
438/438 [==============================] - 1s 2ms/step - loss: 0.0070 - mean_absolute_error: 0.0607

The difference between this conv_model and the multi_step_dense model is that the conv_model can be run on inputs of any length. The convolutional layer is applied to a sliding window of inputs:

Multi-step input Convolutional Neural Network (CNN)

If you run it on wider input, it produces wider output as shown below,

print("Wide window")
print('Input shape:', wide_window.example[0].shape)
print('Labels shape:', wide_window.example[1].shape)
print('Output shape:', conv_model(wide_window.example[0]).shape)
Wide window 
Input shape: (32, 24, 19) 
Labels shape: (32, 24, 1) 
Output shape: (32, 22, 1)

Note that the output is shorter than the input. To make training or plotting work, you need the labels, and prediction to have the same length. So we build a WindowGenerator to produce wide windows with a few extra input time steps so the label and prediction lengths match,

wide_conv_window = WindowGenerator(
    label_columns=['T (degC)'])

Total window size: 27 
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25] 
Label indices: [ 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26] 
Label column name(s): ['T (degC)']
print("Wide conv window")
print('Input shape:', wide_conv_window.example[0].shape)
print('Labels shape:', wide_conv_window.example[1].shape)
print('Output shape:', conv_model(wide_conv_window.example[0]).shape)
Wide conv window 
Input shape: (32, 26, 19) 
Labels shape: (32, 24, 1) 
Output shape: (32, 24, 1)

Now you can plot the model’s predictions on a wider window. 

Note the 3 input time steps before the first prediction. Every prediction here is based on the 3 preceding timesteps:

Multi-step input Convolutional Neural Network (CNN) plot

6. Using a multi-step input Recurrent Neural Network

A Recurrent Neural Network (RNN) is a type of neural network well-suited for time series data. RNNs process a time series step-by-step, maintaining an internal state from time-step to time-step.

In this tutorial, we will use an RNN layer called Long Short Term Memory (LSTM).

An important constructor argument for keras RNN is the return_sequences argument. This setting can configure the layer in one of two ways,

  1. If False (the default), the layer only returns the output of the final timestep, giving the model time to warm up its internal state before making a single prediction:
Single step forecasting RNN model with multi-step input

2. If True the layer returns an output for each input. This is useful for,

  • Stacking RNN layers.
  • Training a model on multiple timesteps simultaneously.
Single-step forecasting RNN model with multi-step input 2
lstm_model = tf.keras.models.Sequential([
    # Shape [batch, time, features] => [batch, time, lstm_units]
    tf.keras.layers.LSTM(32, return_sequences=True),
    # Shape => [batch, time, features]

With return_sequences=True the model can be trained on 24h of data at a time.

print('Input shape:', wide_window.example[0].shape)
print('Output shape:', lstm_model(wide_window.example[0]).shape)
Input shape: (32, 24, 19) 
Output shape: (32, 24, 1)
history = compile_and_fit(lstm_model, wide_window)

val_performance['LSTM'] = lstm_model.evaluate(wide_window.val)
performance['LSTM'] = lstm_model.evaluate(wide_window.test, verbose=0)
438/438 [==============================] - 1s 3ms/step - loss: 0.0057 - mean_absolute_error: 0.0523
Single-step forecasting RNN model with multi-step input plot

Evaluating the performance of single step models

In this final section of this chapter, we are going to evaluate the performance of every models built in this lesson.

x = np.arange(len(performance))
width = 0.3
metric_name = 'mean_absolute_error'
metric_index = lstm_model.metrics_names.index('mean_absolute_error')
val_mae = [v[metric_index] for v in val_performance.values()]
test_mae = [v[metric_index] for v in performance.values()]

plt.ylabel('mean_absolute_error [T (degC), normalized]') - 0.17, val_mae, width, label='Validation') + 0.17, test_mae, width, label='Test')
plt.xticks(ticks=x, labels=performance.keys(),
_ = plt.legend() 
Time series Model Evaluation
for name, value in performance.items():
  print(f'{name:12s}: {value[1]:0.4f}')
Baseline : 0.0852 
Linear : 0.0667 
Dense : 0.0580 
Multi step dense: 0.0612 
Conv : 0.0553 
LSTM : 0.0532

Thus, we can observe that each model performs slightly better than another one. Also, note that the Baseline model is not actually an intelligent model, so its metrics might be misleading.

This is it for building single-step time series forecasting models using TensorFlow 2.0. Now in the next chapter on ‘Multi-Step Time Series Forecasting‘, we will learn about building multi-step forecasting models using TensorFlow 2.0.

Single-Step Time Series ForecastingSingle-Step Time Series Forecasting

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

  1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
  2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
  3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
  4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Leave a Comment