Since we will be building many time-series forecasting models, it is a good idea to create helper functions to make our workflow smooth. In this lesson, we will be developing helper functions for creating data windows, splitting data and for creating visualizations.
A helper function is a function that performs part of the computation of another function following the DRY (Don't repeat yourself) concept. This means that when you write a helper function, you can re-use it in various parts of your code and focus more on the larger objectives with all the smaller blocks of code in an implementable stage.
Let us create some helper functions for time-series forecasting in this lesson.
In time-series forecasting, our model will be looking at a certain number of previous consecutive data to make a forecast. For example, we can predict one hour into the future by using consecutive data of the past 6 hours.

This consecutive number of data taken for time-series forecasting is known as data window. Similarly, we can generate a prediction for 24 hours into the future by using a data window of the past 24 hours.

So, let us construct a python class that can create a data window as per our requirements given the training, validation and testing dataset.
class WindowGenerator():
def __init__(self, input_width, label_width, shift,
train_df=train_df, val_df=val_df, test_df=test_df,
label_columns=None):
# Store the raw data. Refer to the previous chapter for the DataFrames.
self.train_df = train_df
self.val_df = val_df
self.test_df = test_df
# Work out the label column indices.
self.label_columns = label_columns
if label_columns is not None:
self.label_columns_indices = {name: i for i, name in
enumerate(label_columns)}
self.column_indices = {name: i for i, name in
enumerate(train_df.columns)}
# Work out the window parameters.
self.input_width = input_width
self.label_width = label_width
self.shift = shift
self.total_window_size = input_width + shift
self.input_slice = slice(0, input_width)
self.input_indices = np.arange(self.total_window_size)[self.input_slice]
self.label_start = self.total_window_size - self.label_width
self.labels_slice = slice(self.label_start, None)
self.label_indices = np.arange(self.total_window_size)[self.labels_slice]
def __repr__(self):
return '\n'.join([
f'Total window size: {self.total_window_size}',
f'Input indices: {self.input_indices}',
f'Label indices: {self.label_indices}',
f'Label column name(s): {self.label_columns}'])
Great! Let us test if our window generator is working as intended.
# Predicting one hour into the future by using a data window of the past 6 hours
w1 = WindowGenerator(input_width=6, label_width=1, shift=1,
label_columns=['T (degC)'])
print(f'First Window: \n{w1}')
# Predicting 24 hours into the future by using a data window of the past 24 hours.
w2 = WindowGenerator(input_width=24, label_width=1, shift=24,
label_columns=['T (degC)'])
print(f'\nSecond Window: \n{w2}')
First Window: Total window size: 7 Input indices: [0 1 2 3 4 5] Label indices: [6] Label column name(s): ['T (degC)'] Second Window: Total window size: 48 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Label indices: [47] Label column name(s): ['T (degC)']
After creating a data window, we need to need to split the window into two windows: a window of inputs and another window of labels.
In the above example, the first window (w1) can be split into two separate windows as follows,

For this purpose, we will be creating a helper function split_window() that will take a list of consecutive inputs, the and convert them to a window of inputs and a window of labels.
def split_window(self, features):
inputs = features[:, self.input_slice, :]
labels = features[:, self.labels_slice, :]
if self.label_columns is not None:
labels = tf.stack(
[labels[:, :, self.column_indices[name]] for name in self.label_columns],
axis=-1)
# Slicing doesn't preserve static shape information, so set the shapes
# manually. This way the `tf.data.Datasets` are easier to inspect.
inputs.set_shape([None, self.input_width, None])
labels.set_shape([None, self.label_width, None])
return inputs, labels
WindowGenerator.split_window = split_window
Let us try using the function for the above example (w1).
# Stack three slices, the length of the total window:
example_window = tf.stack([np.array(train_df[:w1.total_window_size]),
np.array(train_df[100:100+w1.total_window_size]),
np.array(train_df[200:200+w1.total_window_size])])
example_inputs, example_labels = w1.split_window(example_window)
print('All shapes are: (batch, time, features)')
print(f'Window shape: {example_window.shape}')
print(f'Inputs shape: {example_inputs.shape}')
print(f'labels shape: {example_labels.shape}')
All shapes are: (batch, time, features) Window shape: (3, 7, 19) Inputs shape: (3, 6, 19) labels shape: (3, 1, 1)
Now, let us create a helper function to visualize the dataset.
def plot(self, model=None, plot_col='T (degC)', max_subplots=3):
inputs, labels = self.example
plt.figure(figsize=(12, 8))
plot_col_index = self.column_indices[plot_col]
max_n = min(max_subplots, len(inputs))
for n in range(max_n):
plt.subplot(3, 1, n+1)
plt.ylabel(f'{plot_col} [normed]')
plt.plot(self.input_indices, inputs[n, :, plot_col_index],
label='Inputs', marker='.', zorder=-10)
if self.label_columns:
label_col_index = self.label_columns_indices.get(plot_col, None)
else:
label_col_index = plot_col_index
if label_col_index is None:
continue
plt.scatter(self.label_indices, labels[n, :, label_col_index],
edgecolors='k', label='Labels', c='#2ca02c', s=64)
if model is not None:
predictions = model(inputs)
plt.scatter(self.label_indices, predictions[n, :, label_col_index],
marker='X', edgecolors='k', label='Predictions',
c='#ff7f0e', s=64)
if n == 0:
plt.legend()
plt.xlabel('Time [h]')
# Creating an example plot
w1.example = example_inputs, example_labels
WindowGenerator.plot = plot
w1.plot()

The last step that we need to go through is to build a helper function for creating a tf.data.Dataset using a pandas DataFrame. Creating a tf.data.Dataset will be useful later in this course while building time-series forecasting models.
def make_dataset(self, data):
data = np.array(data, dtype=np.float32)
ds = tf.keras.preprocessing.timeseries_dataset_from_array(
data=data,
targets=None,
sequence_length=self.total_window_size,
sequence_stride=1,
shuffle=True,
batch_size=32,)
ds = ds.map(self.split_window)
return ds
WindowGenerator.make_dataset = make_dataset
@property
def train(self):
return self.make_dataset(self.train_df)
@property
def val(self):
return self.make_dataset(self.val_df)
@property
def test(self):
return self.make_dataset(self.test_df)
@property
def example(self):
"""Get and cache an example batch of `inputs, labels` for plotting."""
result = getattr(self, '_example', None)
if result is None:
# No example batch was found, so get one from the `.train` dataset
result = next(iter(self.train))
# And cache it for next time
self._example = result
return result
WindowGenerator.train = train
WindowGenerator.val = val
WindowGenerator.test = test
WindowGenerator.example = example
Now the WindowGenerator object gives us access to the tf.data.Dataset objects, so you can easily iterate over the data.
w1.train.element_spec
(TensorSpec(shape=(None, 6, 19), dtype=tf.float32, name=None), TensorSpec(shape=(None, 1, 1), dtype=tf.float32, name=None))
Iterating over a Dataset yields concrete batches:
for example_inputs, example_labels in w1.train.take(1):
print(f'Inputs shape (batch, time, features): {example_inputs.shape}')
print(f'Labels shape (batch, time, features): {example_labels.shape}')
Inputs shape (batch, time, features): (32, 6, 19) Labels shape (batch, time, features): (32, 1, 1)
Head on to the next lesson on 'Single-Step Time Series Forecasting' to understand the process of building a single-step time-series forecasting model using TensorFlow 2.0.
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in: