Creating Helper Functions

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

Since we will be building many time-series forecasting models, it is a good idea to create helper functions to make our workflow smooth. In this lesson, we will be developing helper functions for creating data windows, splitting data and for creating visualizations.


What is a helper function in Python?

A helper function is a function that performs part of the computation of another function following the DRY (Don’t repeat yourself) concept. This means that when you write a helper function, you can re-use it in various parts of your code and focus more on the larger objectives with all the smaller blocks of code in an implementable stage.

Let us create some helper functions for time-series forecasting in this lesson.

1. Creating a data window

In time-series forecasting, our model will be looking at a certain number of previous consecutive data to make a forecast. For example, we can predict one hour into the future by using consecutive data of the past 6 hours.

Creating a data window

This consecutive number of data taken for time-series forecasting is known as data window. Similarly, we can generate a prediction for 24 hours into the future by using a data window of the past 24 hours.

Creating a time series data offset

So, let us construct a python class that can create a data window as per our requirements given the training, validation and testing dataset.

class WindowGenerator():
    def __init__(self, input_width, label_width, shift,
               train_df=train_df, val_df=val_df, test_df=test_df,
               label_columns=None):
        # Store the raw data. Refer to the previous chapter for the DataFrames.
        self.train_df = train_df
        self.val_df = val_df
        self.test_df = test_df

        # Work out the label column indices.
        self.label_columns = label_columns
        if label_columns is not None:
            self.label_columns_indices = {name: i for i, name in
                                        enumerate(label_columns)}
        self.column_indices = {name: i for i, name in
                               enumerate(train_df.columns)}

        # Work out the window parameters.
        self.input_width = input_width
        self.label_width = label_width
        self.shift = shift

        self.total_window_size = input_width + shift

        self.input_slice = slice(0, input_width)
        self.input_indices = np.arange(self.total_window_size)[self.input_slice]

        self.label_start = self.total_window_size - self.label_width
        self.labels_slice = slice(self.label_start, None)
        self.label_indices = np.arange(self.total_window_size)[self.labels_slice]

    def __repr__(self):
        return '\n'.join([
            f'Total window size: {self.total_window_size}',
            f'Input indices: {self.input_indices}',
            f'Label indices: {self.label_indices}',
            f'Label column name(s): {self.label_columns}'])  

Great! Let us test if our window generator is working as intended.

# Predicting one hour into the future by using a data window of the past 6 hours
w1 = WindowGenerator(input_width=6, label_width=1, shift=1,
                     label_columns=['T (degC)'])
print(f'First Window: \n{w1}')

# Predicting 24 hours into the future by using a data window of the past 24 hours.
w2 = WindowGenerator(input_width=24, label_width=1, shift=24,
                     label_columns=['T (degC)'])
print(f'\nSecond Window: \n{w2}')
First Window: 
Total window size: 7
Input indices: [0 1 2 3 4 5]
Label indices: [6]
Label column name(s): ['T (degC)']

Second Window: 
Total window size: 48
Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
Label indices: [47]
Label column name(s): ['T (degC)']

2. Splitting a window

After creating a data window, we need to need to split the window into two windows: a window of inputs and another window of labels.

In the above example, the first window (w1) can be split into two separate windows as follows,

Splitting the input and label data

For this purpose, we will be creating a helper function split_window() that will take a list of consecutive inputs, the and convert them to a window of inputs and a window of labels.

def split_window(self, features):
    inputs = features[:, self.input_slice, :]
    labels = features[:, self.labels_slice, :]
    if self.label_columns is not None:
        labels = tf.stack(
            [labels[:, :, self.column_indices[name]] for name in self.label_columns],
            axis=-1)

    # Slicing doesn't preserve static shape information, so set the shapes
    # manually. This way the `tf.data.Datasets` are easier to inspect.
    inputs.set_shape([None, self.input_width, None])
    labels.set_shape([None, self.label_width, None])

    return inputs, labels

WindowGenerator.split_window = split_window

Let us try using the function for the above example (w1).

# Stack three slices, the length of the total window:
example_window = tf.stack([np.array(train_df[:w1.total_window_size]),
                           np.array(train_df[100:100+w1.total_window_size]),
                           np.array(train_df[200:200+w1.total_window_size])])


example_inputs, example_labels = w1.split_window(example_window)

print('All shapes are: (batch, time, features)')
print(f'Window shape: {example_window.shape}')
print(f'Inputs shape: {example_inputs.shape}')
print(f'labels shape: {example_labels.shape}')
All shapes are: (batch, time, features) 
Window shape: (3, 7, 19) 
Inputs shape: (3, 6, 19) 
labels shape: (3, 1, 1)

3. Visualizing data

Now, let us create a helper function to visualize the dataset.

def plot(self, model=None, plot_col='T (degC)', max_subplots=3):
    inputs, labels = self.example
    plt.figure(figsize=(12, 8))
    plot_col_index = self.column_indices[plot_col]
    max_n = min(max_subplots, len(inputs))
    for n in range(max_n):
        plt.subplot(3, 1, n+1)
        plt.ylabel(f'{plot_col} [normed]')
        plt.plot(self.input_indices, inputs[n, :, plot_col_index],
             label='Inputs', marker='.', zorder=-10)

    if self.label_columns:
        label_col_index = self.label_columns_indices.get(plot_col, None)
    else:
        label_col_index = plot_col_index

    if label_col_index is None:
        continue

    plt.scatter(self.label_indices, labels[n, :, label_col_index],
                edgecolors='k', label='Labels', c='#2ca02c', s=64)
    if model is not None:
        predictions = model(inputs)
        plt.scatter(self.label_indices, predictions[n, :, label_col_index],
                  marker='X', edgecolors='k', label='Predictions',
                  c='#ff7f0e', s=64)

    if n == 0:
        plt.legend()

    plt.xlabel('Time [h]')

# Creating an example plot
w1.example = example_inputs, example_labels
WindowGenerator.plot = plot
w1.plot()
Visualizing time series data and label

4. Creating a tf.data.Dataset

The last step that we need to go through is to build a helper function for creating a tf.data.Dataset using a pandas DataFrame. Creating a tf.data.Dataset will be useful later in this course while building time-series forecasting models.

def make_dataset(self, data):
  data = np.array(data, dtype=np.float32)
  ds = tf.keras.preprocessing.timeseries_dataset_from_array(
      data=data,
      targets=None,
      sequence_length=self.total_window_size,
      sequence_stride=1,
      shuffle=True,
      batch_size=32,)

  ds = ds.map(self.split_window)

  return ds

WindowGenerator.make_dataset = make_dataset

@property
def train(self):
  return self.make_dataset(self.train_df)

@property
def val(self):
  return self.make_dataset(self.val_df)

@property
def test(self):
  return self.make_dataset(self.test_df)

@property
def example(self):
  """Get and cache an example batch of `inputs, labels` for plotting."""
  result = getattr(self, '_example', None)
  if result is None:
    # No example batch was found, so get one from the `.train` dataset
    result = next(iter(self.train))
    # And cache it for next time
    self._example = result
  return result

WindowGenerator.train = train
WindowGenerator.val = val
WindowGenerator.test = test
WindowGenerator.example = example

Now the WindowGenerator object gives us access to the tf.data.Dataset objects, so you can easily iterate over the data.

w1.train.element_spec
(TensorSpec(shape=(None, 6, 19), dtype=tf.float32, name=None), 
TensorSpec(shape=(None, 1, 1), dtype=tf.float32, name=None))

Iterating over a Dataset yields concrete batches:

for example_inputs, example_labels in w1.train.take(1):
  print(f'Inputs shape (batch, time, features): {example_inputs.shape}')
  print(f'Labels shape (batch, time, features): {example_labels.shape}')
Inputs shape (batch, time, features): (32, 6, 19) 
Labels shape (batch, time, features): (32, 1, 1)

Head on to the next lesson on ‘Single-Step Time Series Forecasting‘ to understand the process of building a single-step time-series forecasting model using TensorFlow 2.0.


Creating Helper FunctionsCreating Helper Functions

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

  1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
  2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
  3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
  4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Leave a Comment