Since we will be building many time-series forecasting models, it is a good idea to create helper functions to make our workflow smooth. In this lesson, we will be developing helper functions for creating data windows, splitting data and for creating visualizations.
What is a helper function in Python?
A helper function is a function that performs part of the computation of another function following the DRY (Don’t repeat yourself) concept. This means that when you write a helper function, you can re-use it in various parts of your code and focus more on the larger objectives with all the smaller blocks of code in an implementable stage.
Let us create some helper functions for time-series forecasting in this lesson.
1. Creating a data window
In time-series forecasting, our model will be looking at a certain number of previous consecutive data to make a forecast. For example, we can predict one hour into the future by using consecutive data of the past 6 hours.
This consecutive number of data taken for time-series forecasting is known as data window. Similarly, we can generate a prediction for 24 hours into the future by using a data window of the past 24 hours.
So, let us construct a python class that can create a data window as per our requirements given the training, validation and testing dataset.
class WindowGenerator(): def __init__(self, input_width, label_width, shift, train_df=train_df, val_df=val_df, test_df=test_df, label_columns=None): # Store the raw data. Refer to the previous chapter for the DataFrames. self.train_df = train_df self.val_df = val_df self.test_df = test_df # Work out the label column indices. self.label_columns = label_columns if label_columns is not None: self.label_columns_indices = {name: i for i, name in enumerate(label_columns)} self.column_indices = {name: i for i, name in enumerate(train_df.columns)} # Work out the window parameters. self.input_width = input_width self.label_width = label_width self.shift = shift self.total_window_size = input_width + shift self.input_slice = slice(0, input_width) self.input_indices = np.arange(self.total_window_size)[self.input_slice] self.label_start = self.total_window_size - self.label_width self.labels_slice = slice(self.label_start, None) self.label_indices = np.arange(self.total_window_size)[self.labels_slice] def __repr__(self): return '\n'.join([ f'Total window size: {self.total_window_size}', f'Input indices: {self.input_indices}', f'Label indices: {self.label_indices}', f'Label column name(s): {self.label_columns}'])
Great! Let us test if our window generator is working as intended.
# Predicting one hour into the future by using a data window of the past 6 hours w1 = WindowGenerator(input_width=6, label_width=1, shift=1, label_columns=['T (degC)']) print(f'First Window: \n{w1}') # Predicting 24 hours into the future by using a data window of the past 24 hours. w2 = WindowGenerator(input_width=24, label_width=1, shift=24, label_columns=['T (degC)']) print(f'\nSecond Window: \n{w2}')
First Window: Total window size: 7 Input indices: [0 1 2 3 4 5] Label indices: [6] Label column name(s): ['T (degC)'] Second Window: Total window size: 48 Input indices: [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23] Label indices: [47] Label column name(s): ['T (degC)']
2. Splitting a window
After creating a data window, we need to need to split the window into two windows: a window of inputs and another window of labels.
In the above example, the first window (w1) can be split into two separate windows as follows,
For this purpose, we will be creating a helper function split_window() that will take a list of consecutive inputs, the and convert them to a window of inputs and a window of labels.
def split_window(self, features): inputs = features[:, self.input_slice, :] labels = features[:, self.labels_slice, :] if self.label_columns is not None: labels = tf.stack( [labels[:, :, self.column_indices[name]] for name in self.label_columns], axis=-1) # Slicing doesn't preserve static shape information, so set the shapes # manually. This way the `tf.data.Datasets` are easier to inspect. inputs.set_shape([None, self.input_width, None]) labels.set_shape([None, self.label_width, None]) return inputs, labels WindowGenerator.split_window = split_window
Let us try using the function for the above example (w1).
# Stack three slices, the length of the total window: example_window = tf.stack([np.array(train_df[:w1.total_window_size]), np.array(train_df[100:100+w1.total_window_size]), np.array(train_df[200:200+w1.total_window_size])]) example_inputs, example_labels = w1.split_window(example_window) print('All shapes are: (batch, time, features)') print(f'Window shape: {example_window.shape}') print(f'Inputs shape: {example_inputs.shape}') print(f'labels shape: {example_labels.shape}')
All shapes are: (batch, time, features) Window shape: (3, 7, 19) Inputs shape: (3, 6, 19) labels shape: (3, 1, 1)
3. Visualizing data
Now, let us create a helper function to visualize the dataset.
def plot(self, model=None, plot_col='T (degC)', max_subplots=3): inputs, labels = self.example plt.figure(figsize=(12, 8)) plot_col_index = self.column_indices[plot_col] max_n = min(max_subplots, len(inputs)) for n in range(max_n): plt.subplot(3, 1, n+1) plt.ylabel(f'{plot_col} [normed]') plt.plot(self.input_indices, inputs[n, :, plot_col_index], label='Inputs', marker='.', zorder=-10) if self.label_columns: label_col_index = self.label_columns_indices.get(plot_col, None) else: label_col_index = plot_col_index if label_col_index is None: continue plt.scatter(self.label_indices, labels[n, :, label_col_index], edgecolors='k', label='Labels', c='#2ca02c', s=64) if model is not None: predictions = model(inputs) plt.scatter(self.label_indices, predictions[n, :, label_col_index], marker='X', edgecolors='k', label='Predictions', c='#ff7f0e', s=64) if n == 0: plt.legend() plt.xlabel('Time [h]') # Creating an example plot w1.example = example_inputs, example_labels WindowGenerator.plot = plot w1.plot()
4. Creating a tf.data.Dataset
The last step that we need to go through is to build a helper function for creating a tf.data.Dataset using a pandas DataFrame. Creating a tf.data.Dataset will be useful later in this course while building time-series forecasting models.
def make_dataset(self, data): data = np.array(data, dtype=np.float32) ds = tf.keras.preprocessing.timeseries_dataset_from_array( data=data, targets=None, sequence_length=self.total_window_size, sequence_stride=1, shuffle=True, batch_size=32,) ds = ds.map(self.split_window) return ds WindowGenerator.make_dataset = make_dataset @property def train(self): return self.make_dataset(self.train_df) @property def val(self): return self.make_dataset(self.val_df) @property def test(self): return self.make_dataset(self.test_df) @property def example(self): """Get and cache an example batch of `inputs, labels` for plotting.""" result = getattr(self, '_example', None) if result is None: # No example batch was found, so get one from the `.train` dataset result = next(iter(self.train)) # And cache it for next time self._example = result return result WindowGenerator.train = train WindowGenerator.val = val WindowGenerator.test = test WindowGenerator.example = example
Now the WindowGenerator object gives us access to the tf.data.Dataset objects, so you can easily iterate over the data.
w1.train.element_spec
(TensorSpec(shape=(None, 6, 19), dtype=tf.float32, name=None), TensorSpec(shape=(None, 1, 1), dtype=tf.float32, name=None))
Iterating over a Dataset yields concrete batches:
for example_inputs, example_labels in w1.train.take(1): print(f'Inputs shape (batch, time, features): {example_inputs.shape}') print(f'Labels shape (batch, time, features): {example_labels.shape}')
Inputs shape (batch, time, features): (32, 6, 19) Labels shape (batch, time, features): (32, 1, 1)
Head on to the next lesson on ‘Single-Step Time Series Forecasting‘ to understand the process of building a single-step time-series forecasting model using TensorFlow 2.0.
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:
- Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
- Introduction to Data Science in Python- 400,000+ students already enrolled!
- Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
- Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!