Causal Convolution

Overview of Causal Convolution

Causal convolutions are a type of convolution used for temporal data, which ensures that the model does not violate the order of data. For instance, the prediction made at timestep t should not depend on any of the future timesteps, such as x_t+1, x _t+2, etc.

This article explains what causal convolutions are, how they work, and why they are beneficial to use. Additionally, we will look at masked convolutions used for images and shift convolutions used for audio files.

What is Causal Convolution?

Causal convolution is the process of filtering temporal data, ensuring that the model does not depend on future data. In simpler words, the convolution only processes the data up to the current point in time (timestep), preventing the convolution from leaking future data to the model. Causal convolutions are commonly used in many applications, including image and audio processing.

Causal convolutions are especially useful in scenarios where we observe a sequence of events, such as stock price, weather data, or music. We can use this sequence of data to make predictions about future trends, but only with respect to the current time-point. This means that the future data cannot be "knowable" or visible to the model while it's making predictions.

How does Causal Convolution Work?

Causal convolutions operate by restricting the accessible input at each time-step, using only the data up to and including the current timestep. For example, suppose we have a sequence of length t, {x₁, x₂,..., x_t}. If we want to use a causal convolution to predict the next element x_t+1, we would mask or constrain the convolutional kernel in a specific way for each time-step.

While conventional convolutional layers use a fixed kernel, causal convolutions apply a kernel that varies depending on the current time-step. To accomplish this, we can create a kernel mask, which is a binary tensor that matches the dimensions of the convolutional kernel. The mask tensor has zeros in the upper triangle corresponding to future values that we don't want to use for the current time-step. Thus, applying the causal convolution with the masked kernel uses only the current and past values to predict the future values, allowing us to prevent data leakage from the future to the model.

Applications of Causal Convolution

Causal convolutions have several applications in machine learning, particularly for temporal data such as speech, audio, music, and video processing. Additionally, they are commonly used in social media recommendation systems, time-series forecasting, text analysis, and more.

In one time-series forecasting scenario, where we need to analyze stock prices of a particular company, the stock prices usually depend on the previous day's prices. However, it is not necessary that they will depend on the future prices. Thus, we can use causal convolutions to create a model that only considers previous stock prices to predict the future price, allowing us to create a more accurate forecasting model.

Masked Convolutions for Images

While causal convolutions work well for one-dimensional data, they require more computational resources when applied to higher-dimensional data, such as images or video frames. In these cases, we can use a masked convolution. Much like a causal convolution, the masked convolutional kernel is constrained, but rather than applying a regular 1D kernel, we apply a 2D kernel with a 2D mask tensor.

The 2D mask tensor has dimensions matching the kernel tensor and has a 'lower triangular' pattern, where zeroed out entries correspond to the future pixels. Thus, the masked convolution processes only the pixels to the left or above the current position, allowing us to prevent the current pixel from accessing future data.

Shift Convolutions for Audio Processing

For audio signals, it is more typical to use causal convolutions with a shift of a few timesteps, rather than masking or constraining the kernel. However, this method only works for data, where there are no structural dependencies between samples.
For example, suppose we want to determine whether an audio signal contains a specific word. In this case, we can shift the version of the convolution output by k steps but preserve the original pattern of the output. In this way, we can observe how the signal compares to the repeating pattern, indicating that the word we're trying to detect is present in the audio.

Causal convolutions are a vital tool for machine learning applied to temporal data, allowing us to prevent data 'leakage' and making accurate predictions. With the constrained kernel of causal convolutions, we can select only the relevant information, making our model more efficient and effective in many applications like image and audio processing, natural language processing, time-series prediction, and more.