The Convolution Theorem
Mix.install([
{:nx_signal, path: __DIR__ |> Path.join("..") |> Path.expand()},
{:kino_vega_lite, "~> 0.1"},
{:tucan, "~> 0.5"}
])
The most important theorem in DSP
Convolution in the time domain is equivalent to pointwise multiplication in the frequency domain, and vice versa:
$$ x[n] * h[n] \;\longleftrightarrow\; X(f) \cdot H(f) $$
$$ x[n] \cdot h[n] \;\longleftrightarrow\; X(f) * H(f) $$
This single fact underpins almost everything in signal processing:
- Filtering: applying a filter $h$ to signal $x$ is just multiplication in the frequency domain, then transform back.
- Fast convolution: direct convolution costs $O(N^2)$; via FFT it costs $O(N \log N)$.
- Spectral leakage: multiplying by a window in time convolves your spectrum with the window’s own Fourier transform (the dual).
What is convolution?
The cleanest way to understand convolution is through the impulse response: a linear time-invariant (LTI) system is completely described by what it does to a unit impulse $\delta[n]$. Call that response $h[n]$. For any input $x[n]$, the output is:
$$ y[n] = \sum_{k} x[k] \, h[n - k] = (x * h)[n] $$
Every sample of $x$ contributes a scaled, delayed copy of $h$; the output is their sum.
# A simple 5-tap box (moving average) filter
h = Nx.broadcast(1.0 / 5, {5}) |> Nx.as_type(:f32)
# An input with impulses at positions 3, 10, and 16 (amplitudes 1, 0.75, 0.5)
x =
Nx.broadcast(0.0, {30})
|> Nx.indexed_put(Nx.tensor([[3], [10], [16]]), Nx.tensor([1.0, 0.75, 0.5]))
y = NxSignal.Convolution.convolve(x, h, mode: :full)
impulse_data =
Enum.zip_with(Enum.to_list(0..29), Nx.to_flat_list(x), fn i, v ->
%{sample: i, value: v, signal: "Input x[n]"}
end)
filter_data =
Enum.zip_with(Enum.to_list(0..4), Nx.to_flat_list(h), fn i, v ->
%{sample: i, value: v, signal: "Filter h[n]"}
end)
output_data =
Enum.zip_with(Enum.to_list(0..(Nx.size(y) - 1)), Nx.to_flat_list(y), fn i, v ->
%{sample: i, value: v, signal: "Output y[n] = x * h"}
end)
all_data = impulse_data ++ filter_data ++ output_data
Tucan.lineplot(all_data, "sample", "value")
|> Tucan.color_by("signal")
|> Tucan.facet_by(:row, "signal")
|> Tucan.Axes.set_x_title("Sample")
|> Tucan.Axes.set_y_title("Amplitude")
|> Tucan.set_title("Convolution as scaled, delayed copies of the impulse response")
|> Tucan.set_width(640)
|> Tucan.set_height(80)
Each impulse in $x$ produces a copy of $h$ at that position, scaled by the impulse’s amplitude. The output $y$ is their superposition.
The theorem in action
The theorem states that two routes exist to compute $y = x * h$:
Route A - time domain: direct convolution (flip-and-slide).
Route B - frequency domain: $Y = \text{IFFT}(X \cdot H)$.
Both must give the same answer. Let’s verify with a real filter.
fs = 8_000
dur = 0.5
n = trunc(fs * dur)
t = Nx.linspace(0, dur, n: n, endpoint: false, type: :f32)
# Signal: 440 Hz tone + 1 800 Hz tone + Gaussian noise
key = Nx.Random.key(42)
{noise, _key} = Nx.Random.normal(key, shape: {n}, type: :f32)
signal =
Nx.sin(Nx.multiply(t, 2 * :math.pi() * 440))
|> Nx.add(Nx.multiply(0.6, Nx.sin(Nx.multiply(t, 2 * :math.pi() * 1800))))
|> Nx.add(Nx.multiply(0.2, noise))
# 51-tap low-pass FIR filter, cutoff at 1 000 Hz
cutoff_norm = 1000.0 / (fs / 2.0)
h_fir = NxSignal.Filters.firwin(51, [cutoff_norm])
# Route A: direct time-domain convolution
route_a = NxSignal.Convolution.convolve(signal, h_fir, mode: :same, method: :direct)
# Route B: FFT-based convolution
# Under the hood NxSignal zero-pads both signals to the next power of two,
# multiplies their FFTs, IFFTs the result, and trims to the requested length:
#
# Y = IFFT(FFT(x, L) · FFT(h, L)), L = next_pow2(N + M - 1)
#
route_b = NxSignal.Convolution.convolve(signal, h_fir, mode: :same, method: :fft)
# Confirm both routes agree
max_diff =
Nx.subtract(route_a, route_b)
|> Nx.abs()
|> Nx.reduce_max()
IO.inspect(Nx.to_number(max_diff), label: "max |Route A − Route B|")
# Visualise: input spectrum, filter response, and output spectrum side by side
# rfft returns div(n, 2) + 1 bins (DC through Nyquist inclusive)
half = div(n, 2) + 1
freqs_hz = Nx.linspace(0, fs / 2.0, n: half, endpoint: true, type: :f32) |> Nx.to_flat_list()
to_amp_list = fn sig ->
sig
|> Nx.rfft()
|> Nx.abs()
|> Nx.to_flat_list()
end
h_padded = Nx.pad(h_fir, 0.0, [{0, n - Nx.size(h_fir), 0}])
h_spectrum = h_padded |> Nx.rfft() |> Nx.abs() |> Nx.to_flat_list()
input_spec =
Enum.zip_with(freqs_hz, to_amp_list.(signal), fn f, a ->
%{frequency: f, amplitude: a, panel: "Input |X(f)|"}
end)
filter_spec =
Enum.zip_with(freqs_hz, h_spectrum, fn f, a ->
%{frequency: f, amplitude: a, panel: "Filter |H(f)|"}
end)
output_spec =
Enum.zip_with(freqs_hz, to_amp_list.(route_a), fn f, a ->
%{frequency: f, amplitude: a, panel: "Output |X(f)·H(f)|"}
end)
spectrum_data = input_spec ++ filter_spec ++ output_spec
Tucan.lineplot(spectrum_data, "frequency", "amplitude")
|> Tucan.facet_by(:row, "panel")
|> Tucan.Axes.set_x_title("Frequency (Hz)")
|> Tucan.Axes.set_y_title("Amplitude")
|> Tucan.set_title("The convolution theorem: three views of filtering")
|> Tucan.set_width(640)
|> Tucan.set_height(100)
The 1 800 Hz component visible in the input spectrum is absent from the output. This is because the filter zeroed it in the frequency domain.
Cross-correlation and delay estimation
Cross-correlation is like convolution but without the time-reversal of $h$:
$$ (x \star h)[n] = \sum_k x[k] \, h[k - n] $$
It measures how similar $x$ and $h$ are as a function of lag $n$. The peak of the cross-correlation identifies the delay between two signals. This technique is used in sonar, seismology, and audio alignment.
fs_corr = 8_000
burst_len = trunc(0.04 * fs_corr) # 40 ms reference burst
total_len = trunc(0.4 * fs_corr) # 400 ms total window
true_delay = 80 # samples ≈ 10 ms
t_burst = Nx.linspace(0, burst_len / fs_corr, n: burst_len, endpoint: false, type: :f32)
reference = Nx.sin(Nx.multiply(t_burst, 2 * :math.pi() * 600))
# Received signal: noise + attenuated, delayed copy of the reference
key2 = Nx.Random.key(7)
{noise2, _} = Nx.Random.normal(key2, shape: {total_len}, type: :f32)
noise2 = Nx.multiply(noise2, 0.4)
rx =
Nx.pad(reference, 0.0, [{true_delay, total_len - burst_len - true_delay, 0}])
|> Nx.multiply(0.7)
|> Nx.add(noise2)
# Cross-correlate received signal with reference
corr = NxSignal.Convolution.correlate(rx, reference, mode: :full)
corr_amps = Nx.abs(corr)
# Lag axis: negative lags first, zero at index burst_len - 1
n_corr = Nx.size(corr_amps)
lags = Enum.map(0..(n_corr - 1), fn i -> i - (burst_len - 1) end)
{_val, peak_idx} = Nx.top_k(corr_amps, k: 1)
detected_delay = Nx.to_number(peak_idx[0]) - (burst_len - 1)
IO.puts("True delay: #{true_delay} samples")
IO.puts("Detected delay: #{detected_delay} samples")
corr_data =
Enum.zip(lags, Nx.to_flat_list(corr_amps))
|> Enum.filter(fn {lag, _} -> lag >= -20 and lag <= 200 end)
|> Enum.map(fn {lag, a} -> %{lag: lag, correlation: a} end)
peak_data = [%{lag: detected_delay, correlation: Nx.to_number(corr_amps[detected_delay + burst_len - 1])}]
VegaLite.new(width: 680, height: 220, title: "Cross-correlation: peak at delay = #{detected_delay} samples")
|> VegaLite.layers([
VegaLite.new()
|> VegaLite.data_from_values(corr_data)
|> VegaLite.mark(:line, color: "steelblue")
|> VegaLite.encode_field(:x, "lag", type: :quantitative, title: "Lag (samples)")
|> VegaLite.encode_field(:y, "correlation", type: :quantitative, title: "|Correlation|"),
VegaLite.new()
|> VegaLite.data_from_values(peak_data)
|> VegaLite.mark(:point, color: "tomato", size: 120, filled: true)
|> VegaLite.encode_field(:x, "lag", type: :quantitative)
|> VegaLite.encode_field(:y, "correlation", type: :quantitative)
])
The correlation peak (red dot) is at exactly the true delay, even though the echo is buried in noise and invisible to the eye in the raw received signal.
Windowing and spectral leakage
The second form of the theorem states that multiplication in time equals convolution in frequency. This is why window functions matter.
When we multiply a signal by a rectangular window (i.e. we simply observe $N$ samples and discard the rest), we convolve its spectrum with a $\text{sinc}$ function, spreading energy from each spectral line into neighbouring bins. A smooth window has a more compact Fourier transform, so the spectral smearing is less severe.
fs_win = 200
n_win = 200
f_tone2 = 10.5
t_win = Nx.linspace(0, n_win / fs_win, n: n_win, endpoint: false, type: :f32)
x_win = Nx.sin(Nx.multiply(t_win, 2 * :math.pi() * f_tone2))
rect_w = NxSignal.Windows.rectangular(n_win, type: :f32)
hann_w = NxSignal.Windows.hann(n_win, is_periodic: false)
half_win = div(n_win, 2)
freqs_win = NxSignal.fft_frequencies(fs_win, fft_length: n_win)[0..half_win] |> Nx.to_flat_list()
dual_data =
[{rect_w, "Rectangular (multiply by 1)"}, {hann_w, "Hann (smooth taper)"}]
|> Enum.flat_map(fn {w, name} ->
amps =
Nx.multiply(x_win, w)
|> Nx.rfft()
|> Nx.abs()
peak = Nx.reduce_max(amps)
db =
Nx.divide(amps, peak)
|> Nx.log10()
|> Nx.multiply(20)
|> Nx.max(-100.0)
|> Nx.to_flat_list()
Enum.zip_with(freqs_win, db, fn f, d -> %{frequency: f, amplitude_db: d, window: name} end)
end)
Tucan.lineplot(dual_data, "frequency", "amplitude_db")
|> Tucan.color_by("window")
|> VegaLite.encode_field(:y, "amplitude_db", type: :quantitative, title: "Amplitude (dB)", scale: [domain: [-100, 5]])
|> Tucan.Axes.set_x_title("Frequency (Hz)")
|> Tucan.set_title("Dual theorem: multiplication in time = convolution in frequency (spectral leakage)")
|> Tucan.set_width(680)
|> Tucan.set_height(260)
Summary
| Function | Method | When to use |
|---|---|---|
Convolution.convolve(x, h, method: :direct) |
$O(N \cdot M)$ | Short kernels ($M \lesssim 50$) |
Convolution.convolve(x, h, method: :fft) |
$O(N \log N)$ | Long kernels or large signals |
Convolution.fftconvolve(x, h) |
$O(N \log N)$ | Convenience alias for FFT method |
Convolution.correlate(x, h) |
Direct or FFT | Delay estimation, matched filtering |
The convolution theorem is the reason method: :fft exists: instead of
$O(N \cdot M)$ multiply-accumulates, you pay three FFTs at $O(N \log N)$ each.
For any kernel longer than roughly 50 taps the FFT method wins.