Soundwave plotting example
File.cd(__DIR__)
Logger.configure(level: :error)
Mix.install([
{:membrane_core, "~> 1.0"},
{:membrane_raw_audio_parser_plugin, "~> 0.4.0"},
{:membrane_portaudio_plugin, "~> 0.19.2"},
{:vega_lite, "~> 0.1.9"},
{:kino_vega_lite, "~> 0.1.11"}
])
Introduction
This livebook example shows how to perform real-time soundwave plotting with the use of the Membrane Framework and Vega-Lite.
By following that example you will learn how to read the audio from the microphone, how is audio represented, and how to create your custom Membrane element that plots the soundwave with the use of the elixir bindings to the Vega-Lite.
Soundwave plotting sink
Since there is no plugin in the Membrane Framework
, that already provides an element capable of plotting soundwave, we need to write one on our own.
The element, called Visualizer
is a sink, placed at the end of a pipeline.
The element has a single :input
pad, on which raw audio is expected to appear.
> Raw audio is represented as an array of samples, with each sample describing the amplitude of the sound at a given time. There is a possibility that there are a few samples (from so-called different channels) for the same point in time. In such a case, the samples from different channels (e.g. samples A
from the first channel and samples B
from the second channel) might be either interleaved (ABABABAB
), or put one sequence after the other: (AAAABBBB
).
>
> Each sample is of a particular format, and the format is defined by:
>
> the type of a number - e.g. f
might stand for a float
and s
might stand for a signed
integer
> number of bits used to represent a number
> * endianness (order of bytes) - specifies the significance of the bytes in the byte sequence (little endian or big endian).
> An exemplary sample format might be s16le
which stands for a signed integer written on 16 bits, with low endian order of bytes.
>
> For some intuition on the formats you can take a look at a Membrane.RawAudio.SampleFormat
module
Buffers handling
Once a buffer is received, its payload is split into samples, based on sample_format
of the Membrane.RawAudio
. The amplitude of sound from different channels measured at the same time is averaged. As a result, a list of samples with each sample being an amplitude of sound at a given time is produced.
That list of samples is appended to the list of unprocessed samples stored in the element’s state. Right after that, if there are enough samples, plot
function is invoked - and the samples are used to produce points that are put on the plot.
Plotting of the soundwave
Plotting all the audio samples with the typically used frequency (e.g. 44100 Hz
) is impossible due to limitations of the plot displaying system. That is why the list of samples is split into several chunks, and for each of these chunks, a sample with maximal
and minimal
amplitude is found. For each chunk, only these two samples representing a given chunk are later put on the plot, with x
value being a given sample timestamp, and y
value being a measured amplitude of audio. You can play with @visible_points
, @window_duration
and @plot_update_frequency
attributes to customize the plot.
defmodule Visualizer do
use Membrane.Sink
alias Membrane.RawAudio
alias VegaLite, as: Vl
require Membrane.Logger
# The amount of points visible in the chart. The more points, the better chart resolution,
# but higher CPU consumption.
@visible_points 1000
# Last n seconds of audio visible in the chart. Increasing the duration
# lowers the chart resolution, so you may want to increase @visible_points
# accordingly.
@window_duration 3
# Frequency of plot updates. Doesn't impact the chart resolution.
@plot_update_frequency 50
@points_per_update @visible_points / (@window_duration * @plot_update_frequency)
def_input_pad(:input, accepted_format: %RawAudio{})
@impl true
def handle_init(_ctx, _opts) do
{[], %{chart: nil, pts: nil, initial_pts: nil, samples: []}}
end
@impl true
def handle_setup(_ctx, state) do
{[], %{state | chart: render_chart()}}
end
@impl true
def handle_buffer(:input, buffer, ctx, state) do
state = if state.initial_pts == nil, do: %{state | initial_pts: buffer.pts}, else: state
state = if state.pts == nil, do: %{state | pts: buffer.pts}, else: state
stream_format = ctx.pads.input.stream_format
sample_size = RawAudio.sample_size(stream_format)
sample_max = RawAudio.sample_max(stream_format)
samples =
for <> do
RawAudio.sample_to_value(sample, stream_format) / sample_max
end
# we need to make an average out of the samples for all the channels
|> Enum.chunk_every(stream_format.channels)
|> Enum.map(&(Enum.sum(&1) / length(&1)))
state = %{state | samples: samples ++ state.samples}
samples_per_update = stream_format.sample_rate / @plot_update_frequency
if length(state.samples) > samples_per_update do
plot(state.samples, state.pts - state.initial_pts, stream_format.sample_rate, state.chart)
{[], %{state | samples: [], pts: nil}}
else
{[], state}
end
end
defp plot(samples, pts, sample_rate, chart) do
samples_per_point = ceil(length(samples) / @points_per_update)
sample_duration = Ratio.new(1, sample_rate) |> Membrane.Time.seconds()
points =
samples
|> Enum.with_index()
# `*2`, because in each loop run we are producing 2 points
|> Enum.chunk_every(2 * samples_per_point)
|> Enum.flat_map(fn point_samples ->
point_samples
|> Enum.min_max_by(fn {value, _sample_i} -> value end)
|> Tuple.to_list()
|> Enum.map(fn {value, sample_i} ->
x = (pts + sample_i * sample_duration) |> Membrane.Time.as_milliseconds(:round)
%{x: x, y: value}
end)
end)
Kino.VegaLite.push_many(chart, points, window: @visible_points)
end
defp render_chart() do
Vl.new(width: 600, height: 400, title: "Amplitude in time")
|> Vl.mark(:line, point: true)
|> Vl.encode_field(:x, "x", title: "Time [s]", type: :quantitative)
|> Vl.encode_field(:y, "y",
title: "Amplitude",
type: :quantitative,
scale: %{domain: [-1.1, 1.1]}
)
|> Kino.VegaLite.new()
|> Kino.render()
end
end
Pipeline structure
Once we are ready with the Visualizer
element, we can set the pipeline up.
The pipeline will consist of:
- a microphone input,
- a raw audio parser (we need that element to provide timestamps to the buffers),
-
the
Visualizer
.
All the elements are connected linearly.
import Membrane.ChildrenSpec
spec =
child(Membrane.PortAudio.Source)
|> child(%Membrane.RawAudioParser{overwrite_pts?: true})
|> child(Visualizer)
:ok
Running the pipeline
Finally, we can start the Membrane.RCPipeline
(remote-controlled pipeline) and commission spec
action execution with the previously created pipeline stucture:
alias Membrane.RCPipeline
pipeline = RCPipeline.start_link!()
RCPipeline.exec_actions(pipeline, spec: spec)
On the plot above you should be able to see the relation between an audio amplitude and time.
You can terminate the pipeline with the following code:
RCPipeline.terminate(pipeline)