Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Soundwave plotting example

livebooks/soundwave/soundwave.livemd

Soundwave plotting example

File.cd(__DIR__)
Logger.configure(level: :error)

Mix.install([
  {:membrane_core, "~> 1.0"},
  {:membrane_raw_audio_parser_plugin, "~> 0.4.0"},
  {:membrane_portaudio_plugin, "~> 0.18.3"},
  {:vega_lite, "~> 0.1.8"},
  {:kino_vega_lite, "~> 0.1.11"}
])

Introduction

This livebook example shows how to perform real-time soundwave plotting with the use of the Membrane Framework and Vega-Lite.

By following that example you will learn how to read the audio from the microphone, how is audio represented, and how to create your custom Membrane element that plots the soundwave with the use of the elixir bindings to the Vega-Lite.

Soundwave plotting sink

Since there is no plugin in the Membrane Framework, that already provides an element capable of plotting soundwave, we need to write one on our own. The element, called Visualizer is a sink, placed at the end of a pipeline.

The element has a single :input pad, on which raw audio is expected to appear.

> Raw audio is represented as an array of samples, with each sample describing the amplitude of the sound at a given time. There is a possibility that there are a few samples (from so-called different channels) for the same point in time. In such a case, the samples from different channels (e.g. samples A from the first channel and samples B from the second channel) might be either interleaved (ABABABAB), or put one sequence after the other: (AAAABBBB). > > Each sample is of a particular format, and the format is defined by: > > the type of a number - e.g. f might stand for a float and s might stand for a signed integer > number of bits used to represent a number > * endianness (order of bytes) - specifies the significance of the bytes in the byte sequence (little endian or big endian). > An exemplary sample format might be s16le which stands for a signed integer written on 16 bits, with low endian order of bytes. > > For some intuition on the formats you can take a look at a Membrane.RawAudio.SampleFormat module

Buffers handling

Once a buffer is received, its payload is split into samples, based on sample_format of the Membrane.RawAudio. The amplitude of sound from different channels measured at the same time is averaged. As a result, a list of samples with each sample being an amplitude of sound at a given time is produced.

That list of samples is appended to the list of unprocessed samples stored in the element’s state. Right after that, if there are enough samples, plot function is invoked - and the samples are used to produce points that are put on the plot.

Plotting of the soundwave

Plotting all the audio samples with the typically used frequency (e.g. 44100 Hz) is impossible due to limitations of the plot displaying system. That is why the list of samples is split into several chunks, and for each of these chunks, a sample with maximal and minimal amplitude is found. For each chunk, only these two samples representing a given chunk are later put on the plot, with x value being a given sample timestamp, and y value being a measured amplitude of audio. You can play with @visible_points, @window_duration and @plot_update_frequency attributes to customize the plot.

defmodule Visualizer do
  use Membrane.Sink

  alias Membrane.RawAudio
  alias VegaLite, as: Vl

  require Membrane.Logger

  # The amount of points visible in the chart. The more points, the better chart resolution,
  # but higher CPU consumption.
  @visible_points 1000

  # Last n seconds of audio visible in the chart. Increasing the duration
  # lowers the chart resolution, so you may want to increase @visible_points
  # accordingly.
  @window_duration 3

  # Frequency of plot updates. Doesn't impact the chart resolution.
  @plot_update_frequency 50

  @points_per_update @visible_points / (@window_duration * @plot_update_frequency)

  def_input_pad(:input, accepted_format: %RawAudio{})

  @impl true
  def handle_init(_ctx, _opts) do
    {[], %{chart: nil, pts: nil, initial_pts: nil, samples: []}}
  end

  @impl true
  def handle_setup(_ctx, state) do
    {[], %{state | chart: render_chart()}}
  end

  @impl true
  def handle_buffer(:input, buffer, ctx, state) do
    state = if state.initial_pts == nil, do: %{state | initial_pts: buffer.pts}, else: state
    state = if state.pts == nil, do: %{state | pts: buffer.pts}, else: state
    stream_format = ctx.pads.input.stream_format
    sample_size = RawAudio.sample_size(stream_format)
    sample_max = RawAudio.sample_max(stream_format)

    samples =
      for <> do
        RawAudio.sample_to_value(sample, stream_format) / sample_max
      end
      # we need to make an average out of the samples for all the channels
      |> Enum.chunk_every(stream_format.channels)
      |> Enum.map(&amp;(Enum.sum(&amp;1) / length(&amp;1)))

    state = %{state | samples: samples ++ state.samples}

    samples_per_update = stream_format.sample_rate / @plot_update_frequency

    if length(state.samples) > samples_per_update do
      plot(state.samples, state.pts - state.initial_pts, stream_format.sample_rate, state.chart)
      {[], %{state | samples: [], pts: nil}}
    else
      {[], state}
    end
  end

  defp plot(samples, pts, sample_rate, chart) do
    samples_per_point = ceil(length(samples) / @points_per_update)
    sample_duration = Ratio.new(1, sample_rate) |> Membrane.Time.seconds()

    points =
      samples
      |> Enum.with_index()
      # `*2`, because in each loop run we are producing 2 points
      |> Enum.chunk_every(2 * samples_per_point)
      |> Enum.flat_map(fn point_samples ->
        point_samples
        |> Enum.min_max_by(fn {value, _sample_i} -> value end)
        |> Tuple.to_list()
        |> Enum.map(fn {value, sample_i} ->
          x = (pts + sample_i * sample_duration) |> Membrane.Time.as_milliseconds(:round)
          %{x: x, y: value}
        end)
      end)

    Kino.VegaLite.push_many(chart, points, window: @visible_points)
  end

  defp render_chart() do
    Vl.new(width: 600, height: 400, title: "Amplitude in time")
    |> Vl.mark(:line, point: true)
    |> Vl.encode_field(:x, "x", title: "Time [s]", type: :quantitative)
    |> Vl.encode_field(:y, "y",
      title: "Amplitude",
      type: :quantitative,
      scale: %{domain: [-1.1, 1.1]}
    )
    |> Kino.VegaLite.new()
    |> Kino.render()
  end
end

Pipeline structure

Once we are ready with the Visualizer element, we can set the pipeline up. The pipeline will consist of:

  • a microphone input,
  • a raw audio parser (we need that element to provide timestamps to the buffers),
  • the Visualizer.

All the elements are connected linearly.

import Membrane.ChildrenSpec

spec =
  child(Membrane.PortAudio.Source)
  |> child(%Membrane.RawAudioParser{overwrite_pts?: true})
  |> child(Visualizer)

:ok

Running the pipeline

Finally, we can start the Membrane.RCPipeline (remote-controlled pipeline) and commission spec action execution with the previously created pipeline stucture:

alias Membrane.RCPipeline

pipeline = RCPipeline.start_link!()
RCPipeline.exec_actions(pipeline, spec: spec)

On the plot above you should be able to see the relation between an audio amplitude and time.

You can terminate the pipeline with the following code:

RCPipeline.terminate(pipeline)