Binary construction
Introduction
Erlang provides heavy runtime optimisations for building and matching binaries, as outlined in in the this guide.
Consider the following example:
x = <<0>>
x = <>
x = <>
x = <>
<<0, 1, 2, 3>>
The first time we append something to the binary, a refc binary is created with additional space allocated. This way, subsequent appends don’t involve copying the original binary and instead use the preallocated space.
Another possible way of accumulating the binary would be by prepending the bytes, as in:
x = <<3>>
x = <<2, x::binary>>
x = <<1, x::binary>>
x = <<0, x::binary>>
<<0, 1, 2, 3>>
However, since the additional buffer is allocated on the right of the original binary, this approach doesn’t leverage the same optimisations.
Relevance
This difference matters when defining a recursive function that builds a binary result. There are two primary ways of defining such function.
The first approach is to use an accumulator and tail recursion:
defp build(acc, ...), do: build(<>, ...)
On the other hand, as with lists, it may be tempting to use body recursion instead, which possibly makes the code more intuitive:
defp build(acc, ...), do: ... <> build(...)
In the first approach we append to a binary, while in the second we keep prepending while folding the recursive result.
Goal
In this notebook we compare how the two approaches differ in performance.
Setup
We will use benchee
for our
measurements.
Mix.install([
{:benchee, "~> 1.1.0"}
])
:ok
Benchmark
As our benchmark task, we want to build a binary of the given size by adding bytes one by one.
Below we define four functions, which combine the following properties:
-
the byte order in the final result, either left-to-right (
_lr
) or right-to-left (_rl
) -
whether the function uses tail recursion with an accumulator (
_acc
) or body recursion (_noacc
)
We are primarily interested in the difference between build_lr_acc
and build_lr_noacc
, since they directly represent the approaches
outlined above. The _rl
alternatives are included for cases where the
binary needs to be built right-to-left
defmodule BinaryTest do
def build_lr_acc(size), do: build_lr_acc(<<>>, size)
defp build_lr_acc(acc, 0), do: acc
defp build_lr_acc(acc, size), do: build_lr_acc(<>, size - 1)
def build_lr_noacc(0), do: <<>>
def build_lr_noacc(size), do: <> <> build_lr_noacc(size - 1)
def build_rl_acc(size), do: build_rl_acc(<<>>, size)
defp build_rl_acc(acc, 0), do: acc
defp build_rl_acc(acc, size), do: build_rl_acc(<>, size - 1)
def build_rl_noacc(0), do: <<>>
def build_rl_noacc(size), do: build_rl_noacc(size - 1) <> <>
end
{:module, BinaryTest, <<70, 79, 82, 49, 0, 0, 9, ...>>, {:build_rl_noacc, 1}}
results =
Benchee.run(
%{
"left-to-right with acc" => fn size -> BinaryTest.build_lr_acc(size) end,
"left-to-right without acc" => fn size -> BinaryTest.build_lr_noacc(size) end,
"right-to-left with acc" => fn size -> BinaryTest.build_rl_acc(size) end,
"right-to-left without acc" => fn size -> BinaryTest.build_rl_noacc(size) end
},
inputs: %{
"1. size 10" => 10,
"2. size 30" => 30,
"3. size 60" => 60,
"4. size 70" => 70,
"5. size 100" => 100,
"6. size 1_000" => 1_000,
"7. size 100_000" => 100_000
}
)
:ok
Operating System: Linux
CPU Information: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz
Number of Available Cores: 4
Available memory: 30.79 GB
Elixir 1.13.0
Erlang 24.1.2
Benchmark suite executing with the following configuration:
warmup: 2 s
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: 1. size 10, 2. size 30, 3. size 60, 4. size 70, 5. size 100, 6. size 1_000, 7. size 100_000
Estimated total run time: 3.27 min
Benchmarking left-to-right with acc with input 1. size 10 ...
Benchmarking left-to-right with acc with input 2. size 30 ...
Benchmarking left-to-right with acc with input 3. size 60 ...
Benchmarking left-to-right with acc with input 4. size 70 ...
Benchmarking left-to-right with acc with input 5. size 100 ...
Benchmarking left-to-right with acc with input 6. size 1_000 ...
Benchmarking left-to-right with acc with input 7. size 100_000 ...
Benchmarking left-to-right without acc with input 1. size 10 ...
Benchmarking left-to-right without acc with input 2. size 30 ...
Benchmarking left-to-right without acc with input 3. size 60 ...
Benchmarking left-to-right without acc with input 4. size 70 ...
Benchmarking left-to-right without acc with input 5. size 100 ...
Benchmarking left-to-right without acc with input 6. size 1_000 ...
Benchmarking left-to-right without acc with input 7. size 100_000 ...
Benchmarking right-to-left with acc with input 1. size 10 ...
Benchmarking right-to-left with acc with input 2. size 30 ...
Benchmarking right-to-left with acc with input 3. size 60 ...
Benchmarking right-to-left with acc with input 4. size 70 ...
Benchmarking right-to-left with acc with input 5. size 100 ...
Benchmarking right-to-left with acc with input 6. size 1_000 ...
Benchmarking right-to-left with acc with input 7. size 100_000 ...
Benchmarking right-to-left without acc with input 1. size 10 ...
Benchmarking right-to-left without acc with input 2. size 30 ...
Benchmarking right-to-left without acc with input 3. size 60 ...
Benchmarking right-to-left without acc with input 4. size 70 ...
Benchmarking right-to-left without acc with input 5. size 100 ...
Benchmarking right-to-left without acc with input 6. size 1_000 ...
Benchmarking right-to-left without acc with input 7. size 100_000 ...
##### With input 1. size 10 #####
Name ips average deviation median 99th %
left-to-right without acc 624.73 K 1.60 μs ±2021.43% 1.02 μs 3.40 μs
right-to-left with acc 621.02 K 1.61 μs ±2303.97% 0.99 μs 3.39 μs
right-to-left without acc 562.83 K 1.78 μs ±1964.14% 1.09 μs 4.35 μs
left-to-right with acc 515.46 K 1.94 μs ±1976.17% 1.10 μs 4.68 μs
Comparison:
left-to-right without acc 624.73 K
right-to-left with acc 621.02 K - 1.01x slower +0.00957 μs
right-to-left without acc 562.83 K - 1.11x slower +0.176 μs
left-to-right with acc 515.46 K - 1.21x slower +0.34 μs
##### With input 2. size 30 #####
Name ips average deviation median 99th %
right-to-left with acc 394.82 K 2.53 μs ±1258.75% 1.66 μs 5.79 μs
left-to-right without acc 356.26 K 2.81 μs ±1142.61% 1.82 μs 6.05 μs
left-to-right with acc 327.81 K 3.05 μs ±1483.31% 1.74 μs 7.77 μs
right-to-left without acc 327.50 K 3.05 μs ±1193.91% 1.81 μs 8.87 μs
Comparison:
right-to-left with acc 394.82 K
left-to-right without acc 356.26 K - 1.11x slower +0.27 μs
left-to-right with acc 327.81 K - 1.20x slower +0.52 μs
right-to-left without acc 327.50 K - 1.21x slower +0.52 μs
##### With input 3. size 60 #####
Name ips average deviation median 99th %
right-to-left with acc 260.50 K 3.84 μs ±731.31% 2.60 μs 13.21 μs
left-to-right with acc 242.14 K 4.13 μs ±614.73% 2.73 μs 18.29 μs
left-to-right without acc 240.02 K 4.17 μs ±694.06% 2.90 μs 15.76 μs
right-to-left without acc 238.66 K 4.19 μs ±607.83% 2.82 μs 20.16 μs
Comparison:
right-to-left with acc 260.50 K
left-to-right with acc 242.14 K - 1.08x slower +0.29 μs
left-to-right without acc 240.02 K - 1.09x slower +0.33 μs
right-to-left without acc 238.66 K - 1.09x slower +0.35 μs
##### With input 4. size 70 #####
Name ips average deviation median 99th %
right-to-left without acc 215.12 K 4.65 μs ±592.67% 3.16 μs 20.55 μs
left-to-right with acc 211.87 K 4.72 μs ±615.95% 3.06 μs 20.38 μs
right-to-left with acc 167.12 K 5.98 μs ±502.01% 3.52 μs 23.25 μs
left-to-right without acc 156.37 K 6.40 μs ±475.24% 3.86 μs 24.04 μs
Comparison:
right-to-left without acc 215.12 K
left-to-right with acc 211.87 K - 1.02x slower +0.0713 μs
right-to-left with acc 167.12 K - 1.29x slower +1.34 μs
left-to-right without acc 156.37 K - 1.38x slower +1.75 μs
##### With input 5. size 100 #####
Name ips average deviation median 99th %
right-to-left without acc 170.82 K 5.85 μs ±458.18% 4.18 μs 22.33 μs
left-to-right with acc 169.88 K 5.89 μs ±466.18% 4.03 μs 22.37 μs
right-to-left with acc 75.00 K 13.33 μs ±312.55% 6.97 μs 102.85 μs
left-to-right without acc 74.50 K 13.42 μs ±288.92% 7.42 μs 103.32 μs
Comparison:
right-to-left without acc 170.82 K
left-to-right with acc 169.88 K - 1.01x slower +0.0323 μs
right-to-left with acc 75.00 K - 2.28x slower +7.48 μs
left-to-right without acc 74.50 K - 2.29x slower +7.57 μs
##### With input 6. size 1_000 #####
Name ips average deviation median 99th %
left-to-right with acc 24.48 K 40.84 μs ±36.41% 34.34 μs 90.51 μs
right-to-left without acc 23.38 K 42.78 μs ±36.03% 36.44 μs 89.42 μs
right-to-left with acc 4.49 K 222.76 μs ±39.60% 195.33 μs 507.30 μs
left-to-right without acc 4.19 K 238.94 μs ±37.44% 204.73 μs 513.32 μs
Comparison:
left-to-right with acc 24.48 K
right-to-left without acc 23.38 K - 1.05x slower +1.94 μs
right-to-left with acc 4.49 K - 5.45x slower +181.92 μs
left-to-right without acc 4.19 K - 5.85x slower +198.10 μs
##### With input 7. size 100_000 #####
Name ips average deviation median 99th %
left-to-right with acc 250.56 3.99 ms ±23.36% 3.70 ms 7.13 ms
right-to-left without acc 100.25 9.98 ms ±22.40% 9.45 ms 16.96 ms
right-to-left with acc 3.78 264.74 ms ±3.33% 262.14 ms 287.95 ms
left-to-right without acc 0.36 2767.05 ms ±1.68% 2767.05 ms 2799.94 ms
Comparison:
left-to-right with acc 250.56
right-to-left without acc 100.25 - 2.50x slower +5.98 ms
right-to-left with acc 3.78 - 66.33x slower +260.75 ms
left-to-right without acc 0.36 - 693.31x slower +2763.06 ms
:ok
For smaller binaries there is no significant difference between any of the functions. The difference starts to manifest itself when we compare the results for size-60 and size-70 binaries, and this is not a coincidence! Binaries over 64 bytes are always allocated outside the process heap and involve reference counting, so it seems fair that copying those has more overhead.
For size-1000 binaries we can see that functions appending to the binary turned out 5 times faster and the difference escalate as the binary grows.
tl;dr
As a rule of thumb,
-
when building binaries left-to-right use a tail-recursive function with an accumulator
defp build(acc, ...), do: build(<>, ...)
-
when building binaries right-to-left use a body-recursive function
defp build(acc, ...), do: build(...) <> ...
When accumulating binaries below 64 bytes, the approach doesn’t really matter.
Final notes
When constructing large binaries it is crucial to leverage the Erlang runtime optimisations to avoid copying the binary over and over. In this notebook we compared append- and prepend- based approaches and came up with a guideline to ensure the best performance.
References: