Data Types & Messaging - Exercises
Mix.install([{:kino, "~> 0.17.0"}])
Code.require_file("quiz.ex", __DIR__)
Code.require_file("process_viz.ex", __DIR__)
Introduction
Welcome to the hands-on exercises for Data Types & Messaging!
Each section has runnable code cells. Execute them, experiment, and observe what happens!
Term Size and Memory
Exercise 1: Measuring Term Sizes
Goal: Understand the memory footprint of different BEAM data types and predict term sizes.
Task 1.1: Measure Immediate Terms
Immediates fit in one word and require no heap allocation.
Measure = fun(Term) ->
Size = erts_debug:size(Term),
io:format("~p: ~p words~n", [Term, Size]),
{Term, Size}
end,
Measure(ok),
Measure(42),
Measure(error),
Measure(nil).
Observe: All immediate terms show 0 words because they fit in the word itself.
Task 1.2: Measure Boxed Terms
Boxed terms require heap space.
Measure({ok, 42}),
Measure([1, 2, 3]),
Measure(#{a => 1, b => 2}).
Observe: Tuple has overhead (header + elements). Lists use 2 words per element. Maps have additional overhead for keys.
Task 1.3: Compare Large Data Structures
SmallList = lists:seq(1, 10),
LargeList = lists:seq(1, 1000),
SmallMap = maps:from_list([{I, I} || I <- lists:seq(1, 5)]),
LargeMap = maps:from_list([{I, I} || I <- lists:seq(1, 50)]),
io:format("Small list (10 elements): ~p words~n", [erts_debug:size(SmallList)]),
io:format("Large list (1000 elements): ~p words~n", [erts_debug:size(LargeList)]),
io:format("Small map (5 keys): ~p words~n", [erts_debug:size(SmallMap)]),
io:format("Large map (50 keys): ~p words~n", [erts_debug:size(LargeMap)]).
Observe: Lists scale linearly (2 words per element). Maps have overhead that changes at 33 keys (flatmap to HAMT transition).
Discussion: When would you choose a tuple over a list for a fixed collection of 5 items? What about 500 items?
Binary Storage Modes
Exercise 2: Heap vs Reference-Counted Binaries
Goal: Understand when binaries are stored on-heap vs off-heap and the impact on message passing.
Task 2.1: Create Binaries of Different Sizes
SmallBinary = <<"hello">>,
MediumBinary = list_to_binary(lists:duplicate(63, $x)),
LargeBinary = list_to_binary(lists:duplicate(64, $x)),
HugeBinary = crypto:strong_rand_bytes(1024),
io:format("Small (5 bytes): ~p words~n", [erts_debug:size(SmallBinary)]),
io:format("Medium (63 bytes): ~p words~n", [erts_debug:size(MediumBinary)]),
io:format("Large (64 bytes): ~p words~n", [erts_debug:size(LargeBinary)]),
io:format("Huge (1024 bytes): ~p words~n", [erts_debug:size(HugeBinary)]).
Observe: The jump at 64 bytes. Below 64 bytes, binaries live on heap. At 64+ bytes, only a ProcBin wrapper is counted (payload is off-heap).
Task 2.2: Measure Message Sending Cost
Parent = self(),
SendAndMeasure = fun(Payload) ->
{Time, _} = timer:tc(fun() ->
Receiver = spawn(fun() ->
receive
_ -> Parent ! done
end
end),
Receiver ! Payload,
receive done -> ok end
end),
io:format("~p bytes: ~p μs~n", [byte_size(Payload), Time])
end,
SendAndMeasure(<<"small">>),
SendAndMeasure(list_to_binary(lists:duplicate(100, $x))),
SendAndMeasure(list_to_binary(lists:duplicate(10000, $x))).
Observe: Large binaries are cheap to send because only the ProcBin is copied, not the payload.
Discussion: When would you intentionally convert a large data structure to binary before sending it to multiple processes?
Message Copying Costs
Exercise 3: Understanding Message Copying
Goal: Measure the cost of sending different data structures as messages.
Task 3.1: Compare Message Types
Receiver = spawn(fun() ->
Loop = fun LoopFun() ->
receive
stop -> ok;
_ -> LoopFun()
end
end,
Loop()
end),
register(receiver, Receiver),
SendMany = fun(Payload, Count) ->
{Time, _} = timer:tc(fun() ->
[receiver ! Payload || _ <- lists:seq(1, Count)]
end),
Size = erts_debug:size(Payload),
io:format("Sent ~p x ~p words in ~p μs (~.2f μs/msg)~n",
[Count, Size, Time, Time / Count])
end,
SendMany({ok, 42}, 1000),
SendMany(lists:seq(1, 100), 1000),
SendMany(maps:from_list([{I, I} || I <- lists:seq(1, 50)]), 1000),
SendMany(crypto:strong_rand_bytes(1024), 1000).
Observe: Small tuples are fast. Lists and maps copy the entire structure. Large binaries remain fast (only ProcBin copied).
Task 3.2: Deep Nesting Impact
ShallowData = [1, 2, 3, 4, 5],
DeepData = [{user, I, [{email, <<"user@example.com">>}, {status, active}]} || I <- lists:seq(1, 100)],
io:format("Shallow: ~p words~n", [erts_debug:size(ShallowData)]),
io:format("Deep: ~p words~n", [erts_debug:size(DeepData)]),
SendMany(ShallowData, 1000),
SendMany(DeepData, 1000).
Observe: Deeply nested structures have higher copying costs due to recursive walks.
Discussion: How would you redesign a message protocol that currently sends large nested maps to each worker?
Mailbox Scanning
Exercise 4: Selective Receive and Mailbox Performance
Goal: Understand how mailbox scanning works and the importance of message tagging.
Task 4.1: Create Messages with Different Tags
Worker = spawn(fun() ->
Loop = fun LoopFun() ->
receive
{request, Ref, From} ->
From ! {response, Ref, done},
LoopFun();
stop -> ok
end
end,
Loop()
end),
register(worker, Worker).
Task 4.2: Send Tagged Request
Ref = make_ref(),
worker ! {request, Ref, self()},
Response = receive
{response, Ref, Result} -> Result
after
1000 -> timeout
end,
io:format("Received: ~p~n", [Response]).
Observe: The unique reference ensures we match only the intended reply, even if other messages exist.
Task 4.3: Test with Mailbox Pollution
% Add noise to mailbox
[worker ! {request, make_ref(), self()} || _ <- lists:seq(1, 100)],
% Our request should still work
Ref2 = make_ref(),
worker ! {request, Ref2, self()},
Response2 = receive
{response, Ref2, Result} -> Result
after
1000 -> timeout
end,
io:format("Still received: ~p~n", [Response2]).
Observe: Reference tagging allows precise matching despite mailbox clutter.
Discussion: What happens if you use a sequential counter instead of make_ref() for request IDs in a distributed system?
Message Queue Data Modes
Exercise 5: On-Heap vs Off-Heap Queues
Goal: Compare on-heap and off-heap message queue behavior under load.
Task 5.1: Create On-Heap Receiver
OnHeapWorker = spawn(fun() ->
Loop = fun LoopFun() ->
receive
{work, _N} ->
timer:sleep(10),
LoopFun();
stop -> ok
end
end,
Loop()
end),
register(on_heap_worker, OnHeapWorker).
Task 5.2: Create Off-Heap Receiver
OffHeapWorker = spawn(fun() ->
process_flag(message_queue_data, off_heap),
Loop = fun LoopFun() ->
receive
{work, _N} ->
timer:sleep(10),
LoopFun();
stop -> ok
end
end,
Loop()
end),
register(off_heap_worker, OffHeapWorker).
Task 5.3: Send Burst of Messages
SendBurst = fun(Name, Count) ->
[Name ! {work, I} || I <- lists:seq(1, Count)]
end,
SendBurst(on_heap_worker, 500),
SendBurst(off_heap_worker, 500),
timer:sleep(100),
OnHeapInfo = process_info(whereis(on_heap_worker), [memory, message_queue_len]),
OffHeapInfo = process_info(whereis(off_heap_worker), [memory, message_queue_len]),
io:format("On-heap: ~p~n", [OnHeapInfo]),
io:format("Off-heap: ~p~n", [OffHeapInfo]).
Observe: On-heap processes may show higher memory as messages live on the heap. Off-heap processes keep messages separate.
Discussion: When would you configure a GenServer to use off_heap message queues in production?
Binary Construction
Exercise 6: Efficient Binary Building
Goal: Learn to construct binaries efficiently using single allocation.
Task 6.1: Inefficient Concatenation
BuildInefficient = fun(Data) ->
Header = <<1, 2, 3>>,
DataSize = byte_size(Data),
Length = <>,
Header1 = binary:list_to_bin([Header, Length]),
binary:list_to_bin([Header1, Data])
end,
TestData = <<"test payload">>,
{Time1, Result1} = timer:tc(fun() ->
[BuildInefficient(TestData) || _ <- lists:seq(1, 10000)]
end),
io:format("Inefficient: ~p μs~n", [Time1]).
Task 6.2: Efficient Single Allocation
BuildEfficient = fun(Data) ->
DataSize = byte_size(Data),
<<1, 2, 3, DataSize:32, Data/binary>>
end,
{Time2, Result2} = timer:tc(fun() ->
[BuildEfficient(TestData) || _ <- lists:seq(1, 10000)]
end),
io:format("Efficient: ~p μs~n", [Time2]),
io:format("Speedup: ~.2fx~n", [Time1 / Time2]).
Observe: Single allocation is significantly faster than multiple concatenations.
Task 6.3: Using IOLists
BuildIOList = fun(Data) ->
DataSize = byte_size(Data),
IOList = [<<1, 2, 3>>, <>, Data],
iolist_to_binary(IOList)
end,
{Time3, Result3} = timer:tc(fun() ->
[BuildIOList(TestData) || _ <- lists:seq(1, 10000)]
end),
io:format("IOList: ~p μs~n", [Time3]).
Observe: IOLists defer the final binary construction, allowing efficient building.
Discussion: When building HTTP responses with headers and body, which approach would you use and why?
Module 2 Review
Quiz.render_from_file(__DIR__ <> "/module-2-exercises.livemd", quiz: 1)