Notesclub

created by hec & contributors

terms privacy

Data Types & Messaging - Exercises

livebooks/module-2-exercises.livemd

HappiHacking

@HappiHacking

codebeam2025-tutorial

Share to X

Share to Bluesky

More notebooks

Data Types & Messaging - Exercises

Mix.install([{:kino, "~> 0.17.0"}])

Code.require_file("quiz.ex", __DIR__)
Code.require_file("process_viz.ex", __DIR__)

Introduction

Welcome to the hands-on exercises for Data Types & Messaging!

Each section has runnable code cells. Execute them, experiment, and observe what happens!

Term Size and Memory

Exercise 1: Measuring Term Sizes

Goal: Understand the memory footprint of different BEAM data types and predict term sizes.

Task 1.1: Measure Immediate Terms

Immediates fit in one word and require no heap allocation.

Measure = fun(Term) ->
    Size = erts_debug:size(Term),
    io:format("~p: ~p words~n", [Term, Size]),
    {Term, Size}
end,

Measure(ok),
Measure(42),
Measure(error),
Measure(nil).

Observe: All immediate terms show 0 words because they fit in the word itself.

Task 1.2: Measure Boxed Terms

Boxed terms require heap space.

Measure({ok, 42}),
Measure([1, 2, 3]),
Measure(#{a => 1, b => 2}).

Observe: Tuple has overhead (header + elements). Lists use 2 words per element. Maps have additional overhead for keys.

Task 1.3: Compare Large Data Structures

SmallList = lists:seq(1, 10),
LargeList = lists:seq(1, 1000),

SmallMap = maps:from_list([{I, I} || I <- lists:seq(1, 5)]),
LargeMap = maps:from_list([{I, I} || I <- lists:seq(1, 50)]),

io:format("Small list (10 elements): ~p words~n", [erts_debug:size(SmallList)]),
io:format("Large list (1000 elements): ~p words~n", [erts_debug:size(LargeList)]),
io:format("Small map (5 keys): ~p words~n", [erts_debug:size(SmallMap)]),
io:format("Large map (50 keys): ~p words~n", [erts_debug:size(LargeMap)]).

Observe: Lists scale linearly (2 words per element). Maps have overhead that changes at 33 keys (flatmap to HAMT transition).

Discussion: When would you choose a tuple over a list for a fixed collection of 5 items? What about 500 items?

Binary Storage Modes

Exercise 2: Heap vs Reference-Counted Binaries

Goal: Understand when binaries are stored on-heap vs off-heap and the impact on message passing.

Task 2.1: Create Binaries of Different Sizes

SmallBinary = <<"hello">>,
MediumBinary = list_to_binary(lists:duplicate(63, $x)),
LargeBinary = list_to_binary(lists:duplicate(64, $x)),
HugeBinary = crypto:strong_rand_bytes(1024),

io:format("Small (5 bytes): ~p words~n", [erts_debug:size(SmallBinary)]),
io:format("Medium (63 bytes): ~p words~n", [erts_debug:size(MediumBinary)]),
io:format("Large (64 bytes): ~p words~n", [erts_debug:size(LargeBinary)]),
io:format("Huge (1024 bytes): ~p words~n", [erts_debug:size(HugeBinary)]).

Observe: The jump at 64 bytes. Below 64 bytes, binaries live on heap. At 64+ bytes, only a ProcBin wrapper is counted (payload is off-heap).

Task 2.2: Measure Message Sending Cost

Parent = self(),

SendAndMeasure = fun(Payload) ->
    {Time, _} = timer:tc(fun() ->
        Receiver = spawn(fun() ->
            receive
                _ -> Parent ! done
            end
        end),
        Receiver ! Payload,
        receive done -> ok end
    end),
    io:format("~p bytes: ~p μs~n", [byte_size(Payload), Time])
end,

SendAndMeasure(<<"small">>),
SendAndMeasure(list_to_binary(lists:duplicate(100, $x))),
SendAndMeasure(list_to_binary(lists:duplicate(10000, $x))).

Observe: Large binaries are cheap to send because only the ProcBin is copied, not the payload.

Discussion: When would you intentionally convert a large data structure to binary before sending it to multiple processes?

Message Copying Costs

Exercise 3: Understanding Message Copying

Goal: Measure the cost of sending different data structures as messages.

Task 3.1: Compare Message Types

Receiver = spawn(fun() ->
    Loop = fun LoopFun() ->
        receive
            stop -> ok;
            _ -> LoopFun()
        end
    end,
    Loop()
end),
register(receiver, Receiver),

SendMany = fun(Payload, Count) ->
    {Time, _} = timer:tc(fun() ->
        [receiver ! Payload || _ <- lists:seq(1, Count)]
    end),
    Size = erts_debug:size(Payload),
    io:format("Sent ~p x ~p words in ~p μs (~.2f μs/msg)~n",
              [Count, Size, Time, Time / Count])
end,

SendMany({ok, 42}, 1000),
SendMany(lists:seq(1, 100), 1000),
SendMany(maps:from_list([{I, I} || I <- lists:seq(1, 50)]), 1000),
SendMany(crypto:strong_rand_bytes(1024), 1000).

Observe: Small tuples are fast. Lists and maps copy the entire structure. Large binaries remain fast (only ProcBin copied).

Task 3.2: Deep Nesting Impact

ShallowData = [1, 2, 3, 4, 5],
DeepData = [{user, I, [{email, <<"user@example.com">>}, {status, active}]} || I <- lists:seq(1, 100)],

io:format("Shallow: ~p words~n", [erts_debug:size(ShallowData)]),
io:format("Deep: ~p words~n", [erts_debug:size(DeepData)]),

SendMany(ShallowData, 1000),
SendMany(DeepData, 1000).

Observe: Deeply nested structures have higher copying costs due to recursive walks.

Discussion: How would you redesign a message protocol that currently sends large nested maps to each worker?

Mailbox Scanning

Exercise 4: Selective Receive and Mailbox Performance

Goal: Understand how mailbox scanning works and the importance of message tagging.

Task 4.1: Create Messages with Different Tags

Worker = spawn(fun() ->
    Loop = fun LoopFun() ->
        receive
            {request, Ref, From} ->
                From ! {response, Ref, done},
                LoopFun();
            stop -> ok
        end
    end,
    Loop()
end),
register(worker, Worker).

Task 4.2: Send Tagged Request

Ref = make_ref(),
worker ! {request, Ref, self()},

Response = receive
    {response, Ref, Result} -> Result
after
    1000 -> timeout
end,

io:format("Received: ~p~n", [Response]).

Observe: The unique reference ensures we match only the intended reply, even if other messages exist.

Task 4.3: Test with Mailbox Pollution

% Add noise to mailbox
[worker ! {request, make_ref(), self()} || _ <- lists:seq(1, 100)],

% Our request should still work
Ref2 = make_ref(),
worker ! {request, Ref2, self()},

Response2 = receive
    {response, Ref2, Result} -> Result
after
    1000 -> timeout
end,

io:format("Still received: ~p~n", [Response2]).

Observe: Reference tagging allows precise matching despite mailbox clutter.

Discussion: What happens if you use a sequential counter instead of make_ref() for request IDs in a distributed system?

Message Queue Data Modes

Exercise 5: On-Heap vs Off-Heap Queues

Goal: Compare on-heap and off-heap message queue behavior under load.

Task 5.1: Create On-Heap Receiver

OnHeapWorker = spawn(fun() ->
    Loop = fun LoopFun() ->
        receive
            {work, _N} ->
                timer:sleep(10),
                LoopFun();
            stop -> ok
        end
    end,
    Loop()
end),
register(on_heap_worker, OnHeapWorker).

Task 5.2: Create Off-Heap Receiver

OffHeapWorker = spawn(fun() ->
    process_flag(message_queue_data, off_heap),
    Loop = fun LoopFun() ->
        receive
            {work, _N} ->
                timer:sleep(10),
                LoopFun();
            stop -> ok
        end
    end,
    Loop()
end),
register(off_heap_worker, OffHeapWorker).

Task 5.3: Send Burst of Messages

SendBurst = fun(Name, Count) ->
    [Name ! {work, I} || I <- lists:seq(1, Count)]
end,

SendBurst(on_heap_worker, 500),
SendBurst(off_heap_worker, 500),

timer:sleep(100),

OnHeapInfo = process_info(whereis(on_heap_worker), [memory, message_queue_len]),
OffHeapInfo = process_info(whereis(off_heap_worker), [memory, message_queue_len]),

io:format("On-heap: ~p~n", [OnHeapInfo]),
io:format("Off-heap: ~p~n", [OffHeapInfo]).

Observe: On-heap processes may show higher memory as messages live on the heap. Off-heap processes keep messages separate.

Discussion: When would you configure a GenServer to use off_heap message queues in production?

Binary Construction

Exercise 6: Efficient Binary Building

Goal: Learn to construct binaries efficiently using single allocation.

Task 6.1: Inefficient Concatenation

BuildInefficient = fun(Data) ->
    Header = <<1, 2, 3>>,
    DataSize = byte_size(Data),
    Length = <>,
    Header1 = binary:list_to_bin([Header, Length]),
    binary:list_to_bin([Header1, Data])
end,

TestData = <<"test payload">>,
{Time1, Result1} = timer:tc(fun() ->
    [BuildInefficient(TestData) || _ <- lists:seq(1, 10000)]
end),

io:format("Inefficient: ~p μs~n", [Time1]).

Task 6.2: Efficient Single Allocation

BuildEfficient = fun(Data) ->
    DataSize = byte_size(Data),
    <<1, 2, 3, DataSize:32, Data/binary>>
end,

{Time2, Result2} = timer:tc(fun() ->
    [BuildEfficient(TestData) || _ <- lists:seq(1, 10000)]
end),

io:format("Efficient: ~p μs~n", [Time2]),
io:format("Speedup: ~.2fx~n", [Time1 / Time2]).

Observe: Single allocation is significantly faster than multiple concatenations.

Task 6.3: Using IOLists

BuildIOList = fun(Data) ->
    DataSize = byte_size(Data),
    IOList = [<<1, 2, 3>>, <>, Data],
    iolist_to_binary(IOList)
end,

{Time3, Result3} = timer:tc(fun() ->
    [BuildIOList(TestData) || _ <- lists:seq(1, 10000)]
end),

io:format("IOList: ~p μs~n", [Time3]).

Observe: IOLists defer the final binary construction, allowing efficient building.

Discussion: When building HTTP responses with headers and body, which approach would you use and why?

Module 2 Review

Quiz.render_from_file(__DIR__ <> "/module-2-exercises.livemd", quiz: 1)

Other notebooks:

Yejun Su
@goofansu

ogp

ogp

ogp.livemd

tutorial intermediate ogp kino

2022-8-18
@TomBers

livebookNotes

Attractors

attractors.livemd

advanced data-science decimal vega_lite kino

2022-8-18
Kevin Pan
@feng19

spider_man

ElixirJobs

elixirjobs.livemd

tutorial advanced spider_man floki nimble_csv kino

2022-8-18
@TomBers

livebookNotes

Fun with Graphs

graphs.livemd

tutorial advanced intermediate vega_lite kino math

2022-8-18
@DockYard-Academy

curriculum

Math Module Testing

deprecated_math_module_testing.livemd

intermediate testing jason kino youtube hidden_cell

2023-6-5
@DockYard-Academy

curriculum

Strings

strings.livemd

tutorial beginner jason kino youtube hidden_cell

2023-1-21
@alde103

Build_Large_Language_Mode...

Chapter 5: Pretraining on unlabeled data

ch5.livemd

tutorial advanced data-science nx exla axon tiktoken table_rex bumblebee kino_vega_lite

2024-11-26

Back