Powered by AppSignal & Oban Pro

Module 3 – Memory & Garbage Collection - Exercises

livebooks/module-3-exercises.livemd

Module 3 – Memory & Garbage Collection - Exercises

Mix.install([{:kino, "~> 0.17.0"}])
Code.require_file("quiz.ex", __DIR__)
Code.require_file("process_viz.ex", __DIR__)

Introduction

Welcome to the hands-on exercises for Module 3 – Memory & Garbage Collection!

Each section has runnable code cells. Execute them, experiment, and observe what happens!

Term Size and Memory

Process snapshot

:erlang.process_info(self(), [:memory, :heap_size, :total_heap_size, :stack_size, :message_queue_len])

Create a tuple-heavy workload vs list-heavy workload; watch :heap_size and :total_heap_size shift.

Mailbox growth & GC

# Enqueue many small messages
pid = self()
spawn(fn -> for i <- 1..100_000, do: send(pid, {:ping, i}) end)

:erlang.process_info(self(), :message_queue_len)
# Drain them
receive do _ -> :ok end  # wrap in a loop in practice
# Then force a GC and re-check memory
:erlang.garbage_collect(self())
:erlang.process_info(self(), :memory)

X/Y registers in disassembly

If you have dev tools available:

% Erlang shell:
erts_debug:df(your_module).  % Disassemble; look for X0..Xn and Y slots

This shows BEAM instructions using X (arguments/temporaries) and Y (stack locals).

Process Memory Inspection

Exercise 1: Process Memory Layout Exploration

Goal: Understand how heap, stack, and mailbox memory are measured and tracked

Task 1.1: Baseline Memory Snapshot
% Get initial memory state
InitialInfo = process_info(self(), [memory, heap_size, stack_size,
                                     total_heap_size, message_queue_len]),
io:format("Initial state: ~p~n", [InitialInfo]).
Task 1.2: Allocate Heap Data
% Create heap data
TupleData = list_to_tuple(lists:seq(1, 1000)),
ListData = lists:seq(1, 1000),
MapData = maps:from_list([{I, I*2} || I <- lists:seq(1, 100)]),

% Measure heap growth
AfterAlloc = process_info(self(), [memory, heap_size]),
io:format("After allocation: ~p~n", [AfterAlloc]).

Observe: Heap size grows to accommodate boxed terms. Memory includes both heap and other process overhead.

Task 1.3: Message Queue Impact
% Send messages to self
[self() ! {msg, I} || I <- lists:seq(1, 100)],

WithMessages = process_info(self(), [memory, message_queue_len]),
io:format("With 100 messages: ~p~n", [WithMessages]),

% Drain mailbox
FlushMessages = fun F() ->
    receive _ -> F()
    after 0 -> ok
    end
end,
FlushMessages(),

AfterDrain = process_info(self(), [memory, message_queue_len]),
io:format("After drain: ~p~n", [AfterDrain]).

Observe: Messages add to process memory. Draining doesn’t immediately free memory until GC runs.

Discussion: In a GenServer with 10,000 queued messages, what happens to memory? When does it get reclaimed?

Binary Storage and Leaks

Exercise 2: Binary Threshold and Sub-Binary Retention

Goal: Observe the 64-byte threshold and demonstrate sub-binary leaks

Task 2.1: Heap vs Refc Binary Transition
% Test binaries around the 64-byte threshold
SmallBin = list_to_binary(lists:duplicate(63, $x)),
LargeBin = list_to_binary(lists:duplicate(64, $x)),

SmallSize = erts_debug:size(SmallBin),
LargeSize = erts_debug:size(LargeBin),

io:format("63 bytes: ~p words~n", [SmallSize]),
io:format("64 bytes: ~p words~n", [LargeSize]),
io:format("Size jump: ~p -> ~p~n", [SmallSize, LargeSize]).

Observe: At 64 bytes, the size drops because only the ProcBin wrapper is counted, not the off-heap payload.

Task 2.2: Simulate Sub-Binary Leak
% Create large binary
HugeBinary = crypto:strong_rand_bytes(1000000),  % 1MB

% Extract tiny slice (creates sub-binary)
<> = HugeBinary,

% Check binary memory
BeforeCopy = erlang:memory(binary),
io:format("Binary memory with sub-binary: ~p bytes~n", [BeforeCopy]),

% The sub-binary keeps the entire 1MB alive!
% Fix by copying
IndependentHeader = binary:copy(Header),

% Let original go out of scope
HugeBinary2 = undefined,
erlang:garbage_collect(),

AfterCopy = erlang:memory(binary),
io:format("Binary memory after copy+GC: ~p bytes~n", [AfterCopy]),
io:format("Reclaimed: ~p bytes~n", [BeforeCopy - AfterCopy]).

Observe: The sub-binary reference keeps the entire 1MB payload alive until you copy the slice.

Discussion: When processing large files line-by-line, should you copy each line extracted from the file binary? What’s the trade-off?

Generational GC Triggers

Exercise 3: Minor vs Major Garbage Collection

Goal: Observe when minor and major GCs trigger and their costs

Task 3.1: Baseline GC Stats
% Get initial GC counts
% Get the garbage_collection tuple for process-specific minor GCs
{garbage_collection, GCInfo} = process_info(self(), garbage_collection),

% Extract minor_gcs from process info
InitialMinor = proplists:get_value(minor_gcs, GCInfo),

% Get major_gcs from system-wide statistics
{_, InitialMajor, _} = erlang:statistics(garbage_collection),

io:format("Initial GCs - Minor (process): ~p, Major (system): ~p~n", [InitialMinor, InitialMajor]).
Task 3.2: Trigger Minor GCs with Transient Data
% Allocate and immediately discard (triggers minor GCs)
[begin
    Temp = lists:seq(1, 1000),
    length(Temp)
end || _ <- lists:seq(1, 100)],

% Get process-specific minor GCs
{garbage_collection, GCInfoAfter} = process_info(self(), garbage_collection),
MinorAfter = proplists:get_value(minor_gcs, GCInfoAfter),

% Get system-wide major GCs
{_, MajorAfter, _} = erlang:statistics(garbage_collection),

io:format("After transient work - Minor (process): ~p (+~p), Major (system): ~p (+~p)~n",
          [MinorAfter, MinorAfter - InitialMinor,
           MajorAfter, MajorAfter - InitialMajor]).

Observe: Multiple minor GCs triggered. No major GCs because data doesn’t survive.

Task 3.3: Force Major GC
% Accumulate long-lived data
Accumulated = lists:foldl(fun(I, Acc) ->
    [{I, lists:seq(1, 100)} | Acc]
end, [], lists:seq(1, 1000)),

% Get process-specific minor GCs before
{garbage_collection, GCInfoBefore} = process_info(self(), garbage_collection),
MinorBefore = proplists:get_value(minor_gcs, GCInfoBefore),

% Get system-wide major GCs before
{_, MajorBefore, _} = erlang:statistics(garbage_collection),

% Force full sweep
erlang:garbage_collect(),

% Get process-specific minor GCs after
{garbage_collection, GCInfoFinal} = process_info(self(), garbage_collection),
MinorFinal = proplists:get_value(minor_gcs, GCInfoFinal),

% Get system-wide major GCs after
{_, MajorFinal, _} = erlang:statistics(garbage_collection),

io:format("After major GC - Minor (process): ~p, Major (system): ~p (+~p)~n",
          [MinorFinal, MajorFinal, MajorFinal - MajorBefore]),

% Keep reference so data isn't optimized away
length(Accumulated).

Observe: Major GC count increments. Both young and old generations collected.

Discussion: Why does BEAM default to running a major GC every 65535 minor GCs? What would happen if you never ran major GCs?

Memory Leak Patterns

Exercise 4: Common Memory Leak Scenarios

Goal: Identify and fix typical memory leak patterns in BEAM

Task 4.1: Mailbox Bloat Simulation
% Slow consumer process
SlowConsumer = spawn(fun Loop() ->
    receive
        {work, _Data} ->
            timer:sleep(100),  % Slow processing
            Loop();
        {stats, From} ->
            Info = process_info(self(), [message_queue_len, memory]),
            From ! {stats, Info},
            Loop();
        stop -> ok
    end
end),

% Fast producer
[SlowConsumer ! {work, lists:seq(1, 100)} || _ <- lists:seq(1, 500)],

timer:sleep(100),

SlowConsumer ! {stats, self()},
receive
    {stats, Stats} ->
        io:format("Slow consumer stats: ~p~n", [Stats])
after 1000 -> timeout
end,

SlowConsumer ! stop.

Observe: Message queue grows faster than consumer can process. Memory keeps climbing.

Task 4.2: Binary Reference Accumulation
% Relay process that touches binaries without storing them
RelayWithoutGC = spawn(fun Loop(Count) ->
    receive
        {relay, Bin} ->
            % Just forward it, don't store
            % But ProcBin reference stays until GC
            byte_size(Bin),  % Touch the binary
            Loop(Count + 1);
        {count, From} ->
            BinInfo = process_info(self(), [binary, memory]),
            From ! {relay_stats, Count, BinInfo},
            Loop(Count)
    end
end),

% Send many large binaries through relay
[begin
    BigBin = crypto:strong_rand_bytes(10000),
    RelayWithoutGC ! {relay, BigBin}
end || _ <- lists:seq(1, 100)],

timer:sleep(50),

RelayWithoutGC ! {count, self()},
receive
    {relay_stats, Count, BinInfo} ->
        io:format("Relay relayed ~p messages~n", [Count]),
        io:format("Binary memory held: ~p~n", [BinInfo])
after 1000 -> timeout
end.

Observe: Even though relay doesn’t store binaries, it accumulates ProcBin references until GC.

Discussion: How would you fix this relay pattern? When should you call erlang:garbage_collect() or use hibernate?

GC Tuning and Hibernation

Exercise 5: Tuning GC Behavior

Goal: Learn to control GC behavior with process flags

Task 5.1: Set Minimum Heap Size
% Spawn with small initial heap
SmallHeap = spawn_opt(fun() ->
    Data = lists:seq(1, 10000),
    receive after 5000 -> length(Data) end
end, [{min_heap_size, 100}]),

% Spawn with large initial heap
LargeHeap = spawn_opt(fun() ->
    Data = lists:seq(1, 10000),
    receive after 5000 -> length(Data) end
end, [{min_heap_size, 10000}]),

timer:sleep(100),

% Get heap_size and garbage_collection info
[{heap_size, SmallHeapSize}, {garbage_collection, SmallGCInfo}] =
    process_info(SmallHeap, [heap_size, garbage_collection]),
SmallMinorGCs = proplists:get_value(minor_gcs, SmallGCInfo),

[{heap_size, LargeHeapSize}, {garbage_collection, LargeGCInfo}] =
    process_info(LargeHeap, [heap_size, garbage_collection]),
LargeMinorGCs = proplists:get_value(minor_gcs, LargeGCInfo),

io:format("Small heap process - heap_size: ~p, minor_gcs: ~p~n",
          [SmallHeapSize, SmallMinorGCs]),
io:format("Large heap process - heap_size: ~p, minor_gcs: ~p~n",
          [LargeHeapSize, LargeMinorGCs]).

Observe: Larger initial heap reduces early GC pressure.

Task 5.2: Hibernation for Memory Reduction
% Demonstrate hibernation memory reduction in a spawned process
Parent = self(),

% Spawn a process that will hibernate - store the PID
TestPid = spawn(fun() ->
    % Allocate significant data
    BigData = lists:seq(1, 10000),
    Map = maps:from_list([{I, lists:seq(1, 100)} || I <- lists:seq(1, 100)]),

    % Force the data to be used (prevents optimization)
    _ = length(BigData),
    _ = map_size(Map),

    % Report memory before hibernation
    BeforeStats = process_info(self(), [heap_size, total_heap_size, memory]),
    Parent ! {before_hibernate, BeforeStats},

    % Wait for hibernate command
    receive
        hibernate_now ->
            % This function will be called when process wakes up
            WakeupFun = fun() ->
                AfterStats = process_info(self(), [heap_size, total_heap_size, memory]),
                Parent ! {after_hibernate, AfterStats},
                % Keep alive to allow inspection
                receive done -> ok end
            end,
            % Hibernate - this never returns, it replaces the call stack
            erlang:hibernate(erlang, apply, [WakeupFun, []])
    end
end),

% Collect before stats
receive
    {before_hibernate, BeforeStats} ->
        io:format("Before hibernate: ~p~n", [BeforeStats])
after 2000 ->
    io:format("Timeout waiting for before stats~n")
end,

% Tell process to hibernate
TestPid ! hibernate_now,

% Wake it up by sending any message (hibernating processes wake on any message)
timer:sleep(100),
TestPid ! wakeup,

% Collect after stats
receive
    {after_hibernate, AfterStats} ->
        io:format("After hibernate: ~p~n", [AfterStats])
after 2000 ->
    io:format("Timeout waiting for after stats~n")
end,

% Clean up
TestPid ! done.

Observe: Hibernation compacts heap and releases old generation, dramatically reducing memory.

Discussion: When should a GenServer use {:noreply, state, :hibernate}? What’s the cost?

Module 3 Review

Quiz.render_from_file(__DIR__ <> "/module-3-exercises.livemd", quiz: 1)