Powered by AppSignal & Oban Pro

ExZarr 01.01 — Your first Zarr array (create → write → read → stream → save → open)

01_01_first_zarr_array.livemd

ExZarr 01.01 — Your first Zarr array (create → write → read → stream → save → open)

This Livebook is designed to run inside the ExZarr repo.

If you opened it from livebooks/01_core_zarr/, it will use the local path dependency:

Mix.install([{:ex_zarr, path: ".."}])

Setup

Mix.install([{:ex_zarr, path: ".."}])

alias ExZarr.Array
alias ExZarr.Gallery.{Pack, SampleData, Metrics}

1) Create a 2D array in memory

We’ll create a 1000x1000 :int32 array, chunked as 100x100.

{:ok, a} =
  Array.create(
    shape: {1000, 1000},
    chunks: {100, 100},
    dtype: :int32,
    compressor: :zstd,
    storage: :memory
  )

%{shape: a.shape, chunks: a.chunks, dtype: a.dtype, compressor: a.compressor}

2) Write a 10x10 slice

ExZarr expects row-major binary data. For :int32 each value is 4 bytes.

We’ll write values 1..100 into the top-left 10x10 region.

data = Pack.pack(Enum.to_list(1..100), :int32)

:ok =
  Array.set_slice(a, data,
    start: {0, 0},
    stop: {10, 10}
  )

:ok

3) Read the slice back

{:ok, bin} =
  Array.get_slice(a,
    start: {0, 0},
    stop: {10, 10}
  )

vals = Pack.unpack(bin, :int32)

# show the first 20 values
Enum.take(vals, 20)

4) Write a larger region with a pattern (for chunk demos)

We’ll write a 200x200 region. Values follow r*1000 + c.

rows = 200
cols = 200

matrix = SampleData.matrix(rows, cols)
bin2 = Pack.pack(matrix, :int32)

:ok =
  Array.set_slice(a, bin2,
    start: {0, 0},
    stop: {rows, cols}
  )

:ok

5) Chunk streaming (sequential)

Now that we wrote data spanning multiple chunks, we can stream the chunks without loading the whole array.

{result, us} =
  Metrics.time(fn ->
    Array.chunk_stream(a)
    |> Stream.take(5)
    |> Enum.map(fn {chunk_index, chunk_bin} ->
      {chunk_index, byte_size(chunk_bin)}
    end)
  end)

%{first_5: result, took: Metrics.human_us(us)}

6) Chunk streaming (parallel)

For remote stores, parallel chunk reads often help. Here we just demonstrate the API.

progress =
  fn done, total ->
    if rem(done, 10) == 0 or done == total do
      IO.puts("Progress: #{done}/#{total}")
    end
  end

{count, us} =
  Metrics.time(fn ->
    Array.chunk_stream(a, parallel: 4, ordered: false, progress_callback: progress)
    |> Stream.take(50)
    |> Enum.count()
  end)

%{chunks_seen: count, took: Metrics.human_us(us)}

7) Save to disk and reopen

base = Path.join(System.tmp_dir!(), "exzarr_livebook")
path = Path.join(base, "array_2d")
File.rm_rf!(path)
File.mkdir_p!(path)

:ok = Array.save(a, path: path)

{:ok, reopened} = Array.open(path: path)

%{saved_to: path, reopened_shape: reopened.shape, reopened_chunks: reopened.chunks}

8) Verify the persisted data

{:ok, bin} =
  Array.get_slice(reopened,
    start: {0, 0},
    stop: {3, 6}
  )

Pack.unpack(bin, :int32)

Next

  • AI / GenAI: livebooks/04_ai_genai/04_01_embeddings_in_zarr.livemd
  • Finance: livebooks/05_finance/05_01_tick_data_cube.livemd