Notesclub

created by hec & contributors

terms privacy

Introduction to Systolic Arrays

introduction_to_systolic_arrays.livemd

thanos vassilakis

@thanos

ex_systolic

Share to X

Share to Bluesky

More notebooks

Introduction to Systolic Arrays

Systolic Arrays in 60 Seconds

A systolic array is a grid of simple processors (called processing elements or PEs) connected by communication channels (links). Data flows through the array like blood through a heart – hence the name “systolic.”

Every PE executes the same simple operation every clock tick. Data arrives from neighbours, is processed, and is forwarded onward. There is no global state, no shared memory, and no coordination beyond the clock.

This makes systolic arrays:

Deterministic – same inputs, same outputs, always
Predictable – execution time is known in advance
Composable – the same PE can tile across arbitrarily large grids

The Key Abstractions

Concept	What it is	ex_systolic module
Grid	The coordinate geometry	`ExSystolic.Grid`
Space	Pluggable topology / neighbour semantics	`ExSystolic.Space` + implementations like `ExSystolic.Space.Grid2D`
PE	A stateful step function	`ExSystolic.PE` (behaviour)
Link	A FIFO buffer between ports	`ExSystolic.Link`
Array	A grid/space + PEs + links	`ExSystolic.Array`
Clock	Drives tick-by-tick execution	`ExSystolic.Clock`
Trace	Records every PE step	`ExSystolic.Trace`

Tick Semantics

Each tick follows this strict order:

INJECT – external input streams push one value into each boundary link
READ – every PE reads its input links (FIFO, from previous tick)
EXECUTE – every PE runs its step/4 function (pure)
WRITE – every PE writes outputs to its output links (for next tick)
RECORD – trace events are appended (if tracing is enabled)

This order guarantees that no PE ever reads data produced in the same tick. Reads always see the state left by the previous tick.

Building a 2x2 Systolic Array

Let us build a 2x2 array of multiply-accumulate (MAC) PEs and use it for matrix multiplication.

By default the array uses a 2D rectangular space (ExSystolic.Space.Grid2D). You can either specify rows/cols (backwards compatible) or provide the space explicitly:

alias ExSystolic.{Array, Clock, PE.MAC}
alias ExSystolic.Space.Grid2D

array =
  Array.new(rows: 2, cols: 2)
  |> Array.fill(MAC)
  |> Array.connect(:west_to_east)
  |> Array.connect(:north_to_south)

# Equivalent explicit-space construction:
array =
  Array.new(space: {Grid2D, rows: 2, cols: 2})
  |> Array.fill(MAC)
  |> Array.connect(:west_to_east)
  |> Array.connect(:north_to_south)

Step-by-Step Tick Visualization

We feed row data from the west and column data from the north. For a 2x2 GEMM with:

A = [[1, 2], [3, 4]] (west input)
B = [[5, 6], [7, 8]] (north input)

The input streams are skewed so data arrives at each PE at the right time:

West streams (rows of A, entering at column 0):

Stream target	Tick 0	Tick 1	Tick 2	Tick 3	Tick 4
(0,0):west	1	2	0	0	0
(1,0):west	0	3	4	0	0

North streams (columns of B, entering at row 0):

Stream target	Tick 0	Tick 1	Tick 2	Tick 3	Tick 4
(0,0):north	5	7	0	0	0
(0,1):north	0	6	8	0	0

Tick-by-tick execution:

Tick 0:

PE(0,0): west=1, north=5 -> acc = 0 + 1*5 = 5; forward east=1, south=5
PE(0,1): west=:empty, north=:empty -> acc = 0
PE(1,0): west=:empty, north=:empty -> acc = 0
PE(1,1): west=:empty, north=:empty -> acc = 0

Tick 1:

PE(0,0): west=2, north=7 -> acc = 5 + 2*7 = 19; forward east=2, south=7
PE(0,1): west=1 (from PE(0,0)), north=:empty -> acc = 0; forward east=1
PE(1,0): west=3, north=5 (from PE(0,0)) -> acc = 0 + 3*5 = 15; forward east=3, south=5
PE(1,1): west=:empty, north=:empty -> acc = 0

Tick 2:

PE(0,0): west=0, north=0 -> acc = 19 (no change)
PE(0,1): west=2, north=6 -> acc = 0 + 2*6 = 12; forward east=2, south=6
PE(1,0): west=4, north=7 -> acc = 15 + 4*7 = 43; forward east=4, south=7
PE(1,1): west=3 (from PE(1,0)), north=5 (from PE(0,1)) -> acc = 0 + 3*5 = 15

Tick 3:

PE(1,1): west=4, north=6 -> acc = 15 + 4*6 = 39; forward east=4, south=6

Wait – this doesn’t look right for 2x2 GEMM. Let me just run it:

alias ExSystolic.Examples.GEMM

A = [[1, 2], [3, 4]]
B = [[5, 6], [7, 8]]

GEMM.run(A, B)

The result should be:

[[19, 22], [43, 50]]

Verifying by hand

C = A * B where:

C[0][0] = 15 + 27 = 5 + 14 = 19
C[0][1] = 16 + 28 = 6 + 16 = 22
C[1][0] = 35 + 47 = 15 + 28 = 43
C[1][1] = 36 + 48 = 18 + 32 = 50

Running with Tracing

To see exactly what each PE does at every tick:

alias ExSystolic.{Array, Clock, PE.MAC}

array =
  Array.new(rows: 2, cols: 2)
  |> Array.fill(MAC)
  |> Array.connect(:west_to_east)
  |> Array.connect(:north_to_south)
  |> Array.input(:west, ExSystolic.Examples.GEMM.west_streams([[1,2],[3,4]], 2, 2, 2))
  |> Array.input(:north, ExSystolic.Examples.GEMM.north_streams([[5,6],[7,8]], 2, 2, 2))
  |> Array.trace(true)

result = Clock.run(array, ticks: 5)

# Inspect the trace
for event <- result.trace.events do
  IO.puts("Tick #{event.tick} PE#{inspect(event.coord)}: " <>
    "inputs=#{inspect(event.inputs)} " <>
    "acc: #{event.state_before} -> #{event.state_after}")
end

The GraphBLAS Connection

Matrix multiplication over the arithmetic semi-ring (multiply: *, add: +) is the foundational operation in GraphBLAS. A systolic array of MAC PEs implements exactly this semi-ring product.

GraphBLAS generalises this to arbitrary semi-rings:

Boolean semi-ring: (multiply: AND, add: OR) – reachability
Tropical semi-ring: (multiply: +, add: min) – shortest paths
Arithmetic semi-ring: (multiply: *, add: +) – standard matrix multiply

The ex_systolic PE behaviour is designed so that future phases can implement PEs for any of these semi-rings without changing the array, clock, or link infrastructure.

What’s Next?

Phase 1 delivers correctness and traceability. Future phases may add:

Custom semi-rings beyond arithmetic
Larger arrays with performance optimisation
Alternative backends (multi-process, native)
Sparse data support (GraphBLAS core feature)

For now, the interpreted backend proves that the semantics are sound, deterministic, and testable.

Other notebooks:

Alembic
@team-alembic

warragamba-blogpost

warragamba.livebook

warragamba.livemd

tutorial advanced data-science httpoison jason vega_lite kino kino_vega_lite explorer axon exla nx table_rex

2023-1-6
@DockYard-Academy

curriculum

Battle Map

battle_map.livemd

tutorial advanced intermediate gen-server jason kino youtube hidden_cell

2023-3-21
Mohsin Khan
@CodeByMohsin

Basic-of-Elixir

Basics of Data Types and Operations in Elixir

basic.livemd

tutorial beginner jason kino youtube hidden_cell

2024-5-10
Ryo Wakabayashi
@RyoWakabayashi

elixir-learning

Amazon Textract

textract.livemd

tutorial advanced apis aws hackney kino

2023-12-11
@DeSchoel

Elixir_Curriculum

Sublist

deprecated_sublist.livemd

algorithms intermediate jason kino youtube hidden_cell

2026-1-10
Chris Martin
@trbngr

elixir_cqrs_tools

Using cqrs_tools with Commanded

commanded.livemd

tutorial advanced gen-server otp commanded cqrs_tools ecto etso jason

2022-8-18
@DeSchoel

Elixir_Curriculum

Timer

timer.livemd

tutorial advanced gen-server jason kino youtube hidden_cell

2026-1-10

Back