Powered by AppSignal & Oban Pro

Graph Analytics

guides/graph-analytics.livemd

Graph Analytics

Mix.install([{:dux, "~> 0.2.0"}])

Graphs Are Two DataFrames

In Dux, a graph is two %Dux{} structs — one for vertices, one for edges. Graph algorithms are verb compositions that compile to SQL, just like everything else.

Let’s explore with Zachary’s Karate Club — a famous social network dataset from a 1977 study of friendships in a university karate club that eventually split into two factions.

require Dux

datasets_dir = Application.app_dir(:dux, "priv/datasets")

# Load edges (undirected — we need both directions)
edges =
  Dux.from_query("""
    SELECT src, dst FROM read_csv('#{datasets_dir}/karate_club.csv')
    UNION ALL
    SELECT dst AS src, src AS dst FROM read_csv('#{datasets_dir}/karate_club.csv')
  """)

vertices = Dux.from_list(Enum.map(1..34, &%{id: &1}))

graph = Dux.Graph.new(vertices: vertices, edges: edges)
IO.puts("Vertices: #{Dux.Graph.vertex_count(graph)}")
IO.puts("Edges: #{Dux.Graph.edge_count(graph)} (directed, both directions)")

Who Has the Most Connections?

Degree functions count edges per vertex:

graph
|> Dux.Graph.out_degree()
|> Dux.sort_by(desc: :out_degree)
|> Dux.head(5)
|> Dux.compute()

> #### Graph results are DataFrames {: .info} > > Every graph algorithm returns a %Dux{} struct. You can pipe results > into any Dux verb — filter, join, sort, export.

PageRank: Who Is Most Influential?

PageRank identifies important vertices based on the structure of incoming edges, not just the count:

graph
|> Dux.Graph.pagerank(iterations: 20)
|> Dux.sort_by(desc: :rank)
|> Dux.head(10)
|> Dux.compute()

Node 34 (the club “Officer”) and node 1 (the “Instructor”) are the two leaders who eventually caused the club to split.

Shortest Paths

Find the distance from any node to all reachable nodes via BFS:

# Distances from node 1 (the Instructor)
graph
|> Dux.Graph.shortest_paths(1)
|> Dux.sort_by(:dist)
|> Dux.head(10)
|> Dux.compute()

Connected Components

Label each vertex with its component ID. The karate club is a single connected component:

graph
|> Dux.Graph.connected_components()
|> Dux.to_columns()
|> then(fn cols -> cols["component"] |> Enum.uniq() |> length() end)

Triangle Counting

Count the number of triangles (three mutually connected nodes) in the network. This measures clustering density:

Dux.Graph.triangle_count(graph)

The karate club has 45 triangles — a tightly knit social network.

Composability: Graph Results Meet Verbs

Since graph algorithms return %Dux{} structs, you can join them with other data:

# Create faction labels (from the known split)
factions = Dux.from_list([
  %{id: 1, faction: "Instructor"},
  %{id: 34, faction: "Officer"}
])

# Join PageRank with faction labels
graph
|> Dux.Graph.pagerank(iterations: 20)
|> Dux.join(factions, on: :id)
|> Dux.sort_by(desc: :rank)
|> Dux.compute()

Building Graphs from Scratch

You can create graphs from any edge list:

# A simple directed graph
edges = Dux.from_list([
  %{from: "A", to: "B"},
  %{from: "B", to: "C"},
  %{from: "C", to: "A"},
  %{from: "A", to: "D"}
])

vertices = Dux.from_list([%{name: "A"}, %{name: "B"}, %{name: "C"}, %{name: "D"}])

small_graph = Dux.Graph.new(
  vertices: vertices,
  edges: edges,
  vertex_id: :name,
  edge_src: :from,
  edge_dst: :to
)

small_graph
|> Dux.Graph.pagerank(iterations: 10)
|> Dux.sort_by(desc: :rank)
|> Dux.compute()

What’s Next