Graph Analytics
Mix.install([{:dux, "~> 0.2.0"}])
Graphs Are Two DataFrames
In Dux, a graph is two %Dux{} structs — one for vertices, one for edges. Graph algorithms are verb compositions that compile to SQL, just like everything else.
Let’s explore with Zachary’s Karate Club — a famous social network dataset from a 1977 study of friendships in a university karate club that eventually split into two factions.
require Dux
datasets_dir = Application.app_dir(:dux, "priv/datasets")
# Load edges (undirected — we need both directions)
edges =
Dux.from_query("""
SELECT src, dst FROM read_csv('#{datasets_dir}/karate_club.csv')
UNION ALL
SELECT dst AS src, src AS dst FROM read_csv('#{datasets_dir}/karate_club.csv')
""")
vertices = Dux.from_list(Enum.map(1..34, &%{id: &1}))
graph = Dux.Graph.new(vertices: vertices, edges: edges)
IO.puts("Vertices: #{Dux.Graph.vertex_count(graph)}")
IO.puts("Edges: #{Dux.Graph.edge_count(graph)} (directed, both directions)")
Who Has the Most Connections?
Degree functions count edges per vertex:
graph
|> Dux.Graph.out_degree()
|> Dux.sort_by(desc: :out_degree)
|> Dux.head(5)
|> Dux.compute()
> #### Graph results are DataFrames {: .info}
>
> Every graph algorithm returns a %Dux{} struct. You can pipe results
> into any Dux verb — filter, join, sort, export.
PageRank: Who Is Most Influential?
PageRank identifies important vertices based on the structure of incoming edges, not just the count:
graph
|> Dux.Graph.pagerank(iterations: 20)
|> Dux.sort_by(desc: :rank)
|> Dux.head(10)
|> Dux.compute()
Node 34 (the club “Officer”) and node 1 (the “Instructor”) are the two leaders who eventually caused the club to split.
Shortest Paths
Find the distance from any node to all reachable nodes via BFS:
# Distances from node 1 (the Instructor)
graph
|> Dux.Graph.shortest_paths(1)
|> Dux.sort_by(:dist)
|> Dux.head(10)
|> Dux.compute()
Connected Components
Label each vertex with its component ID. The karate club is a single connected component:
graph
|> Dux.Graph.connected_components()
|> Dux.to_columns()
|> then(fn cols -> cols["component"] |> Enum.uniq() |> length() end)
Triangle Counting
Count the number of triangles (three mutually connected nodes) in the network. This measures clustering density:
Dux.Graph.triangle_count(graph)
The karate club has 45 triangles — a tightly knit social network.
Composability: Graph Results Meet Verbs
Since graph algorithms return %Dux{} structs, you can join them with other data:
# Create faction labels (from the known split)
factions = Dux.from_list([
%{id: 1, faction: "Instructor"},
%{id: 34, faction: "Officer"}
])
# Join PageRank with faction labels
graph
|> Dux.Graph.pagerank(iterations: 20)
|> Dux.join(factions, on: :id)
|> Dux.sort_by(desc: :rank)
|> Dux.compute()
Building Graphs from Scratch
You can create graphs from any edge list:
# A simple directed graph
edges = Dux.from_list([
%{from: "A", to: "B"},
%{from: "B", to: "C"},
%{from: "C", to: "A"},
%{from: "A", to: "D"}
])
vertices = Dux.from_list([%{name: "A"}, %{name: "B"}, %{name: "C"}, %{name: "D"}])
small_graph = Dux.Graph.new(
vertices: vertices,
edges: edges,
vertex_id: :name,
edge_src: :from,
edge_dst: :to
)
small_graph
|> Dux.Graph.pagerank(iterations: 10)
|> Dux.sort_by(desc: :rank)
|> Dux.compute()
What’s Next
- Getting Started — core Dux concepts
- Distributed Execution — distribute queries and graph algorithms across workers