Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

K-means clustering

notebooks/k_means.livemd

K-means clustering

Mix.install([
  {:scholar, "~> 0.2.0"},
  {:exla, "~> 0.6.0"},
  {:nx, "~> 0.6.0", override: true},
  {:explorer, "~> 0.6.1"},
  {:stb_image, "~> 0.6.1"},
  {:scidata, "~> 0.1.10"},
  {:req, "~> 0.3.9"},
  {:kino, "~> 0.10.0"},
  {:kino_vega_lite, "~> 0.1.9"},
  {:tucan, "~> 0.3.0"}
])

Introduction

The main purpose of this livebook is to introduce the KMeans clustering algorithm. We will explore KMeans in three different use cases.

alias Scholar.Cluster.KMeans
require Explorer.DataFrame, as: DF
Nx.global_default_backend(EXLA.Backend)
key = Nx.Random.key(42)
#Nx.Tensor<
  u32[2]
  EXLA.Backend
  [0, 42]
>

Iris Dataset

In the first example, we will focus on is the Iris Dataset. It is one of the most renowned datasets. It consists of 150 records describing three iris species: Iris Setosa, Iris Virginica, and Iris Versicolor. Our task will be to predict the species of given flowers.

Firstly, we load the data, then we split it into Training Data (x) and Target (y) and cast those into Nx tensors.

df = Explorer.Datasets.iris()
x = df |> DF.discard(["species"]) |> Nx.stack(axis: 1)

y =
  df[["species"]]
  |> DF.dummies(["species"])
  |> Nx.stack(axis: 1)
  |> Nx.argmax(axis: 1)

{x, y}
{#Nx.Tensor<
   f64[150][4]
   EXLA.Backend
   [
     [5.1, 3.5, 1.4, 0.2],
     [4.9, 3.0, 1.4, 0.2],
     [4.7, 3.2, 1.3, 0.2],
     [4.6, 3.1, 1.5, 0.2],
     [5.0, 3.6, 1.4, 0.2],
     [5.4, 3.9, 1.7, 0.4],
     [4.6, 3.4, 1.4, 0.3],
     [5.0, 3.4, 1.5, 0.2],
     [4.4, 2.9, 1.4, 0.2],
     [4.9, 3.1, 1.5, 0.1],
     [5.4, 3.7, 1.5, 0.2],
     [4.8, 3.4, 1.6, 0.2],
     [4.8, ...],
     ...
   ]
 >,
 #Nx.Tensor<
   s64[150]
   EXLA.Backend
   [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
 >}

Exploratory Data Analysis

An important part of Data Science workflow is so-called Exploratory Data Analysis. EDA helps us understand the data in a better way and suggests some efficient strategies to solve problems. There is no one specific course of action which defines good EDA. It should contain tabular summaries and plots showing relations between features.

We start our EDA by finding the mean values of each feature by species.

grouped_data = DF.group_by(df, "species")

DF.summarise(
  grouped_data,
  petal_length: mean(petal_length),
  petal_width: mean(petal_width),
  sepal_width: mean(sepal_width),
  sepal_length: mean(sepal_length)
)
#Explorer.DataFrame<
  Polars[3 x 5]
  species string ["Iris-setosa", "Iris-versicolor", "Iris-virginica"]
  petal_length float [1.464, 4.26, 5.552]
  petal_width float [0.2439999999999999, 1.3259999999999998, 2.026]
  sepal_width float [3.4180000000000006, 2.7700000000000005, 2.9739999999999998]
  sepal_length float [5.005999999999999, 5.936, 6.587999999999998]
>

We see that petal_length and petal_width are the most distinguishing features. Let’s explore them a little bit more.

Tucan.histogram(df, "petal_length", color_by: "species")
|> Tucan.facet_by(:column, "species")
|> Tucan.Scale.set_y_domain(0, 55)
|> Tucan.set_size(200, 200)
|> Tucan.set_title("Histograms of petal_length column by species", offset: 25, anchor: :middle)
{"$schema":"https://vega.github.io/schema/vega-lite/v5.json","__tucan__":{"plot":"histogram"},"data":{"values":[{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.9,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.7,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.6,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.9,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":4.6,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.4,"sepal_width":2.9,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.4,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.1,"sepal_length":4.8,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.1,"petal_width":0.1,"sepal_length":4.3,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.2,"petal_width":0.2,"sepal_length":5.8,"sepal_width":4.0,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.7,"sepal_width":4.4,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.9,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":5.1,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.3,"sepal_length":5.7,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.3,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.2,"sepal_length":5.4,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.1,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.0,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.6,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.5,"sepal_length":5.1,"sepal_width":3.3,"species":"Iris-setosa"},{"petal_length":1.9,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.4,"sepal_length":5.0,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.2,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.2,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.7,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":5.2,"sepal_width":4.1,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.5,"sepal_width":4.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.2,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":5.5,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.4,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.3,"sepal_length":5.0,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.3,"sepal_length":4.5,"sepal_width":2.3,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.4,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.6,"sepal_length":5.0,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.9,"petal_width":0.4,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":4.8,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.3,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.3,"species":"Iris-setosa"},{"petal_length":4.7,"petal_width":1.4,"sepal_length":7.0,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.4,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.9,"petal_width":1.5,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":5.5,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.5,"sepal_length":6.5,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.6,"sepal_length":6.3,"sepal_width":3.3,"species":"Iris-versicolor"},{"petal_length":3.3,"petal_width":1.0,"sepal_length":4.9,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.3,"sepal_length":6.6,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.4,"sepal_length":5.2,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":3.5,"petal_width":1.0,"sepal_length":5.0,"sepal_width":2.0,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.5,"sepal_length":5.9,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.0,"sepal_length":6.0,"sepal_width":2.2,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.4,"sepal_length":6.1,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.6,"petal_width":1.3,"sepal_length":5.6,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.4,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":5.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.0,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.2,"sepal_width":2.2,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.1,"sepal_length":5.6,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":5.9,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":6.1,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.9,"petal_width":1.5,"sepal_length":6.3,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.2,"sepal_length":6.1,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.3,"petal_width":1.3,"sepal_length":6.4,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.4,"sepal_length":6.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.8,"petal_width":1.4,"sepal_length":6.8,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":5.0,"petal_width":1.7,"sepal_length":6.7,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.0,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.5,"petal_width":1.0,"sepal_length":5.7,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":3.8,"petal_width":1.1,"sepal_length":5.5,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":3.7,"petal_width":1.0,"sepal_length":5.5,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.2,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":5.1,"petal_width":1.6,"sepal_length":6.0,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":5.4,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.6,"sepal_length":6.0,"sepal_width":3.4,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.5,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.3,"sepal_length":6.3,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.3,"sepal_length":5.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":5.5,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.2,"sepal_length":5.5,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.4,"sepal_length":6.1,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.2,"sepal_length":5.8,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":3.3,"petal_width":1.0,"sepal_length":5.0,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.3,"sepal_length":5.6,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.2,"sepal_length":5.7,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.3,"petal_width":1.3,"sepal_length":6.2,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.0,"petal_width":1.1,"sepal_length":5.1,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":6.0,"petal_width":2.5,"sepal_length":6.3,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.9,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.9,"petal_width":2.1,"sepal_length":7.1,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":1.8,"sepal_length":6.3,"sepal_width":2.9,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":2.2,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.6,"petal_width":2.1,"sepal_length":7.6,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":4.5,"petal_width":1.7,"sepal_length":4.9,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":6.3,"petal_width":1.8,"sepal_length":7.3,"sepal_width":2.9,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":1.8,"sepal_length":6.7,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":2.5,"sepal_length":7.2,"sepal_width":3.6,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.0,"sepal_length":6.5,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.3,"petal_width":1.9,"sepal_length":6.4,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":2.1,"sepal_length":6.8,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":2.0,"sepal_length":5.7,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.4,"sepal_length":5.8,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.3,"petal_width":2.3,"sepal_length":6.4,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":1.8,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.7,"petal_width":2.2,"sepal_length":7.7,"sepal_width":3.8,"species":"Iris-virginica"},{"petal_length":6.9,"petal_width":2.3,"sepal_length":7.7,"sepal_width":2.6,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":1.5,"sepal_length":6.0,"sepal_width":2.2,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.3,"sepal_length":6.9,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":2.0,"sepal_length":5.6,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":6.7,"petal_width":2.0,"sepal_length":7.7,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":1.8,"sepal_length":6.3,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.1,"sepal_length":6.7,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":6.0,"petal_width":1.8,"sepal_length":7.2,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":6.2,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":1.8,"sepal_length":6.1,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.1,"sepal_length":6.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":1.6,"sepal_length":7.2,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":1.9,"sepal_length":7.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":6.4,"petal_width":2.0,"sepal_length":7.9,"sepal_width":3.8,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.2,"sepal_length":6.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.5,"sepal_length":6.3,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":1.4,"sepal_length":6.1,"sepal_width":2.6,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":2.3,"sepal_length":7.7,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.4,"sepal_length":6.3,"sepal_width":3.4,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":1.8,"sepal_length":6.4,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":6.0,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.4,"petal_width":2.1,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.4,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.3,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.9,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.9,"petal_width":2.3,"sepal_length":6.8,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.5,"sepal_length":6.7,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":5.2,"petal_width":2.3,"sepal_length":6.7,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":1.9,"sepal_length":6.3,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":5.2,"petal_width":2.0,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.4,"petal_width":2.3,"sepal_length":6.2,"sepal_width":3.4,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.8,"sepal_length":5.9,"sepal_width":3.0,"species":"Iris-virginica"}]},"encoding":{"color":{"field":"species"},"column":{"field":"species"},"x":{"bin":{"binned":true},"field":"bin_petal_length","title":"petal_length"},"x2":{"field":"bin_petal_length_end"},"y":{"field":"count_petal_length","scale":{"domain":[0,55]},"stack":null,"type":"quantitative"}},"height":200,"mark":{"fillOpacity":1,"type":"bar"},"title":{"anchor":"middle","offset":25,"text":"Histograms of petal_length column by species"},"transform":[{"as":"bin_petal_length","bin":true,"field":"petal_length"},{"aggregate":[{"as":"count_petal_length","op":"count"}],"groupby":["bin_petal_length","bin_petal_length_end","species"]}],"width":200}
Tucan.scatter(df, "petal_length", "petal_width", filled: true, color_by: "species")
|> Tucan.set_size(300, 300)
|> Tucan.set_title("Scatterplot of data samples projected on plane petal_width x petal_length",
  offset: 25
)
{"$schema":"https://vega.github.io/schema/vega-lite/v5.json","data":{"values":[{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.9,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.7,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.6,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.9,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":4.6,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.4,"sepal_width":2.9,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.4,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.1,"sepal_length":4.8,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.1,"petal_width":0.1,"sepal_length":4.3,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.2,"petal_width":0.2,"sepal_length":5.8,"sepal_width":4.0,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.7,"sepal_width":4.4,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.9,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":5.1,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.3,"sepal_length":5.7,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.3,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.2,"sepal_length":5.4,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.1,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.0,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.6,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.5,"sepal_length":5.1,"sepal_width":3.3,"species":"Iris-setosa"},{"petal_length":1.9,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.4,"sepal_length":5.0,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.2,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.2,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.7,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":5.2,"sepal_width":4.1,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.5,"sepal_width":4.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.2,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":5.5,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.4,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.3,"sepal_length":5.0,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.3,"sepal_length":4.5,"sepal_width":2.3,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.4,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.6,"sepal_length":5.0,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.9,"petal_width":0.4,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":4.8,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.3,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.3,"species":"Iris-setosa"},{"petal_length":4.7,"petal_width":1.4,"sepal_length":7.0,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.4,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.9,"petal_width":1.5,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":5.5,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.5,"sepal_length":6.5,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.6,"sepal_length":6.3,"sepal_width":3.3,"species":"Iris-versicolor"},{"petal_length":3.3,"petal_width":1.0,"sepal_length":4.9,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.3,"sepal_length":6.6,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.4,"sepal_length":5.2,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":3.5,"petal_width":1.0,"sepal_length":5.0,"sepal_width":2.0,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.5,"sepal_length":5.9,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.0,"sepal_length":6.0,"sepal_width":2.2,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.4,"sepal_length":6.1,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.6,"petal_width":1.3,"sepal_length":5.6,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.4,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":5.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.0,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.2,"sepal_width":2.2,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.1,"sepal_length":5.6,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":5.9,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":6.1,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.9,"petal_width":1.5,"sepal_length":6.3,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.2,"sepal_length":6.1,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.3,"petal_width":1.3,"sepal_length":6.4,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.4,"sepal_length":6.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.8,"petal_width":1.4,"sepal_length":6.8,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":5.0,"petal_width":1.7,"sepal_length":6.7,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.0,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.5,"petal_width":1.0,"sepal_length":5.7,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":3.8,"petal_width":1.1,"sepal_length":5.5,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":3.7,"petal_width":1.0,"sepal_length":5.5,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.2,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":5.1,"petal_width":1.6,"sepal_length":6.0,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":5.4,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.6,"sepal_length":6.0,"sepal_width":3.4,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.5,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.3,"sepal_length":6.3,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.3,"sepal_length":5.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":5.5,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.2,"sepal_length":5.5,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.4,"sepal_length":6.1,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.2,"sepal_length":5.8,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":3.3,"petal_width":1.0,"sepal_length":5.0,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.3,"sepal_length":5.6,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.2,"sepal_length":5.7,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.3,"petal_width":1.3,"sepal_length":6.2,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.0,"petal_width":1.1,"sepal_length":5.1,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":6.0,"petal_width":2.5,"sepal_length":6.3,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.9,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.9,"petal_width":2.1,"sepal_length":7.1,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":1.8,"sepal_length":6.3,"sepal_width":2.9,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":2.2,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.6,"petal_width":2.1,"sepal_length":7.6,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":4.5,"petal_width":1.7,"sepal_length":4.9,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":6.3,"petal_width":1.8,"sepal_length":7.3,"sepal_width":2.9,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":1.8,"sepal_length":6.7,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":2.5,"sepal_length":7.2,"sepal_width":3.6,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.0,"sepal_length":6.5,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.3,"petal_width":1.9,"sepal_length":6.4,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":2.1,"sepal_length":6.8,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":2.0,"sepal_length":5.7,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.4,"sepal_length":5.8,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.3,"petal_width":2.3,"sepal_length":6.4,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":1.8,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.7,"petal_width":2.2,"sepal_length":7.7,"sepal_width":3.8,"species":"Iris-virginica"},{"petal_length":6.9,"petal_width":2.3,"sepal_length":7.7,"sepal_width":2.6,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":1.5,"sepal_length":6.0,"sepal_width":2.2,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.3,"sepal_length":6.9,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":2.0,"sepal_length":5.6,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":6.7,"petal_width":2.0,"sepal_length":7.7,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":1.8,"sepal_length":6.3,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.1,"sepal_length":6.7,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":6.0,"petal_width":1.8,"sepal_length":7.2,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":6.2,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":1.8,"sepal_length":6.1,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.1,"sepal_length":6.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":1.6,"sepal_length":7.2,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":1.9,"sepal_length":7.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":6.4,"petal_width":2.0,"sepal_length":7.9,"sepal_width":3.8,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.2,"sepal_length":6.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.5,"sepal_length":6.3,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":1.4,"sepal_length":6.1,"sepal_width":2.6,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":2.3,"sepal_length":7.7,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.4,"sepal_length":6.3,"sepal_width":3.4,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":1.8,"sepal_length":6.4,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":6.0,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.4,"petal_width":2.1,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.4,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.3,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.9,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.9,"petal_width":2.3,"sepal_length":6.8,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.5,"sepal_length":6.7,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":5.2,"petal_width":2.3,"sepal_length":6.7,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":1.9,"sepal_length":6.3,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":5.2,"petal_width":2.0,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.4,"petal_width":2.3,"sepal_length":6.2,"sepal_width":3.4,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.8,"sepal_length":5.9,"sepal_width":3.0,"species":"Iris-virginica"}]},"encoding":{"color":{"field":"species","type":"nominal"},"x":{"field":"petal_length","scale":{"zero":false},"type":"quantitative"},"y":{"field":"petal_width","scale":{"zero":false},"type":"quantitative"}},"height":300,"mark":{"fillOpacity":1,"filled":true,"type":"point"},"title":{"offset":25,"text":"Scatterplot of data samples projected on plane petal_width x petal_length"},"width":300}
Tucan.scatter(df, "petal_length", "petal_width")
|> Tucan.facet_by(:column, "species")
|> Tucan.set_title(
  "Scatterplot of data samples projected on plane petal_width x petal_length by species",
  offset: 25
)
{"$schema":"https://vega.github.io/schema/vega-lite/v5.json","data":{"values":[{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.9,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.7,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.6,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.9,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":4.6,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.4,"sepal_width":2.9,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.4,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.1,"sepal_length":4.8,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.1,"petal_width":0.1,"sepal_length":4.3,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.2,"petal_width":0.2,"sepal_length":5.8,"sepal_width":4.0,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.7,"sepal_width":4.4,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.9,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":5.1,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.3,"sepal_length":5.7,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.3,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.2,"sepal_length":5.4,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.1,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.0,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.6,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.5,"sepal_length":5.1,"sepal_width":3.3,"species":"Iris-setosa"},{"petal_length":1.9,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.4,"sepal_length":5.0,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.2,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.2,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.7,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":5.2,"sepal_width":4.1,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.5,"sepal_width":4.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.2,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":5.5,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.4,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.3,"sepal_length":5.0,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.3,"sepal_length":4.5,"sepal_width":2.3,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.4,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.6,"sepal_length":5.0,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.9,"petal_width":0.4,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":4.8,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.3,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.3,"species":"Iris-setosa"},{"petal_length":4.7,"petal_width":1.4,"sepal_length":7.0,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.4,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.9,"petal_width":1.5,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":5.5,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.5,"sepal_length":6.5,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.6,"sepal_length":6.3,"sepal_width":3.3,"species":"Iris-versicolor"},{"petal_length":3.3,"petal_width":1.0,"sepal_length":4.9,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.3,"sepal_length":6.6,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.4,"sepal_length":5.2,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":3.5,"petal_width":1.0,"sepal_length":5.0,"sepal_width":2.0,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.5,"sepal_length":5.9,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.0,"sepal_length":6.0,"sepal_width":2.2,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.4,"sepal_length":6.1,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.6,"petal_width":1.3,"sepal_length":5.6,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.4,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":5.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.0,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.2,"sepal_width":2.2,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.1,"sepal_length":5.6,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":5.9,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":6.1,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.9,"petal_width":1.5,"sepal_length":6.3,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.2,"sepal_length":6.1,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.3,"petal_width":1.3,"sepal_length":6.4,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.4,"sepal_length":6.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.8,"petal_width":1.4,"sepal_length":6.8,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":5.0,"petal_width":1.7,"sepal_length":6.7,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.0,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.5,"petal_width":1.0,"sepal_length":5.7,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":3.8,"petal_width":1.1,"sepal_length":5.5,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":3.7,"petal_width":1.0,"sepal_length":5.5,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.2,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":5.1,"petal_width":1.6,"sepal_length":6.0,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":5.4,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.6,"sepal_length":6.0,"sepal_width":3.4,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.5,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.3,"sepal_length":6.3,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.3,"sepal_length":5.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":5.5,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.2,"sepal_length":5.5,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.4,"sepal_length":6.1,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.2,"sepal_length":5.8,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":3.3,"petal_width":1.0,"sepal_length":5.0,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.3,"sepal_length":5.6,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.2,"sepal_length":5.7,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.3,"petal_width":1.3,"sepal_length":6.2,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.0,"petal_width":1.1,"sepal_length":5.1,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":6.0,"petal_width":2.5,"sepal_length":6.3,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.9,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.9,"petal_width":2.1,"sepal_length":7.1,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":1.8,"sepal_length":6.3,"sepal_width":2.9,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":2.2,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.6,"petal_width":2.1,"sepal_length":7.6,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":4.5,"petal_width":1.7,"sepal_length":4.9,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":6.3,"petal_width":1.8,"sepal_length":7.3,"sepal_width":2.9,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":1.8,"sepal_length":6.7,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":2.5,"sepal_length":7.2,"sepal_width":3.6,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.0,"sepal_length":6.5,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.3,"petal_width":1.9,"sepal_length":6.4,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":2.1,"sepal_length":6.8,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":2.0,"sepal_length":5.7,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.4,"sepal_length":5.8,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.3,"petal_width":2.3,"sepal_length":6.4,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":1.8,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.7,"petal_width":2.2,"sepal_length":7.7,"sepal_width":3.8,"species":"Iris-virginica"},{"petal_length":6.9,"petal_width":2.3,"sepal_length":7.7,"sepal_width":2.6,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":1.5,"sepal_length":6.0,"sepal_width":2.2,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.3,"sepal_length":6.9,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":2.0,"sepal_length":5.6,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":6.7,"petal_width":2.0,"sepal_length":7.7,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":1.8,"sepal_length":6.3,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.1,"sepal_length":6.7,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":6.0,"petal_width":1.8,"sepal_length":7.2,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":6.2,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":1.8,"sepal_length":6.1,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.1,"sepal_length":6.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":1.6,"sepal_length":7.2,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":1.9,"sepal_length":7.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":6.4,"petal_width":2.0,"sepal_length":7.9,"sepal_width":3.8,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.2,"sepal_length":6.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.5,"sepal_length":6.3,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":1.4,"sepal_length":6.1,"sepal_width":2.6,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":2.3,"sepal_length":7.7,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.4,"sepal_length":6.3,"sepal_width":3.4,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":1.8,"sepal_length":6.4,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":6.0,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.4,"petal_width":2.1,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.4,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.3,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.9,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.9,"petal_width":2.3,"sepal_length":6.8,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.5,"sepal_length":6.7,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":5.2,"petal_width":2.3,"sepal_length":6.7,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":1.9,"sepal_length":6.3,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":5.2,"petal_width":2.0,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.4,"petal_width":2.3,"sepal_length":6.2,"sepal_width":3.4,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.8,"sepal_length":5.9,"sepal_width":3.0,"species":"Iris-virginica"}]},"encoding":{"column":{"field":"species"},"x":{"field":"petal_length","scale":{"zero":false},"type":"quantitative"},"y":{"field":"petal_width","scale":{"zero":false},"type":"quantitative"}},"mark":{"fillOpacity":1,"type":"point"},"title":{"offset":25,"text":"Scatterplot of data samples projected on plane petal_width x petal_length by species"}}

Now we have a better understanding of the data. Iris species have different petal widths and petal lengths. Iris Setosa has the smallest petal, Versicolor is medium size, and Virginica has the largest petal. We can ascertain that our analysis is correct and plot the so-called Elbow plot. The Elbow plot is a plot which presents Inertia vs the number of clusters. If there is a characteristic elbow, then we have a strong suggestion that the number of clusters is correct. Let’s train KMeans models for a different number of clusters from range 1 to 11.

clusterings = 1..11

models =
  for num_clusters <- clusterings do
    KMeans.fit(x, num_clusters: num_clusters, key: key)
  end

inertias = for model <- models, do: Nx.to_number(model.inertia)
[680.8244, 152.36870647733906, 78.94084142614602, 57.44028021295475, 46.56163015873016,
 38.95701115711985, 35.15943976939724, 30.324232174688056, 27.927083333333336, 26.371291306519566,
 24.004956137000256]
Tucan.lineplot([num_clusters: clusterings, inertia: inertias], "num_clusters", "inertia",
  x: [type: :nominal, axis: [label_angle: 0]],
  title: "Elbow Plot"
)
|> Tucan.Axes.set_xy_titles("Number of Clusters", "Inertia")
|> Tucan.set_size(600, 300)
{"$schema":"https://vega.github.io/schema/vega-lite/v5.json","data":{"values":[{"inertia":680.8244000000001,"num_clusters":1},{"inertia":152.36870647733906,"num_clusters":2},{"inertia":78.94084142614601,"num_clusters":3},{"inertia":57.44028021295475,"num_clusters":4},{"inertia":46.56163015873015,"num_clusters":5},{"inertia":38.95701115711985,"num_clusters":6},{"inertia":35.15943976939724,"num_clusters":7},{"inertia":30.324232174688056,"num_clusters":8},{"inertia":27.92708333333333,"num_clusters":9},{"inertia":26.371291306519566,"num_clusters":10},{"inertia":24.00495613700025,"num_clusters":11}]},"encoding":{"x":{"axis":{"labelAngle":0,"title":"Number of Clusters"},"field":"num_clusters","type":"nominal"},"y":{"axis":{"title":"Inertia"},"field":"inertia","type":"quantitative"}},"height":300,"mark":{"fillOpacity":1,"type":"line"},"title":"Elbow Plot","width":600}

As you can see, we have the elbow when the number of clusters equals three. So this value of the parameter seems to be the best.

In order to compare our clustering with the target labels, we need to ensure our clusters are in a matching order.

defmodule Iris.Clusters do
  import Nx.Defn

  defn sort_clusters(model) do
    # We sort clusters by the first coordinate
    order = Nx.argsort(model.clusters[[.., 0]])
    labels_maping = Nx.argsort(order)

    %{
      model
      | labels: Nx.take(labels_maping, model.labels),
        clusters: Nx.take(model.clusters, order)
    }
  end
end
{:module, Iris.Clusters, <<70, 79, 82, 49, 0, 0, 10, ...>>, true}
best_model = Enum.at(models, 2)
best_model = Iris.Clusters.sort_clusters(best_model)
accuracy = Scholar.Metrics.Classification.accuracy(best_model.labels, y)
#Nx.Tensor<
  f32
  EXLA.Backend
  0.8933333158493042
>

Accuracy is nearly 90% - that’s pretty decent! Let’s look at our results plotted on one of the previous plots.

coords = [
  cluster_petal_length: best_model.clusters[[.., 2]] |> Nx.to_flat_list(),
  cluster_petal_width: best_model.clusters[[.., 3]] |> Nx.to_flat_list()
]

Tucan.layers([
  Tucan.scatter(df, "petal_length", "petal_width", color_by: "species", filled: true),
  Tucan.scatter(coords, "cluster_petal_length", "cluster_petal_width",
    filled: true,
    point_size: 100,
    point_color: "green"
  )
])
|> Tucan.set_size(300, 300)
|> Tucan.set_title(
  "Scatterplot of data samples projected on plane petal_width x petal_length with calculated centroids",
  offset: 25
)
{"$schema":"https://vega.github.io/schema/vega-lite/v5.json","height":300,"layer":[{"data":{"values":[{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.9,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.7,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.6,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.9,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":4.6,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.4,"sepal_width":2.9,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.4,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.1,"sepal_length":4.8,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.1,"petal_width":0.1,"sepal_length":4.3,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.2,"petal_width":0.2,"sepal_length":5.8,"sepal_width":4.0,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.7,"sepal_width":4.4,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.9,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":5.1,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.3,"sepal_length":5.7,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.3,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.2,"sepal_length":5.4,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.1,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.0,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.6,"species":"Iris-setosa"},{"petal_length":1.7,"petal_width":0.5,"sepal_length":5.1,"sepal_width":3.3,"species":"Iris-setosa"},{"petal_length":1.9,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.4,"sepal_length":5.0,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.2,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.2,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.7,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":4.8,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.4,"sepal_length":5.4,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":5.2,"sepal_width":4.1,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.5,"sepal_width":4.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.2,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":5.5,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.1,"sepal_length":4.9,"sepal_width":3.1,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.4,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.4,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.3,"sepal_length":5.0,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.3,"sepal_length":4.5,"sepal_width":2.3,"species":"Iris-setosa"},{"petal_length":1.3,"petal_width":0.2,"sepal_length":4.4,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.6,"sepal_length":5.0,"sepal_width":3.5,"species":"Iris-setosa"},{"petal_length":1.9,"petal_width":0.4,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.3,"sepal_length":4.8,"sepal_width":3.0,"species":"Iris-setosa"},{"petal_length":1.6,"petal_width":0.2,"sepal_length":5.1,"sepal_width":3.8,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":4.6,"sepal_width":3.2,"species":"Iris-setosa"},{"petal_length":1.5,"petal_width":0.2,"sepal_length":5.3,"sepal_width":3.7,"species":"Iris-setosa"},{"petal_length":1.4,"petal_width":0.2,"sepal_length":5.0,"sepal_width":3.3,"species":"Iris-setosa"},{"petal_length":4.7,"petal_width":1.4,"sepal_length":7.0,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.4,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.9,"petal_width":1.5,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":5.5,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.5,"sepal_length":6.5,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.6,"sepal_length":6.3,"sepal_width":3.3,"species":"Iris-versicolor"},{"petal_length":3.3,"petal_width":1.0,"sepal_length":4.9,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.3,"sepal_length":6.6,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.4,"sepal_length":5.2,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":3.5,"petal_width":1.0,"sepal_length":5.0,"sepal_width":2.0,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.5,"sepal_length":5.9,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.0,"sepal_length":6.0,"sepal_width":2.2,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.4,"sepal_length":6.1,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.6,"petal_width":1.3,"sepal_length":5.6,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.4,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":5.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.0,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.2,"sepal_width":2.2,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.1,"sepal_length":5.6,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":5.9,"sepal_width":3.2,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":6.1,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.9,"petal_width":1.5,"sepal_length":6.3,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.2,"sepal_length":6.1,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":4.3,"petal_width":1.3,"sepal_length":6.4,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.4,"sepal_length":6.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.8,"petal_width":1.4,"sepal_length":6.8,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":5.0,"petal_width":1.7,"sepal_length":6.7,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":6.0,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.5,"petal_width":1.0,"sepal_length":5.7,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":3.8,"petal_width":1.1,"sepal_length":5.5,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":3.7,"petal_width":1.0,"sepal_length":5.5,"sepal_width":2.4,"species":"Iris-versicolor"},{"petal_length":3.9,"petal_width":1.2,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":5.1,"petal_width":1.6,"sepal_length":6.0,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.5,"sepal_length":5.4,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.5,"petal_width":1.6,"sepal_length":6.0,"sepal_width":3.4,"species":"Iris-versicolor"},{"petal_length":4.7,"petal_width":1.5,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.3,"sepal_length":6.3,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.3,"sepal_length":5.6,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.3,"sepal_length":5.5,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.4,"petal_width":1.2,"sepal_length":5.5,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":4.6,"petal_width":1.4,"sepal_length":6.1,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.0,"petal_width":1.2,"sepal_length":5.8,"sepal_width":2.6,"species":"Iris-versicolor"},{"petal_length":3.3,"petal_width":1.0,"sepal_length":5.0,"sepal_width":2.3,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.3,"sepal_length":5.6,"sepal_width":2.7,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.2,"sepal_length":5.7,"sepal_width":3.0,"species":"Iris-versicolor"},{"petal_length":4.2,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":4.3,"petal_width":1.3,"sepal_length":6.2,"sepal_width":2.9,"species":"Iris-versicolor"},{"petal_length":3.0,"petal_width":1.1,"sepal_length":5.1,"sepal_width":2.5,"species":"Iris-versicolor"},{"petal_length":4.1,"petal_width":1.3,"sepal_length":5.7,"sepal_width":2.8,"species":"Iris-versicolor"},{"petal_length":6.0,"petal_width":2.5,"sepal_length":6.3,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.9,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.9,"petal_width":2.1,"sepal_length":7.1,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":1.8,"sepal_length":6.3,"sepal_width":2.9,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":2.2,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.6,"petal_width":2.1,"sepal_length":7.6,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":4.5,"petal_width":1.7,"sepal_length":4.9,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":6.3,"petal_width":1.8,"sepal_length":7.3,"sepal_width":2.9,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":1.8,"sepal_length":6.7,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":2.5,"sepal_length":7.2,"sepal_width":3.6,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.0,"sepal_length":6.5,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.3,"petal_width":1.9,"sepal_length":6.4,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":2.1,"sepal_length":6.8,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":2.0,"sepal_length":5.7,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.4,"sepal_length":5.8,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.3,"petal_width":2.3,"sepal_length":6.4,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":1.8,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.7,"petal_width":2.2,"sepal_length":7.7,"sepal_width":3.8,"species":"Iris-virginica"},{"petal_length":6.9,"petal_width":2.3,"sepal_length":7.7,"sepal_width":2.6,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":1.5,"sepal_length":6.0,"sepal_width":2.2,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.3,"sepal_length":6.9,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":2.0,"sepal_length":5.6,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":6.7,"petal_width":2.0,"sepal_length":7.7,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":1.8,"sepal_length":6.3,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.1,"sepal_length":6.7,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":6.0,"petal_width":1.8,"sepal_length":7.2,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":6.2,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":4.9,"petal_width":1.8,"sepal_length":6.1,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.1,"sepal_length":6.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.8,"petal_width":1.6,"sepal_length":7.2,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":1.9,"sepal_length":7.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":6.4,"petal_width":2.0,"sepal_length":7.9,"sepal_width":3.8,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.2,"sepal_length":6.4,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.5,"sepal_length":6.3,"sepal_width":2.8,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":1.4,"sepal_length":6.1,"sepal_width":2.6,"species":"Iris-virginica"},{"petal_length":6.1,"petal_width":2.3,"sepal_length":7.7,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.4,"sepal_length":6.3,"sepal_width":3.4,"species":"Iris-virginica"},{"petal_length":5.5,"petal_width":1.8,"sepal_length":6.4,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":4.8,"petal_width":1.8,"sepal_length":6.0,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.4,"petal_width":2.1,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.6,"petal_width":2.4,"sepal_length":6.7,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":2.3,"sepal_length":6.9,"sepal_width":3.1,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.9,"sepal_length":5.8,"sepal_width":2.7,"species":"Iris-virginica"},{"petal_length":5.9,"petal_width":2.3,"sepal_length":6.8,"sepal_width":3.2,"species":"Iris-virginica"},{"petal_length":5.7,"petal_width":2.5,"sepal_length":6.7,"sepal_width":3.3,"species":"Iris-virginica"},{"petal_length":5.2,"petal_width":2.3,"sepal_length":6.7,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.0,"petal_width":1.9,"sepal_length":6.3,"sepal_width":2.5,"species":"Iris-virginica"},{"petal_length":5.2,"petal_width":2.0,"sepal_length":6.5,"sepal_width":3.0,"species":"Iris-virginica"},{"petal_length":5.4,"petal_width":2.3,"sepal_length":6.2,"sepal_width":3.4,"species":"Iris-virginica"},{"petal_length":5.1,"petal_width":1.8,"sepal_length":5.9,"sepal_width":3.0,"species":"Iris-virginica"}]},"encoding":{"color":{"field":"species","type":"nominal"},"x":{"field":"petal_length","scale":{"zero":false},"type":"quantitative"},"y":{"field":"petal_width","scale":{"zero":false},"type":"quantitative"}},"mark":{"fillOpacity":1,"filled":true,"type":"point"}},{"data":{"values":[{"cluster_petal_length":1.464,"cluster_petal_width":0.24400000000000005},{"cluster_petal_length":4.393548387096775,"cluster_petal_width":1.4338709677419355},{"cluster_petal_length":5.742105263157896,"cluster_petal_width":2.0710526315789473}]},"encoding":{"x":{"field":"cluster_petal_length","scale":{"zero":false},"type":"quantitative"},"y":{"field":"cluster_petal_width","scale":{"zero":false},"type":"quantitative"}},"mark":{"color":"green","fillOpacity":1,"filled":true,"size":100,"type":"point"}}],"title":{"offset":25,"text":"Scatterplot of data samples projected on plane petal_width x petal_length with calculated centroids"},"width":300}

As we expect 😎

Clustering of pixel colors

The other interesting use case of KMeans clustering is pixel clustering. This technique replaces all pixels with similar colors (similar in terms of euclidean distance between RGB) with a centroid related to them.

Let us start with loading the referral image.

url =
  "https://pix4free.org/assets/library/2021-01-12/originals/san_francisco_california_golden_gate_bridge_water.jpg"

%{body: raw_image} = Req.get!(url)
image = StbImage.read_binary!(raw_image)

{height, width, _num_channels} = image.shape
image = StbImage.resize(image, div(height, 3), div(width, 3))
shape = image.shape

image_kino = image |> StbImage.to_binary(:jpg) |> Kino.Image.new(:jpeg)

Now we will try to use only ten colors to represent the same picture.

x = image |> StbImage.to_nx() |> Nx.reshape({:auto, 3})

model =
  KMeans.fit(x,
    num_clusters: 10,
    num_runs: 10,
    max_iterations: 200,
    key: key
  )

repainted_x = Nx.take(model.clusters, model.labels)

tensor_to_image = fn x ->
  x
  |> Nx.reshape(shape)
  |> Nx.round()
  |> Nx.as_type({:u, 8})
  |> StbImage.from_nx()
  |> StbImage.to_binary(:jpg)
  |> Kino.Image.new(:jpeg)
end

repainted_x = tensor_to_image.(repainted_x)

Look that even though we use only ten colors, we can say without any doubt that this is the same image. Let’s experiment more deeply. Now we will try 5, 10, 15, 20 and 40 colors and then compare the processed images with the original one.

clusterings = [5, 10, 15, 20, 40]

models =
  for num_clusters <- clusterings do
    KMeans.fit(x, num_clusters: num_clusters, key: key)
  end
[
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      9
    >,
    clusters: #Nx.Tensor<
      f32[5][3]
      EXLA.Backend
      [
        [61.85893249511719, 60.239295959472656, 58.86780548095703],
        [4.204394340515137, 87.57450866699219, 99.723876953125],
        [136.80120849609375, 136.31080627441406, 133.2758026123047],
        [8.59874153137207, 138.21804809570312, 150.6464385986328],
        [212.19105529785156, 194.9630126953125, 186.4161376953125]
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      609477056.0
    >,
    labels: #Nx.Tensor<
      s64[426400]
      EXLA.Backend
      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      17
    >,
    clusters: #Nx.Tensor<
      f32[10][3]
      EXLA.Backend
      [
        [43.8863410949707, 51.81157302856445, 52.23817825317383],
        [217.59837341308594, 208.80609130859375, 205.50840759277344],
        [151.6479949951172, 163.68496704101562, 166.06430053710938],
        [6.749063968658447, 148.70291137695312, 160.81790161132812],
        [87.08784484863281, 72.5816879272461, 68.8161849975586],
        [111.68388366699219, 126.14994049072266, 125.45569610595703],
        [1.9535311460494995, 76.27875518798828, 88.1100845336914],
        [193.17453002929688, 100.84982299804688, 74.338134765625],
        [5.854827404022217, 107.62191009521484, 120.50939178466797],
        [225.98037719726562, 179.45010375976562, 155.50418090820312]
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      271123392.0
    >,
    labels: #Nx.Tensor<
      s64[426400]
      EXLA.Backend
      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      25
    >,
    clusters: #Nx.Tensor<
      f32[15][3]
      EXLA.Backend
      [
        [37.49226379394531, 49.20851516723633, 49.95775604248047],
        [68.16342163085938, 63.4951057434082, 62.18951416015625],
        [222.685546875, 213.00254821777344, 209.28672790527344],
        [1.7072257995605469, 74.72100830078125, 86.64070892333984],
        [211.2851104736328, 104.6119613647461, 69.65288543701172],
        [8.460535049438477, 161.30209350585938, 172.6630859375],
        [168.0978240966797, 155.88272094726562, 149.5994873046875],
        [80.14752960205078, 115.78617095947266, 120.41868591308594],
        [107.40535736083984, 78.1260757446289, 71.10165405273438],
        [233.2224578857422, 185.8236541748047, 160.95843505859375],
        [1.716829776763916, 128.2496795654297, 141.5491180419922],
        [118.86845397949219, 156.38539123535156, 165.07196044921875],
        [175.82142639160156, 184.47825622558594, 189.05616760253906],
        [132.0281982421875, 125.65474700927734, 118.93122863769531],
        [4.0000224113464355, 101.1719741821289, 113.9442367553711]
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      177419888.0
    >,
    labels: #Nx.Tensor<
      s64[426400]
      EXLA.Backend
      [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      22
    >,
    clusters: #Nx.Tensor<
      f32[20][3]
      EXLA.Backend
      [
        [1.4399189949035645, 68.7367935180664, 80.45700073242188],
        [212.9929962158203, 166.43873596191406, 144.96112060546875],
        [7.347901821136475, 162.13064575195312, 173.42539978027344],
        [2.1622705459594727, 107.44270324707031, 120.4988784790039],
        [134.57351684570312, 77.13945770263672, 64.61695098876953],
        [220.81675720214844, 107.48382568359375, 68.0926742553711],
        [221.37522888183594, 214.39996337890625, 212.31607055664062],
        [180.4493408203125, 186.41665649414062, 190.1817169189453],
        [61.370914459228516, 112.97090148925781, 120.70513916015625],
        [2.7440736293792725, 87.0137710571289, 99.2638931274414],
        [108.83963012695312, 138.17112731933594, 144.1614227294922],
        [147.15538024902344, 131.68154907226562, 123.84530639648438],
        [108.10203552246094, 109.16707611083984, 103.55908203125],
        [155.11891174316406, 159.10829162597656, 159.69961547851562],
        [239.6630096435547, 196.75611877441406, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      131876120.0
    >,
    labels: #Nx.Tensor<
      s64[426400]
      EXLA.Backend
      [15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      26
    >,
    clusters: #Nx.Tensor<
      f32[40][3]
      EXLA.Backend
      [
        [2.26080584526062, 85.23534393310547, 97.10236358642578],
        [217.07537841796875, 176.8404083251953, 158.10992431640625],
        [72.31938934326172, 168.48968505859375, 179.66793823242188],
        [126.88947296142578, 86.30879211425781, 77.26168823242188],
        [60.02167892456055, 54.381187438964844, 52.95188903808594],
        [181.08859252929688, 173.72731018066406, 169.16676330566406],
        [246.27549743652344, 195.74496459960938, 167.2862548828125],
        [197.31752014160156, 200.42007446289062, 203.8845672607422],
        [167.11375427246094, 153.47061157226562, 146.3658905029297],
        [1.486596703529358, 126.65245056152344, 140.52223205566406],
        [97.79022979736328, 103.88321685791016, 99.8821029663086],
        [226.1820831298828, 220.26646423339844, 219.80264282226562],
        [168.97100830078125, 110.16779327392578, 101.84797668457031],
        [230.60598754882812, 206.01318359375, 192.64845275878906],
        [5.403233528137207, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      64511020.0
    >,
    labels: #Nx.Tensor<
      s64[426400]
      EXLA.Backend
      [23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, 19, ...]
    >
  }
]
image_boxes =
  for {model, num_clusters} <- Enum.zip(models, clusterings) do
    repainted_x = Nx.take(model.clusters, model.labels)

    image_kino = tensor_to_image.(repainted_x)

    Kino.Layout.grid(
      [Kino.Markdown.new("### Number of colors: #{num_clusters}"), image_kino],
      boxed: true
    )
  end

image_box =
  Kino.Layout.grid(
    [Kino.Markdown.new("### Original image"), image_kino],
    boxed: true
  )

Kino.Layout.grid(image_boxes ++ [image_box], columns: 2)

Look that even with only five colors can recognize the Golden Gate Bridge in the image. On the other hand, with only 40 colors we keep almost all details except the sky and water surface. Sky and water do not map well because there is a small gradient in changing colors. Pixel clustering is a great way to compress images drastically with small integration in their appearance.

Clustering images from Fashion-MNIST

The last example is the clustering problem on the Fashion-MNIST Dataset. The dataset consists of 60000 images 28 by 28 pixels of ten different parts of clothing. Let’s dive into this clustering problem.

Before we start, we define the StratifiedSplit module. The module trims input data and splits it, so the number of samples per class is the same for each.

defmodule StratifiedSplit do
  import Nx.Defn

  defn trim_samples(x, labels, opts \\ []) do
    opts = keyword!(opts, [:num_classes, :samples_per_class])

    num_classes = opts[:num_classes]
    samples_per_class = opts[:samples_per_class]

    membership_mask = Nx.iota({1, num_classes}) == Nx.reshape(labels, {:auto, 1})

    indices =
      membership_mask
      |> Nx.argsort(axis: 0, direction: :desc)
      |> Nx.slice_along_axis(0, samples_per_class, axis: 0)
      |> Nx.flatten()

    {Nx.take(x, indices), Nx.take(labels, indices)}
  end
end
{:module, StratifiedSplit, <<70, 79, 82, 49, 0, 0, 13, ...>>, true}

Firstly, load the data and cast it into Nx tensors.

{image_data, labels_data} = Scidata.FashionMNIST.download()

{images_binary, images_type, images_shape} = image_data
{num_samples, _num_channels = 1, image_height, image_width} = images_shape

images =
  images_binary
  |> Nx.from_binary(images_type)
  |> Nx.reshape({num_samples, :auto})
  |> Nx.divide(255)

{labels_binary, labels_type, _shape} = labels_data
target = Nx.from_binary(labels_binary, labels_type)

num_classes = 10
samples_per_class = 20

{images, target} =
  StratifiedSplit.trim_samples(images, target,
    num_classes: num_classes,
    samples_per_class: samples_per_class
  )

num_images = num_classes * samples_per_class
200

Let’s also define a function that will visualize an image in the tensor format for us.

tensor_to_kino = fn x ->
  x
  |> Nx.reshape({image_height, image_width, 1})
  # Replicate the value into 3 channels for PNG
  |> Nx.broadcast({image_height, image_width, 3})
  |> Nx.multiply(255)
  |> Nx.as_type({:u, 8})
  |> StbImage.from_nx()
  |> StbImage.resize(112, 112)
  |> StbImage.to_binary(:png)
  |> Kino.Image.new(:png)
end
#Function<42.3316493/1 in :erl_eval.expr/6>

Here is one of the images.

tensor_to_kino.(images[0])

We will try some different numbers of clusters and then measure the quality of clustering.

nums_clusters = 2..20

models =
  for num_clusters <- 2..20 do
    KMeans.fit(images, num_clusters: num_clusters, key: key)
  end
[
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      4
    >,
    clusters: #Nx.Tensor<
      f32[2][784]
      EXLA.Backend
      [
        [0.0, 3.501400715322234e-5, 1.4005602861288935e-4, 3.851540677715093e-4, 3.501400933600962e-4, 7.002801285125315e-4, 4.901961074210703e-4, 0.009278712794184685, 0.04635854437947273, 0.09737396240234375, 0.22310924530029297, 0.29975488781929016, 0.32121849060058594, 0.3055672347545624, 0.3146008551120758, 0.36162465810775757, 0.32324934005737305, 0.3010154068470001, 0.20304621756076813, 0.06995797902345657, 0.019502801820635796, 0.003641456598415971, 0.003641456598415971, 0.0034313725773245096, 0.0025560224894434214, 0.0010504202218726277, 3.501400715322234e-5, 0.0, 0.0, 0.0, 3.501400715322234e-5, 4.901961074210703e-4, 5.252101109363139e-4, 0.006232493091374636, 0.05105042830109596, 0.13872550427913666, 0.2501050531864166, 0.37622547149658203, 0.533753514289856, 0.6370097994804382, 0.7304272055625916, 0.7347339391708374, 0.7232843637466431, 0.7482843399047852, 0.7121148109436035, 0.6053571105003357, 0.5261555314064026, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      10950.6201171875
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      3
    >,
    clusters: #Nx.Tensor<
      f32[3][784]
      EXLA.Backend
      [
        [0.0, 5.9417710872367024e-5, 1.1883542174473405e-4, 2.376708434894681e-4, 2.376708434894681e-4, 4.753416869789362e-4, 2.376708434894681e-4, 0.013071895577013493, 0.05971479415893555, 0.12269756942987442, 0.27730244398117065, 0.3171122968196869, 0.2941770851612091, 0.2795603275299072, 0.28009507060050964, 0.31200236082077026, 0.2995246648788452, 0.3170528709888458, 0.25864526629447937, 0.07664884626865387, 0.02192513458430767, 1.7825313261710107e-4, 4.1592397610656917e-4, 3.565062361303717e-4, 1.1883542174473405e-4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.376708434894681e-4, 5.9417710872367024e-5, 0.009566251188516617, 0.05246583744883537, 0.11509210616350174, 0.23559121787548065, 0.3770647943019867, 0.5828877091407776, 0.6433154940605164, 0.7103387117385864, 0.7102198004722595, 0.6955437064170837, 0.7320261001586914, 0.675638735294342, 0.6002377271652222, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      9246.3125
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [0, 0, 2, 0, 1, 1, 2, 1, 1, 1, 0, 0, 2, 0, 0, 1, 2, 1, 1, 2, 0, 0, 2, 0, 2, 1, 1, 1, 0, 1, 0, 0, 1, 0, 2, 1, 2, 1, 1, 1, 0, 0, 2, 0, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[4][784]
      EXLA.Backend
      [
        [0.0, 0.0, 1.8239855126012117e-4, 6.383949075825512e-4, 5.471956683322787e-4, 0.0010943913366645575, 7.295942050404847e-4, 0.004012768156826496, 0.02891017124056816, 0.06675787270069122, 0.19589604437351227, 0.31500229239463806, 0.3976288139820099, 0.38148659467697144, 0.3805745542049408, 0.4316461682319641, 0.3498404026031494, 0.2810761630535126, 0.13807569444179535, 0.05654355138540268, 0.008755129761993885, 5.471956101246178e-4, 8.20793560706079e-4, 0.0014591884100809693, 6.383949657902122e-4, 0.0, 9.119927563006058e-5, 0.0, 0.0, 0.0, 9.119927563006058e-5, 9.119927417486906e-4, 0.0012767899315804243, 0.0015503877075389028, 0.05243958532810211, 0.18467853963375092, 0.29092568159103394, 0.4202462434768677, 0.5493844747543335, 0.6998631954193115, 0.8229821920394897, 0.810123085975647, 0.7939808964729309, 0.8111263513565063, 0.7670770883560181, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      8447.5419921875
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [3, 3, 0, 3, 1, 1, 0, 1, 2, 2, 3, 3, 0, 3, 3, 1, 0, 1, 1, 2, 3, 3, 0, 3, 0, 1, 1, 1, 3, 2, 3, 3, 1, 3, 0, 1, 0, 1, 2, 2, 3, 3, 0, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      6
    >,
    clusters: #Nx.Tensor<
      f32[5][784]
      EXLA.Backend
      [
        [0.0, 1.9607844296842813e-4, 3.9215688593685627e-4, 7.843137718737125e-4, 7.843137718737125e-4, 3.9215688593685627e-4, 3.9215688593685627e-4, 0.036274511367082596, 0.14215686917304993, 0.21843135356903076, 0.31882351636886597, 0.3727450966835022, 0.29549020528793335, 0.2123529464006424, 0.22921571135520935, 0.29098039865493774, 0.3682352900505066, 0.40980392694473267, 0.2998039126396179, 0.1456862837076187, 0.05725490301847458, 1.9607844296842813e-4, 0.001372549100778997, 5.882353289052844e-4, 5.882353289052844e-4, 0.0, 1.9607844296842813e-4, 0.0, 0.0, 0.0, 0.0, 7.843137718737125e-4, 1.9607844296842813e-4, 0.0313725508749485, 0.16862747073173523, 0.32960787415504456, 0.522549033164978, 0.6184313893318176, 0.6737255454063416, 0.7147058844566345, 0.7619606852531433, 0.7511764764785767, 0.6862744688987732, 0.7696077823638916, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      7935.5498046875
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [0, 3, 2, 3, 4, 4, 2, 4, 1, 1, 3, 3, 2, 3, 3, 4, 2, 4, 4, 2, 3, 3, 2, 3, 2, 4, 4, 4, 3, 1, 0, 3, 4, 3, 2, 4, 2, 4, 1, 1, 0, 3, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[6][784]
      EXLA.Backend
      [
        [0.0, 1.9607844296842813e-4, 3.9215688593685627e-4, 7.843137718737125e-4, 7.843137718737125e-4, 3.9215688593685627e-4, 3.9215688593685627e-4, 0.036274511367082596, 0.14215686917304993, 0.21843135356903076, 0.31882351636886597, 0.3727450966835022, 0.29549020528793335, 0.2123529464006424, 0.22921571135520935, 0.29098039865493774, 0.3682352900505066, 0.40980392694473267, 0.2998039126396179, 0.1456862837076187, 0.05725490301847458, 1.9607844296842813e-4, 0.001372549100778997, 5.882353289052844e-4, 5.882353289052844e-4, 0.0, 1.9607844296842813e-4, 0.0, 0.0, 0.0, 0.0, 7.843137718737125e-4, 1.9607844296842813e-4, 0.0313725508749485, 0.16862747073173523, 0.32960787415504456, 0.522549033164978, 0.6184313893318176, 0.6737255454063416, 0.7147058844566345, 0.7619606852531433, 0.7511764764785767, 0.6862744688987732, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      7484.12109375
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [0, 3, 2, 3, 4, 4, 2, 5, 1, 1, 3, 3, 2, 3, 3, 4, 2, 4, 4, 1, 3, 3, 2, 3, 2, 5, 4, 5, 3, 1, 0, 3, 4, 3, 2, 4, 2, 5, 1, 1, 0, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[7][784]
      EXLA.Backend
      [
        [0.0, 0.0, 1.9129604334011674e-4, 6.695361225865781e-4, 5.738881300203502e-4, 0.0011477762600407004, 5.738881300203502e-4, 0.004208512604236603, 0.03032042272388935, 0.0700143575668335, 0.20545193552970886, 0.3070301413536072, 0.3960784375667572, 0.3846963346004486, 0.38326162099838257, 0.43538981676101685, 0.3438546359539032, 0.2696317434310913, 0.12692491710186005, 0.043424203991889954, 0.009182210080325603, 5.738881300203502e-4, 7.65184173360467e-4, 0.001434720354154706, 6.69536180794239e-4, 0.0, 9.564802167005837e-5, 0.0, 0.0, 0.0, 9.564802167005837e-5, 9.564801584929228e-4, 0.001339072361588478, 0.0016260163392871618, 0.05499761179089546, 0.19368726015090942, 0.298995703458786, 0.41300809383392334, 0.5291248559951782, 0.6889526844024658, 0.8161645531654358, 0.8013391494750977, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      7091.55810546875
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [2, 1, 0, 1, 6, 6, 0, 3, 4, 5, 1, 1, 0, 2, 1, 6, 0, 3, 6, 5, 2, 1, 0, 2, 0, 3, 6, 3, 1, 4, 2, 1, 6, 1, 0, 3, 0, 3, 4, 4, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[8][784]
      EXLA.Backend
      [
        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      6868.54296875
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [3, 4, 1, 6, 6, 6, 1, 7, 0, 5, 6, 4, 1, 4, 2, 6, 2, 7, 6, 5, 4, 4, 1, 4, 2, 7, 6, 7, 6, 0, 3, 4, 6, 6, 2, 7, 1, 7, 0, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      6
    >,
    clusters: #Nx.Tensor<
      f32[9][784]
      EXLA.Backend
      [
        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      6582.2734375
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [3, 4, 1, 6, 6, 8, 1, 7, 0, 5, 6, 4, 1, 4, 2, 8, 2, 7, 6, 5, 4, 4, 1, 4, 2, 8, 8, 7, 6, 0, 3, 4, 6, 6, 2, 8, 1, 7, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      6
    >,
    clusters: #Nx.Tensor<
      f32[10][784]
      EXLA.Backend
      [
        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      6426.0517578125
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [3, 4, 9, 6, 6, 8, 9, 7, 0, 5, 6, 4, 1, 4, 2, 8, 9, 7, 2, 5, 4, 4, 9, 4, 9, 8, 6, 7, 2, 0, 3, 4, 6, 6, 2, 8, 9, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      7
    >,
    clusters: #Nx.Tensor<
      f32[11][784]
      EXLA.Backend
      [
        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      6238.46923828125
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [3, 4, 9, 6, 6, 8, 9, 7, 0, 5, 6, 4, 1, 4, 2, 8, 9, 7, 2, 5, 4, 10, 9, 4, 9, 8, 6, 7, 2, 0, 3, 10, 6, 6, 2, 8, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[12][784]
      EXLA.Backend
      [
        [0.0, 0.0, 0.0, 0.0, 0.0, 2.3068052541930228e-4, 0.0, 2.3068052541930228e-4, 0.06805074959993362, 0.1568627506494522, 0.17716263234615326, 0.17923875153064728, 0.18846596777439117, 0.18685120344161987, 0.18362168967723846, 0.22260668873786926, 0.21245676279067993, 0.20484431087970734, 0.21453288197517395, 0.07381777465343475, 0.017070358619093895, 2.3068052541930228e-4, 6.920415908098221e-4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.3068052541930228e-4, 6.920415326021612e-4, 0.04013840854167938, 0.30542102456092834, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      6081.9736328125
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [7, 3, 2, 3, 5, 5, 6, 1, 8, 4, 5, 0, 2, 0, 3, 10, 6, 1, 5, 4, 0, 11, 2, 0, 2, 1, 5, 1, 9, 10, 7, 11, 3, 9, 3, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[13][784]
      EXLA.Backend
      [
        [0.0, 4.6136105083860457e-4, 0.011995386332273483, 0.012687427923083305, 0.012456747703254223, 0.012456747703254223, 0.012456747703254223, 0.011072664521634579, 0.013148789294064045, 0.01361014973372221, 0.012687427923083305, 0.014994233846664429, 0.06528258323669434, 0.058362167328596115, 0.059284891933202744, 0.06920415163040161, 0.044290658086538315, 0.013840830884873867, 0.01407151110470295, 0.015224914066493511, 0.0117647061124444, 0.013840830884873867, 0.01568627543747425, 0.014532871544361115, 0.014763553626835346, 0.003921568859368563, 0.0, 0.0, 0.0, 0.013148789294064045, 0.024221453815698624, 0.02283737063407898, 0.025836216285824776, 0.022606689482927322, 0.021453287452459335, 0.018223760649561882, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      5876.912109375
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [8, 10, 2, 1, 1, 5, 2, 3, 4, 12, 1, 10, 2, 10, 11, 5, 11, 0, 6, 7, 10, 10, 2, 10, 2, 0, 0, 3, 1, 12, 8, 10, 1, 1, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[14][784]
      EXLA.Backend
      [
        [0.0, 4.6136105083860457e-4, 0.011995386332273483, 0.012687427923083305, 0.012456747703254223, 0.012456747703254223, 0.012456747703254223, 0.011072664521634579, 0.013148789294064045, 0.01361014973372221, 0.012687427923083305, 0.014994233846664429, 0.06528258323669434, 0.058362167328596115, 0.059284891933202744, 0.06920415163040161, 0.044290658086538315, 0.013840830884873867, 0.01407151110470295, 0.015224914066493511, 0.0117647061124444, 0.013840830884873867, 0.01568627543747425, 0.014532871544361115, 0.014763553626835346, 0.003921568859368563, 0.0, 0.0, 0.0, 0.013148789294064045, 0.024221453815698624, 0.02283737063407898, 0.025836216285824776, 0.022606689482927322, 0.021453287452459335, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      5797.251953125
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [8, 10, 13, 1, 1, 5, 2, 3, 4, 12, 1, 10, 13, 10, 11, 5, 11, 0, 6, 7, 10, 10, 13, 10, 13, 0, 0, 3, 1, 12, 8, 10, 1, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      4
    >,
    clusters: #Nx.Tensor<
      f32[15][784]
      EXLA.Backend
      [
        [0.0, 0.0, 2.4509805371053517e-4, 2.4509805371053517e-4, 0.0, 2.4509805371053517e-4, 4.901961074210703e-4, 0.001470588380470872, 4.901961074210703e-4, 0.02401961013674736, 0.05441176891326904, 0.1200980469584465, 0.23112745583057404, 0.2237745076417923, 0.26225489377975464, 0.2404411882162094, 0.20392157137393951, 0.15514707565307617, 0.037254903465509415, 0.030637256801128387, 0.02549019642174244, 0.035784315317869186, 0.0416666679084301, 0.0365196093916893, 0.008333333767950535, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.4509805371053517e-4, 2.4509805371053517e-4, 2.4509805371053517e-4, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      5662.34521484375
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [8, 11, 2, 5, 0, 0, 2, 10, 13, 6, 5, 11, 14, 12, 5, 9, 2, 10, 0, 4, 12, 11, 14, 12, 1, 7, 0, 10, 5, 6, 8, 11, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[16][784]
      EXLA.Backend
      [
        [0.0, 0.0, 2.4509805371053517e-4, 2.4509805371053517e-4, 0.0, 2.4509805371053517e-4, 4.901961074210703e-4, 0.001470588380470872, 4.901961074210703e-4, 0.02401961013674736, 0.05441176891326904, 0.1200980469584465, 0.23112745583057404, 0.2237745076417923, 0.26225489377975464, 0.2404411882162094, 0.20392157137393951, 0.15514707565307617, 0.037254903465509415, 0.030637256801128387, 0.02549019642174244, 0.035784315317869186, 0.0416666679084301, 0.0365196093916893, 0.008333333767950535, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.4509805371053517e-4, 2.4509805371053517e-4, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      5568.4345703125
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [8, 11, 2, 5, 0, 0, 2, 10, 13, 6, 5, 11, 14, 12, 5, 9, 2, 10, 0, 4, 12, 11, 14, 15, 1, 7, 0, 10, 5, 6, 8, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[17][784]
      EXLA.Backend
      [
        [0.0, 0.0, 2.801120572257787e-4, 2.801120572257787e-4, 0.0, 2.801120572257787e-4, 5.602241144515574e-4, 0.001680672401562333, 5.602241144515574e-4, 0.027450982481241226, 0.06218487769365311, 0.13613446056842804, 0.2641456723213196, 0.24537815153598785, 0.24425771832466125, 0.2731092572212219, 0.23305322229862213, 0.17731094360351562, 0.04257703199982643, 0.020168067887425423, 5.602241144515574e-4, 5.602241144515574e-4, 2.801120572257787e-4, 0.0, 2.801120572257787e-4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.801120572257787e-4, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      5398.72412109375
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [8, 11, 2, 5, 0, 0, 2, 10, 13, 6, 5, 11, 14, 12, 5, 9, 2, 10, 16, 4, 12, 11, 14, 15, 1, 7, 0, 10, 5, 6, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[18][784]
      EXLA.Backend
      [
        [0.0, 0.0, 2.801120572257787e-4, 2.801120572257787e-4, 0.0, 2.801120572257787e-4, 5.602241144515574e-4, 0.001680672401562333, 5.602241144515574e-4, 0.027450982481241226, 0.06218487769365311, 0.13613446056842804, 0.2641456723213196, 0.24537815153598785, 0.24425771832466125, 0.2731092572212219, 0.23305322229862213, 0.17731094360351562, 0.04257703199982643, 0.020168067887425423, 5.602241144515574e-4, 5.602241144515574e-4, 2.801120572257787e-4, 0.0, 2.801120572257787e-4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      5336.95263671875
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [8, 11, 2, 5, 0, 0, 2, 10, 17, 6, 5, 11, 14, 12, 5, 9, 2, 10, 16, 4, 12, 11, 14, 15, 1, 7, 0, 10, 5, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[19][784]
      EXLA.Backend
      [
        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.22875867318362e-4, 0.001568627660162747, 5.22875867318362e-4, 0.0313725508749485, 0.08000000566244125, 0.10849674046039581, 0.24653595685958862, 0.22901961207389832, 0.22797387838363647, 0.2549019753932953, 0.20392157137393951, 0.15973857045173645, 0.062745101749897, 0.033986929804086685, 0.02718954347074032, 0.0376470610499382, 0.04418300837278366, 0.038954250514507294, 0.00862745102494955, 0.0, 0.0, 0.0, 0.0, 0.0, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      5233.1396484375
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [8, 11, 2, 5, 0, 0, 2, 10, 17, 6, 0, 11, 14, 12, 18, 9, 2, 10, 18, 4, 12, 11, 14, 15, 1, 7, 0, 10, ...]
    >
  },
  %Scholar.Cluster.KMeans{
    num_iterations: #Nx.Tensor<
      s64
      EXLA.Backend
      5
    >,
    clusters: #Nx.Tensor<
      f32[20][784]
      EXLA.Backend
      [
        [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 5.22875867318362e-4, 0.001568627660162747, 5.22875867318362e-4, 0.0313725508749485, 0.08000000566244125, 0.10849674046039581, 0.24653595685958862, 0.22901961207389832, 0.22797387838363647, 0.2549019753932953, 0.20392157137393951, 0.15973857045173645, 0.062745101749897, 0.020130719989538193, 5.22875867318362e-4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...],
        ...
      ]
    >,
    inertia: #Nx.Tensor<
      f32
      EXLA.Backend
      5173.603515625
    >,
    labels: #Nx.Tensor<
      s64[200]
      EXLA.Backend
      [8, 19, 2, 5, 0, 0, 2, 10, 17, 6, 0, 11, 14, 12, 18, 9, 2, 10, 18, 4, 12, 11, 14, 15, 1, 7, 0, ...]
    >
  }
]
data = [
  num_clusters: nums_clusters,
  inertia: for(model <- models, do: Nx.to_number(model.inertia))
]

Tucan.lineplot(data, "num_clusters", "inertia",
  x: [type: :ordinal, axis: [label_angle: 0]],
  width: 600,
  height: 300
)
|> Tucan.Axes.set_xy_titles("Number of Clusters", "Inertia")
|> Tucan.Scale.set_y_domain(4800, 11500)
|> Tucan.set_title("Elbow Plot")
{"$schema":"https://vega.github.io/schema/vega-lite/v5.json","data":{"values":[{"inertia":10950.62109375,"num_clusters":2},{"inertia":9246.3134765625,"num_clusters":3},{"inertia":8447.5419921875,"num_clusters":4},{"inertia":7935.54931640625,"num_clusters":5},{"inertia":7484.1201171875,"num_clusters":6},{"inertia":7091.55810546875,"num_clusters":7},{"inertia":6868.54248046875,"num_clusters":8},{"inertia":6582.2734375,"num_clusters":9},{"inertia":6426.0517578125,"num_clusters":10},{"inertia":6238.46923828125,"num_clusters":11},{"inertia":6081.97265625,"num_clusters":12},{"inertia":5876.912109375,"num_clusters":13},{"inertia":5797.251953125,"num_clusters":14},{"inertia":5662.3447265625,"num_clusters":15},{"inertia":5568.43408203125,"num_clusters":16},{"inertia":5398.724609375,"num_clusters":17},{"inertia":5336.9521484375,"num_clusters":18},{"inertia":5233.13916015625,"num_clusters":19},{"inertia":5173.60400390625,"num_clusters":20}]},"encoding":{"x":{"axis":{"labelAngle":0,"title":"Number of Clusters"},"field":"num_clusters","type":"ordinal"},"y":{"axis":{"title":"Inertia"},"field":"inertia","scale":{"domain":[4800,11500]},"type":"quantitative"}},"height":300,"mark":{"fillOpacity":1,"type":"line"},"title":{"text":"Elbow Plot"},"width":600}

Look that this time there is no elbow on a plot. We need to use a different method to predict the number of classes. Now we will use Silhouette Score. It is a metric that indicates the quality of clustering. The higher score we achieve, the better clustering we get. However, we should be aware that Silhouette Score is just a heuristic and not always works.

silhouette_scores =
  for {model, num_clusters} <- Enum.zip(models, nums_clusters) do
    Scholar.Metrics.Clustering.silhouette_score(images, model.labels, num_clusters: num_clusters)
    |> Nx.to_number()
  end
[0.1867797076702118, 0.19426067173480988, 0.18798942863941193, 0.16196762025356293,
 0.14662104845046997, 0.15014168620109558, 0.1334874927997589, 0.12096332758665085,
 0.12907366454601288, 0.12029680609703064, 0.12559780478477478, 0.12784500420093536,
 0.12478214502334595, 0.1155780702829361, 0.11121585220098495, 0.11069852113723755,
 0.10738851875066757, 0.10977344214916229, 0.10331624001264572]
data = [num_clusters: nums_clusters, silhouette_scores: silhouette_scores]

Tucan.lineplot(data, "num_clusters", "silhouette_scores",
  points: true,
  point_color: "darkBlue",
  x: [type: :ordinal, axis: [label_angle: 0]]
)
|> Tucan.Axes.set_xy_titles("Number of Clusters", "Silhouette score")
|> Tucan.Scale.set_y_domain(0.088, 0.205)
|> Tucan.set_size(600, 300)
|> Tucan.set_title("Silhouette score vs Number of Clusters")
{"$schema":"https://vega.github.io/schema/vega-lite/v5.json","data":{"values":[{"num_clusters":2,"silhouette_scores":0.18677975237369537},{"num_clusters":3,"silhouette_scores":0.1942606419324875},{"num_clusters":4,"silhouette_scores":0.18798941373825073},{"num_clusters":5,"silhouette_scores":0.16196760535240173},{"num_clusters":6,"silhouette_scores":0.14662104845046997},{"num_clusters":7,"silhouette_scores":0.15014170110225677},{"num_clusters":8,"silhouette_scores":0.13348747789859772},{"num_clusters":9,"silhouette_scores":0.12096334248781204},{"num_clusters":10,"silhouette_scores":0.12907366454601288},{"num_clusters":11,"silhouette_scores":0.12029680609703064},{"num_clusters":12,"silhouette_scores":0.1255977749824524},{"num_clusters":13,"silhouette_scores":0.12784500420093536},{"num_clusters":14,"silhouette_scores":0.12478211522102356},{"num_clusters":15,"silhouette_scores":0.1155780628323555},{"num_clusters":16,"silhouette_scores":0.11121582984924316},{"num_clusters":17,"silhouette_scores":0.11069852113723755},{"num_clusters":18,"silhouette_scores":0.10738851875066757},{"num_clusters":19,"silhouette_scores":0.10977346450090408},{"num_clusters":20,"silhouette_scores":0.10331626981496811}]},"encoding":{"x":{"axis":{"labelAngle":0,"title":"Number of Clusters"},"field":"num_clusters","type":"ordinal"},"y":{"axis":{"title":"Silhouette score"},"field":"silhouette_scores","scale":{"domain":[0.088,0.205]},"type":"quantitative"}},"height":300,"mark":{"fillOpacity":1,"point":{"color":"darkBlue"},"type":"line"},"title":{"text":"Silhouette score vs Number of Clusters"},"width":600}

As we can see, the model with num_clusters equal to 3 has the highest Silhouette Score. Now we will visualize this clusterization.

best_num_clusters = 3
best_model = Enum.at(models, 1)
%Scholar.Cluster.KMeans{
  num_iterations: #Nx.Tensor<
    s64
    EXLA.Backend
    3
  >,
  clusters: #Nx.Tensor<
    f32[3][784]
    EXLA.Backend
    [
      [0.0, 5.9417710872367024e-5, 1.1883542174473405e-4, 2.376708434894681e-4, 2.376708434894681e-4, 4.753416869789362e-4, 2.376708434894681e-4, 0.013071895577013493, 0.05971479415893555, 0.12269756942987442, 0.27730244398117065, 0.3171122968196869, 0.2941770851612091, 0.2795603275299072, 0.28009507060050964, 0.31200236082077026, 0.2995246648788452, 0.3170528709888458, 0.25864526629447937, 0.07664884626865387, 0.02192513458430767, 1.7825313261710107e-4, 4.1592397610656917e-4, 3.565062361303717e-4, 1.1883542174473405e-4, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.376708434894681e-4, 5.9417710872367024e-5, 0.009566251188516617, 0.05246583744883537, 0.11509210616350174, 0.23559121787548065, 0.3770647943019867, 0.5828877091407776, 0.6433154940605164, 0.7103387117385864, 0.7102198004722595, 0.6955437064170837, 0.7320261001586914, 0.675638735294342, 0.6002377271652222, 0.5515151619911194, 0.35151517391204834, ...],
      ...
    ]
  >,
  inertia: #Nx.Tensor<
    f32
    EXLA.Backend
    9246.3125
  >,
  labels: #Nx.Tensor<
    s64[200]
    EXLA.Backend
    [0, 0, 2, 0, 1, 1, 2, 1, 1, 1, 0, 0, 2, 0, 0, 1, 2, 1, 1, 2, 0, 0, 2, 0, 2, 1, 1, 1, 0, 1, 0, 0, 1, 0, 2, 1, 2, 1, 1, 1, 0, 0, 2, 0, 2, 1, ...]
  >
}
predicted_cluster_with_indices =
  best_model.labels
  |> Nx.to_flat_list()
  |> Enum.with_index()
  |> Enum.group_by(&amp;elem(&amp;1, 0), &amp;elem(&amp;1, 1))

for cluster <- 0..(best_num_clusters - 1) do
  indices = predicted_cluster_with_indices[cluster]

  boxes =
    for index <- indices do
      original_cluster = Nx.to_number(target[index])

      Kino.Layout.grid([
        Kino.Markdown.new("Original cluster: #{original_cluster}"),
        tensor_to_kino.(images[index])
      ])
    end

  Kino.Layout.grid(
    [
      Kino.Markdown.new("## Cluster #{cluster}"),
      Kino.Layout.grid(boxes, columns: 5)
    ],
    boxed: true
  )
end
|> Kino.Layout.grid()

Oops, it doesn’t look right! That’s because our algorithm for three clusters gathers images by colors rather than shapes. To spot this, let’s plot the average image of each cluster.

for cluster <- 0..(best_num_clusters - 1) do
  indices = predicted_cluster_with_indices[cluster]

  mean_image =
    indices
    |> Enum.map(&amp;images[&amp;1])
    |> Nx.stack()
    |> Nx.mean(axes: [0])

  tensor_to_kino.(mean_image)
end
|> Kino.Layout.grid(columns: 3)

One of the images has a vertical line (something like trousers), the next image is almost all white (similar to a jumper), and the last one is mostly black. This time Silhouette Score turns out to be not the best indicator. To get better clustering, try to rerun the code with a higher number of clusters.