Iris Classification with Gradient Boosting

notebooks/iris_classification.livemd

Andrés Alejos

@acalejos

exgboost

Share to X

Share to Bluesky

More notebooks

Iris Classification with Gradient Boosting

Mix.install([
  {:exgboost, "~> 0.5"},
  {:nx, "~> 0.5"},
  {:scidata, "~> 0.1"},
  {:scholar, "~> 0.1"}
])

Data

We’ll be working with the Iris flower dataset. The Iris dataset consists of features corresponding to measurements of 3 different species of the Iris flower. Overall we have 150 examples, each with 4 featurse and a numeric label mapping to 1 of the 3 species. We can download this dataset using Scidata:

{x, y} = Scidata.Iris.download()
:ok

Scidata doesn’t provide train-test splits for Iris. Instead, we’ll need to shuffle the original dataset and split manually. We’ll save 20% of the dataset for testing:

data = Enum.zip(x, y) |> Enum.shuffle()
{train, test} = Enum.split(data, ceil(length(data) * 0.8))
:ok

EXGBoost requires inputs to be Nx tensors. The conversion for this example is rather easy as we can just wrap both features and labels in a call to Nx.tensor/1:

{x_train, y_train} = Enum.unzip(train)
{x_test, y_test} = Enum.unzip(test)

x_train = Nx.tensor(x_train)
y_train = Nx.tensor(y_train)

x_test = Nx.tensor(x_test)
y_test = Nx.tensor(y_test)

x_train

y_train

We now have both train and test sets consisting of features and labels. Time to train a booster!

Training

The simplest way to train a booster is using the top-level EXGBoost.train/2 function. This function expects input features and labels, as well as some optional training configuration parameters.

This example is a multi-class classification problem with 3 output classes. We need to configure EXGBoost to train this booster as a multi-class classifier by specifying a different training objective. We also need to specify the number of output classes:

booster =
  EXGBoost.train(x_train, y_train,
    num_class: 3,
    objective: :multi_softprob,
    num_boost_rounds: 10000,
    evals: [{x_train, y_train, "training"}]
  )

And that’s it! Now we can test our booster.

Testing

To get predictions from a trained booster, we can just call EXGBoost.predict/2. You’ll notice for this problem that the booster outputs a tensor of shape {30, 3} where the 2nd dimension represents output probabilities for each class. We can obtain a discrete prediction for use in our accuracy measurement by computing the argmax along the last dimension:

preds = EXGBoost.predict(booster, x_test) |> Nx.argmax(axis: -1)
Scholar.Metrics.Classification.accuracy(y_test, preds)

And that’s it! We’ve successfully trained a booster on the Iris dataset with EXGBoost.

Other notebooks:

Michal Slaski
@michalslaski

livebook_examples

Salary predictions

salary_prediction.livemd

exla axon nx

2022-8-18
Dr. Christian Geuer-Pollmann
@chgeuer

livebook_on_azure

Christian's first LiveBook test

notebook1.livemd

axon exla nx

2022-8-18
@andyl

elix_util

MNIST

mnist.livemd

req axon exla nx

2022-8-18
@TomBers

livebookNotes

Trying Nx

NX.livemd

exla axon nx

2022-8-18
a/vivekbala
@ks0m1c

OrbisTertius

User Interaction

usr_intraction.livemd

phoenix_gen_socket_client websocket_client jason slipstream

2024-7-7
Ryan Young
@ryoung786

AdventOfCode

Day 2

2.livemd

req vega_lite kino_vega_lite

2022-12-5
Ryo Wakabayashi
@RyoWakabayashi

elixir-learning

国土交通省 GIS 行政区域データ

gis_polygon.livemd

nx evision exla geo jason kino kino_maplibre

2023-1-5

Back