Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Machine Learning in Elixir

machine_learning_in_elixir.livemd

Machine Learning in Elixir

Mix.install([
  # create/train models/neural networks
  {:axon, "~> 0.5"},
  # elixir ML foundation library
  {:nx, "~> 0.5"},
  # work with DataFrames (tables)
  {:explorer, "~> 0.5"},
  # Livebook visualizations
  {:kino, "~> 0.11.0"}
])

Make Machines That Learn

Learning With Elixir

Explorer DataFrame queries API includes a number of convenience macros, it’s encouraged to use them

Terms
  • model: a function.
  • tensor: a multi-dimensional array
  • DataFrame: tabular (i.e. 2D) data structures
  • one-hot encoding: A binary and lossy way of encoding a column that includes all the other columns.
  • x: commonly indicates model features, e.g. x_train (🙄, why not just prefix with model_?)
require Explorer.DataFrame, as: DF

Working with Data

We’re using sample data included in the Explorer library, let’s retrieve it.

iris = Explorer.Datasets.iris()
On Polars

🤔 what is a Polar? The book says:

> the DataFrame has 150 rows (examples) by 5 (features).

Polars are data structures backed by a Rust library that is a port of the Pything ML library that is named Polars (a replacement for the slower “Pandas” library… 🤦 developer’s naming things, yeesh)

> read more here

Things you could do with DataFrames

neat Visualizations (with Kino?) apply operations on a Series (the set of Examples)

Things we will do DataFrames

transform the data into an appropriate scale [0, 1] z-score/standardization (highlight outliers) we’ll pick standardization, although both are good(?) subtract the mean from the value, and divide by the variance

Preparing the Data for Training

Normalize the series in the DataFrame, but exclude species. For species, we’ll formalize/cast it as a categorical feature, so that we can easily convert the DataFrame into a tensor.

feature_columns = ~w(sepal_width sepal_length petal_length petal_width)

normalized_iris =
  DF.mutate(
    iris,
    for column <- across(^feature_columns) do
      {column.name, (column - mean(column)) / variance(column)}
    end
  )

normalized_iris = DF.mutate(normalized_iris, species: Explorer.Series.cast(species, :category))
shuffled_normalized_iris = DF.shuffle(normalized_iris)

# Dave's refactor:
#
# feature_columns = Enum.filter(iris.names, fn (col) -> col != "species" end)
# normalized_iris = iris
#   |> DF.mutate(for column <- across(^feature_columns) do
#     {column.name, (column - mean(column)) / variance(column)}
#   end)
#   |> DF.mutate([species: Explorer.Series.cast(species, :category)])
#   |> DF.shuffle()
Splitting into Train and Test Sets

It’s important to measure the performance of the model data it hasn’t seen. We’ll slice out some data from the DataFrame for this purpose (i.e. a Holdout Set)

train_df = DF.slice(shuffled_normalized_iris, 0..119)
test_df = DF.slice(shuffled_normalized_iris, 120..149)
Preparing Data for Training

Nx.Tensor is used for all the different modules that make up the Nx library, so we need to convert from a DataFrame to a Tensor. In particular, the values of the categorical data are string but we need them represented as one-hot encoding

x_train = Nx.stack(train_df[feature_columns], axis: -1)

y_train =
  train_df["species"]
  |> Nx.stack(axis: -1)
  |> Nx.equal(Nx.iota({1, 3}, axis: -1))

x_test = Nx.stack(test_df[feature_columns], axis: -1)

y_test =
  test_df["species"]
  |> Nx.stack(axis: -1)
  |> Nx.equal(Nx.iota({1, 3}, axis: -1))

# Dave Refactor:

# defmodule Helpers do
#   def stack columns do
#     columns
#       |> Nx.stack(axis: -1)
#   end

#   def equal tensor do
#     tensor
#       |> Nx.equal(Nx.iota({1, 3}, axis: -1))
#   end  
# end

# x_train = train_df[feature_columns] |> Helpers.stack
# y_train = train_df["species"] 
#   |> Helpers.stack 
#   |> Helpers.equal

# x_test = test_df[feature_columns] |> Helpers.stack
# y_test = test_df["species"] 
#   |> Helpers.stack 
#   |> Helpers.equal

Multinomial Logistic Regression with Axon

Axon training, in 3 steps:

  1. Define the model
  2. Create an input pipeline
  3. Declare and run the training loop` ##### Defining the Model elixir model = Axon.input("iris_features", shape: {nil, 4}) |> Axon.dense(3, activation: :softmax) # how is this useful 🫤mm Axon.Display.as_graph(model, Nx.template({1, 4}, :f32)) ##### Declaring the Input Pipeline