Intro to ML

intro-to-livebook.livemd

Frankline Apiyo

@FrankApiyo

machine-learing-with-elix...

Share to X

Share to Bluesky

More notebooks

Intro to ML

Preparing data for training

Mix.install([
  {:axon, "~> 0.5"},
  {:nx, "~> 0.5"},
  {:explorer, "~> 0.5"},
  {:kino, "~> 0.8"}
])

require Explorer.DataFrame, as: DF

iris = Explorer.Datasets.iris()

Normalize data

There’s several reasons why you should always normalize data, just to mention a few:

Improved Model Performance: Normalizing data helps in improving the performance of many machine learning algorithms. By bringing all features to the same scale, the optimization algorithms can converge faster, leading to quicker training and better model performance.
Prevention of Biased Features: Features with larger scales can dominate over those with smaller scales during the training process, potentially causing the model to be biased towards features with larger scales. Normalization prevents this bias and ensures that each feature contributes proportionally to the learning process.
Stability of Algorithms: Some algorithms are sensitive to the scale of the input data. For instance, distance-based algorithms like k-nearest neighbors (KNN) and support vector machines (SVM) can be influenced by the scale of features. Normalizing the data ensures that such algorithms are not biased by the scale and perform consistently across different datasets.
Interpretability and Convergence: Normalization can also aid in the interpretation of feature importance. When features are on different scales, it becomes challenging to compare their contributions to the model. Normalization makes it easier to interpret the importance of features relative to each other. Additionally, normalization helps in faster convergence of - optimization algorithms, reducing the training time.
Handling Outliers: Normalization can help in handling outliers by bringing the values of features within a similar range. Outliers, which may have an exaggerated effect on algorithms, can be mitigated by scaling the data.

cols = ~w(sepal_width sepal_length petal_length petal_width)

normalized_iris =
  DF.mutate(
    iris,
    for col <- across(^cols) do
      {col.name, (col - mean(col)) / variance(col)}
    end
  )

normalized_iris =
  DF.mutate(
    normalized_iris,
    species: Explorer.Series.cast(species, :category)
  )

shuffled_normalized_iris = DF.shuffle(normalized_iris)

train_df = DF.slice(shuffled_normalized_iris, 0..119)
test_df = DF.slice(shuffled_normalized_iris, 120..149)

feature_columns = [
  "sepal_length",
  "sepal_width",
  "petal_length",
  "petal_width"
]

x_train = Nx.stack(train_df[feature_columns], axis: 1)

y_train =
  train_df["species"]
  |> Nx.stack(axis: -1)
  |> Nx.equal(Nx.iota({1, 3}, axis: -1))

x_test = Nx.stack(test_df[feature_columns], axis: 1)

y_test =
  test_df["species"]
  |> Nx.stack(axis: -1)
  |> Nx.equal(Nx.iota({1, 3}, axis: -1))

Multinomial Logistic Regression(MLRM) with Axon

model =
  Axon.input("iris_features", shape: {nil, 4})
  |> Axon.dense(3, activation: :softmax)

Axon.Display.as_graph(model, Nx.template({1, 4}, :f32))

data_stream =
  Stream.repeatedly(fn ->
    {x_train, y_train}
  end)

trained_model_state =
  model
  |> Axon.Loop.trainer(:categorical_cross_entropy, :sgd)
  |> Axon.Loop.metric(:accuracy)
  |> Axon.Loop.run(data_stream, %{}, iterations: 500, epochs: 10)

data = [{x_test, y_test}]

model
|> Axon.Loop.evaluator()
|> Axon.Loop.metric(:accuracy)
|> Axon.Loop.run(data, trained_model_state)

Difference between regression and classification problem?

Classification: The goal is to predict which class a data point or class belongs to
Regression: The goal is to predict a continous numerical value

Other notebooks:

Michal Slaski
@michalslaski

livebook_examples

Salary predictions

salary_prediction.livemd

advanced data-science exla axon nx

2022-8-18
Dr. Christian Geuer-Pollmann
@chgeuer

livebook_on_azure

Christian's first LiveBook test

notebook1.livemd

tutorial advanced data-science axon exla nx

2022-8-18
@andyl

elix_util

MNIST

mnist.livemd

tutorial advanced data-science req axon exla nx

2022-8-18
Yejun Su
@goofansu

ogp

ogp

ogp.livemd

tutorial intermediate ogp kino

2022-8-18
Shozo Fukuda
@shoz-f

onnx_interp

NanoDet plus

nanodet.livemd

tutorial advanced onnx_interp cimg postdnn nx kino

2024-5-18
william
@williamhzo

IRLcafe

Scrap Google Maps coffees

scrap_google_maps.livemd

tutorial advanced req kino floki

2023-3-19
@DockYard-Academy

curriculum

Classified

deprecated_classified.livemd

tutorial intermediate jason kino youtube hidden_cell

2023-1-23

Back