Introduction to PyTorch (Elixir mimic)

AA.livemd

@alde103

Build_Large_Language_Mode...

Share to X

Share to Bluesky

More notebooks

Introduction to PyTorch (Elixir mimic)

Mix.install(
  [
    {:axon, "~> 0.6"},
    {:nx, "~> 0.7"},
    {:kino, "~> 0.8"},
    {:scidata, "~> 0.1"},
    {:exla, ">= 0.0.0"},
    {:table_rex, "~> 3.1.1"}
  ],
  #config: [nx: [default_backend: {EXLA.Backend, client: :cuda}]]
  config: []
)

Scalars, vectors, matrices, and tensors

tensor0d=Nx.tensor(1)

tensor1d=Nx.tensor([1, 2, 3])

tensor2d=Nx.tensor([[1, 2], [3, 4]])

tensor3d=Nx.tensor([[[1, 2], [3, 4]], [[1, 2], [3, 4]]])

{tensor0d, tensor1d, tensor2d, tensor3d}

tensor1d.type

floatvec = Nx.tensor([1.0, 2.0, 3.0])

f32_tensor1d = Nx.as_type(tensor1d, :f32)

{f32_tensor1d, tensor1d}

Common PyTorch tensor operations

tensor2d = Nx.tensor([[1, 2, 3], [4, 5, 6]])

tensor2d.shape

reshaped_t2d = Nx.reshape(tensor2d, {3, 2})

transposed_t2d = Nx.transpose(tensor2d)

Nx.dot(tensor2d, transposed_t2d)

A.3 Seeing models as computation graphs

PyTorch’s autograd system provides functions to compute gradients in dynamic computational graphs automatically.

A computational graph (or computation graph in short) is a directed graph that allows us to express and visualize mathematical expressions. A computation graph lays out the sequence of calculations needed to compute the output of a neural network.

# label (true)
y = Nx.tensor([1])
# input feature
x1 = Nx.tensor([1.1])
# weight parameter
w1 = Nx.tensor([2.2])
# bias unit
b = Nx.tensor([0.0])
# net input
# z = x1 * w1 + b
z = 
  x1
  |> Nx.multiply(w1)
  |> Nx.add(b)
# activation & output
a = Axon.Activations.sigmoid(z)
loss = Axon.Losses.binary_cross_entropy(y, a)

# fun = Nx.Defn.grad(fn x -> Nx.sin(x) end)
# fun.(Nx.tensor([0]))

A.4 Automatic differentiation made easy

Computations graph in PyTorch, the graph will be build internally by default if one of its terminal nodes has the requires_grad attribute set to True. Gradients are required when training neural networks via the popular backpropagation algorithm, which can be thought of as an implementation of the chain rule from calculus for neural networks.

The most common way of computing the loss gradients in a computation graph involves applying the chain rule from right to left, which is also called reverse-model automatic differentiation or backpropagation.

PARTIAL DERIVATIVES AND GRADIENTS

Partial derivatives measure the rate at which a function changes with respect to one of its variables. A gradient is a vector containing all of the partial derivatives of a multivariate function, a function with more than one variable as input.

By tracking every operation performed on tensors, PyTorch’s autograd engine constructs a computational graph in the background. Then, calling the grad function, we can compute the gradient of the loss with respect to model parameters.

net_fn = 
  fn(w1, b) ->
    forward_pass =
      x1
      |> Nx.multiply(w1)
      |> Nx.add(b)
      |> Axon.Activations.sigmoid()

    Axon.Losses.binary_cross_entropy(y, forward_pass)
  end

net_grad = Nx.Defn.grad({w1, b}, fn {w1, b} -> net_fn.(w1, b) end)

A.5 Implementing multilayer neural networks

When implementing a neural network in PyTorch, we typically subclass the torch.nn.Module class to define our own custom network architecture.

It allows us to encapsulate layers and operations and keep track of the model’s parameters.

Within this subclass, we define the network layers in the init constructor and specify how they interact in the forward method. The forward method describes how the input data passes through the network and comes together as a computation graph.

In contrast, the backward method, which we typically do not need to implement ourselves, is used during training to compute gradients of the loss function with respect to the model parameters

# Building the Model

model =
  # a model that takes an input shape of {nil, 784}
  Axon.input("images", shape: {nil, 50})
  # hidden layer 1 of 30 neurons. 
  |> Axon.dense(30, activation: :relu)
  # hidden layer 2 of 20 outputs.
  |> Axon.dense(20, activation: :relu)
  # output layer of 3 outputs.
  |> Axon.dense(3, activation: :softmax)

# Graphic display:
template = Nx.template({1, 50}, :f32)
Axon.Display.as_table(model, template)
|> IO.puts

{init_fn, predict_fn} = Axon.build(model, compiler: EXLA)

key = Nx.Random.key(123)
{random_input, _new_key} = Nx.Random.uniform(key, shape: {1, 50}, type: :f32)

params = init_fn.(template, %{})

predict_fn.(params, random_input)

Other notebooks:

Michal Slaski
@michalslaski

livebook_examples

Salary predictions

salary_prediction.livemd

advanced data-science exla axon nx

2022-8-18
Dr. Christian Geuer-Pollmann
@chgeuer

livebook_on_azure

Christian's first LiveBook test

notebook1.livemd

tutorial advanced data-science axon exla nx

2022-8-18
@andyl

elix_util

MNIST

mnist.livemd

tutorial advanced data-science req axon exla nx

2022-8-18
Yejun Su
@goofansu

ogp

ogp

ogp.livemd

tutorial intermediate ogp kino

2022-8-18
Ryo Wakabayashi
@RyoWakabayashi

elixir-learning

Tigris

tigris.livemd

tutorial advanced data-science apis ex_aws ex_aws_s3 poison hackney sweet_xml explorer evision req kino

2024-6-30
Massachusetts Bay Transportation Authority
@mbta

transit_data_reports

Glides Terminal Departure Accuracy

glides_terminal_departure_accuracy.livemd

tutorial advanced kino transit_data tz kino_vega_lite

2024-7-23
Nicolò G.
@nickgnd

kaggle-titanic-livebook

Predicting Titanic survivors with Explorer and ML ...

predicting-titanic-survivors-with-ml.livemd

tutorial advanced data-science scholar explorer exgboost kino_explorer kino_vega_lite

2023-11-13

Back