Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Introduction to PyTorch (Elixir mimic)

AA.livemd

Introduction to PyTorch (Elixir mimic)

Mix.install(
  [
    {:axon, "~> 0.6"},
    {:nx, "~> 0.7"},
    {:kino, "~> 0.8"},
    {:scidata, "~> 0.1"},
    {:exla, ">= 0.0.0"},
    {:table_rex, "~> 3.1.1"}
  ],
  #config: [nx: [default_backend: {EXLA.Backend, client: :cuda}]]
  config: []
)

Scalars, vectors, matrices, and tensors

tensor0d=Nx.tensor(1)

tensor1d=Nx.tensor([1, 2, 3])

tensor2d=Nx.tensor([[1, 2], [3, 4]])

tensor3d=Nx.tensor([[[1, 2], [3, 4]], [[1, 2], [3, 4]]])

{tensor0d, tensor1d, tensor2d, tensor3d}
tensor1d.type
floatvec = Nx.tensor([1.0, 2.0, 3.0])
f32_tensor1d = Nx.as_type(tensor1d, :f32)

{f32_tensor1d, tensor1d}

Common PyTorch tensor operations

tensor2d = Nx.tensor([[1, 2, 3], [4, 5, 6]])
tensor2d.shape
reshaped_t2d = Nx.reshape(tensor2d, {3, 2})
transposed_t2d = Nx.transpose(tensor2d)
Nx.dot(tensor2d, transposed_t2d)

A.3 Seeing models as computation graphs

PyTorch’s autograd system provides functions to compute gradients in dynamic computational graphs automatically.

A computational graph (or computation graph in short) is a directed graph that allows us to express and visualize mathematical expressions. A computation graph lays out the sequence of calculations needed to compute the output of a neural network.

# label (true)
y = Nx.tensor([1])
# input feature
x1 = Nx.tensor([1.1])
# weight parameter
w1 = Nx.tensor([2.2])
# bias unit
b = Nx.tensor([0.0])
# net input
# z = x1 * w1 + b
z = 
  x1
  |> Nx.multiply(w1)
  |> Nx.add(b)
# activation & output
a = Axon.Activations.sigmoid(z)
loss = Axon.Losses.binary_cross_entropy(y, a)

# fun = Nx.Defn.grad(fn x -> Nx.sin(x) end)
# fun.(Nx.tensor([0]))

A.4 Automatic differentiation made easy

Computations graph in PyTorch, the graph will be build internally by default if one of its terminal nodes has the requires_grad attribute set to True. Gradients are required when training neural networks via the popular backpropagation algorithm, which can be thought of as an implementation of the chain rule from calculus for neural networks.

The most common way of computing the loss gradients in a computation graph involves applying the chain rule from right to left, which is also called reverse-model automatic differentiation or backpropagation.

PARTIAL DERIVATIVES AND GRADIENTS

Partial derivatives measure the rate at which a function changes with respect to one of its variables. A gradient is a vector containing all of the partial derivatives of a multivariate function, a function with more than one variable as input.

By tracking every operation performed on tensors, PyTorch’s autograd engine constructs a computational graph in the background. Then, calling the grad function, we can compute the gradient of the loss with respect to model parameters.

net_fn = 
  fn(w1, b) ->
    forward_pass =
      x1
      |> Nx.multiply(w1)
      |> Nx.add(b)
      |> Axon.Activations.sigmoid()

    Axon.Losses.binary_cross_entropy(y, forward_pass)
  end

net_grad = Nx.Defn.grad({w1, b}, fn {w1, b} -> net_fn.(w1, b) end)

A.5 Implementing multilayer neural networks

When implementing a neural network in PyTorch, we typically subclass the torch.nn.Module class to define our own custom network architecture.

It allows us to encapsulate layers and operations and keep track of the model’s parameters.

Within this subclass, we define the network layers in the init constructor and specify how they interact in the forward method. The forward method describes how the input data passes through the network and comes together as a computation graph.

In contrast, the backward method, which we typically do not need to implement ourselves, is used during training to compute gradients of the loss function with respect to the model parameters

# Building the Model

model =
  # a model that takes an input shape of {nil, 784}
  Axon.input("images", shape: {nil, 50})
  # hidden layer 1 of 30 neurons. 
  |> Axon.dense(30, activation: :relu)
  # hidden layer 2 of 20 outputs.
  |> Axon.dense(20, activation: :relu)
  # output layer of 3 outputs.
  |> Axon.dense(3, activation: :softmax)
# Graphic display:
template = Nx.template({1, 50}, :f32)
Axon.Display.as_table(model, template)
|> IO.puts
{init_fn, predict_fn} = Axon.build(model, compiler: EXLA)

key = Nx.Random.key(123)
{random_input, _new_key} = Nx.Random.uniform(key, shape: {1, 50}, type: :f32)

params = init_fn.(template, %{})

predict_fn.(params, random_input)