Modeling XOR with a neural network

books/ai/nx/xor.livemd

@andyl

livebooks

Share to X

Share to Bluesky

More notebooks

Modeling XOR with a neural network

Mix.install([
  # {:axon, github: "elixir-nx/axon"},
  {:axon, "~> 0.2.0"},
  {:nx, "~> 0.3.0", override: true},
  {:exla, "~> 0.3.0"},
  {:kino_vega_lite, "~> 0.1.3"}
])

Nx.Defn.default_options(compiler: EXLA)

alias VegaLite, as: Vl

Introduction

In this notebook we try to create a model and learn it the logical XOR.

Even though XOR seems like a trivial operation, it cannot be modeled using a single dense layer (single-layer perceptron). The underlying reason is that the classes in XOR are not linearly separable. We cannot draw a straight line to separate the points $(0,0)$, $(1,1)$ from the points $(0,1)$, $(1,0)$. To model this properly, we need to turn to deep learning methods. Deep learning is capable of learning non-linear relationships like XOR.

The model

Let’s start with the model. We need two inputs, since XOR has two operands. We then concatenate them into a single input vector with Axon.concatenate/3. Then we have one hidden layer and one output layer, both of them dense.

Note: the model is a sequential neural network. In Axon, we can conveniently create such a model by using the pipe operator (|>) to add layers one by one.

x1_input = Axon.input("x1", shape: {nil, 1})
x2_input = Axon.input("x2", shape: {nil, 1})

model =
  x1_input
  |> Axon.concatenate(x2_input)
  |> Axon.dense(8, activation: :tanh)
  |> Axon.dense(1, activation: :sigmoid)

Training data

The next step is to prepare training data. Since we are modeling a well-defined operation, we can just generate random operands and compute the expected XOR result for them.

The training works with batches of examples, so we repeatedly generate a whole batch of inputs and the expected result.

batch_size = 32

data =
  Stream.repeatedly(fn ->
    x1 = Nx.random_uniform({batch_size, 1}, 0, 2)
    x2 = Nx.random_uniform({batch_size, 1}, 0, 2)
    y = Nx.logical_xor(x1, x2)

    {%{"x1" => x1, "x2" => x2}, y}
  end)

Here’s how a sample batch looks:

Enum.at(data, 0)

Training

It’s time to train our model. In this case we use binary cross entropy for the loss and stochastic gradient descent as the optimizer. We use binary cross entropy because we can consider the task of computing XOR the same as a binary classification problem. We want our output to have a binary label 0 or 1, and binary cross entropy is typically used in these cases. Having defined our training loop, we run it with Axon.Loop.run/4.

epochs = 10

params =
  model
  |> Axon.Loop.trainer(:binary_cross_entropy, :sgd)
  |> Axon.Loop.run(data, %{}, epochs: epochs, iterations: 1000)

Trying the model

Finally, we can test our model on sample data.

Axon.predict(model, params, %{
  "x1" => Nx.tensor([[0]]),
  "x2" => Nx.tensor([[1]])
})

Try other combinations of $x_1$ and $x_2$ and see what the output is. To improve the model performance, you can increase the number of training epochs.

Visualizing the model predictions

The original XOR we modeled only works with binary values $0$ and $1$, however our model operates in continuous space. This means that we can give it $x1 = 0.5$, $x_2 = 0.5$ as input and we expect _some output. We can use this to visualize the non-linear relationship between inputs $x_1$, $x_2$ and outputs that our model has learned.

# The number of points per axis, determines the resolution
n = 50

# We generate coordinates of in the (n x n) grid
x1 = Nx.iota({n, n}, axis: 0) |> Nx.divide(n) |> Nx.reshape({:auto, 1})
x2 = Nx.iota({n, n}, axis: 1) |> Nx.divide(n) |> Nx.reshape({:auto, 1})

# The output is also a real number, but we round it into one of the two classes
y = Axon.predict(model, params, %{"x1" => x1, "x2" => x2}) |> Nx.round()

Vl.new(width: 300, height: 300)
|> Vl.data_from_values(
  x1: Nx.to_flat_list(x1),
  x2: Nx.to_flat_list(x2),
  y: Nx.to_flat_list(y)
)
|> Vl.mark(:circle)
|> Vl.encode_field(:x, "x1", type: :quantitative)
|> Vl.encode_field(:y, "x2", type: :quantitative)
|> Vl.encode_field(:color, "y", type: :nominal)

From the plot we can clearly see that during training our model learnt two clean boundaries to separate $(0,0)$, $(1,1)$ from $(0,1)$, $(1,0)$.

Other notebooks:

Michal Slaski
@michalslaski

livebook_examples

Salary predictions

salary_prediction.livemd

data-science advanced exla axon nx

2022-8-18
Dr. Christian Geuer-Pollmann
@chgeuer

livebook_on_azure

Christian's first LiveBook test

notebook1.livemd

data-science advanced tutorial axon exla nx

2022-8-18
@andyl

elix_util

MNIST

mnist.livemd

data-science advanced tutorial req axon exla nx

2022-8-18
@TomBers

livebookNotes

Trying Nx

NX.livemd

advanced data-science exla axon nx

2022-8-18
@DeSchoel

Elixir_Curriculum

ETS Inventory Management

inventory_management.livemd

advanced gen-server tutorial jason kino youtube hidden_cell

2026-1-10
Bradley Fargo
@blasphemetheus

edifice

Liquid Neural Networks

liquid_neural_networks.livemd

advanced data-science tutorial exla kino_vega_lite kino

2026-2-24
Petrus Janse van Rensburg
@petrus-jvrensburg

bumblebee

LLMs

llms.livemd

ai advanced bumblebee nx exla kino

2024-11-14

Back