# Modeling XOR with a neural network

```
Mix.install([
{:axon, "~> 0.3.0"},
{:nx, "~> 0.4.0", override: true},
{:exla, "~> 0.4.0"},
{:kino_vega_lite, "~> 0.1.6"}
])
Nx.Defn.default_options(compiler: EXLA)
alias VegaLite, as: Vl
```

## Introduction

In this notebook we try to create a model and learn it the **logical XOR**.

Even though XOR seems like a trivial operation, it cannot be modeled using a single dense layer (single-layer perceptron). The underlying reason is that the classes in XOR are not linearly separable. We cannot draw a straight line to separate the points $(0,0)$, $(1,1)$ from the points $(0,1)$, $(1,0)$. To model this properly, we need to turn to deep learning methods. Deep learning is capable of learning non-linear relationships like XOR.

## The model

Let’s start with the model. We need two inputs, since XOR has two operands. We then concatenate them into a single input vector with `Axon.concatenate/3`

. Then we have one hidden layer and one output layer, both of them dense.

Note: the model is a sequential neural network. In Axon, we can conveniently create such a model by using the pipe operator (`|>`

) to add layers one by one.

```
x1_input = Axon.input("x1", shape: {nil, 1})
x2_input = Axon.input("x2", shape: {nil, 1})
model =
x1_input
|> Axon.concatenate(x2_input)
|> Axon.dense(8, activation: :tanh)
|> Axon.dense(1, activation: :sigmoid)
```

## Training data

The next step is to prepare training data. Since we are modeling a well-defined operation, we can just generate random operands and compute the expected XOR result for them.

The training works with batches of examples, so we *repeatedly* generate a whole batch of inputs and the expected result.

```
batch_size = 32
data =
Stream.repeatedly(fn ->
x1 = Nx.random_uniform({batch_size, 1}, 0, 2)
x2 = Nx.random_uniform({batch_size, 1}, 0, 2)
y = Nx.logical_xor(x1, x2)
{%{"x1" => x1, "x2" => x2}, y}
end)
```

Here’s how a sample batch looks:

`Enum.at(data, 0)`

## Training

It’s time to train our model. In this case we use *binary cross entropy* for the loss and *stochastic gradient descent* as the optimizer. We use binary cross entropy because we can consider the task of computing XOR the same as a binary classification problem. We want our output to have a binary label `0`

or `1`

, and binary cross entropy is typically used in these cases. Having defined our training loop, we run it with `Axon.Loop.run/4`

.

```
epochs = 10
params =
model
|> Axon.Loop.trainer(:binary_cross_entropy, :sgd)
|> Axon.Loop.run(data, %{}, epochs: epochs, iterations: 1000)
```

## Trying the model

Finally, we can test our model on sample data.

```
Axon.predict(model, params, %{
"x1" => Nx.tensor([[0]]),
"x2" => Nx.tensor([[1]])
})
```

Try other combinations of $x_1$ and $x_2$ and see what the output is. To improve the model performance, you can increase the number of training epochs.

## Visualizing the model predictions

The original XOR we modeled only works with binary values $0$ and $1$, however our model operates in continuous space. This means that we can give it $x*1 = 0.5$, $x_2 = 0.5$ as input and we expect _some* output. We can use this to visualize the non-linear relationship between inputs $x_1$, $x_2$ and outputs that our model has learned.

```
# The number of points per axis, determines the resolution
n = 50
# We generate coordinates of inputs in the (n x n) grid
x1 = Nx.iota({n, n}, axis: 0) |> Nx.divide(n) |> Nx.reshape({:auto, 1})
x2 = Nx.iota({n, n}, axis: 1) |> Nx.divide(n) |> Nx.reshape({:auto, 1})
# The output is also a real number, but we round it into one of the two classes
y = Axon.predict(model, params, %{"x1" => x1, "x2" => x2}) |> Nx.round()
Vl.new(width: 300, height: 300)
|> Vl.data_from_values(
x1: Nx.to_flat_list(x1),
x2: Nx.to_flat_list(x2),
y: Nx.to_flat_list(y)
)
|> Vl.mark(:circle)
|> Vl.encode_field(:x, "x1", type: :quantitative)
|> Vl.encode_field(:y, "x2", type: :quantitative)
|> Vl.encode_field(:color, "y", type: :nominal)
```

From the plot we can clearly see that during training our model learnt two clean boundaries to separate $(0,0)$, $(1,1)$ from $(0,1)$, $(1,0)$.