IMDB Reviews Classification - Conv1D
Introduction
The Internet Movie Database (IMDB) contains a wealth of data on the movies we all know and love (or hate; you do you). One of the most common ML tasks the dataset is used for involves natural language processing (NLP) to see if we can properly predict moviegoers’ sentiments towards a flick. To this end, Stanford AI Labs has compiled 50k reviews and labeled them as positive or negative for the NLP community to tinker with.
For fun (yes, FUN), we’re going to tackle this challenge using a convolutional network powered by word vectors. We’ll get to see how Elixir is well suited to the task of processing textual data, and how Nx complements this with performant numerical computation. Finally we’ll actually train and test a DL model using the Axon deep learning framework powered by the EXLA backend (which supports CPU, GPU, and yes TPU compute). Visualizations are provided by VegaLite and Kino.
Oh, and for the very observant observer - you might have noticed this isn’t Jupyter. Nope, we’re going to do all this in Livebook Server, the Elixir ecosystem’s crash-tolerant, concurrent, and distributed collaborative experimentation server.
And this is not a “notebook.”
It is a Livebook.
Setup
As of Elixir 1.12 we can install packages on-the-fly instead of having to build a complete Mix project around our livebook (not that you couldn’t do that as well).
Here we install our core ML ecosystem (Nx, EXLA, Axon), a dataset library (Scidata), stub for a native NLP toolkit (NLTEx), and visualization tools (Kino, VegaLite).
Mix.install(
[
# Core ecosystem components
{:nx, "~> 0.1.0-dev", github: "elixir-nx/nx", sparse: "nx", override: true},
{:exla, "~> 0.1.0-dev", github: "elixir-nx/nx", sparse: "exla", override: true},
{:axon, "~> 0.1.0-dev", github: "elixir-nx/axon", ref: "ff02898", override: true},
# Dataset package
{:scidata, "~> 0.1.2"},
# Natural Language Toolkit for Elixir
{:nltex, "~> 0.1.0-dev", github: "arpieb/nltex", branch: "master", override: true},
# Visualizations
{:kino, "~> 0.3.0"},
{:vega_lite, "~> 0.1.0"},
]#, force: true
)
Now we need to set some livebook-wide config settings that would otherwise be part of a Mix project configuration.
# Explicitly config EXLA to by default compile all computation graphs to the CPU
Application.put_env(:exla, :clients, default: [platform: :host])
# Macro initialization for Axon
require Axon
# Short name for using VegaLite
alias VegaLite, as: Vl
Retrieve IMDB movie reviews
Now we need to load the IMDB Reviews dataset provided by Stanford AI to get our train/test data. We’re going to use the Scidata package to pull in the labeled review text (labels are 0
for negative reviews, 1
for positive reviews).
Please hold…
reviews = Scidata.IMDBReviews.download()
# Sanity check to make sure we have the same number of reviews and labels
IO.puts(length(reviews.review))
IO.puts(length(reviews.sentiment))
Retrieve Stanford NLP’s GloVe word vectors
Good ol’ Stanford AI Labs… They are also home to one of the leading NLP research organizations, and as such have nice collections of word vectors we can leverage for our model.
Think of word vectors as lists of values assigned to an English word that 1. uniquely identify that word, yet 2. allow it to be mathematically related to other words based on how it is generally used in context. This is the secret sawse that will allow us to train a network quickly to recognize “patterns of meaning” in small snippets of reviews.
We’re going to pull in one of their smaller Global Vectors for Word Representation (GloVe) models, trained on 6B tokens from Wikipedia in 2014, with 50-dimension vectors per word.
(Note there’s some crude local caching going on here; the particular server in question doesn’t honor the If-Modified-Since
HTTP request header and always returns the file, so…)
If not already cached - please hold…
# Quick-n-dirty caching since apparently Stanford's data server doesn't honor the
# "If-Modified-Since" request header leveraged by Scidata.
glove_cache_file = "/tmp/glove_6b_50d.bin"
read_result =
File.open(glove_cache_file, [:read, :binary], fn ifs ->
IO.binread(ifs, :all)
|> :erlang.binary_to_term()
end)
w2v =
case read_result do
{:ok, data} ->
data
{:error, _} ->
data = NLTEx.WordVectors.GloVe.download(:glove_6b, 50)
File.open!(glove_cache_file, [:write, :binary], fn ofs ->
IO.binwrite(ofs, data |> :erlang.term_to_binary())
end)
data
end
# Verify we have 400k tokens in our word vector vocabulary
w2v.wordvecs |> Map.keys() |> length()
Prepare review text data
Now that we have all the raw materials, we need to prep them for use in a convolutional neural network. This means we have to:
- Downcase all text
- Split each review into “tokens” using the age-old whitespace approach (‘cause it’s quick-n-dirty, and in the end doesn’t make that much difference for our use case)
- Convert each list of tokens into a list of vectors
- Pad and aggregate these vectors into a single rank 3 tensor for use in our model
defmodule TransformReviews do
# Define the default Nx backend compiler to use for `defn` numeric functions
@default_defn_compiler EXLA
# Import the macros for defining numeric functions
import Nx.Defn
@doc ~S"""
Preprocess reviews for Conv1D model consumption
"""
def transform(reviews, w2v) do
reviews
|> Stream.map(&String.downcase/1)
|> Stream.map(&NLTEx.Tokenizer.Simple.transform/1)
|> Stream.map(fn tokens -> vectorize_review(tokens, w2v) end)
|> Enum.to_list()
|> Nx.stack()
end
def vectorize_review(tokens, w2v) do
tokens
|> NLTEx.WordVectors.vectorize_tokens(w2v)
|> Nx.stack()
|> nx_proc(
lead: 0,
trail: 32 - length(tokens)
)
end
# Break out tensor-based numeric computation for backend compilation
defnp nx_proc(t, opts \\ []) do
t
|> Nx.pad(0.0, [{opts[:lead], opts[:trail], 0}, {0, 0, 0}])
|> Nx.transpose()
end
end
vec_reviews =
reviews.review
|> TransformReviews.transform(w2v)
Prep sentiment labels
Similarly, we need to preprocess our labels to get them into class vectors.
(Sure, we could have made this a binary labeling example, but where’s the fun in that?)
labels =
reviews.sentiment
|> Enum.take(vec_reviews |> Nx.shape() |> elem(0))
|> Nx.tensor(type: {:f, 32})
|> Nx.new_axis(-1)
|> Nx.equal(Nx.iota({2}))
# Sanity check, confirm we have equal numbers of labels per the dataset reference
Nx.sum(labels, axes: [0])
Construct a classification model and train it!
^^ See that branch from “Prep sentiment labels” above?
That means we’ve forked a new process in Elixir, so if anything goes sideways in this section it can’t take down the main livebook process, yet it inherits the environment from the final state in the “from” section.
Recovery is as simple as re-evaluating this section to pick up the parent section’s state and do its thing.
Define model
Here is where the DL magic happens! We’re going to construct a 1D convolutional network (Conv1D for short) where the first dimension is our words (the first 32 from a review) and the “depth” of each feature is 50 (the size of the word vectors we loaded and transformed above).
You might ask why we didn’t just use a Conv2D model here? That kind of model is optimized for 2D spatial patterns such as images, and while it could be used it’s not really the correct model for a one-dimensional dataset like sentences. Additionally, it would require a rank 4 tensor (NCHW) which means you’d have to add yet another layer of padding to the data, which just doesn’t smell right either.
Always trust your nose in sushi restaurants and DL architectures.
# Define convolution options for convenience
conv_opts = [
kernel_size: 3,
activation: :relu
]
# Define pooling options for convenience
max_pool_opts = [
kernel_size: 3,
stride: 3
]
# Collect shape info for model definition
num_words = vec_reviews |> Nx.shape() |> elem(2)
vec_size = vec_reviews |> Nx.shape() |> elem(1)
# Construct the model using a functional pipeline
model =
Axon.input({nil, vec_size, num_words})
|> Axon.conv(16, conv_opts)
|> Axon.max_pool(max_pool_opts)
|> Axon.flatten()
|> Axon.dense(224, activation: :relu)
|> Axon.dense(2, activation: :softmax)
Check out the nice model summary above! Notice that Axon unrolls the layers for you so you can see the compute layer as well as the activation layer for each conceptual “layer” of a network.
The Shape column is the output shape of that layer, and the Parameters column reports the number of trainable parameters that exist in that layer.
Being able to see the shape of what you have constructed is vital as it 1. guides sizing for your DL environment, and 2. helps you analyze the complexity of your network. Note that this turns out to be a relatively small model re number of trainable parameters, yet we’ll see it’s still pretty effective!
Setting up a visualization (not a requirement, but nice to have!)
Before we actually start training our model, we’re going to define a pair of graphs and their related callbacks to plug into the Axon training loop. This is going to leverage two new Elixir packages, VegaLite (as Vl
) and Kino.
If we get this right, we’re going to have realtime graphs of our training losses at both the epoch and batch levels. For those who have stared at enough ML training dashboards external to their modeling system, you know just how cool this is. In Livebook this is fairy trivial to add thanks to how Axon, Kino and VegaLite play together. Ultimately this is made posible due to first-class concurrency capabilities baked into Elixir and the BEAM via the OTP model.
defmodule TrainingVis do
@moduledoc ~S"""
Convenience module to bundle up visualization logic + Axon training callbacks
"""
defstruct [:epoch_widget, :batch_widget]
@doc ~S"""
Registers our custom callbacks with Axon's training step with inof re our vis widgets
"""
def register_callbacks(step) do
opts = %__MODULE__{
epoch_widget: create_widget("Epoch Losses", "epoch", "loss"),
batch_widget: create_widget("Batch Losses", "batch", "loss")
}
Axon.Training.callback(step, &callback/3, :all, training_vis: opts)
end
# Util method to create consistent widgets
defp create_widget(title, xlabel, ylabel) do
Vl.new(width: 400, height: 300, title: title)
|> Vl.mark(:line)
|> Vl.encode_field(:x, "x", type: :quantitative, title: xlabel)
|> Vl.encode_field(:y, "y", type: :quantitative, title: ylabel)
|> Kino.VegaLite.new()
|> tap(&Kino.render/1)
end
# Callback that fires at the beginning of each epoch to clear the batch widget
def callback(train_state, :before_epoch, opts) do
widget = opts[:training_vis].batch_widget
Kino.VegaLite.clear(widget)
{:cont, train_state}
end
# Callback that fires at the end of each epoch to update epoch loss widget
def callback(train_state, :after_epoch, opts) do
widget = opts[:training_vis].epoch_widget
epoch = Nx.to_scalar(train_state[:epoch]) + 1
epoch_loss = Nx.to_scalar(train_state[:epoch_loss])
point = %{x: epoch, y: epoch_loss}
Kino.VegaLite.push(widget, point)
{:cont, train_state}
end
# Callback that fires at the end of each batch to update batch loss widget
def callback(train_state, :after_batch, opts) do
widget = opts[:training_vis].batch_widget
epoch_step = Nx.to_scalar(train_state[:epoch_step]) + 1
epoch_loss = Nx.to_scalar(train_state[:epoch_loss])
point = %{x: epoch_step, y: epoch_loss}
Kino.VegaLite.push(widget, point)
{:cont, train_state}
end
# Catch-all callback to allow unmatched events to pass through
def callback(train_state, _event, _opts), do: {:cont, train_state}
end
Training Day
Now that we have prepared training samples and their related labels, a model, and a visualization - let’s do something with them! The next code block performs the following:
- Batches up our samples
- Shuffles the batches (way more efficient than shuffling entire datasets, end results are not much different in this case)
- Defines the training step with loss, optimizer, and any optional metrics we want reported
- Registers the visualization callbacks
- Compiles and executes the defined model and step function using EXLA to run on the target platform (CPU in our case, but could be GPU or TPU in the right environment)
# Set common batch size
batch_size = 32
# Split training data and labels into batches
train_x =
vec_reviews
|> Nx.to_batched_list(batch_size)
train_y =
labels
|> Nx.to_batched_list(batch_size)
# Shuffle batches
{train_x, train_y} =
train_x
|> Enum.zip(train_y)
|> Enum.shuffle()
|> Enum.unzip()
# Fit the model and store off result data
final_params =
model
|> Axon.Training.step(
:categorical_cross_entropy,
Axon.Optimizers.adam(0.005),
metrics: [:accuracy]
)
|> TrainingVis.register_callbacks()
|> Axon.Training.train(train_x, train_y, epochs: 10, compiler: EXLA)
Like most other DL frameworks, Axon provides incremental training performance feedback. We can also see that when it finishes up final_params
holds quite a bit more state than just the final model parameters. This allows for a nice post mortem analysis of the training run whereas most frameworks don’t make that information readily available without jumping through some hoops.
At this point we should have a model with ~70-80% accuracy. More epochs might have improved performance, but the literature reports mid-80s for this approach in most cases, so… yeah.
Testing our model
For those who caught it, I committed a major ML sin above - I didn’t split out train/test sets. In production that would be a very bad idea but in this case I didn’t want us to get lost in the weeds. Besides, how hard is it to cook up some sample reviews ourselves, amiright?
our_reviews = [
# First three reviews from IMDB; should all be positive (class 1)
"The story centers around Barry McKenzie who must go to England if he wishes to claim his inheritance. Being about the grossest Aussie shearer ever to set foot outside this great Nation of ours there is something of a culture clash and much fun and games ensue. The songs of Barry McKenzie(Barry Crocker) are highlights.",
"'The Adventures Of Barry McKenzie' started life as a satirical comic strip in 'Private Eye', written by Barry Humphries and based on an idea by Peter Cook. McKenzie ( 'Bazza' to his friends ) is a lanky, loud, hat-wearing Australian whose two main interests in life are sex ( despite never having had any ) and Fosters lager. In 1972, he found his way to the big screen for the first of two outings. It must have been tempting for Humphries to cast himself as 'Bazza', but he wisely left the job to Barry Crocker ( later to sing the theme to the television soap opera 'Neighbours'! ). Humphries instead played multiple roles in true Peter Sellers fashion, most notably Bazza's overbearing Aunt 'Edna Everage' ( this was before she became a Dame ).
You know this is not going to be 'The Importance Of Being Ernest' when its censorship classification N.P.A. stands for 'No Poofters Allowed'. Pom-hating Bazza is told by a Sydney solicitor that in order to inherit a share in his father's will he must go to England to absorb British culture. With Aunt Edna in tow, he catches a Quantas flight to Hong Kong, and then on to London. An over-efficient customs officer makes Bazza pay import duties on everything he bought over there, including a suitcase full of 'tubes of Fosters lager'. As he puts it: \"when it comes to fleecing you, the Poms have got the edge on the gyppos!\". A crafty taxi driver ( Bernard Spear ) maximises the fare by taking Bazza and Edna first to Stonehenge, then Scotland. The streets of London are filthy, and their hotel is a hovel run by a seedy landlord ( Spike Milligan ) who makes Bazza put pound notes in the electricity meter every twenty minutes. There is some good news for our hero though; he meets up with other Aussies in Earls Court, and Fosters is on sale in British pubs.
What happens next is a series of comical escapades that take Bazza from starring in his own cigarette commercial, putting curry down his pants in the belief it is some form of aphrodisiac, a bizarre encounter with Dennis Price as an upper-class pervert who loves being spanked while wearing a schoolboy's uniform, a Young Conservative dance in Rickmansworth to a charity rock concert where his song about 'chundering' ( vomiting ) almost makes him an international star, and finally to the B.B.C. T.V. Centre where he pulls his pants down on a live talk-show hosted by the thinking man's crumpet herself, Joan Bakewell. A fire breaks out, and Bazza's friends come to the rescue - downing cans of Fosters, they urinate on the flames en masse.
This is a far cry from Bruce Beresford's later works - 'Breaker Morant' and 'Driving Miss Daisy'. On release, it was savaged by critics for being too 'vulgar'. Well, yes, it is, but it is also great non-P.C. fun. 'Bazza' is a disgusting creation, but his zest for life is unmistakable, you cannot help but like the guy. His various euphemisms for urinating ( 'point Percy at the porcelain' ) and vomiting ( 'the Technicolour yawn' ) have passed into the English language without a lot of people knowing where they came from. Other guest stars include Dick Bentley ( as a detective who chases Bazza everywhere ), Peter Cook, Julie Covington ( later to star in 'Rock Follies' ), and even future arts presenter Russell Davies.
A sequel - the wonderfully-named 'Barry McKenzie Holds His Own - came out two years later. At its premiere, Humphries took the opportunity to blast the critics who had savaged the first film. Good for him.
What must have been of greater concern to him, though, was the release of 'Crocodile Dundee' in 1985. It also featured a lanky, hat-wearing Aussie struggling to come to terms with a foreign culture. And made tonnes more money.
The song on the end credits ( performed by Snacka Fitzgibbon ) is magnificent. You have a love a lyric that includes the line: \"If you want to send your sister in a frenzy, introduce her to Barry McKenzie!\". Time to end this review. I have to go the dunny to shake hands with the unemployed...",
"This film and it's sequel Barry Mckenzie holds his own, are the two greatest comedies to ever be produced. A great story a young Aussie bloke travels to england to claim his inheritance and meets up with his mates, who are just as loveable and innocent as he is.
It's chock a block full of great, sayings , where else could you find someone who needs a drink so bad that he's as dry as a dead dingoes donger? great characters, top acting, and it's got great sheilas and more Fosters consumption then any other three films put together. Top notch.
And some of the funniest songs you'll ever hear, and it's full of great celebrities. Definitely my two favourite films of all time, I watch them at least once a fortnight.",
# Made up reviews - what do you think they should be?
# (class 0 == negative, class 1 == positive)
"This movie was great really enjoyed it loved it",
"Meh, I could take it or leave it",
"this movie sucked, 2 hours of my life i'll never get back"
]
vec_our_reviews =
our_reviews
|> TransformReviews.transform(w2v)
Axon.predict(model, final_params[:params], vec_our_reviews, compiler: EXLA)
|> IO.inspect()
|> Nx.argmax(axis: 1)