Oban Training—Ready for Production
Mix.install([:oban, :postgrex])
Logger.configure(level: :info)
Application.put_env(:chow_mojo, ChowMojo.Repo, url: "postgres://localhost:5432/chow_mojo_dev")
defmodule ChowMojo.Repo do
use Ecto.Repo, otp_app: :chow_mojo, adapter: Ecto.Adapters.Postgres
end
defmodule ChowMojo.ObanCase do
use ExUnit.CaseTemplate
using do
quote do
use Oban.Testing, repo: ChowMojo.Repo
end
end
end
ChowMojo.Repo.start_link()
🏅 Goals
There’s one final hurdle before your training is complete—getting ready for production. Throughout the previous exercises we’ve focused on development and testing environments where job data is short lived and there’s no scale to contend with. Now we’ll dig into enabling introspection, external observability, and maintaining database health.
Managing Jobs
Job introspection and uniqueness is built on keeping job rows in the database after they have completed. To prevent the oban_jobs
table from growing indefinitely, a Pruner
plugin provides out-of-band deletion of completed
, cancelled
and discarded
jobs.
Include Pruner
in the list of plugins and configure it to retain jobs for 7 days:
plugins = [{Oban.Plugins.Pruner, max_age: 60 * 60 * 24 * 7}]
# Your turn...
plugins = []
Oban.Config.validate(plugins: plugins)
During deployment or unexpected node restarts, jobs may be left in an executing
state indefinitely. We call these jobs “orphans”, but orphaning isn’t a bad thing. It means that the job wasn’t lost and it may be retried again when the system comes back online.
There are two mechanisms to mitigate orphans:
- Increase the shutdown_grace_period to allow the system more time to finish executing before shutdown.
- Use the Lifeline plugin to automatically move those jobs back to available so they can run again.
Add the Lifeline plugin and configure it to rescue after 5 minutes:
Use a Hintplugins = [{Oban.Plugins.Lifeline, rescue_after: :timer.minutes(5)}]
# Your turn...
plugins = []
Oban.Config.validate(plugins: plugins)
Demonstrating pruning and rescuing in a notebook environment is tricky, but let’s give it a shot. Start an Oban instance that prunes after a short period (e.g. 10s) and rescues after a short period (e.g. 1000ms).
Use a HintOban.start_link(
repo: ChowMojo.Repo,
queues: [default: 10],
plugins: [{Oban.Plugins.Pruner, max_age: 10}, {Oban.Plugins.Lifeline, rescue_after: 1_000}]
)
defmodule SleepyWorker do
use Oban.Worker
def perform(_job) do
Process.sleep(:rand.uniform(3000))
:ok
end
end
# Your turn...
# Oban.start_link()
0..10
|> Enum.map(&SleepyWorker.new(%{id: &1}))
|> Oban.insert_all()
Start evaluating the cell below to watch as jobs complete, get rescued as orphans (erroneously after only 1s), and finally are deleted after 10s. Reevaluate the cell above to seed more jobs if you’ve missed it.
Oban.Job
|> ChowMojo.Repo.all()
|> Enum.map(&Map.take(&1, [:id, :state, :attempt, :completed_at]))
Telemetry & Logging
Oban heavily utilizes Telemetry for instrumentation at every level. From job execution, plugin activity, through to every database call there’s a telemetry event to hook into.
The simplest way to leverage Oban’s telemetry usage is through the default logger. In the test below, attach the default logger and use perform_job/3
to trigger a few log events (test jobs emit the same telemetry as production). Use catpure_log/1 to record the log lines and make assertions about them.
Oban.Telemetry.attach_default_logger(encode: false)
logged =
capture_log(fn ->
perform_job(FakeWorker, %{action: "ok"})
perform_job(FakeWorker, %{action: "error"})
end)
assert logged =~ "job:start"
assert logged =~ "job:stop"
assert logged =~ "job:exception"
ExUnit.start(auto_run: false)
defmodule ChowMojo.TelemetryLoggingTest do
use ChowMojo.ObanCase
import ExUnit.CaptureLog
setup do
Logger.configure(level: :debug)
end
defmodule FakeWorker do
use Oban.Worker
def perform(%{args: %{"action" => "ok"}}), do: :ok
def perform(%{args: %{"action" => "error"}}), do: {:error, :boom}
end
test "logging events triggered by execution jobs" do
# Your turn...
end
end
ExUnit.run()
🎉 Congratulations, you’ve made it! You’re ready to build robust background job systems with Oban and run them in production with safeguards and essential introspection.
☠️ Extra Challenges
Event spelunking
Browse Oban’s Telemetry docs and see what other type of events are available. How many can you identify? Can you guess where they’re emitted from and how they could be used? Can you build a generic handler that logs some details about all events?
External error reporting
Telemetry events can be used to report issues externally to services like Sentry or AppSignal. Write a handler that sends error notifications to a third party (use a mock, or something that sends a message back to the test process). Choose a subset of the job’s fields to include in the notification, and optionally only deliver on the final attempt (attempt == max_attempts
).