Powered by AppSignal & Oban Pro

Scalar Optimization with Scholar.Optimize

notebooks/optimize.livemd

Scalar Optimization with Scholar.Optimize

Mix.install([
  {:scholar, path: "."},
  {:nx, "~> 0.9"},
  {:kino_vega_lite, "~> 0.1"}
])

Introduction

Scholar provides two algorithms for scalar (univariate) function minimization:

  • Scholar.Optimize.Brent - Brent’s method (recommended)
  • Scholar.Optimize.GoldenSection - Golden Section search

Both find the minimum of a function of one variable within a specified bracket [a, b].

Recommendation: Use Brent’s method for most applications. It combines the reliability of Golden Section with faster convergence from parabolic interpolation, typically requiring 3-5x fewer function evaluations.

Brent’s Method (Recommended)

Brent’s method is the default choice for scalar optimization. It intelligently combines:

  • Parabolic interpolation for fast convergence when the function is smooth
  • Golden section fallback for guaranteed convergence

Simple Parabola

Let’s minimize $f(x) = (x - 3)^2$, which has its minimum at $x = 3$.

alias Scholar.Optimize.Brent

# Define the objective function
fun = fn x -> Nx.pow(Nx.subtract(x, 3), 2) end

# Find the minimum using Brent's method
result = Brent.minimize(0.0, 5.0, fun)

IO.puts("Minimum found at x = #{Nx.to_number(result.x)}")
IO.puts("Function value: #{Nx.to_number(result.fun)}")
IO.puts("Converged: #{Nx.to_number(result.converged) == 1}")
IO.puts("Iterations: #{Nx.to_number(result.iterations)}")
IO.puts("Function evaluations: #{Nx.to_number(result.fun_evals)}")

Minimizing a Trigonometric Function

Find the minimum of $\sin(x)$ in the interval $[3, 5]$ (around $3\pi/2$):

alias Scholar.Optimize.Brent

fun = fn x -> Nx.sin(x) end

result = Brent.minimize(3.0, 5.0, fun)

expected_x = 3 * :math.pi() / 2  # 3π/2 ≈ 4.712

IO.puts("Minimum found at x = #{Nx.to_number(result.x)}")
IO.puts("Expected: #{expected_x}")
IO.puts("sin(x) at minimum: #{Nx.to_number(result.fun)}")

Shifted Parabola

Minimize $f(x) = (x + 2)^2 + 1$, which has minimum at $x = -2$ with value $1$:

alias Scholar.Optimize.Brent

fun = fn x -> Nx.add(Nx.pow(Nx.add(x, 2), 2), 1) end

result = Brent.minimize(-5.0, 5.0, fun)

IO.puts("Minimum found at x = #{Nx.to_number(result.x)}")
IO.puts("Function value: #{Nx.to_number(result.fun)}")
IO.puts("Expected: x = -2, f(x) = 1")

Brent vs Golden Section: Performance Comparison

Brent’s method typically converges much faster than Golden Section:

alias Scholar.Optimize.{Brent, GoldenSection}

fun = fn x -> Nx.pow(Nx.subtract(x, 3), 2) end

brent_result = Brent.minimize(0.0, 5.0, fun)
golden_result = GoldenSection.minimize(0.0, 5.0, fun)

IO.puts("Simple parabola (x-3)^2 on [0, 5]:")
IO.puts("  Brent:         #{Nx.to_number(brent_result.fun_evals)} function evaluations")
IO.puts("  Golden Section: #{Nx.to_number(golden_result.fun_evals)} function evaluations")
IO.puts("")
IO.puts("Both find x ≈ 3.0:")
IO.puts("  Brent x:  #{Nx.to_number(brent_result.x)}")
IO.puts("  Golden x: #{Nx.to_number(golden_result.x)}")

Wider Bracket Comparison

The advantage of Brent becomes more pronounced with wider brackets:

alias Scholar.Optimize.{Brent, GoldenSection}

fun = fn x -> Nx.pow(Nx.subtract(x, 50), 2) end

brent_result = Brent.minimize(0.0, 100.0, fun)
golden_result = GoldenSection.minimize(0.0, 100.0, fun)

IO.puts("Parabola (x-50)^2 on [0, 100]:")
IO.puts("  Brent:         #{Nx.to_number(brent_result.fun_evals)} function evaluations")
IO.puts("  Golden Section: #{Nx.to_number(golden_result.fun_evals)} function evaluations")
IO.puts("")
IO.puts("Speedup: #{Float.round(Nx.to_number(golden_result.fun_evals) / Nx.to_number(brent_result.fun_evals), 1)}x fewer evaluations")

How Brent’s Method Works

Brent’s method tracks six points during optimization:

  • a, b: Current bracket bounds
  • x: Best point (lowest function value)
  • w: Second best point
  • v: Previous value of w

At each iteration:

  1. Try parabolic interpolation through x, w, v
  2. If the parabolic step is acceptable (within bounds, making progress), use it
  3. Otherwise, take a golden section step into the larger segment
  4. Update the bracket and tracked points

This gives:

  • Superlinear convergence when the function is smooth (parabolic steps)
  • Guaranteed convergence when it’s not (golden section fallback)
alias Scholar.Optimize.Brent

fun = fn x -> Nx.pow(Nx.subtract(x, 1), 2) end

result = Brent.minimize(-5.0, 5.0, fun, tol: 1.0e-10, maxiter: 100)

IO.puts("Result from Brent.minimize:")
IO.puts("  x = #{Nx.to_number(result.x)}")
IO.puts("  f(x) = #{Nx.to_number(result.fun)}")
IO.puts("  Iterations: #{Nx.to_number(result.iterations)}")
IO.puts("  Function evaluations: #{Nx.to_number(result.fun_evals)}")

Golden Section Search

Golden Section is included primarily for educational purposes. It uses the golden ratio $\phi = \frac{\sqrt{5} - 1}{2} \approx 0.618$ to select probe points.

Key properties:

  • Guaranteed convergence for unimodal functions
  • Bracket width decreases by factor $\phi \approx 0.618$ per iteration
  • Linear convergence rate
  • Requires only function evaluations (no derivatives)
alias Scholar.Optimize.GoldenSection

fun = fn x -> Nx.pow(Nx.subtract(x, 1), 2) end

result = GoldenSection.minimize(-5.0, 5.0, fun, tol: 1.0e-10, maxiter: 100)

IO.puts("Result from GoldenSection.minimize:")
IO.puts("  x = #{Nx.to_number(result.x)}")
IO.puts("  f(x) = #{Nx.to_number(result.fun)}")
IO.puts("  Iterations: #{Nx.to_number(result.iterations)}")
IO.puts("  Function evaluations: #{Nx.to_number(result.fun_evals)}")

GPU/JIT Compatibility

Both methods are implemented as pure defn functions, making them fully compatible with JIT compilation and GPU execution:

alias Scholar.Optimize.Brent

fun = fn x -> Nx.pow(Nx.subtract(x, 3), 2) end
opts = [tol: 1.0e-8, maxiter: 500]

# This works with jit_apply for GPU acceleration
result = Nx.Defn.jit_apply(&Brent.minimize/4, [0.0, 5.0, fun, opts])

IO.puts("JIT-compiled result:")
IO.puts("  x = #{Nx.to_number(result.x)}")
IO.puts("  Converged: #{Nx.to_number(result.converged) == 1}")

Options

Tolerance

Control convergence tolerance:

alias Scholar.Optimize.Brent

fun = fn x -> Nx.pow(Nx.subtract(x, 2), 2) end

# Default tolerance (1.0e-5)
result_default = Brent.minimize(0.0, 5.0, fun)

# Looser tolerance (faster)
result_loose = Brent.minimize(0.0, 5.0, fun, tol: 1.0e-3)

IO.puts("Default tolerance (1e-5): #{Nx.to_number(result_default.fun_evals)} function evals")
IO.puts("Loose tolerance (1e-3): #{Nx.to_number(result_loose.fun_evals)} function evals")

Maximum Iterations

Limit the number of iterations:

alias Scholar.Optimize.Brent

fun = fn x -> Nx.pow(Nx.subtract(x, 2), 2) end

result = Brent.minimize(0.0, 100.0, fun, maxiter: 5)

IO.puts("With maxiter: 5")
IO.puts("  Converged: #{Nx.to_number(result.converged) == 1}")
IO.puts("  Iterations used: #{Nx.to_number(result.iterations)}")

Higher Precision with f64

For higher precision, use f64 tensors for bounds:

alias Scholar.Optimize.Brent

fun = fn x -> Nx.pow(Nx.subtract(x, 3), 2) end

# f32 precision (default)
result_f32 = Brent.minimize(0.0, 5.0, fun)

# f64 precision
a = Nx.tensor(0.0, type: :f64)
b = Nx.tensor(5.0, type: :f64)
result_f64 = Brent.minimize(a, b, fun, tol: 1.0e-12)

IO.puts("f32 result: x = #{Nx.to_number(result_f32.x)}")
IO.puts("f64 result: x = #{Nx.to_number(result_f64.x)}")
IO.puts("f64 error:  #{abs(Nx.to_number(result_f64.x) - 3.0)}")

Result Struct

Both Brent.minimize/4 and GoldenSection.minimize/4 return structs with the same fields:

alias Scholar.Optimize.Brent

fun = fn x -> Nx.pow(Nx.subtract(x, 3), 2) end

result = Brent.minimize(0.0, 5.0, fun)

IO.puts("Result struct fields:")
IO.puts("  x:          #{Nx.to_number(result.x)} (solution)")
IO.puts("  fun:        #{Nx.to_number(result.fun)} (function value at solution)")
IO.puts("  converged:  #{Nx.to_number(result.converged)} (1 = yes, 0 = no)")
IO.puts("  iterations: #{Nx.to_number(result.iterations)}")
IO.puts("  fun_evals:  #{Nx.to_number(result.fun_evals)}")

When to Use Which Method

Method Use Case
Brent Default choice for all scalar optimization
Golden Section Educational purposes, or when you need guaranteed linear convergence behavior

Both methods:

  • Work with any continuous unimodal function
  • Require no derivatives
  • Are JIT/GPU compatible
  • Guarantee convergence within the specified bracket