Powered by AppSignal & Oban Pro

Monocular Depth Estimation by MiDaS v2.1

MiDaS.livemd

Monocular Depth Estimation by MiDaS v2.1

0.Original work

Intelligent Systems Lab Org:
“Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer”

Thanks a lot!!!


Implementation for Elixir/Nerves using TflInterp

1.Helper module

Create the module to assist with tasks such as downloading a model.

defmodule Model do
  @model_file "midas_opt.tflite"

  @wearhouse "https://github.com/isl-org/MiDaS/releases/download/v2_1/model_opt.tflite"
  @local "/data/#{@model_file}"

  def file() do
    @local
  end

  def get() do
    Req.get!(@wearhouse).body
    |> then(fn x -> File.write(@local, x) end)
  end

  def rm() do
    File.rm(@local)
  end

  def exists?() do
    File.exists?(@local)
  end
end

Get the tflite model from @wearhouse and store it in @local.

Model.get()

2.Defining the inference module: Midas

  • Pre-processing:
    Resize the input image to the size of @midas_shape and create a Float32 binary sequence normalized to the range {-2.0, 2.0}.

  • Post-processing:
    Cut the f32 depth image at its min-max value and map it to a 0-255 gray image.

defmodule Midas do
  #use TflInterp, model: Model.file()
  use TflInterp

  @midas_shape {256, 256}

  def apply(img) do
    # preprocess
    bin =
      img
      |> CImg.resize(@midas_shape)
      |> CImg.to_binary(range: {-2.0, 2.0})

    # prediction
    outputs =
      __MODULE__
      |> TflInterp.set_input_tensor(0, bin)
      |> TflInterp.invoke()
      |> TflInterp.get_output_tensor(0)
      |> Nx.from_binary({:f, 32})
      |> Nx.reshape({256, 256})

    # postprocess
    [min, max] =
      [Nx.window_min(outputs, {256, 256}), Nx.window_max(outputs, {256, 256})]
      |> Enum.map(&Nx.squeeze/1)
      |> Enum.map(&Nx.to_number/1)

    _result =
      outputs
      |> Nx.subtract(min)
      |> Nx.divide(max - min)
      |> Nx.to_binary()
      |> CImg.from_binary(256, 256, 1, 1, dtype: " Midas.apply()
|> CImg.resize({320, 240})
|> CImg.color_mapping(:jet)
|> CImg.display_kino(:jpeg)

4.TIL ;-)

Date: Feb. 5, 2022 / Nerves-livebook rpi3

It takes a long time to quantize the depth image in post-processing,

The heatmap scale (256) is narrow, so you may not see the depth details.

License

Copyright 2022 Shozo Fukuda. Apache License Version 2.0