Monocular Depth Estimation: MiDaS v2.1
File.cd!(__DIR__)
# for windows JP
System.shell("chcp 65001")
System.put_env("NNCOMPILED", "YES")
Mix.install([
{:tfl_interp, path: ".."},
{:cimg, "~> 0.1.19"},
{:nx, "~> 0.4.0"},
{:kino, "~> 0.7.0"}
])
0.Original work
Intelligent Systems Lab Org:
“Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer”
Thanks a lot!!!
Implementation with TflInterp in Elixir
1.Defining the inference module: Midas
-
Model
midas.tflite: get from “https://github.com/isl-org/MiDaS/releases/download/v2_1/model_opt.tflite” if not existed.
-
Pre-processing
Resize input image to {@width, @height} and normalize it to a range of {-1.0, 1.0}.
-
Post-processing
The depth image {@width, @height, 1} is outputted directly by this model.
defmodule Midas do
@width 256
@height 256
alias TflInterp, as: NNInterp
use NNInterp,
model: "./model/model_opt.tflite",
url: "https://github.com/isl-org/MiDaS/releases/download/v2_1/model_opt.tflite",
inputs: [f32: {1, @height, @width, 3}],
outputs: [f32: {1, @height, @width, 3}]
def apply(img) do
# preprocess
input0 =
CImg.builder(img)
|> CImg.resize({@width, @height})
|> CImg.to_binary(range: {-1.0, 1.0})
# prediction
output =
session()
|> NNInterp.set_input_tensor(0, input0)
|> NNInterp.invoke()
|> NNInterp.get_output_tensor(0)
|> Nx.from_binary(:f32)
|> Nx.reshape({@height, @width})
# postprocess
[min, max] =
[Nx.window_min(output, {@height, @width}), Nx.window_max(output, {@height, @width})]
|> Enum.map(&Nx.squeeze/1)
|> Enum.map(&Nx.to_number/1)
{w, h, _, _} = CImg.shape(img)
Nx.subtract(output, min)
|> Nx.divide(max - min)
|> Nx.to_binary()
|> CImg.from_binary(@width, @height, 1, 1)
|> CImg.resize({w, h})
end
end
Launch Midas
.
# TflInterp.stop(Midas)
Midas.start_link([])
Display the properties of the Midas
model.
TflInterp.info(Midas)
2.Let’s try it
Load a photo and apply Midas to it.
img = CImg.load("sample.jpg")
result =
Midas.apply(img)
|> CImg.color_mapping(:jet)
Enum.map([img, result], &CImg.display_kino(&1, :jpeg))
|> Kino.Layout.grid(columns: 2)
□