Imports

docs/guides/imports.livemd

zebbra

@zebbra

data_aggregator

Share to X

Share to Bluesky

More notebooks

Imports

import DataAggregator.Kino.Helpers

Using this Livebook

ℹ️ NOTE: You need to connect your livebook instance to the running application:

iex --sname dagg --cookie secret -S mix
# or
iex --sname dagg --cookie secret -S mix phx.server

Then select Runtime settings on in the Livebook sidebar to change to Attached Node using the settings above.

Intro

Imports are used to create records from a file. Currenty only CSV is supported. The main functions are:

create_from_path(collection, file_path) - Creates an import from a file for the given collection

States

The import implements a state machine and has the following states:

pending - …

The following flow chart visualizes each state and transition:

"lib/data_aggregator/records/import/import-mermaid-flowchart.md"
|> File.read!()
|> Kino.Markdown.new()

Create Import

Each Import is assigned to a Collection, so lets create one first:

# Define some aliases
alias DataAggregator.Records.Collection
alias DataAggregator.Records.Import
alias DataAggregator.Records.Record

# Disable the logger
Logger.configure(level: :info)

{:ok, collection} = Collection.create(%{name: "My Collection", owner: "John Doe", grscicoll_reference: "322ce107-3156-4420-8a2b-7f17efeaa472"})

collection
|> render_struct(keys: [:id, :name, :records_count])

# example_file = "test/support/fixtures/files/museum-dataset-import-example.csv"
example_file = "test/support/fixtures/files/dataset-10k.csv"
# example_file = "test/support/fixtures/files/dataset-100.csv"
example_file |> Explorer.DataFrame.from_csv!()

# create an import using the file
{:ok, import} = Import.create_from_path(collection, example_file, tenant: collection)

# render the struct
import |> render_struct()


17:18:22.349 [info] [fat_02uspVGpMsNILUt6pYTD7A] Successfully uploaded file as "dataset-10k.csv"

Note that the current :state of the import is :pending:

import.state

:pending

Mapping

When creating the import, the column names and types are extracted from the import and stored as :columns attribute on the import, where each column has the following fields:

name - Name of the column in the original file
type - Detected type of the column
mapped_to - Target record attribute when importing the file. When nil, the original name is used.

This is used to store information how the columns are mapped when creating the records.

import.columns
|> Kino.DataTable.new(keys: [:name, :type, :mapped_to])

The mapping can be updated using Import.update_mapping/2:

mapping = [
  %{name: "Scientific Name", mapped_to: "tax_scientific_name"},
  %{name: "Numéro scientifique GBIF", mapped_to: "mte_material_entity_id"},
  %{name: "Age", mapped_to: "age"},
  %{name: "DAYCOLLECTED", mapped_to: "day_collected"}
]

# update the column mapping
import = Import.update_mapping!(import, mapping)

# show the updated columns as table
import.columns
|> Kino.DataTable.new(name: "import.columns", keys: [:name, :type, :mapped_to])

Running imports

# run the job
# {:ok, import} = import |> Import.import()

import |> render_struct(keys: [:id, :state, :imported_at, :records_count])

Asynchronously importing records

# enqueue the job
{:ok, import} = import |> Import.enqueue()

import |> render_struct(keys: [:id, :state, :imported_at, :records_count])

{:ok, import} = Ash.reload(import, load: [:records_count])
import |> render_struct(keys: [:id, :state, :imported_at, :records_count])

Other notebooks:

Lee Dong Wook
@dongwooklee96

machine_learning_in_elixi...

chatper 08

chatper08.livemd

axon nx exla stb_image kino axon_onnx

2024-11-13
Santiago Imelio
@santiago-imelio

metnum

Jacobi Method

jacobi_method.livemd

metnum exla tucan kino_vega_lite

2024-7-4
Alex Filatov
@alexfilatov

livekit-elixir-sdk

Livekit Server SDK Basic Usage

basic_usage.livemd

livekit jason tesla hackney

2025-2-26
Livebook
@livebook-dev

kino_db

Components

components.livemd

postgrex myxql kino_db

2022-8-18
Alex Heflin
@heflinao

dockyard-curriculum

Blog: Deployment

blog_deployment.livemd

jason kino youtube hidden_cell

2025-7-24
Penny Chase
@pennychase

curriculum-2.0.0

Advanced Score Tracker

advanced_score_tracker.livemd

jason kino youtube hidden_cell

2025-7-24
Hugo Baraúna
@hugobarauna

livebook-notebooks

CI Green Streak

ci_green_streak.livemd

plug jason kino req

2024-7-3

Back