Imports
import DataAggregator.Kino.Helpers
Using this Livebook
ℹ️ NOTE: You need to connect your livebook instance to the running application:
iex --sname dagg --cookie secret -S mix
# or
iex --sname dagg --cookie secret -S mix phx.server
Then select Runtime settings
on in the Livebook sidebar to change to Attached Node
using the settings above.
Intro
Imports are used to create records from a file. Currenty only CSV is supported. The main functions are:
-
create_from_path(collection, file_path)
- Creates an import from a file for the given collection
States
The import implements a state machine and has the following states:
-
pending
- …
The following flow chart visualizes each state and transition:
"lib/data_aggregator/records/import/import-mermaid-flowchart.md"
|> File.read!()
|> Kino.Markdown.new()
Create Import
Each Import
is assigned to a Collection
, so lets create one first:
# Define some aliases
alias DataAggregator.Records.Collection
alias DataAggregator.Records.Import
alias DataAggregator.Records.Record
# Disable the logger
Logger.configure(level: :info)
{:ok, collection} = Collection.create(%{name: "My Collection", owner: "John Doe", grscicoll_reference: "322ce107-3156-4420-8a2b-7f17efeaa472"})
collection
|> render_struct(keys: [:id, :name, :records_count])
# example_file = "test/support/fixtures/files/museum-dataset-import-example.csv"
example_file = "test/support/fixtures/files/dataset-10k.csv"
# example_file = "test/support/fixtures/files/dataset-100.csv"
example_file |> Explorer.DataFrame.from_csv!()
# create an import using the file
{:ok, import} = Import.create_from_path(collection, example_file, tenant: collection)
# render the struct
import |> render_struct()
17:18:22.349 [info] [fat_02uspVGpMsNILUt6pYTD7A] Successfully uploaded file as "dataset-10k.csv"
Note that the current :state
of the import is :pending
:
import.state
:pending
Mapping
When creating the import, the column names and types are extracted from the import and stored as :columns
attribute on the import, where each column has the following fields:
-
name
- Name of the column in the original file -
type
- Detected type of the column -
mapped_to
- Target record attribute when importing the file. Whennil
, the originalname
is used.
This is used to store information how the columns are mapped when creating the records.
import.columns
|> Kino.DataTable.new(keys: [:name, :type, :mapped_to])
The mapping can be updated using Import.update_mapping/2
:
mapping = [
%{name: "Scientific Name", mapped_to: "tax_scientific_name"},
%{name: "Numéro scientifique GBIF", mapped_to: "mte_material_entity_id"},
%{name: "Age", mapped_to: "age"},
%{name: "DAYCOLLECTED", mapped_to: "day_collected"}
]
# update the column mapping
import = Import.update_mapping!(import, mapping)
# show the updated columns as table
import.columns
|> Kino.DataTable.new(name: "import.columns", keys: [:name, :type, :mapped_to])
Running imports
# run the job
# {:ok, import} = import |> Import.import()
import |> render_struct(keys: [:id, :state, :imported_at, :records_count])
Asynchronously importing records
# enqueue the job
{:ok, import} = import |> Import.enqueue()
import |> render_struct(keys: [:id, :state, :imported_at, :records_count])
{:ok, import} = Ash.reload(import, load: [:records_count])
import |> render_struct(keys: [:id, :state, :imported_at, :records_count])