Exploration des données

screens/screens.livemd

Etalab

@etalab

transport-site

Share to X

Share to Bluesky

More notebooks

Exploration des données

Ressources avec un “duplicate resource data gouv id”

Transport.Screens.resources_with_duplicate_datagouv_id(markdown: true)
|> Kino.Markdown.new()

Ressources jamais historisées

On va déjà compter tous les datagouv_id des ressources:

import Ecto.Query

datagouv_ids =
  DB.Resource
  |> where([r], not is_nil(r.datagouv_id))
  |> select([r], map(r, [:datagouv_id]))
  |> DB.Repo.all()
  |> Enum.map(fn x -> x[:datagouv_id] end)
  |> Enum.sort()

[
  count: datagouv_ids |> Enum.count(),
  unique_count: datagouv_ids |> Enum.uniq() |> Enum.count()
]

used_datagouv_ids =
  DB.ResourceHistory
  |> select([:datagouv_id])
  |> DB.Repo.all()
  |> Enum.map(&amp; &amp;1.datagouv_id)
  |> MapSet.new()

[count: used_datagouv_ids |> Enum.count()]

Curieux !?

non_duplicate_datagouv_ids =
  datagouv_ids
  |> Enum.group_by(fn x -> x end)
  |> Enum.reject(fn {_a, b} -> b |> Enum.count() > 1 end)
  |> Enum.map(fn {a, _b} -> a end)
  |> MapSet.new()

problematic_non_duplicate_datagouv_ids =
  MapSet.difference(non_duplicate_datagouv_ids, used_datagouv_ids)

[count: problematic_non_duplicate_datagouv_ids |> Enum.count()]

ids = problematic_non_duplicate_datagouv_ids |> Enum.into([])

DB.Resource
|> select([r], %{format: r.format, count: count(r.id)})
|> where([r], r.datagouv_id in ^ids)
|> group_by([r], r.format)
|> DB.Repo.all()
|> Enum.sort_by(fn x -> -x.count end)
|> Kino.DataTable.new()

resource_history_uuids =
  DB.ResourceHistory
  |> select([r], %{uuid: fragment("payload ->> 'uuid'")})
  |> where([r], fragment("payload->>'format' = 'GTFS'"))
  |> DB.Repo.all()
  |> Enum.map(&amp; &amp;1.uuid)
  |> MapSet.new()

geojson_conversion_uuids =
  DB.DataConversion
  |> where([r], r.convert_from == "GTFS" and r.convert_to == "GeoJSON")
  |> select([dc], %{uuid: dc.resource_history_uuid})
  |> DB.Repo.all()
  |> Enum.map(&amp; &amp;1.uuid)
  |> MapSet.new()

# TODO: dry
netex_conversion_uuids =
  DB.DataConversion
  |> where([r], r.convert_from == "GTFS" and r.convert_to == "NeTEx")
  |> select([dc], %{uuid: dc.resource_history_uuid})
  |> DB.Repo.all()
  |> Enum.map(&amp; &amp;1.uuid)
  |> MapSet.new()

missing_netex = MapSet.difference(resource_history_uuids, netex_conversion_uuids)
missing_geojson = MapSet.difference(resource_history_uuids, geojson_conversion_uuids)

[
  missing_netex_per_resource_history: missing_netex |> Enum.count(),
  missing_geojson_per_resource_history: missing_geojson |> Enum.count()
]

uuids = geojson_conversion_uuids |> Enum.into([])

existing_resource_datagouv_id =
  DB.Resource
  |> select([r], %{datagouv_id: r.datagouv_id})
  |> DB.Repo.all()
  |> Enum.map(&amp; &amp;1.datagouv_id)
  |> MapSet.new()

gtfs_resources_with_no_netex =
  DB.ResourceHistory
  |> select([r], %{datagouv_id: r.datagouv_id})
  |> where([r], r.datagouv_id not in ^uuids)
  |> where([r], fragment("payload->>'format' = 'GTFS'"))
  |> distinct(:datagouv_id)
  |> DB.Repo.all()
  |> MapSet.new()
  # clean-up for not used anymore
  |> MapSet.intersection(existing_resource_datagouv_id)

uuids = netex_conversion_uuids |> Enum.into([])

gtfs_resources_with_no_geojson =
  DB.ResourceHistory
  |> select([r], %{datagouv_id: r.datagouv_id})
  |> where([r], r.datagouv_id not in ^uuids)
  |> where([r], fragment("payload->>'format' = 'NeTEx'"))
  |> distinct(:datagouv_id)
  |> DB.Repo.all()
  |> MapSet.new()
  # clean-up for not used anymore
  |> MapSet.intersection(existing_resource_datagouv_id)

[
  gtfs_resources_with_no_netex: gtfs_resources_with_no_netex |> Enum.count(),
  gtfs_resources_with_no_geojson: gtfs_resources_with_no_geojson |> Enum.count()
]

Other notebooks:

Andrés Alejos
@acalejos

Exile

Exile

exile.livemd

advanced tutorial data-science apis kino kino_bumblebee exla req multipart

2024-5-29
@instancer-kirik

resolvinator

Elixir Playbook

elixir_playbook.livemd

gen-server advanced tutorial jason kino explorer mox telemetry

2024-11-8
Ryo Wakabayashi
@RyoWakabayashi

elixir-learning

Tellus Traveler

traveler.livemd

advanced data-science apis nx evision exla req geo kino kino_maplibre

2023-1-6
NISHIGUCHI Masatoshi
@mnishiguchi

livebooks

Deep Learning from zero - Neural network

dl_from_zero_03_neural_network.livemd

tutorial advanced data-science nx exla kino_vega_lite

2024-2-28
Leandro Pereira
@leandrocp

mdex

NimblePublisher

nimble_publisher.livemd

advanced tutorial mdex phoenix_live_view phoenix_playground nimble_publisher phoenix_html req_embed

2025-7-30
@DeSchoel

Elixir_Curriculum

Code Editors

code_editors.livemd

tutorial beginner jason kino youtube hidden_cell

2026-1-10
@DockYard-Academy

curriculum

Streams

streams.livemd

tutorial intermediate advanced jason benchee kino youtube hidden_cell

2023-3-21

Back