Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Data transform with Explorer

data_transform_with_explorer.livemd

Data transform with Explorer

Mix.install([
  {:kino_explorer, "~> 0.1.11"}
])

Introduction

To explore and transform data in Livebook we use two libraries:

  • The explorer package brings series (one-dimensional) and dataframes (two-dimensional) for fast data exploration to Elixir.

  • The kino_explorer package automatically renders an Explorer.DataFrame or Explorer.Series as a data table.

We will make extensive use of Explorer.DataFrame‘s functions, so it’s handy to alias the module to something shorter. We will also require the Explorer.DataFrame module to use its querying facilities:

alias Explorer.DataFrame, as: DF
require Explorer.DataFrame

All set, let’s go.

Quick introduction to Explorer and data tables

In short, Explorer is the DataFrame library for Elixir. It brings the essential data analysis, exploration, and transformation tools to the Elixir ecosystem, while the integration provided by KinoExplorer makes it effortless to visualize and interact with data through data tables.

Data tables offers a variety of features to make viewing data more convenient. You can easily select cells, columns or even the entire table, switch between paging and infinite scrolling, sort the columns or search for specific data. Keyboard shortcuts, including copy selection, are also available. Let’s render a DataFrame to see an example:

Explorer.Datasets.fossil_fuels()
|> DF.filter(contains(country, "A") and year < 2013)
|> DF.select(["year", "country", "total"])

The Data transform smart cell

Data tables let us quickly view the raw data, without modifying it, while the Data transform cell allows us to transform it, creating insightful and flexible data pipelines and seeing the results on the fly.

Before we explore its features, we need some data to work with.

teams =
  DF.new(
    weekday: [
      "Monday",
      "Tuesday",
      "Wednesday",
      "Thursday",
      "Friday",
      "Monday",
      "Tuesday",
      "Wednesday",
      "Thursday",
      "Friday"
    ],
    team: ["A", "B", "C", "A", "B", "C", "A", "B", "C", "A"],
    hour: [10, 9, 10, 10, 11, 15, nil, 16, 14, 16]
  )
weekdays =
  teams
  |> DF.lazy()
  |> DF.mutate(hour: fill_missing(hour, :forward))
  |> DF.filter(hour > 10 and team != "B")
  |> DF.arrange(asc: weekday)
  |> DF.collect()
  |> DF.pivot_wider("team", "hour")

Let’s break down what happened in the previous Data transform cell.

Currently, the Data transform cell supports several operations, such as sorting, filter, pivot_wider, and more. Each operation has its own colored card and you can move the cards to reorder the operations and see the changes in real time.

Except for pivot_wider, you can have multiple operations of any type. If two similar operations are in a row, they are grouped, and a single query command is generated for them in the code. However, you still have individual control over each.

You can also use the toggle button to enable/disable the operations. This is particularly useful for seeing the implications of each step in your pipeline without having to rewrite it.

The initial state is purely a suggestion. You can easily add and remove operations to get the pipeline that meets your needs.

Finally, the assign to field allows you to save the resulting DataFrame in a variable for later use in the notebook or in conjunction with other Smart Cells. For example, to plot a chart using the Chart cell.

Shortcuts

Key Combo Description
↑ ↓ ← → Moves the currently selected cell and clears other selections
shift + ↑ ↓ ← → Extends the current selection range in the direction pressed
alt + ↑ ↓ ← → Moves the currently selected cell and retains the current selection
ctrl + ↑ ↓ ← → Moves the selection as far as possible in the direction pressed
ctrl + shift + ↑ ↓ ← → Extends the selection as far as possible in the direction pressed
shift + home end Extends the selection as far as possible in the direction pressed
ctrl + A Selects all cells
shift + ␣ Selects the current row
ctrl + ␣ Selects the current col
esc Clear the current selection
ctrl + C Copies the current selection
ctrl + home end Moves the selection to the first/last cell in the data table
ctrl + shift + home end Extends the selection to the first/last cell in the data table