Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Time Series Plots in Tucan

time_series_plots_in_tucan.livemd

Time Series Plots in Tucan

# Comment this out for running from the path
# System.put_env("TUCAN_DEV", "true")

tucan_version =
  case System.get_env("TUCAN_DEV") do
    nil -> "~> 0.3.0"
    _other -> [path: Path.expand("..", __DIR__)]
  end

Mix.install([
  {:tucan, tucan_version},
  {:kino_vega_lite, "~> 0.1.10"}
])

alias VegaLite, as: Vl

Introduction

This tutorial is an adaptation to Tucan of Basic Time Series Plots in Vega-Lite by Jon E. Froehlich

We will explore creating visualizations of Seattle’s daily maximum temperature showing how to use position, color, and size to create multi-dimensional plots. We will create temporal scatter plots, dot/strip plots, heatmaps, and bubble plots.

We will then show how to use Tucan’s functionalities to calculate and plot average monthly weather data.

Our first visualization - scatter

> Throughout this notebook we will use the :weather dataset that comes with Tucan.

We can start by using Tucan.scatter/4 to plot all dataset points. Let’s encode time on the x-axis and temp_max on the y-axis. By default scatter expects two quantitative variables. We have to specify that the x-axis encodes time by explicitly setting the type to temporal through the :x option.

We set the filled option to true in order to have filled points and enable the tooltip in order to add some interactivity.

max_temp_scatter =
  Tucan.scatter(:weather, "date", "temp_max", filled: true, x: [type: :temporal], tooltip: true)
  |> Tucan.set_size(700, 400)
  |> Tucan.set_title("Daily Max Temperatures in Seattle 2012 - 2015")
  |> Tucan.Axes.set_y_title("Max Temperature")

Semantic grouping by color

We can also add weather as an additional encoding channel using color. To control the color palette, we set the color scale.

Since we want to control the visual values, we specify a range. We carefully choose a palette that has semantic meaning: yellow for sun, gray for fog, blue for rain, etc.

You can either pipe the previous plot through Tucan.color_by/3 or set directly the color_by option to the Tucan.scatter/4 call.

color_palette = ["#aec7e8", "#c7c7c7", "#1f77b4", "#9467bd", "#e7ba52"]

Tucan.color_by(max_temp_scatter, "weather")
|> Tucan.Scale.set_color_scheme(color_palette)

Semantic grouping by size

We can also control the size of the points by a fourth variable. Let’s use precipitation for this. Notice that we also set the range of the size encoding in order to ensure that all points are included in the graph.

Notice how the tooltip content changes with respect to the encoded parameters.

max_temp_scatter
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.size_by("precipitation", scale: [range: [5, 350]])

Aggregating around month

To track seasonal patterns, let’s aggregate the temporal dimension around the monthdate time unit, which is sensitive to month and date but not year—and can be useful for binning time values to look at seasonal patterns.

Let’s also specifically format the x-axis date format to print out the abbreviated month name. See Vega-Lite’s text format documentation for more details. We will use the Tucan.Axes.put_options/3 helper which can set any option on the given axis.

Also notice that the :y option (like all encoding options) can be used to set any arbitrary option to the y encoding channel.

aggregated_scatter =
  Tucan.scatter(:weather, "date", "temp_max",
    x: [type: :temporal, time_unit: :monthdate],
    y: [aggregate: :mean],
    width: 700,
    height: 400
  )
  |> Tucan.color_by("weather", scale: [range: color_palette])
  |> Tucan.size_by("precipitation", type: :quantitative, scale: [range: [5, 350]])
  |> Tucan.set_title("Daily Max Temperatures in Seattle 2012 - 2015")
  |> Tucan.Axes.set_y_title("Max Temperature")
  |> Tucan.Axes.set_x_title("Aggregated Months (2012-2015)")
  |> Tucan.Axes.put_options(:x, format: "%b")

Faceting the plot

We can use the Tucan.facet_by/4 function to split the plot into small multiples. Let’s split the above graph by weather in order to highlight weather-based temporal trends.

You can use various helper methods to modify things like the legend position or reset the dimensions of a plot or the axes titles.

aggregated_scatter
|> Tucan.facet_by(:column, "weather")
|> Tucan.set_size(130, 140)
|> Tucan.Axes.set_x_title("Month")
|> Tucan.Legend.set_orientation(:color, "bottom")
|> Tucan.Legend.set_orientation(:size, "bottom")

Combining two plots using concatenation

We can use Tucan‘s concat methods to combine plots vertically or horizontally. This is a form of view composition. We can combine plots horizontally with Tucan.hconcat/2, vertically with Tucan.vconcat/2, or via a general wrappable Tucan.concat/2.

Let’s combine both a bar-based frequency plot with this temporal plot.

frequencies =
  Tucan.countplot(:weather, "weather", orient: :vertical, width: 700)
  |> Tucan.color_by("weather", scale: [range: color_palette])
  |> Tucan.Axes.set_x_title("Num of Days with Weather (2012-2015)")

Tucan.vconcat([aggregated_scatter, frequencies])

Stripplot

A strip plot is another way to explore variations in weather over time. In this case, let’s encode the date by month along the x-axis, the date by year on the y-axis, and weather via color.

strip =
  Tucan.stripplot(:weather, "date",
    group_by: "date",
    x: [time_unit: :monthdate, type: :temporal],
    y: [time_unit: :year, type: :ordinal],
    width: 700,
    fill_opacity: 1
  )
  |> Tucan.color_by("weather", scale: [range: color_palette])
  |> Tucan.Axes.set_y_title("Year")
  |> Tucan.Axes.set_x_title("Month")
  |> Tucan.Axes.put_options(:x, format: "%b")

We could add in a fourth dimension by encoding the tick size as a function of the max_temp field that day.

strip
|> Tucan.size_by("temp_max")
|> Tucan.set_height(140)

You could also change the :mode to :jitter to plot jittered points instead of ticks.

Tucan.stripplot(:weather, "date",
  group_by: "date",
  style: :jitter,
  x: [time_unit: :monthdate, type: :temporal],
  y: [time_unit: :year, type: :ordinal],
  fill_opacity: 1
)
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.Axes.set_y_title("Year")
|> Tucan.Axes.set_x_title("Month")
|> Tucan.Axes.put_options(:x, format: "%b")
|> Tucan.set_size(700, 300)

Strip plot split by weather

Rather than encoding year on the y-axis, let’s encode the weather.

Tucan.stripplot(:weather, "date",
  group_by: "weather",
  x: [time_unit: :monthdate, type: :temporal],
  width: 700,
  fill_opacity: 1
)
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.Axes.set_y_title("Weather")
|> Tucan.Axes.set_x_title("Month")
|> Tucan.Axes.put_options(:x, format: "%b")

Distribution plots

Tucan provides several plot types for displaying the distribution of a numerical variable. Let’s start by plotting the boxplots of the temp_max for the various weather types.

Tucan.boxplot(:weather, "temp_max", group_by: "weather")
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.set_width(500)

We could also use Tucan.errorbar/3.

Tucan.errorbar(:weather, "temp_max",
  group_by: "weather",
  points: true,
  ticks: true,
  extent: :ci
)
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.set_width(500)

Similarly we could use Tucan.stripplot/3 this time encoding the temp_max on the x-axis. Notice that we use uniform jittering.

Tucan.stripplot(:weather, "temp_max", group_by: "weather", style: :jitter, jitter_mode: :uniform)
|> Tucan.color_by("weather", scale: [range: color_palette])
|> Tucan.set_width(500)
|> Tucan.set_height(300)

Weather heatmaps

A heatmap uses color to encode the magnitude of a value.

Let’s use a heatmap to examine how Seattle’s max temperature changes over the year. For this, we will encode the date by day along the x-axis, the date by month along the y-axis, and the average temp_max as color.

heatmap =
  Tucan.heatmap(:weather, "date", "date", "temp_max",
    x: [time_unit: :date],
    y: [time_unit: :month],
    color: [aggregate: :mean]
  )
  |> Tucan.set_title("Heatmap of Avg Max Temperatures in Seattle (2012-2015)")
  |> Tucan.Axes.set_x_title("Day")
  |> Tucan.Axes.set_y_title("Month")
  |> Tucan.Legend.set_title(:color, "Avg Max Temp")

Changing the heatmap color scheme

Let’s change the color scheme to something more semantic with high temperature values mapped to red and low temperature values mapped to blue. See the Vega-Lite color scheme documentation for more details on available schemes.

The only change to the previous plot is that we will change the :color encoding scheme. We can use the Tucan.Scale.set_color_scheme/3 helper to easily set or change the color scheme of an existing plot. We use the reverse: true option in order to make the low temperatures blue and the high temperatures red.

Tucan.Scale.set_color_scheme(heatmap, :redyellowblue, reverse: true)

Adding in heatmap cells the labels

You can add annotations by setting the :annotate option to true. This will add a :text encoding to the plot which you can configure like all other encodings through the :text option. You can also conditionally color the text using :text_color

Tucan.heatmap(:weather, "date", "date", "temp_max",
  annotate: true,
  x: [time_unit: :date],
  y: [time_unit: :month],
  color: [aggregate: :mean],
  text: [format: ".1f"],
  text_color: [{nil, 9, "white"}, {25, nil, "white"}]
)
|> Tucan.set_title("Heatmap of Avg Max Temperatures in Seattle (2012-2015)")
|> Tucan.Axes.set_x_title("Day")
|> Tucan.Axes.set_y_title("Month")
|> Tucan.Legend.set_title(:color, "Avg Max Temp")
|> Tucan.Scale.set_color_scheme(:redyellowblue, reverse: true)
|> Tucan.set_width(800)

Punchcard Plots

Similar to the heatmap example, we can also make a punchcard to explore temporal Seattle weather patterns.

We’ll use roughly the same encodings as before but this time map the average temp_max to circle size.

Tucan.punchcard(:weather, "date", "date", "temp_max",
  x: [time_unit: :date],
  y: [time_unit: :month],
  size: [aggregate: :mean]
)
|> Tucan.Axes.set_x_title("Day")
|> Tucan.Axes.set_y_title("Month")
|> Tucan.Legend.set_title(:size, "Avg Max Temp")

Add dual encoding of temp max

We could also set a dual encoding where both color and size encode the average temp_max field. Notice that since this is a layered plot we also need to set recursive: true in the Tucan.color_by/3 call.

Tucan.punchcard(:weather, "date", "date", "temp_max",
  x: [time_unit: :date],
  y: [time_unit: :month],
  size: [aggregate: :mean]
)
|> Tucan.color_by("temp_max", aggregate: :mean, recursive: true)
|> Tucan.Scale.set_color_scheme(:redyellowblue, reverse: true)
|> Tucan.Axes.set_x_title("Day")
|> Tucan.Axes.set_y_title("Month")
|> Tucan.Legend.set_title(:size, "Avg Max Temp")

Encode precipitation as mark size

Alternatively, we could encode precipitation as mark size.

Tucan.punchcard(:weather, "date", "date", "precipitation",
  x: [time_unit: :date],
  y: [time_unit: :month],
  size: [aggregate: :mean]
)
|> Tucan.Axes.set_x_title("Day")
|> Tucan.Axes.set_y_title("Month")

Encode both temperature and precipitation

You can encode add both temperature and precipitation to the same punchcard plot by using both color and size encodings.

We are also using Tucan.Legend.set_orientation/3 helper to change the position of the two legends.

Tucan.punchcard(:weather, "date", "date", "precipitation",
  x: [time_unit: :date],
  y: [time_unit: :month],
  size: [aggregate: :mean]
)
|> Tucan.color_by("temp_max", aggregate: :mean, type: :quantitative, recursive: true)
|> Tucan.Scale.set_color_scheme(:redyellowblue, reverse: true)
|> Tucan.Axes.set_x_title("Day")
|> Tucan.Axes.set_y_title("Month")
|> Tucan.Legend.set_orientation(:color, "top")
|> Tucan.Legend.set_orientation(:size, "top")

Lineplots - Plotting average monthly temperatures

While useful to graph the raw data above, aggregating our data along higher-order temporal dimensions like month or year might help highlight additional trends. Let’s try graphing the average maximum temperature in Seattle by month from 2012-2015.

We can do this by combining Vega-Lite’s :time_unit and :aggregate properties. We will specify that we want the x-axis in months and to aggregate using the mean.

We will use Tucan.lineplot/4 to plot the seasonal trend. By setting points: true we include the points to the line plot.

avg_temperature =
  Tucan.lineplot(:weather, "date", "temp_max",
    x: [time_unit: :month, type: :temporal],
    y: [aggregate: :mean],
    points: true,
    tooltip: true,
    width: 700
  )
  |> Tucan.set_title("Average Daily Max Temperatures in Seattle (2012-2015) by Month")

Adding annotations for daily max average

We can add in an average line using Tucan.hruler/3. We can add a line either on a specific y-axis point or on a calculated point.

Tucan.hruler(avg_temperature, "temp_max", line_color: "red")

Lines by weather type

Similarly to all other plots we can use color_by to plot multiple lines for each weather type. Additionally we will use stroke_dash_by to make the plot more accessible.

Tucan.lineplot(:weather, "date", "temp_max",
  x: [time_unit: :month, type: :temporal],
  y: [aggregate: :mean],
  points: true,
  tooltip: true
)
|> Tucan.color_by("weather")
|> Tucan.Scale.set_color_scheme(color_palette)
|> Tucan.stroke_dash_by("weather")
|> Tucan.set_title("Average Daily Max Temperatures in Seattle (2012-2015) by Month & Weather")
|> Tucan.set_size(700, 350)

Showing the confidence interval

We can use Tucan.errorband/4 to plot the confidence interval. We also color by the weather type in order to display the max temperatures intervals by weather type.

Tucan.errorband(:weather, "date", "temp_max", x: [time_unit: :month, type: :temporal])
|> Tucan.color_by("weather")
|> Tucan.Scale.set_color_scheme(color_palette)
|> Tucan.set_title("Daily Max Temperatures in Seattle (2012-2015) by Month & Weather")
|> Tucan.set_size(700, 350)

Usually you want to overlay the confidence intervals with the trend lines. You can achieve this by using the Tucan.layers/2 helper.

errorbands =
  Tucan.errorband(:weather, "date", "temp_max", x: [time_unit: :month, type: :temporal])

trend_lines =
  Tucan.lineplot(:weather, "date", "temp_max",
    x: [time_unit: :month, type: :temporal],
    y: [aggregate: :mean],
    points: true,
    tooltip: true
  )

Tucan.layers([errorbands, trend_lines])
|> Tucan.set_title("Daily Max Temperatures in Seattle (2012-2015) by Month & Weather")
|> Tucan.set_size(700, 350)
|> Tucan.color_by("weather", recursive: true)
|> Tucan.Scale.set_color_scheme(color_palette)

Composite Plots

Tucan.jointplot(:weather, "temp_max", "precipitation",
  color_by: "weather",
  ratio: 0.3,
  fill_opacity: 0.5
)
Tucan.pairplot(:weather, ["temp_min", "temp_max", "precipitation", "wind"], diagonal: :histogram)
|> Tucan.color_by("weather", recursive: true)
|> Tucan.Legend.set_orientation(:color, "top")

# |> Tucan.Scale.set_color_scheme(color_palette)