Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

SCIP 2.2 Hierarchical Data and the Closure Property

2.2_hierarchical_data_and_the_closure_property.livemd

SCIP 2.2 Hierarchical Data and the Closure Property

Mix.install([])

2.2 Hierarchical Data and the Closure Property

Pairs provide a primitive “glue” that we can use to construct compound data structures. We can visualize this using box-and-pointer notation.

graph TD;
  pair_a-->pair_b;
  pair_a-->1;
  pair_b-->pair_c;
  pair_b-->2;
  pair_c-->3
  pair_c-->4

However we will note that pairs can not just contain primitives like numbers or strings, but also other pairs. This allows us to build up complex data structures from simple parts. The ability to create pairs whose elements are pairs is the essence of the list structure’s importance as a representational tool and is referred to as the closure property.

Closure Property: an operation that combines data objects if the result of the combination can themselves be combined using the same operation.

2.2.1 Representing Sequences

One of the useful structures we can build with pairs is a sequence.

Sequence: an ordered collection of data objects.

A straightforward representation could be:

(cons 1
  (cons 2 
    (cons 3
      cons 4 nil)))

A sequce of pairs, formed by nested cons calls, is called a list.

List: a sequence of pairs.

Scheme provides a primitive for constructing lists, where the above could be represented as:

(list 1 2 3 4)

The above would print to (1 2 3 4), but note that this is a convenience and is not representative of the actual data structure.

Since a list is a list of pairs, many list operations can be programmed as recursive procedures:

(define (list-ref items n)
  (if (= n 0)
    (car items)
    (list-ref (cdr items) (- n 1))))

(define (length items)
  (if (null? items)
    0
    (+ 1 (length (cdr items)))))

Another common technique is to “cons up” an answer list while cdring down a list (construct up, tails down):

(define (append list1 list2)
  if (null? list1)
    list 2
    (cons (car list1) (append (cdr list1) list2)))

which would result the following execution:

(append (list 1 2 3) (list 4 5 6))
  (cons (car (list 1 2 3)) (append (cdr (list 1 2 3)) (list 4 5 6)))
    (cons 1 (append (cdr (list 1 2 3)) (list 4 5 6)))
      (cons 1 (append (list 2 3) (list 4 5 6)))
        (cons (car (list 2 3)) (append (cdr (list 2 3)) (list 4 5 6)))
          (cons 2 (append (cdr (list 2 3)) (list 4 5 6)))
            (cons 2 (append (list 3) (list 4 5 6)))
              (cons (car (list 3)) (append (cdr (list 3)) (list 4 5 6)))
                (cons 3 (append (cdr (list 3)) (list 4 5 6)))
                  (cons 3 (append () (list 4 5 6)))
                    (cons 3 (list 4 5 6))
                  (3 4 5 6)
                (cons 2 (3 4 5 6))
              (2 3 4 5 6)
            (cons 1 (2 3 4 5 6))
          (1 2 3 4 5 6)

One of the most common operations is to apply a transformation to each element in a list and then generate a list of results. This higher order procedure is called a map.

(define (map proc items)
  (if (null? items)
    nil
    (cons (proc (car items))
      (map proc (cdr items)))))
defmodule EnumMap do
  def map(_proc, []), do: []

  def map(proc, [head | tail]) do
    [proc.(head) | map(proc, tail)]
  end
end

EnumMap.map(fn x -> x + 1 end, [1, 2, 3, 4])

You could write a procedure that applies map logic itself or calls map under the hood. In either case, the difference is not the computer applying a different process. It is applying the same process. However, it makes us think about the process differently. map establishes an abstraction barrier that isolates the implementation of procedures that transform lists from the details of how the elements of the list are extracted and combined.

We can transform this into an iterative process and apply tail call optimization by using an accumulator.

defmodule EnumMapTail do
  def map(proc, list) do
    run_map(proc, list, [])
  end

  defp run_map(_proc, [], acc), do: Enum.reverse(acc)

  defp run_map(proc, [head | tail], acc) do
    run_map(proc, tail, [proc.(head) | acc])
  end
end

EnumMapTail.map(fn x -> x + 1 end, [1, 2, 3, 4])

2.2.2 Hierarchical Structures

We can generalize sequences whose elements may themselves be sequences. For example, we can construct the object ((1 2) 3 4) by:

(cons (list 1 2) (list 3 4))

Sequences whose elements are sequences can be thought of as trees. Recursion is a natural tool for dealing with tree structures as can reduce operations on trees to operations on their branches, which reduce in turn to operations onf the branches of the branches, and so on, until we reach the leaves of the tree.

(define x (cons (list 1 2) (list 3 4)))

(length x)
; 3

(count-leaves x)
; 4

Let’s lay out some logic:

length

  • the length of a list x is 1 plus the length of the cdr of x
  • the length of an empty list is 0

count-leaves

  • the count of an empty list is 0
    • in the reduction step, where we strip off the car of the list, we must take into account that the car may itself be a tree
  • the count of a tree x is:
    • the count-leaves of the car of x
    • +
    • thecount-leaves of the cdr of x
  • the count of a leaf is 1
# Hmmm...my first pass used the `head | tail` matching that I am used to
# however, this never matched against a leaf directly and thus required 
# three invocations of `count_leaves`
defmodule CountLeaves do
  def count_leaves([]) do
    0
  end

  def count_leaves([head | tail]) when is_list(head) do
    count_leaves(head) + count_leaves(tail)
  end

  def count_leaves([_head | tail]) do
    1 + count_leaves(tail)
  end
end

IO.puts(CountLeaves.count_leaves([1, 2, 3, 4, 5]))
IO.puts(CountLeaves.count_leaves([[1, 2], [3, 4, 5]]))
IO.puts(CountLeaves.count_leaves([1, [2, [3, 4, 5, 6], 7, [8, 9], 10], 11]))

# I can match specifically to SCIP's logic by matching against a (potential) leaf
# and not simply falling back on a `head | tail` match 
defmodule CountLeavesLisp do
  def count_leaves([]), do: 0

  def count_leaves([head | tail] = x) when is_list(x) do
    count_leaves(head) + count_leaves(tail)
  end

  def count_leaves(_), do: 1
end

IO.puts(CountLeavesLisp.count_leaves([1, 2, 3, 4, 5]))
IO.puts(CountLeavesLisp.count_leaves([[1, 2], [3, 4, 5]]))
IO.puts(CountLeavesLisp.count_leaves([1, [2, [3, 4, 5, 6], 7, [8, 9], 10], 11]))

2.2.3 Sequences as Conventional Interfaces

Processes can often be broken down into distinct operations, so much so that a signal-processing engineer would find it natural to conceptualize these processes in terms of signals flowing through a cascade of stages. However procedures will often decompose the computations in different ways so that the enumeration is spread over the program, mingled here and there with operations such as map, filter, and accumulation. While two procedures may under the hood use the same fundamental concepts of maps, filters, and accumulation the actual implementation will obscure their similarity.

Consider the following implementation of summing the odd numbers in a sequence:

import Integer

defmodule SumOddSquares do
  def perform(tree) do
    case tree do
      [] ->
        0

      val when is_integer(val) ->
        if Integer.is_odd(val) do
          val * val
        else
          0
        end

      [head | tail] ->
        perform(head) + perform(tail)
    end
  end
end

SumOddSquares.perform([1, 2, 3, 4, 5])

There is quite a lot going on: matching, filtering, accumulation. These are very common operations, but they are obscured by being spread out in ways specific to this operation.

The key to organizing programs so as to more clearly reflect the signal-flow structure is to concentrate on the “signals” that flow from one stage in the process to the next.

If we represent these signals as lists, then we can use list operations to implement the processing at each of the stages.

Let’s define filter, accumulate, and enumerate_tree operations that expect a list:

defmodule Operations do
  def filter(sequence, predicate) do
    case sequence do
      [] ->
        []

      [head | tail] ->
        case predicate.(head) do
          true -> [head | filter(tail, predicate)]
          false -> filter(tail, predicate)
        end
    end
  end

  def accumulate(sequence, operation, initial) do
    case sequence do
      [] ->
        initial

      [head | tail] ->
        operation.(head, accumulate(tail, operation, initial))
    end
  end

  def enumerate_tree(sequence) do
    case sequence do
      [] ->
        []

      [head | tail] when is_list(head) ->
        enumerate_tree(head) ++ enumerate_tree(tail)

      [head | tail] ->
        [head | enumerate_tree(tail)]
    end
  end
end

Operations.filter([1, 2, 3, 4, 5], &Integer.is_odd/1)
# Operations.accumulate([1, 2, 3, 4, 5], &+/2, 0)
# Operations.enumerate_tree([1, [2, 3], [4, [5]]])

Now we can express the original SumOddSquares as sequence operations:

defmodule BetterSumOddSquares do
  def perform(sequence) do
    sequence
    |> Operations.enumerate_tree()
    |> Operations.filter(&Integer.is_odd/1)
    |> Enum.map(fn x -> x * x end)
    |> Operations.accumulate(&+/2, 0)
  end
end

BetterSumOddSquares.perform([1, [2, 3], [4, [5]]])

The value of expressing programs as sequence operations is that this helps us make program designs that are modular, that is, designs that are constructued by combining relatively independent pieces. Modular construction is a powerful strategy for controlling complexity in engineering design.