SCIP 2.2 Hierarchical Data and the Closure Property
Mix.install([])
2.2 Hierarchical Data and the Closure Property
Pairs provide a primitive “glue” that we can use to construct compound data structures. We can visualize this using box-and-pointer notation.
graph TD;
pair_a-->pair_b;
pair_a-->1;
pair_b-->pair_c;
pair_b-->2;
pair_c-->3
pair_c-->4
However we will note that pairs can not just contain primitives like numbers or strings, but also other pairs. This allows us to build up complex data structures from simple parts. The ability to create pairs whose elements are pairs is the essence of the list structure’s importance as a representational tool and is referred to as the closure property.
Closure Property: an operation that combines data objects if the result of the combination can themselves be combined using the same operation.
2.2.1 Representing Sequences
One of the useful structures we can build with pairs is a sequence.
Sequence: an ordered collection of data objects.
A straightforward representation could be:
(cons 1
(cons 2
(cons 3
cons 4 nil)))
A sequce of pairs, formed by nested cons
calls, is called a list.
List: a sequence of pairs.
Scheme provides a primitive for constructing lists, where the above could be represented as:
(list 1 2 3 4)
The above would print to (1 2 3 4)
, but note that this is a convenience and is not representative of the actual data structure.
Since a list is a list of pairs, many list operations can be programmed as recursive procedures:
(define (list-ref items n)
(if (= n 0)
(car items)
(list-ref (cdr items) (- n 1))))
(define (length items)
(if (null? items)
0
(+ 1 (length (cdr items)))))
Another common technique is to “cons
up” an answer list while cdr
ing down a list (construct up, tails down):
(define (append list1 list2)
if (null? list1)
list 2
(cons (car list1) (append (cdr list1) list2)))
which would result the following execution:
(append (list 1 2 3) (list 4 5 6))
(cons (car (list 1 2 3)) (append (cdr (list 1 2 3)) (list 4 5 6)))
(cons 1 (append (cdr (list 1 2 3)) (list 4 5 6)))
(cons 1 (append (list 2 3) (list 4 5 6)))
(cons (car (list 2 3)) (append (cdr (list 2 3)) (list 4 5 6)))
(cons 2 (append (cdr (list 2 3)) (list 4 5 6)))
(cons 2 (append (list 3) (list 4 5 6)))
(cons (car (list 3)) (append (cdr (list 3)) (list 4 5 6)))
(cons 3 (append (cdr (list 3)) (list 4 5 6)))
(cons 3 (append () (list 4 5 6)))
(cons 3 (list 4 5 6))
(3 4 5 6)
(cons 2 (3 4 5 6))
(2 3 4 5 6)
(cons 1 (2 3 4 5 6))
(1 2 3 4 5 6)
One of the most common operations is to apply a transformation to each element in a list and then generate a list of results. This higher order procedure is called a map.
(define (map proc items)
(if (null? items)
nil
(cons (proc (car items))
(map proc (cdr items)))))
defmodule EnumMap do
def map(_proc, []), do: []
def map(proc, [head | tail]) do
[proc.(head) | map(proc, tail)]
end
end
EnumMap.map(fn x -> x + 1 end, [1, 2, 3, 4])
You could write a procedure that applies map logic itself or calls map
under the hood. In either case, the difference is not the computer applying a different process. It is applying the same process. However, it makes us think about the process differently. map
establishes an abstraction barrier that isolates the implementation of procedures that transform lists from the details of how the elements of the list are extracted and combined.
We can transform this into an iterative process and apply tail call optimization by using an accumulator.
defmodule EnumMapTail do
def map(proc, list) do
run_map(proc, list, [])
end
defp run_map(_proc, [], acc), do: Enum.reverse(acc)
defp run_map(proc, [head | tail], acc) do
run_map(proc, tail, [proc.(head) | acc])
end
end
EnumMapTail.map(fn x -> x + 1 end, [1, 2, 3, 4])
2.2.2 Hierarchical Structures
We can generalize sequences whose elements may themselves be sequences. For example, we can construct the object ((1 2) 3 4)
by:
(cons (list 1 2) (list 3 4))
Sequences whose elements are sequences can be thought of as trees. Recursion is a natural tool for dealing with tree structures as can reduce operations on trees to operations on their branches, which reduce in turn to operations onf the branches of the branches, and so on, until we reach the leaves of the tree.
(define x (cons (list 1 2) (list 3 4)))
(length x)
; 3
(count-leaves x)
; 4
Let’s lay out some logic:
length
-
the length of a list
x
is 1 plus the length of thecdr
ofx
- the length of an empty list is 0
count-leaves
-
the count of an empty list is 0
-
in the reduction step, where we strip off the
car
of the list, we must take into account that thecar
may itself be a tree
-
in the reduction step, where we strip off the
-
the count of a tree
x
is:-
the
count-leaves
of thecar
ofx
-
+
-
the
count-leaves
of thecdr
ofx
-
the
- the count of a leaf is 1
# Hmmm...my first pass used the `head | tail` matching that I am used to
# however, this never matched against a leaf directly and thus required
# three invocations of `count_leaves`
defmodule CountLeaves do
def count_leaves([]) do
0
end
def count_leaves([head | tail]) when is_list(head) do
count_leaves(head) + count_leaves(tail)
end
def count_leaves([_head | tail]) do
1 + count_leaves(tail)
end
end
IO.puts(CountLeaves.count_leaves([1, 2, 3, 4, 5]))
IO.puts(CountLeaves.count_leaves([[1, 2], [3, 4, 5]]))
IO.puts(CountLeaves.count_leaves([1, [2, [3, 4, 5, 6], 7, [8, 9], 10], 11]))
# I can match specifically to SCIP's logic by matching against a (potential) leaf
# and not simply falling back on a `head | tail` match
defmodule CountLeavesLisp do
def count_leaves([]), do: 0
def count_leaves([head | tail] = x) when is_list(x) do
count_leaves(head) + count_leaves(tail)
end
def count_leaves(_), do: 1
end
IO.puts(CountLeavesLisp.count_leaves([1, 2, 3, 4, 5]))
IO.puts(CountLeavesLisp.count_leaves([[1, 2], [3, 4, 5]]))
IO.puts(CountLeavesLisp.count_leaves([1, [2, [3, 4, 5, 6], 7, [8, 9], 10], 11]))
2.2.3 Sequences as Conventional Interfaces
Processes can often be broken down into distinct operations, so much so that a signal-processing engineer would find it natural to conceptualize these processes in terms of signals flowing through a cascade of stages. However procedures will often decompose the computations in different ways so that the enumeration is spread over the program, mingled here and there with operations such as map, filter, and accumulation. While two procedures may under the hood use the same fundamental concepts of maps, filters, and accumulation the actual implementation will obscure their similarity.
Consider the following implementation of summing the odd numbers in a sequence:
import Integer
defmodule SumOddSquares do
def perform(tree) do
case tree do
[] ->
0
val when is_integer(val) ->
if Integer.is_odd(val) do
val * val
else
0
end
[head | tail] ->
perform(head) + perform(tail)
end
end
end
SumOddSquares.perform([1, 2, 3, 4, 5])
There is quite a lot going on: matching, filtering, accumulation. These are very common operations, but they are obscured by being spread out in ways specific to this operation.
The key to organizing programs so as to more clearly reflect the signal-flow structure is to concentrate on the “signals” that flow from one stage in the process to the next.
If we represent these signals as lists, then we can use list operations to implement the processing at each of the stages.
Let’s define filter
, accumulate
, and enumerate_tree
operations that expect a list:
defmodule Operations do
def filter(sequence, predicate) do
case sequence do
[] ->
[]
[head | tail] ->
case predicate.(head) do
true -> [head | filter(tail, predicate)]
false -> filter(tail, predicate)
end
end
end
def accumulate(sequence, operation, initial) do
case sequence do
[] ->
initial
[head | tail] ->
operation.(head, accumulate(tail, operation, initial))
end
end
def enumerate_tree(sequence) do
case sequence do
[] ->
[]
[head | tail] when is_list(head) ->
enumerate_tree(head) ++ enumerate_tree(tail)
[head | tail] ->
[head | enumerate_tree(tail)]
end
end
end
Operations.filter([1, 2, 3, 4, 5], &Integer.is_odd/1)
# Operations.accumulate([1, 2, 3, 4, 5], &+/2, 0)
# Operations.enumerate_tree([1, [2, 3], [4, [5]]])
Now we can express the original SumOddSquares
as sequence operations:
defmodule BetterSumOddSquares do
def perform(sequence) do
sequence
|> Operations.enumerate_tree()
|> Operations.filter(&Integer.is_odd/1)
|> Enum.map(fn x -> x * x end)
|> Operations.accumulate(&+/2, 0)
end
end
BetterSumOddSquares.perform([1, [2, 3], [4, [5]]])
The value of expressing programs as sequence operations is that this helps us make program designs that are modular, that is, designs that are constructued by combining relatively independent pieces. Modular construction is a powerful strategy for controlling complexity in engineering design.