Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Exploring Scitree

examples/exploring_scitree.livemd

Exploring Scitree

Mix.install([
  {:explorer, "~> 0.1.1"},
  {:scitree, "~> 0.1.0"}
])

Settings

  • To facilitate some examples we will use the explorer which is a fantastic library to work with datasets (See the documentation to configure).

  • Scitree is a wrapper over the yggdrasil library and has several decision tree algorithms, with which we can perform training in an optimized way.

Wine Data Set

For our examples we will use the Wine Data Set and then convert this dataset to map in a format accepted by Scitree.

The first is the dataset_train which will be used to train our model and the second is the dataset_predict which will be used in our predictions.

For the predictions, 10 samples were randomly selected.

dataset_train =
  Explorer.Datasets.wine()
  |> Explorer.DataFrame.to_map()

dataset_predict =
  Explorer.Datasets.wine()
  |> Explorer.DataFrame.sample(10)
  |> Explorer.DataFrame.to_map()

To generate a training, the scitree needs to receive a configuration with guidelines on how the training should be done, more details on configuration options can be found in the Scitree.Config module.

Then we pass the settings and our training dataset to train/2 which returns a status and a reference to the generated model.

ref =
  Scitree.Config.init()
  |> Scitree.Config.label("class")
  |> Scitree.Config.learner(:random_forest)
  |> Scitree.Config.task(:classification)
  |> Scitree.train(dataset_train)

A data specification is a list of attribute definitions that indicates how a dataset is semantically understood.

Check the documentation of the inspect_dataspec/1 function for more details

Scitree.inspect_dataspec(ref)

To perform the prediction, just call the function predict/1, passing the reference of our model and the desired dataset. The function must return a status and a list of predictions.

The class column was removed from our dataset because the idea was to simulate a structure without the column with the results. However, if the column with the result is present, internally this removal operation will be performed.

predictions = Scitree.predict(ref, dataset_predict |> Map.delete(:class))

For the calculations it is necessary to work with the data in the form of Nx tensors.

To convert the probabilities predictions to class you can use probabilities_to_class, the index of the first element start with one because zero is reserved to out of dictionar

The result is in tersor format.

Scitree.Predictions.probabilities_to_class(predictions)