Exploring Scitree

examples/exploring_scitree.livemd

Jean Carlos

@jeantux

scitree

Share to X

Share to Bluesky

More notebooks

Exploring Scitree

Mix.install([
  {:explorer, "~> 0.1.1"},
  {:scitree, "~> 0.1.0"}
])

Settings

To facilitate some examples we will use the explorer which is a fantastic library to work with datasets (See the documentation to configure).
Scitree is a wrapper over the yggdrasil library and has several decision tree algorithms, with which we can perform training in an optimized way.

Wine Data Set

For our examples we will use the Wine Data Set and then convert this dataset to map in a format accepted by Scitree.

The first is the dataset_train which will be used to train our model and the second is the dataset_predict which will be used in our predictions.

For the predictions, 10 samples were randomly selected.

dataset_train =
  Explorer.Datasets.wine()
  |> Explorer.DataFrame.to_map()

dataset_predict =
  Explorer.Datasets.wine()
  |> Explorer.DataFrame.sample(10)
  |> Explorer.DataFrame.to_map()

To generate a training, the scitree needs to receive a configuration with guidelines on how the training should be done, more details on configuration options can be found in the Scitree.Config module.

Then we pass the settings and our training dataset to train/2 which returns a status and a reference to the generated model.

ref =
  Scitree.Config.init()
  |> Scitree.Config.label("class")
  |> Scitree.Config.learner(:random_forest)
  |> Scitree.Config.task(:classification)
  |> Scitree.train(dataset_train)

A data specification is a list of attribute definitions that indicates how a dataset is semantically understood.

Check the documentation of the inspect_dataspec/1 function for more details

Scitree.inspect_dataspec(ref)

To perform the prediction, just call the function predict/1, passing the reference of our model and the desired dataset. The function must return a status and a list of predictions.

The class column was removed from our dataset because the idea was to simulate a structure without the column with the results. However, if the column with the result is present, internally this removal operation will be performed.

predictions = Scitree.predict(ref, dataset_predict |> Map.delete(:class))

For the calculations it is necessary to work with the data in the form of Nx tensors.

To convert the probabilities predictions to class you can use probabilities_to_class, the index of the first element start with one because zero is reserved to out of dictionar

The result is in tersor format.

Scitree.Predictions.probabilities_to_class(predictions)

Other notebooks:

Ryan Curtin
@ryancurtin

titanic-machine-learning

Titanic Machine Learning Project

titanic-machine-learning.livemd

tutorial advanced data-science axon exla nx explorer vega_lite kino_vega_lite jason analysis_prep

2022-8-18
Christopher James Eyre
@chriseyre2000

livebooks

Data analysis getting started

data.livemd

tutorial data-science intermediate explorer kino kino_vega_lite

2022-8-20
Numerical Elixir (Nx)
@elixir-nx

axon

Classifying fraudulent transactions

credit_card_fraud.livemd

tutorial advanced data-science axon exla explorer kino nx

2022-9-7
Numerical Elixir (Nx)
@elixir-nx

scholar

k-means clustering

k_means.livemd

tutorial advanced data-science exla explorer stb_image scidata req kino kino_vega_lite scholar nx tucan

2022-9-14
Ryo Wakabayashi
@RyoWakabayashi

elixir-learning

KinoDB Snowflake

kino_db_snowflake.livemd

advanced sql kino_db kino_explorer adbc

2023-11-6
Andrés Alejos
@acalejos

exgboost

Prediction Intervals using Quantile Regression

quantile_prediction_interval.livemd

tutorial advanced data-science exgboost explorer kino_explorer kino_vega_lite nx tucan

2023-12-1
Callum McIntyre
@mcintyre94

solana-transaction-livebo...

Helius Transaction Render

helius.livemd

tutorial advanced req jason kino

2023-12-25

Back