Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

Predicting Titanic survivors with Explorer and ML 🧊🛳️ (template)

predicting-titanic-survivors-with-ml--template.livemd

Predicting Titanic survivors with Explorer and ML 🧊🛳️ (template)

Mix.install([
  {:scholar, "~> 0.2.1"},
  {:explorer, "~> 0.7.1"},
  {:exgboost, "~> 0.3"},
  {:kino_explorer, "~> 0.1.12"},
  {:kino_vega_lite, "~> 0.1.10"}
])

✋ Before starting…

📦 Importing Data

> With Livebook you can simply drag a file to import its content…

🧭 Exploring the dataset

> Survived distributions (0 = NO, 1 = YES)

> Age with regard to Survived

> Class distribution

> Class density (KDE) > > it’s a technique that let’s you create a smooth curve given a set of data. > > https://towardsdatascience.com/kernel-density-estimation-explained-step-by-step-7cc5b5bc4517

> Sex distribution

> Survived with regard to Sex

> Survived density with regard to Class

> Combining Sex and Class

🧠 Use intuition to predict survivors

Random

50% chances of guessing the right answer

Based on Sex value

The assumption is

  • Women survive
  • Men do not survive

📈 Linear and Polynomial Regression to predict survivors

Prepare the data

  • Fill missing values
  • Categorize columns (from string to integer values)

Linear Regression

https://towardsdatascience.com/introduction-to-linear-regression-and-polynomial-regression-f8adc96f31cb

> Build target tensor

> Build features tensor

> Classifier

> Check accuracy of our trained classifier (aka model)

Polynomial regression

https://towardsdatascience.com/introduction-to-linear-regression-and-polynomial-regression-f8adc96f31cb

> From linear to polynomial > (NOTE: the classifier is still the same, just the “features” have been changed)

> Check accuracy of our trained classifier (aka model)

🌲 Decision Tree to predict survivors

https://eight2late.files.wordpress.com/2016/02/7214525854_733237dd83_z1.jpg

> Build features tensor

> Build target tensor and hot-encode it > (https://projects.volkamerlab.org/teachopencadd/_images/OneHotEncoding_eg.png)

> Build the Decision Tree and check its accuracy

⚔️ Avoid overfitting with cross-validation

https://notes.club/elixir-nx/scholar/notebooks/cv_gradient_boosting_tree

🎨 Plot the Decision Tree

# # https://stackoverflow.com/questions/60186747/how-do-i-include-feature-names-in-the-plot-tree-function-from-the-xgboost-librar
# ["Pclass", "Age", "Sex", "SibSp", "Parch", "Fare", "Embarked"]
# |> Enum.with_index(fn element, index -> "#{index} #{element} q" end)
# |> Enum.join("\n")
# |> then(&File.write!("/Users/nicolo.gnudi/fmap.txt", &1))

# EXGBoost.Booster.get_dump(model, fmap: "/Users/nicolo.gnudi/fmap.txt", format: :json)
# |> Jason.Formatter.pretty_print()
# # |> then(& File.write!("/Users/nicolo.gnudi/dt.json", &1))

Export model and import it in Python for plotting

https://github.com/acalejos/exgboost/issues/29

# # Dump the model
# EXGBoost.write_weights(model, "/Users/nicolo.gnudi/dtw")

Then, install the required Python packages

pip3 install xgboost
pip3 install graphviz

And finally plot the Decision Tree

❯ python3

>>> import xgboost as xgb
>>> model = xgb.Booster()
>>> model.load_model("/Users/nicolo.gnudi/dtw.json")
>>> g = xgb.to_graphviz(model, fmap="/Users/nicolo.gnudi/fmap.txt")
>>> g.render(filename="/Users/nicolo.gnudi/dtg")
'/Users/nicolo.gnudi/dtg.pdf'