Predicting Titanic survivors with Explorer and ML 🧊🛳️ (template)
Mix.install([
{:scholar, "~> 0.2.1"},
{:explorer, "~> 0.7.1"},
{:exgboost, "~> 0.3"},
{:kino_explorer, "~> 0.1.12"},
{:kino_vega_lite, "~> 0.1.10"}
])
✋ Before starting…
- Kaggle https://www.kaggle.com/competitions/titanic/data
- No Slides Conf talk by Ju Liu (@arkh4m) https://youtu.be/YhZXU5zUnO0?si=4njVBZJ9q5j0zYRP
📦 Importing Data
> With Livebook you can simply drag a file to import its content…
🧭 Exploring the dataset
> Survived distributions (0 = NO, 1 = YES)
> Age with regard to Survived
> Class distribution
> Class density (KDE) > > it’s a technique that let’s you create a smooth curve given a set of data. > > https://towardsdatascience.com/kernel-density-estimation-explained-step-by-step-7cc5b5bc4517
> Sex distribution
> Survived with regard to Sex
> Survived density with regard to Class
> Combining Sex and Class
🧠 Use intuition to predict survivors
Random
50% chances of guessing the right answer
Based on Sex value
The assumption is
- Women survive
- Men do not survive
📈 Linear and Polynomial Regression to predict survivors
Prepare the data
- Fill missing values
- Categorize columns (from string to integer values)
Linear Regression
> Build target tensor
> Build features tensor
> Classifier
> Check accuracy of our trained classifier (aka model)
Polynomial regression
> From linear to polynomial > (NOTE: the classifier is still the same, just the “features” have been changed)
> Check accuracy of our trained classifier (aka model)
🌲 Decision Tree to predict survivors
https://eight2late.files.wordpress.com/2016/02/7214525854_733237dd83_z1.jpg
> Build features tensor
> Build target tensor and hot-encode it > (https://projects.volkamerlab.org/teachopencadd/_images/OneHotEncoding_eg.png)
> Build the Decision Tree and check its accuracy
⚔️ Avoid overfitting with cross-validation
https://notes.club/elixir-nx/scholar/notebooks/cv_gradient_boosting_tree
🎨 Plot the Decision Tree
# # https://stackoverflow.com/questions/60186747/how-do-i-include-feature-names-in-the-plot-tree-function-from-the-xgboost-librar
# ["Pclass", "Age", "Sex", "SibSp", "Parch", "Fare", "Embarked"]
# |> Enum.with_index(fn element, index -> "#{index} #{element} q" end)
# |> Enum.join("\n")
# |> then(&File.write!("/Users/nicolo.gnudi/fmap.txt", &1))
# EXGBoost.Booster.get_dump(model, fmap: "/Users/nicolo.gnudi/fmap.txt", format: :json)
# |> Jason.Formatter.pretty_print()
# # |> then(& File.write!("/Users/nicolo.gnudi/dt.json", &1))
Export model and import it in Python for plotting
https://github.com/acalejos/exgboost/issues/29
# # Dump the model
# EXGBoost.write_weights(model, "/Users/nicolo.gnudi/dtw")
Then, install the required Python packages
pip3 install xgboost
pip3 install graphviz
And finally plot the Decision Tree
❯ python3
>>> import xgboost as xgb
>>> model = xgb.Booster()
>>> model.load_model("/Users/nicolo.gnudi/dtw.json")
>>> g = xgb.to_graphviz(model, fmap="/Users/nicolo.gnudi/fmap.txt")
>>> g.render(filename="/Users/nicolo.gnudi/dtg")
'/Users/nicolo.gnudi/dtg.pdf'