Would you like to see your link here? Contact us

Notesclub

created by hec & contributors

terms privacy

Predicting Titanic survivors with Explorer and ML 🧊🛳️ (template)

predicting-titanic-survivors-with-ml--template.livemd

Nicolò G.

@nickgnd

kaggle-titanic-livebook

Share to X

Share to Bluesky

More notebooks

Predicting Titanic survivors with Explorer and ML 🧊🛳️ (template)

Mix.install([
  {:scholar, "~> 0.2.1"},
  {:explorer, "~> 0.7.1"},
  {:exgboost, "~> 0.3"},
  {:kino_explorer, "~> 0.1.12"},
  {:kino_vega_lite, "~> 0.1.10"}
])

✋ Before starting…

Kaggle https://www.kaggle.com/competitions/titanic/data
No Slides Conf talk by Ju Liu (@arkh4m) https://youtu.be/YhZXU5zUnO0?si=4njVBZJ9q5j0zYRP

📦 Importing Data

> With Livebook you can simply drag a file to import its content…

🧭 Exploring the dataset

> Survived distributions (0 = NO, 1 = YES)

> Age with regard to Survived

> Class distribution

> Class density (KDE) > > it’s a technique that let’s you create a smooth curve given a set of data. > > https://towardsdatascience.com/kernel-density-estimation-explained-step-by-step-7cc5b5bc4517

> Sex distribution

> Survived with regard to Sex

> Survived density with regard to Class

> Combining Sex and Class

🧠 Use intuition to predict survivors

Random

50% chances of guessing the right answer

Based on Sex value

The assumption is

Women survive
Men do not survive

📈 Linear and Polynomial Regression to predict survivors

Prepare the data

Fill missing values
Categorize columns (from string to integer values)

Linear Regression

https://towardsdatascience.com/introduction-to-linear-regression-and-polynomial-regression-f8adc96f31cb

> Build target tensor

> Build features tensor

> Classifier

> Check accuracy of our trained classifier (aka model)

Polynomial regression

https://towardsdatascience.com/introduction-to-linear-regression-and-polynomial-regression-f8adc96f31cb

> From linear to polynomial > (NOTE: the classifier is still the same, just the “features” have been changed)

> Check accuracy of our trained classifier (aka model)

🌲 Decision Tree to predict survivors

https://eight2late.files.wordpress.com/2016/02/7214525854_733237dd83_z1.jpg

> Build features tensor

> Build target tensor and hot-encode it > (https://projects.volkamerlab.org/teachopencadd/_images/OneHotEncoding_eg.png)

> Build the Decision Tree and check its accuracy

⚔️ Avoid overfitting with cross-validation

https://notes.club/elixir-nx/scholar/notebooks/cv_gradient_boosting_tree

🎨 Plot the Decision Tree

# # https://stackoverflow.com/questions/60186747/how-do-i-include-feature-names-in-the-plot-tree-function-from-the-xgboost-librar
# ["Pclass", "Age", "Sex", "SibSp", "Parch", "Fare", "Embarked"]
# |> Enum.with_index(fn element, index -> "#{index} #{element} q" end)
# |> Enum.join("\n")
# |> then(&File.write!("/Users/nicolo.gnudi/fmap.txt", &1))

# EXGBoost.Booster.get_dump(model, fmap: "/Users/nicolo.gnudi/fmap.txt", format: :json)
# |> Jason.Formatter.pretty_print()
# # |> then(& File.write!("/Users/nicolo.gnudi/dt.json", &1))

Export model and import it in Python for plotting

https://github.com/acalejos/exgboost/issues/29

# # Dump the model
# EXGBoost.write_weights(model, "/Users/nicolo.gnudi/dtw")

Then, install the required Python packages

pip3 install xgboost
pip3 install graphviz

And finally plot the Decision Tree

❯ python3

>>> import xgboost as xgb
>>> model = xgb.Booster()
>>> model.load_model("/Users/nicolo.gnudi/dtw.json")
>>> g = xgb.to_graphviz(model, fmap="/Users/nicolo.gnudi/fmap.txt")
>>> g.render(filename="/Users/nicolo.gnudi/dtg")
'/Users/nicolo.gnudi/dtg.pdf'

Other notebooks:

Jeremy Brayton
@w0rd-driven

livebook_notebooks

Job Application Fields to Markdown

req--job_fields.livemd

kino kino_lab kino_vega_lite kino_db req req_easyhtml

2022-8-18
Livebook
@livebook-dev

kino_vega_lite

Components

components.livemd

vega_lite kino_vega_lite

2022-8-18
Ryan Curtin
@ryancurtin

titanic-machine-learning

Titanic Machine Learning Project

titanic-machine-learning.livemd

axon exla nx explorer vega_lite kino_vega_lite jason analysis_prep

2022-8-18
Hugo Baraúna
@hugobarauna

livebook-notebooks

How to query and visualize data from Google BigQue...

livebook_google_big_query.livemd

kino_db req_bigquery kino_vega_lite

2022-8-18
Carlo Gilmar
@carlogilmar

ml_study_group

Chapter 2

chapter2.livemd

nx exla benchee

2025-4-15
Aslak Johansen
@aslakjohansen

livebook-demos

MQTT

mqtt.livemd

vega_lite kino_vega_lite kino tortoise jason

2024-1-23
@andyl

livebooks

Sequential models

2_sequential.livemd

axon nx kino

2023-12-4

Back