Powered by AppSignal & Oban Pro
Would you like to see your link here? Contact us

datanalysis

data_analysis.livemd

datanalysis

Mix.install([
  {:explorer, "~> 0.10.0"}
])

Data Analysis with Elixir

What follows is an analysis of over 300,000 road accidents. My goal is to demonstrate some of the important things you must keep in mind when doing data analysis.

The data is included in the repository for htis project. You can peruse it at your convenience.

EDA: Exploratory Data Analysis

In this phase we explore the dataset in general and try to get a sense of what the data looks like, This next cell prints a row from the dataset to give us an idea of what is going on.

require Explorer, as: Ex 
require Explorer.DataFrame, as: DF

data = Explorer.DataFrame.from_csv!("./road_data.csv")

Explorer.DataFrame.head(data, 1)
  

I am curious about the Vehicle_Type column. Lets explore it and see what kinds of vehicles are tracked in this dataset.

DF.distinct(data, ["Vehicle_Type"])["Vehicle_Type"]

Apparently, We have “Ridden horse” as a vehicle type, which I find… interesting. Lets see what kind of accident data we have on horses.

DF.filter_with(data, fn d -> Explorer.Series.equal(d["Vehicle_Type"], "Ridden horse") end)

Describing data

DF.describe(data)

Your goal here is to figure out under which circumstances the most casaulties happen.

data = DF.group_by(data, "Vehicle_Type")
DF.summarise_with(data, &[averages: Explorer.Series.mean(&1["Number_of_Casualties"])])
|> DF.sort_by(asc: averages)

So, there is no particular vehicle type that has more casualties than others. More grouping is needed perhaps? We can test out those ideas now to find out if specific conditions trigger more accidents.

However, this is about the limit of what we can do with this dataset. We can definitely find out which circumstances cause the most accidents and why by grouping the data in different ways and exploring the results.

But this data in and of itself will not help us actually prevent accidents. For that, we may need to do other things such as asking Subject Matter Experts, exploring particular crashes in detail and analyzing what caused them. Etc.