datanalysis
Mix.install([
{:explorer, "~> 0.10.0"}
])
Data Analysis with Elixir
What follows is an analysis of over 300,000 road accidents. My goal is to demonstrate some of the important things you must keep in mind when doing data analysis.
The data is included in the repository for htis project. You can peruse it at your convenience.
EDA: Exploratory Data Analysis
In this phase we explore the dataset in general and try to get a sense of what the data looks like, This next cell prints a row from the dataset to give us an idea of what is going on.
require Explorer, as: Ex
require Explorer.DataFrame, as: DF
data = Explorer.DataFrame.from_csv!("./road_data.csv")
Explorer.DataFrame.head(data, 1)
I am curious about the Vehicle_Type column. Lets explore it and see what kinds of vehicles are tracked in this dataset.
DF.distinct(data, ["Vehicle_Type"])["Vehicle_Type"]
Apparently, We have “Ridden horse” as a vehicle type, which I find… interesting. Lets see what kind of accident data we have on horses.
DF.filter_with(data, fn d -> Explorer.Series.equal(d["Vehicle_Type"], "Ridden horse") end)
Describing data
DF.describe(data)
Your goal here is to figure out under which circumstances the most casaulties happen.
data = DF.group_by(data, "Vehicle_Type")
DF.summarise_with(data, &[averages: Explorer.Series.mean(&1["Number_of_Casualties"])])
|> DF.sort_by(asc: averages)
So, there is no particular vehicle type that has more casualties than others. More grouping is needed perhaps? We can test out those ideas now to find out if specific conditions trigger more accidents.
However, this is about the limit of what we can do with this dataset. We can definitely find out which circumstances cause the most accidents and why by grouping the data in different ways and exploring the results.
But this data in and of itself will not help us actually prevent accidents. For that, we may need to do other things such as asking Subject Matter Experts, exploring particular crashes in detail and analyzing what caused them. Etc.