Road Accident Severity Classification using US Accidents Dataset

Published in My academic website, 2021

Most employees started to work from home due to social-distancing measures imposed by public health authorities to help prevent workplace exposure at the beginning of COVID-19 pandemic. As a result, gridlocked roads emptied out, and the congestion declined very sharply [1] . In order to predict accident-induced congestion severity levels, I utilized a huge US accident dataset of 1.5 million observations. Next, I predicted the accident severity classes using Random Forest (RF) Bootstrap Aggregation, and heuristic Support Vector Machine (SVM) - one-vs-one and one-vs-rest - after feature selection analysis (correlation coefficients and mutual information criteria). I then assessed the performance of classifiers through credible interval determination and binomial significance tests. The RF (bootstrap aggregation) outperforms both the base model (logistic regression) [2] , and heuristic SVM in terms of overall prediction accuracy, and confusion matrix metric. The study also demonstrates that traffic accident-induced congestion has been less severe than prepandemic levels.

Machine learning models in this study.

Recommended citation: Dahir A (2021). "Road Accident Severity Classification using US Accidents Dataset." My academic website.
Download Paper