Fault Prediction in the Crowd
Fault Prediction in the Crowd
Context and Role: Dissertation/Thesis for Big Data master's at Warwick. Sole contributor.
Background: The nice people at Cisco let me have some network device event data. There is a Github page, a PDF of the thesis/dissertation, and a Prezi.
Abstract: An investigation was conducted into a 40 GB, 326 million record event dataset. This dataset contained anonymized event information representing performance, availability, and security issues of 172,000 network devices from approximately 150 Cisco Systems customers. It was hypothesized that network device event data gathered from one customer environment could be used to predict events in another customer environment. After analysis of the dataset, a binary model was developed to predict when a process might request too much compute resources on a device. The model was developed on one set of customer data and tested on another unseen set of customer data. The Matthews correlation coefficient for the model on the unseen test data was 0.66, the F1 score was 0.72, and the False Negative rate was 27%. This was a substantial improvement over a model with no skill.