Machine Learning Ensemble in MapReduce for Predictive Analytics 16.08

Project Start Date: Jul 1, 2016
Research Areas: Analytics, Analytics - Machine Learning, Data Management, Data Management - Big Data Platforms
Funding: Member Funded
Project Tags: ,

Project Summary

Predictive analytics enables information extraction from big data (both real-time and historical) in order to forecast potential future events with an acceptable level of assurance. Examples include what-if scenarios and risk assessment.  Machine learning is one of two broadly grouped techniques utilized for predictive analytics besides regression.  Current innovations in the areas of machine learning, such as action rules mining and contrast set learning, are promising for predictions, but also face challenges for implementation into the MapReduce Framework. Using an ensemble method, we can combine multiple prediction models for more thorough and robust predictions.

Our project utilized a Bayesian combination technique to merge serval machine-learning methods into a single, more robust online anomaly-based action rules detection system. By combining these methods, we were able to discover anomalies and frequent patterns over historical data, which resulted in more thorough and robust predictions. The main goals of this project were to (a) design algorithms that can fully leverage the MapReduce framework and (b) create machine-learning techniques to make reliable predictions.  The project objectives, which were met, included:

  1. Parallelizing algorithms of action rules mining and contrast set mining for MapReduce
  2. Designing an Independent Bayesian Classifier Combination (IBCC) based machine learning approach for predictive analytics
  3. Implementing a prototypical system of above designed ensemble-based predictive analytics system on MapReduce
  4. Conducting case studies with data provided by one of our IAB members.