Developing an Incremental and Active Learning Framework for Evolving High-Volume Data Streams 16.06

Project Start Date: Jul 1, 2016
Research Areas: Analytics, Analytics - Machine Learning, Analytics - Probabilistic Modeling, Data Management, Data Management - Big Data Platforms, Data Management - Dynamic Data
Funding: Member Funded
Project Tags: ,

Project Summary

The objectives are to:

Implement a system that classifies evolving streams of data and efficiently relabels old datasets. Significance: Data streams are continuous flows of data. Examples of data streams include network traffic, sensor data, and call center and electronic health records. Large volumes of data arriving in a stream render traditional classification algorithms too inefficient, e.g. classifiers are usually entirely retrained when an incremental update occurs. Also, many times, one would like to compare newly labeled data to data previously labeled, which could a) result in inconsistencies or b) spend extra resources recomputing old data. We aim to make a unified system that will leverage labeled and unlabeled data to incrementally update the classifier and previous datasets, all in one step. We will reduce computational redundancies enabling computation of large volumes of data.

Offer active learning and integration of multiple sources for curation and improved performance. Significance: Usually, the more data one has, the better a classifier model is. However, sometimes automatic (or machine- classification is inaccurate due to an anomaly or nonlinearity in the data. We will offer an active learning component so that adjustments to the learning model can made by a curator. Also, we plan to exploit data that are inherently related with multi learning.


Principal Investigator(s)