Implement a system that classifies evolving streams of data and efficiently relabels old datasets. Significance: Data streams are continuous flows of data. Examples of data streams include network traffic, sensor data, and call center and electronic health records. Large volumes of data arriving in a stream render traditional classification algorithms too inefficient, e.g. classifiers are usually entirely retrained when an incremental update occurs. Also, many times, one would like to compare newly labeled data to data previously labeled, which could a) result in inconsistencies or b) spend extra resources recomputing old data. We aim to make a unified system that will leverage labeled and unlabeled data to incrementally update the classifier and previous datasets, all in one step. We will reduce computational redundancies enabling computation of large volumes of data.
Offer active learning and integration of multiple sources for curation and improved performance. Significance: Usually, the more data one has, the better a classifier model is. However, sometimes automatic (or machine- classification is inaccurate due to an anomaly or nonlinearity in the data. We will offer an active learning component so that adjustments to the learning model can made by a curator. Also, we plan to exploit data that are inherently related with multi learning.