High Dimensional Data Reduction, Sampling and Visualization for Big Data Applications 16.05

Project Start Date: Jul 1, 2016
Research Areas: Analytics, Analytics - Deep Learning, Analytics - Visual Analytics, Data Management, Data Management - Information Sharing, Visualization, Visualization - Visual Analytics
Funding: Member Funded
Project Tags: , ,

Project Summary

Data visualization is one of the fundamental ways to extract meaningful information related to data structures and interrelation of various groups or classes of a problem. This task is easy (even trivial) to solve for low-dimensional and small datasets, since both linear and nonlinear subspace learning techniques can be easily applied using today’s computational environments, such as conventional personal computers. With the increase of the amount and/or the dimensionality of the data, the application of nonlinear techniques becomes intractable (even impossible), since most of the state-of-the-art nonlinear subspace learning techniques, such as Locally Linear Embedding (LLE), require the storage and the manipulation the affinity matrix of the entire dataset. The computational and storage cost of these operations is usually a quadratic or cubic function of the cardinality of the dataset. In order to address these problems, there is a need to exploit efficient and effective alternatives, e.g. smart sampling techniques or approximate solutions. In this project, we aim to:

1. Overcome the limitation and shortcoming of current visualization, data analysis, data sampling techniques to make sense of complex big data through latent theme extraction, to detect emerging practices, recommendation, and collaborative filtering.

2. Develop programming interfaces and implementations for non-linear, efficient data sampling and dimensionality reduction, novel visualization algorithms for dynamic information and data visualization


Principal Investigator(s)