Scalable Visualization, Gap Analytics for Multiple Big Data Industry Sectors

Project Start Date: Jul 1, 2013
Research Areas: Visualization, Visualization - Visual Analytics
Funding: Member Funded
Project Tags: ,

Project Summary

1. Develop an integrated and extensible process and a series of prototypes for gap analytics, portfolio analysis, competitive intelligence surveillance, and large-scale longitudinal evaluative assessments across heterogeneous data sources and a variety of units of analysis over time 2. Design and develop innovative and robust algorithms to support a flexible workflow of gap analytic tasks, including modeling dynamic high-dimensional data in terms of visualizations of fitness landscapes, identifying and tracking critical paths over fitness landscapes, and analyzing the dynamics of trajectories and other indicators of a complex adaptive system 3. An extensible and adaptive platform will be built to demonstrate applications in 2~3 priority domains based on recommendations of industry partners and available data sources

Methods: The project has focused on developing a gap analysis of U.S. patents as one of the representative application domains. We have constructed a local database of U.S. patents, containing over 5 million patents and over 30 million patent co-citations.
The motivating use scenario is to enable an organization identify its own position and trajectories in a fitness landscape as well as its potential competitors’ positions and trajectories, for example Apple’s portfolio of 4,590 patents versus Samusung Electronics’ 44,736 patents.
We have been working on algorithms and experiments that will identify hot areas of patenting activities with burst detection of patent citations at three levels, namely, the individual patent level, the international patent classification (IPC) level, and the U.S. patent classification level.
In this project, we have developed a process for the construction of the fitness landscape, including applying multiple techniques such as co-citation analysis of patents (CCAP), bibliographic coupling of patents (BCP), latent semantic analysis and topic modeling of unstructured text (LDA), multidimensional scaling (MDS),

Results: A prototyping architecture and its components have been constructed and a series of experiments have been conducted. The results demonstrate how the following types of questions can be answered:
 How did patenting activities grow in terms of specific areas of patent classification?

 How did the patent portfolios of two companies differ in terms of the distribution of
their patents across all the patent classes?

 What is a fitness landscape like for a given information space of interest?

Conclusions: We have achieved the objectives of the project with a number of increasingly refined prototypes. The basic procedure is in place to generate three-dimensional fitness landscapes. We are in a good position to further refine the procedure for visual gap analytics. In addition to the patent data, we have also begun to explore procedural modeling at the level of individual users of an analytic system such that novice users can learn from expert users.