Large-Scale Social Media Analytical Tools with Application to Detecting Emerging Events


Project Start Date: Jul 1, 2013
Research Areas: Analytics, Analytics - Natural Language Processing, Analytics - Probabilistic Modeling, Analytics - Visual Analytics, Data Management, Data Management - Big Data Platforms, Data Management - Dynamic Data
Funding: Member Funded
Project Tags: , ,

Project Summary

Objectives: This project has four primary objectives. First, we developed improved methods for sentiment analysis that simultaneously generated aspects and the related user sentiments. Second, we developed improved topic evolution models, which better captures topics and their evolution over time.  Third, we developed a high-speed, distributed clustering algorithm, as graph clustering is utilized by many social media analytic techniques.  Fourth, we sought to enhance the ability to detect emerging events by detection and tracking of subevents as well as incorporate data from multiple data sources.

Methods: The Event Detection on Onset with Subevents (EDOS) method leverages a simple evolution method to detect subevents; it recasts the initial EDO graph pruning step into a graph fusion step for low-cost incorporation new media data. By eliminating weak associations in the similarity matrix, the proposed Pruned AP method reduces the original AP complexity from O(N3) to O(N), significantly reducing time for data processing and network communication.

Results: The EDOS method is still able to detect both the onset of events and detect new subevents, within 3 minutes of the first event occurrence. The proposed Pruned AP distributed method is not only efficient and scalable but also producing competitive clustering quality. In addition, results shows that JSDDP-W and JSDDP-P outperform all other models for sentiment analysis, and the phrase model JSDDP-P performs better than the word model JSDDP-W.

Conclusions: The EDOS works; however, use of a distributed computational environment and developing strong summarization methods would be highly beneficial. We developed a new probabilistic model for sentiment analysis, namely Joint Similarity Dependency Dirichlet Process (JSDDP), which extends the Dirichlet Process. This model has solved the problem of determining the number of aspects faced by LDA extended models.