Multi-Industry Semantic Discovery Tool Sets for Data Integration, Data Warehousing, and E-Science


Project Start Date: Jul 1, 2012
Funding: Member Funded
Project Tags: ,

Project Summary

Objectives

 Discovering and integrating semantic information of heterogeneous data sources for
various business applications.

 Using the semantic information for query expansion and information visualization.

 Making data more meaningful for industrial data integration, data warehousing, and E-science scenarios.

Methods

 Develop analytical and probabilistic methods to discover the semantics of various data
sources, including structured databases, plain text files, and the “Deep Web” (Drexel).

 Develop an environment that empowers users to aggregate, visualize, and search both structured and unstructured data (Drexel).

 Develop analytical tools that will be used to build domain knowledge in the form of concept hierarchy or lightweight ontologies by utilizing a collaborative semi-structured knowledge base, such as Wikipedia (UL Lafayette).

Results

 A semantic discovery tool called SemIntegrator that can extract structured records for given relational databases from unstructured data and annotate text documents using given ontology concepts and relationships.

 An Ontology-based Annotation, Integration, and Visualization framework, or the OAIV framework, for visualizing and exploring semantic information in large document collections.

 A method for extracting lightweight ontology from Wikipedia.

Conclusions

 Accomplished most of the objectives of the project.

 Developed working prototype tools and interfaces with innovative methods for
annotation, integration, and visualization.

 Enhance the CVDI capability of working with big data.