Summarizing Search Results with Dynamic Exploratory Search Captions – 9a.007.UH

Project Start Date: Aug 1, 2020
Funding: Member Funded
Project Tags: ,

Project Summary

Search is often characterised by user uncertainty with respect to search domain and information seeking goals. This uncertainty can negatively impact users’ abilities to assess the quality of search results, causing them to scroll through more documents than necessary and struggle to give consistent relevance feedback. As users’ information needs are assumed to be highly dynamic and expected to evolve over time, successful searches can be indistinguishable from those that have drifted erroneously away from their original search intent. Indeed, given their lack of domain knowledge, searchers may be slow, or even unable, to recognise when search results have become skewed towards another topic.

With these issues in mind, we propose to develop Exploratory Search Captions (ESC), a method that combines semantic and lexical information to generate succinct keyword-based descriptions of ranked search results. ESC will summarise the content of search results in order to assure users that their search is proceeding as intended and to alert them when it is not. In ESC, semantic information will come from a sequence-to-sequence autoencoder, which we will be used to learn a distributed representation for ranked documents. In this distributed representation, semantically similar sets of ranked documents will be proximate to one another in vector space. By projecting the current search engine results page of documents into vector space, ESC can find nearby examples of ranked documents that are associated with known queries. This approach is analogous to finding semantically-related terms in word embeddings. In situations where it is difficult to derive semantic information, such as when there is no coherent theme present in the search results, ESC will fall back to a simpler, lexical method. ESC will automatically detect when the autoencoder makes weaker, less coherent predictions using a logistic regression model, which will be used to determine the proportion of captions to be presented from each method.


Principal Investigator(s)