When pursuing complex information needs (e.g., doing genealogical searching, exploring historical archives, planning a vacation, doing a patent search, etc.), people often run multiple queries to discover effective search terms, to break the problem down into sub-tasks, to reflect an evolving understanding of the information need, etc. Such queries often retrieve many of the same documents, but most systems offer no help in understanding this redundancy. In this talk, I will describe Querium, an interactive information seeking system I have been building that helps people make sense of their past interactions, that helps them understand how the current results relate to what has been found before, and thus helps them plan for the future.
These slides are from an invited talk I gave at a NWO-sponsored CATCH meeting by BRIDGE on June 22, 2012 in The Netherlands. For more information on the event, see http://www.nwo.nl/nwohome.nsf/pages/NWOP_8UYEKF
NWO: The Netherlands Organisation for Scientific Research
CATCH: Continuous Access To Cultural Heritage
BRIDGE: Building Rich Links To Enable Television History Research
4. Some examples of search tasks
Google isn’t very good at
Patentability search Archives research
Medical/pharmaceutical Intelligence analysis
research Travel planning
Business analysis Historical research
Genealogical research Academic research
eDiscovery Etc.
Why is this?
5. Exploratory search
Interactive
Information seeking
Anomalous state of knowledge
Evolving information need
Often recall-oriented
6. What happens in exploratory search?
A person
Runs a query
Looks at some documents
Learns something
… and the process continues
…but there is a lot of repetition,
a lot of redundancy, and
a lot of reliance on memory
7. Overlap as a function of number of queries in a session
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 2 4 6 8 10 12 14 16
8. Questions people might ask
of an exploratory search tool
What queries have I run?
What documents have I found?
Have I seen this document before?
What are the central themes?
Was this query effective at finding new
information?
9. How might we answer these?
Keep track of queries & documents for a task
Structure search in terms of this
process metadata
11. Google/Bing and history
Web search engines record clicked-on documents
System aggregates clicks, adjusts document rankings
Future searchers get higher precision
All searchers get personalization for common queries
One key problem:
Idiosyncratic information needs do not
benefit as much as common ones
12. A brief history of search history
1970s: DIALOG let people combine queries with Boolean operators
1990s: Web Browsers keep track of visited documents
1990s: Search engines use click-through rates to affect future rankings
1997: VOIR (Golovchinsky) shows retrieval histories of documents in a session
1998: ARIADNE (Twidale and Nichols) lets people review search activity
2000: SearchPad (Bharat) lets people save and revisit queries and documents
2005: KonwlegeSea (Ahn et al.) shows prior activity on retrieved documents
2008?: Ancestry.com annotates results with info from family tree
2012: Querium (Golovchinsky et al.) reflects query/document history for
exploring search results
23. In closing…
Memory is uncertain
Information needs evolve
Queries are approximations
Understanding changes
Design challenge: Help people plan
future actions by understanding the
present in the context of the past
28. References
Ahn, J.-W., Brusilovsky, P., and Farzan, R. (2005). Investigating users' needs and behavior
for social search. In Proc. of the Workshop on New Technologies for Personalized
Information Access (held in conjunction with UM’05), Edinburgh, Scotland, UK; pp. 1-12.
Bharat, K. (2000) SearchPad: Explicit Capture of Search Context to Support Web Search. In
Proc. WWW2000, pp. 493-501.
Golovchinsky, G. (1997) Queries? Links? Is there a difference? In Proc. CHI 1997. ACM
Press.
Golovchinsky, G., Diriye, A., and Dunnigan, T. (2012) The future is in the past: Designing
for exploratory search. To appear in Proc. IIiX2012, Nijmegen, ACM Press.
Twidale, M. and Nichols, D. M. (1998) Designing interfaces to support collaboration in
information retrieval. Interacting with Computers 10(2), pp. 177-193.
Editor's Notes
I would like to talk to you about information seeking from a user-centered perspective. Specifically, I would like us to think about some of the kinds of complex information seeking activities that happen in archives and in many other social and organizational contexts. While most of my examples are related to text, I hope you will see that the approach I am advocating applies broadly to all media collections and to all sorts of ways if finding information.
Specifically, I would like to consider the role that human memory plays in the ways we look for information, and how we can build systems that help people with these cognitively-demanding tasks. Thus the history I am referring to is the history of interaction with the computer in pursuit of a specific information need, and the focus of my talk will be how we can compensate for human cognitive limitations with some clever design.
So here some of you might be thinking… hasn’t Google made this topic obsolete?In many cases, the answer is yes: Google has made finding certain kinds of information – say the weather forecast, or what Britney Spears is up to – very easy. They have been very successful at solving the common information needs against a particular collection. But there are many other instances where the assumptions that underlie Google’s approach to search do not apply.
Here’s a long list of domains, applications, information needs that the Google approach doesn’t solve. I am certain there are many more. Google’s unreasonable effectiveness comes from identifying common patterns of behavior, and predicting – based on what others have done – what the next person might want. But this assumes a certain commonality of information needs, and this is an assumption that is not always true. One of the unifying characteristics of this list is that these tasks are exploratory, rather than navigational or fact-finding.
Here are some characteristics of an activity we call “exploratory search”: it involves learning about a topic, it is interactive and iterative, people engaged in it may not know what they don’t know, and they often are trying to find information spread over many documents, rather than just trying to find a single document. There is a certain state of fog, if you will, through which one glimpses potentially interesting objects.
I am sure you’re all familiar with this kind of activity: you run a query, look at some results, learn something about the topic or the collection, and repeat. Exploratory search is something like a memory game: there is a lot of repetition, a lot of redundancy in the results, and a lot of reliance on memory.
Just how redundant are such queries? To give you a quick impression, this is a plot of the degree of overlap among queries vs. the number of queries in a session from some data we collected last year. You can see two interesting behaviors in this graph: first, the absolute value of overlap is quite high – something like 80% of all retrieved documents are retrieved by at least two queries. Second, there is a small downward trend, suggesting that people eventually figure out how to broaden their search. But that iteration can take a long time, at least some of which might be improved by better search tools.
This is a small list of questions that searchers might ask during a search task, and these questions may be difficult to answer from memory. This, therefore, is an opportunity to design a system that makes searching for information easier.
The notion of metadata is fundamental to archives and digital libraries, particularly for non-textual collections. Such metadata can be used to search for and filter results to approximate a user’s information need. I would like to propose that in addition to this document metadata, we can think of process metadata that characterizes a particular search activity in terms of the queries that have been run and the information objects that have been retrieved. We can then use this process metadata in our search systems to improve the effectiveness and efficiency of the information seeking process.
So if Google is already using this information, what’s new or interesting about my proposal?
Here’s what a modern search engine does: it uses click-through behavior to train a ranking algorithm to prioritize retrieval of clicked-on documents for a given set of queries.The problem is this works best for those common information needs, and not so well for more researchy, exploratory search tasks. In some cases (e.g., eDiscovery) there may not be any history of prior interactions; in other cases, that history may be private.Now let’s look at a small sample of ways in which historical information has been used in search interfaces over the years.
This is just a small ad hoc sample, and some of these features have appeared in many other systems.
This screenshot is from DIALOG, an early commercial IR system. It used Boolean queries with a few additional operators. It also allowed searchers to refer to the results sets of prior queries and to combine those queries with additional constraints. Thus it reduced the need to remember or retype earlier queries by representing them symbolically.
This is an example of a system I built during my PhD work. Among other things, it kept track of which documents had been retrieved previously, and showed this information in histograms. For each document, each bar represents the results of one query that retrieved it. The taller the bar, the more important that document was to that query. Bars are arranged left-to-right corresponding to the sequence of queries in a given task. These histograms give the searcher a quick feel for whether the document is new, or re–retrieved. See Golovchinsky (1997) for more details.
This is a screenshot of ARIADNE, a bibliographic information system that visualized the activities in a search session to help people make sense of what they had done. Actitvity was divided into three tiers – the top for specific commands, the middle for queries, and the lowest for showing retrieved records. While people could get a sense for what they had done, it’s not clear that these intermediate states could be manipulated directly.See Twidale and Nichols (1998) for more details.
SearchPad was a browser plugin that made it easier to keep track of searching and bookmarking activity. The list of queries was available in a side bar, and for each query the documents marked or “saved” were shown. This design helped people remember what queries they ran and what documents they found.See Bharat (2000) for more details.
KnowledgeSea was an experimental system that collected click-through and bookmarking rates on various documents, similar to modern search engines, but then surfaced that information in the interface explicitly. Thus although it did not reflect a particular individual’s history of interaction with objects in a collection, it made a group history available in a manner that allowed people to reason about it explicitly.See Ahnet al. (2005) for more details.
Ancestry.com is a service that allows people to organize their genealogical data, and to find historical records related to specific people. Most searches are run in the context of a particular individual. When a pertinent record is found, it can be saved into the tree to document that person’s life. If a subsequent search for records for that person re-retrieves a previously-saved document, that record is marked with a green checkmark. This tells the searcher that the particular record had been found previously, reducing the need to remember which records have already been examined and saved. This interface might be improved if it also marked records that were considered and rejected, or records that were saved to related people.All of these prior examples have illustrated some particular use of process metadata in the user interface to help people make sense of their search activity in some way. In my current work, I have started looking at how to unify this historical information in a more principled manner so that it can be used both by the system and by the searcher. The next few slides illustrate some of this work.
This is a screenshot of Querium, a system that allows people to use process metadata explicitly in filtering their search results.See Golovchinsky et al. (2012) for more details.
What queries have I run?What documents have I found?Have I seen this document before?What are the central themes?Was this query effective at finding new information?
These are some closeups of the various controls from the earlier screen shot. The bars in the horizontal control represent all the results returned by the current query, and shows the user’s interaction with them over the entire session (not just the current query). The red box represents a document that was explicitly excluded; the green box represents a document that was explicitly marked as being useful; the dark grey box represents a retrieved snippet that was clicked on to show the associated PDF file; light gray boxes represent documents that were shown on the screen at some point.The histograms reflect the retrieval history of each document: Each bar represents the output of one query; the taller the bars, the more important the document was. Queries are arranged left-to-right in chronological order. A single bar indicates a newly-retrieved document.
Same controls with two filters selected. The filters restrict the results to only those retrieved by just one (the current) query, and only those for which snippets have not previously been displayed (because the documents were ranked “below the fold”). The pink rectangles show which documents were included in the results by the filter.
Exploratory search is a difficult cognitive task, often distributed over time. It’s characterized by uncertainty and evolution of the information need. Search typically involves a series of queries that are approximations of various aspects of the overall need. It relies heavily on memory and a variety of sense-making tactics. This complex process can be made easier by tools that help people plan future actions by understanding the present in the context of the past.
And now, I’d like to review a bit of history with you. Have we seen these guys before?
Well, yes and no. Aside from some cheap references to revisionist history, I think these photographs illustrate that memory can trick us, and without some help in keeping track of what we did or found earlier, it may be difficult for people to recall what they saw or didn’t see. I hope that we can build tools that mitigate at least some of these cognitive limitations.