1. Recall
Best Practices:
Making the Most of Search Navigators
Precision
Contents:
1. Introduction
2. The role of search navigators
3. Their place in the ecosystem
4. Getting the most from search navigators
5. Further reading
1. Introduction
Search Technologies has provided more than 20,000 consultant-days of search implementation
services during the last 4 years, working with a variety of leading search products. Our
engagements range from corporate intranets and knowledge management systems, to search
applications for content-rich websites, classifieds, and e-commerce.
Search navigators 1 are now commonplace within non-trivial search applications. This brief paper
explores the reasons for their success, positions search navigators in relation to other common
approaches to search, and discusses how to maximize their effectiveness.
For those unfamiliar with the concept of search navigators, two examples of their use follow. Both
of these applications serve navigators in the left-side column:
Classifieds:
http://shop.ebay.com/?_from=R40&_trksid=p3907.m38.l1313&_nkw=sony&_sacat=See-All-Categories
Government:
http://www.gpo.gov/fdsys/search/search.action?na=_accodenav&se=_CRECfalse&sm=&flr=&ercode=&dateBrowse=&st=freedom+
of+information&=freedom+of+information&psh=&sbh=&tfh=&originalSearch=freedom+of+information&sb=re&ps=10&sb=re&ps=
10
These are public-facing search applications, but the approach illustrated is just as relevent behind
the firewall.
1
Search navigators are also called guided navigation or facetted search
2. 2. The role of search navigators
The search process can be viewed as consisting of two simple steps:
a. The formation of a search clue
b. The browsing of results
An iterative process of search clue improvement is often necessary, this has always been the case.
A large search system twenty years ago would initially reply to a search request with a number (of
documents matching the criteria) rather than a results list, and invite the user to provide
additional search terms to reduce that number to a manageable quantity, which could then be
displayed and browsed. This often resulted in long search clues containing a mixture of full text
and fielded terms. A typical “advanced search page” provides a helpful UI for achieving the same
thing – the building of a great search clue - but without the need to know specific syntax.
Enabled by modern search architectures and fast servers, search navigators play this important
role today. The role is this:
Search navigators help users to quickly reduce the search scope through single clicks.
Put another way, search navigators are the most efficient mechanism yet implemented to help the
user to build a great search clue.
Added value
Well-constructed search navigators go beyond being efficient mechanisms. They also provide
feedback and insight to the user to guide the process of search scope reduction. This is
particularly helpful to new users who, as a by-product of search activity, can quickly learn about
the structure and distribution of content. For regular users, well-structured navigators provide a
continuing education into the make-up of the dataset in a non-intrusive fashion. With time, this
leads to more sophisticated use of both the search facility and content resources as a whole.
Making better use of existing resources is a key goal for most intranet and knowledge
management initiatives.
It is the added value of providing actionable insight and a continuous education about the
available content that truly separates search navigators from earlier approaches.
3. Navigators’ place in the search ecosystem
The search software industry has for many years been technology led, with the various vendors
evangelizing their favoured algorithms and approaches. It may be useful to briefly position search
navigators relative to some of these.
Earlier, it was suggested that search can be seen as a simple two step process. Of course, most
modern search applications will present both the search results and opportunities to further refine
the search within the same page. However, in positioning the various technological approaches, it
is useful to keep the two steps separate. Let’s expand this theme and look at both in more detail:
a. Formation of a search clue: The objective of this step is the reduction of the search scope
to a point where the desired information can be conveniently found during results
browsing
3. b. Browsing of results: The interactive inspection of a hit list to identify the desired
information.
The two steps must obviously work together and in some applications, one might dominate.
a. Formation of a search clue
The role of search navigators is firmly within this part of the search process, supporting human-
decision making and efficiency of search scope reduction. Other technologies with something to
contribute to this part of the process include:
Tagging: Category taggers, entity extraction and other parsing methods that create
additional metadata to populate search navigators
Query parsing: Enriching queries with synonyms and other related terms, and where the
search engine provides an appropriate query language, optionally customizing relevancy
calculations.
Clustering techniques: These compare the contents of documents as a whole and can sort
search results into similar groupings using statistical techniques. Often these groupings are
presented as a type of search navigator.
b. Browsing results
In this part of the search process, the user is presented with an ordered listing of what remains
within the search scope. The primary issue is the order of presentation. Technologies and
methods which can contribute include:
Basic sorting: Using fielded information from the search index, such as ordering by date,
price or distance
Generic relevance: Fifteen years ago, keyword density and the ability to favor rare
(assumed to be more important) keywords were mainstream approaches to ordering
search results by relevance. Many other factors have since been added to relevancy
calculations, including word proximity, contextual evidence (a semantically-based
technique in which the presence of related words supports the relevance of keywords) and
favoring specific areas of documents, such as titles or section headings. Such methods are
present, to some extent, in most contemporary search applications, forming a baseline for
relevance judgement.
Off-page criteria: Factors other than document content, such as adjusting relevance based
on the document’s original location, or on incoming links in a hyperlinked environment
Polularity: Based on the historical behavior or contributions of the community as a whole,
this class of relvancy measurement can be used in an absolute way to order results, or as
an influencer of relevance. Factors include:
What people previously bought, or viewed
Ratings and opinions actively provided by other users
Automatically derived measures based on the observation of visitor behaviour on a
website as a whole
Personalization: Ranking based on personally identifiable information has implications and
issues for some communities, and is generally blended into relevancy calculations with
some subtlety rather than being used for explicit results ordering. Google’s main web
search offering currently does this. The main methods are:
Influencing results ordering based on pre-defined criteria that have been
volunteered by the user
4. Influencing results ordering based on observed previous behaviour of the individual
user.
An important reason for the widespread adoption of search navigators in sophisticated search
systems is that they are complimentary rather than antagonistic to all of these other popular
approaches.
4. Getting the most from search navigators
Great search navigators exhibit two primary properties:
Accuracy: The user needs to be able to trust search navigators to provide accurate
information
Contextual relevance: The most useful navigators are those that have been built
specifically for the application. Users searching for an automobile will value a completely
different set of navigators to users looking for stock market investment ideas.
The key to delivering accuracy and contextual relevance is data preparation prior to indexing.
Data preparation for search
There are a wide range of techniques available for use in data preparation for search. Each
application must deal with its own unique combination of data and users, and to get the best from
search navigators, every applications should be approached on its merits. Specific technologies
can often be helpful, especially in established niche applications, but in general, technology should
be the assistant rather than the project focus. In our experience, the most important success
factors are staff experience, well-practiced methodologies and a pragmatic approach. Knowing
which of the many available extraction or matching techniques is suitable to an application is key
to a successful outcome.
The importance of data preparation goes beyond the accurate extraction of information to drive
search navigators. Data cleansing, merging, splitting and enriching also improvesthe efficiency of
the search experience as a whole. In struggling search applications, criticized by users in terms of
relevancy or accuracy, the search engine is often not the problem – rather it is the poor quality of
data being fed to the search engine that is causing issues. Search engine vendors only have
themselves to blame for this – the industry has a history of over-selling the capabilities of
technology to automatically overcome basic issues such as poor data quality.
The good news for today’s buyer of enterprise search technology is this: Search is now a mature
market and the leading products have all of the necessary capabilities to support most search
applications. Comparison with the (even more) mature database market is insightful. Today,
there are very few use cases where is it necessary to worry whether Oracle, DB2, SQLServer or
MySQL is capable of providing the necessary functions or throughput. For the majority of
structured data processing needs, it is the application-layer rather than the choice of database
that makes the difference. Search engines are reaching this point too.
At Search Technologies, we work with a range of leading search software vendors and we value
our independence. Whatever your search engine of choice, proprietary or open source, if you
need to provide an important search application to your users then we can help you to arrange
clean, accurate and contextual data to feed search navigators and help your search engine provide
a great service to users.
5. Although dilligent data preparation is not the only thing you’ll need to do, it is the foundation on
which many successful search applications are built.
5. Further reading
Best Practices: A Document Processing Methodology for Search
Case Study: United States Government Printing Office
A short glossary of data preparation tasks
--------------------------
Search Technologies Corporation Search Technologies Limited
590 Herndon Parkway, Suite 375 Kingswick House
Herndon, VA 20170 Sunninghill, Berkshire
T: +1 703 953 2791 T: +44 1344 292 292
jback@searchtechnologies.com gcharlesworth@searchtechnologies.com
www.searchtechnologies.com