SlideShare a Scribd company logo
1 of 5
Recall

Best Practices:
Making the Most of Search Navigators
                                                                                      Precision

Contents:
           1.   Introduction
           2.   The role of search navigators
           3.   Their place in the ecosystem
           4.   Getting the most from search navigators
           5.   Further reading


1. Introduction
Search Technologies has provided more than 20,000 consultant-days of search implementation
services during the last 4 years, working with a variety of leading search products. Our
engagements range from corporate intranets and knowledge management systems, to search
applications for content-rich websites, classifieds, and e-commerce.

Search navigators 1 are now commonplace within non-trivial search applications. This brief paper
explores the reasons for their success, positions search navigators in relation to other common
approaches to search, and discusses how to maximize their effectiveness.

For those unfamiliar with the concept of search navigators, two examples of their use follow. Both
of these applications serve navigators in the left-side column:

Classifieds:
http://shop.ebay.com/?_from=R40&_trksid=p3907.m38.l1313&_nkw=sony&_sacat=See-All-Categories


Government:
http://www.gpo.gov/fdsys/search/search.action?na=_accodenav&se=_CRECfalse&sm=&flr=&ercode=&dateBrowse=&st=freedom+
of+information&=freedom+of+information&psh=&sbh=&tfh=&originalSearch=freedom+of+information&sb=re&ps=10&sb=re&ps=
10


These are public-facing search applications, but the approach illustrated is just as relevent behind
the firewall.




1
    Search navigators are also called guided navigation or facetted search
2. The role of search navigators
The search process can be viewed as consisting of two simple steps:

   a. The formation of a search clue
   b. The browsing of results

An iterative process of search clue improvement is often necessary, this has always been the case.
A large search system twenty years ago would initially reply to a search request with a number (of
documents matching the criteria) rather than a results list, and invite the user to provide
additional search terms to reduce that number to a manageable quantity, which could then be
displayed and browsed. This often resulted in long search clues containing a mixture of full text
and fielded terms. A typical “advanced search page” provides a helpful UI for achieving the same
thing – the building of a great search clue - but without the need to know specific syntax.

Enabled by modern search architectures and fast servers, search navigators play this important
role today. The role is this:

       Search navigators help users to quickly reduce the search scope through single clicks.

Put another way, search navigators are the most efficient mechanism yet implemented to help the
user to build a great search clue.

Added value
Well-constructed search navigators go beyond being efficient mechanisms. They also provide
feedback and insight to the user to guide the process of search scope reduction. This is
particularly helpful to new users who, as a by-product of search activity, can quickly learn about
the structure and distribution of content. For regular users, well-structured navigators provide a
continuing education into the make-up of the dataset in a non-intrusive fashion. With time, this
leads to more sophisticated use of both the search facility and content resources as a whole.
Making better use of existing resources is a key goal for most intranet and knowledge
management initiatives.

It is the added value of providing actionable insight and a continuous education about the
available content that truly separates search navigators from earlier approaches.

3. Navigators’ place in the search ecosystem
The search software industry has for many years been technology led, with the various vendors
evangelizing their favoured algorithms and approaches. It may be useful to briefly position search
navigators relative to some of these.

Earlier, it was suggested that search can be seen as a simple two step process. Of course, most
modern search applications will present both the search results and opportunities to further refine
the search within the same page. However, in positioning the various technological approaches, it
is useful to keep the two steps separate. Let’s expand this theme and look at both in more detail:

   a. Formation of a search clue: The objective of this step is the reduction of the search scope
      to a point where the desired information can be conveniently found during results
      browsing
b. Browsing of results: The interactive inspection of a hit list to identify the desired
      information.

The two steps must obviously work together and in some applications, one might dominate.

a. Formation of a search clue
The role of search navigators is firmly within this part of the search process, supporting human-
decision making and efficiency of search scope reduction. Other technologies with something to
contribute to this part of the process include:

       Tagging: Category taggers, entity extraction and other parsing methods that create
       additional metadata to populate search navigators
       Query parsing: Enriching queries with synonyms and other related terms, and where the
       search engine provides an appropriate query language, optionally customizing relevancy
       calculations.
       Clustering techniques: These compare the contents of documents as a whole and can sort
       search results into similar groupings using statistical techniques. Often these groupings are
       presented as a type of search navigator.

b. Browsing results
In this part of the search process, the user is presented with an ordered listing of what remains
within the search scope. The primary issue is the order of presentation. Technologies and
methods which can contribute include:

       Basic sorting: Using fielded information from the search index, such as ordering by date,
       price or distance
       Generic relevance: Fifteen years ago, keyword density and the ability to favor rare
       (assumed to be more important) keywords were mainstream approaches to ordering
       search results by relevance. Many other factors have since been added to relevancy
       calculations, including word proximity, contextual evidence (a semantically-based
       technique in which the presence of related words supports the relevance of keywords) and
       favoring specific areas of documents, such as titles or section headings. Such methods are
       present, to some extent, in most contemporary search applications, forming a baseline for
       relevance judgement.
       Off-page criteria: Factors other than document content, such as adjusting relevance based
       on the document’s original location, or on incoming links in a hyperlinked environment
       Polularity: Based on the historical behavior or contributions of the community as a whole,
       this class of relvancy measurement can be used in an absolute way to order results, or as
       an influencer of relevance. Factors include:
               What people previously bought, or viewed
               Ratings and opinions actively provided by other users
               Automatically derived measures based on the observation of visitor behaviour on a
               website as a whole
       Personalization: Ranking based on personally identifiable information has implications and
       issues for some communities, and is generally blended into relevancy calculations with
       some subtlety rather than being used for explicit results ordering. Google’s main web
       search offering currently does this. The main methods are:
               Influencing results ordering based on pre-defined criteria that have been
               volunteered by the user
Influencing results ordering based on observed previous behaviour of the individual
               user.

An important reason for the widespread adoption of search navigators in sophisticated search
systems is that they are complimentary rather than antagonistic to all of these other popular
approaches.

4. Getting the most from search navigators
Great search navigators exhibit two primary properties:

       Accuracy: The user needs to be able to trust search navigators to provide accurate
       information
       Contextual relevance: The most useful navigators are those that have been built
       specifically for the application. Users searching for an automobile will value a completely
       different set of navigators to users looking for stock market investment ideas.

The key to delivering accuracy and contextual relevance is data preparation prior to indexing.

Data preparation for search
There are a wide range of techniques available for use in data preparation for search. Each
application must deal with its own unique combination of data and users, and to get the best from
search navigators, every applications should be approached on its merits. Specific technologies
can often be helpful, especially in established niche applications, but in general, technology should
be the assistant rather than the project focus. In our experience, the most important success
factors are staff experience, well-practiced methodologies and a pragmatic approach. Knowing
which of the many available extraction or matching techniques is suitable to an application is key
to a successful outcome.

The importance of data preparation goes beyond the accurate extraction of information to drive
search navigators. Data cleansing, merging, splitting and enriching also improvesthe efficiency of
the search experience as a whole. In struggling search applications, criticized by users in terms of
relevancy or accuracy, the search engine is often not the problem – rather it is the poor quality of
data being fed to the search engine that is causing issues. Search engine vendors only have
themselves to blame for this – the industry has a history of over-selling the capabilities of
technology to automatically overcome basic issues such as poor data quality.

The good news for today’s buyer of enterprise search technology is this: Search is now a mature
market and the leading products have all of the necessary capabilities to support most search
applications. Comparison with the (even more) mature database market is insightful. Today,
there are very few use cases where is it necessary to worry whether Oracle, DB2, SQLServer or
MySQL is capable of providing the necessary functions or throughput. For the majority of
structured data processing needs, it is the application-layer rather than the choice of database
that makes the difference. Search engines are reaching this point too.

At Search Technologies, we work with a range of leading search software vendors and we value
our independence. Whatever your search engine of choice, proprietary or open source, if you
need to provide an important search application to your users then we can help you to arrange
clean, accurate and contextual data to feed search navigators and help your search engine provide
a great service to users.
Although dilligent data preparation is not the only thing you’ll need to do, it is the foundation on
which many successful search applications are built.


5. Further reading
       Best Practices: A Document Processing Methodology for Search
       Case Study: United States Government Printing Office
       A short glossary of data preparation tasks

                                         --------------------------




Search Technologies Corporation                Search Technologies Limited
590 Herndon Parkway, Suite 375                 Kingswick House
Herndon, VA 20170                              Sunninghill, Berkshire
T: +1 703 953 2791                             T: +44 1344 292 292
jback@searchtechnologies.com                   gcharlesworth@searchtechnologies.com

www.searchtechnologies.com

More Related Content

More from Enterprise Technology Management (ETM)

Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 PercentMicrosoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 PercentEnterprise Technology Management (ETM)
 

More from Enterprise Technology Management (ETM) (14)

Managing The Virtualized Enterprise New Technology, New Challenges
Managing The Virtualized Enterprise New Technology, New ChallengesManaging The Virtualized Enterprise New Technology, New Challenges
Managing The Virtualized Enterprise New Technology, New Challenges
 
Leveraging Log Management to provide business value
Leveraging Log Management to provide business valueLeveraging Log Management to provide business value
Leveraging Log Management to provide business value
 
The Top Ten Insider Threats And How To Prevent Them
The Top Ten Insider Threats And How To Prevent ThemThe Top Ten Insider Threats And How To Prevent Them
The Top Ten Insider Threats And How To Prevent Them
 
Content Aware SIEM™ defined
Content Aware SIEM™ definedContent Aware SIEM™ defined
Content Aware SIEM™ defined
 
Is Outsourcing Right for You?
Is Outsourcing Right for You?Is Outsourcing Right for You?
Is Outsourcing Right for You?
 
Whitepaper- Real World Search
Whitepaper-  Real World SearchWhitepaper-  Real World Search
Whitepaper- Real World Search
 
Liwp consider opensource2010
Liwp consider opensource2010Liwp consider opensource2010
Liwp consider opensource2010
 
Ibm social commerce_whitepaper
Ibm social commerce_whitepaperIbm social commerce_whitepaper
Ibm social commerce_whitepaper
 
Cloud view platform-highlights-web3
Cloud view platform-highlights-web3Cloud view platform-highlights-web3
Cloud view platform-highlights-web3
 
10 obvious statements about software configuration and change
10 obvious statements about software configuration and change10 obvious statements about software configuration and change
10 obvious statements about software configuration and change
 
Don't let wireless_detour_your_pci_compliance
Don't let wireless_detour_your_pci_complianceDon't let wireless_detour_your_pci_compliance
Don't let wireless_detour_your_pci_compliance
 
Qradar Business Case
Qradar Business CaseQradar Business Case
Qradar Business Case
 
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 PercentMicrosoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
Microsoft: Financial Exchange Speeds Development and Audit Reviews by 20 Percent
 
Kickfire: Best Of All Worlds
Kickfire: Best Of All WorldsKickfire: Best Of All Worlds
Kickfire: Best Of All Worlds
 

Recently uploaded

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 

Recently uploaded (20)

Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 

White Paper - Search navigators

  • 1. Recall Best Practices: Making the Most of Search Navigators Precision Contents: 1. Introduction 2. The role of search navigators 3. Their place in the ecosystem 4. Getting the most from search navigators 5. Further reading 1. Introduction Search Technologies has provided more than 20,000 consultant-days of search implementation services during the last 4 years, working with a variety of leading search products. Our engagements range from corporate intranets and knowledge management systems, to search applications for content-rich websites, classifieds, and e-commerce. Search navigators 1 are now commonplace within non-trivial search applications. This brief paper explores the reasons for their success, positions search navigators in relation to other common approaches to search, and discusses how to maximize their effectiveness. For those unfamiliar with the concept of search navigators, two examples of their use follow. Both of these applications serve navigators in the left-side column: Classifieds: http://shop.ebay.com/?_from=R40&_trksid=p3907.m38.l1313&_nkw=sony&_sacat=See-All-Categories Government: http://www.gpo.gov/fdsys/search/search.action?na=_accodenav&se=_CRECfalse&sm=&flr=&ercode=&dateBrowse=&st=freedom+ of+information&=freedom+of+information&psh=&sbh=&tfh=&originalSearch=freedom+of+information&sb=re&ps=10&sb=re&ps= 10 These are public-facing search applications, but the approach illustrated is just as relevent behind the firewall. 1 Search navigators are also called guided navigation or facetted search
  • 2. 2. The role of search navigators The search process can be viewed as consisting of two simple steps: a. The formation of a search clue b. The browsing of results An iterative process of search clue improvement is often necessary, this has always been the case. A large search system twenty years ago would initially reply to a search request with a number (of documents matching the criteria) rather than a results list, and invite the user to provide additional search terms to reduce that number to a manageable quantity, which could then be displayed and browsed. This often resulted in long search clues containing a mixture of full text and fielded terms. A typical “advanced search page” provides a helpful UI for achieving the same thing – the building of a great search clue - but without the need to know specific syntax. Enabled by modern search architectures and fast servers, search navigators play this important role today. The role is this: Search navigators help users to quickly reduce the search scope through single clicks. Put another way, search navigators are the most efficient mechanism yet implemented to help the user to build a great search clue. Added value Well-constructed search navigators go beyond being efficient mechanisms. They also provide feedback and insight to the user to guide the process of search scope reduction. This is particularly helpful to new users who, as a by-product of search activity, can quickly learn about the structure and distribution of content. For regular users, well-structured navigators provide a continuing education into the make-up of the dataset in a non-intrusive fashion. With time, this leads to more sophisticated use of both the search facility and content resources as a whole. Making better use of existing resources is a key goal for most intranet and knowledge management initiatives. It is the added value of providing actionable insight and a continuous education about the available content that truly separates search navigators from earlier approaches. 3. Navigators’ place in the search ecosystem The search software industry has for many years been technology led, with the various vendors evangelizing their favoured algorithms and approaches. It may be useful to briefly position search navigators relative to some of these. Earlier, it was suggested that search can be seen as a simple two step process. Of course, most modern search applications will present both the search results and opportunities to further refine the search within the same page. However, in positioning the various technological approaches, it is useful to keep the two steps separate. Let’s expand this theme and look at both in more detail: a. Formation of a search clue: The objective of this step is the reduction of the search scope to a point where the desired information can be conveniently found during results browsing
  • 3. b. Browsing of results: The interactive inspection of a hit list to identify the desired information. The two steps must obviously work together and in some applications, one might dominate. a. Formation of a search clue The role of search navigators is firmly within this part of the search process, supporting human- decision making and efficiency of search scope reduction. Other technologies with something to contribute to this part of the process include: Tagging: Category taggers, entity extraction and other parsing methods that create additional metadata to populate search navigators Query parsing: Enriching queries with synonyms and other related terms, and where the search engine provides an appropriate query language, optionally customizing relevancy calculations. Clustering techniques: These compare the contents of documents as a whole and can sort search results into similar groupings using statistical techniques. Often these groupings are presented as a type of search navigator. b. Browsing results In this part of the search process, the user is presented with an ordered listing of what remains within the search scope. The primary issue is the order of presentation. Technologies and methods which can contribute include: Basic sorting: Using fielded information from the search index, such as ordering by date, price or distance Generic relevance: Fifteen years ago, keyword density and the ability to favor rare (assumed to be more important) keywords were mainstream approaches to ordering search results by relevance. Many other factors have since been added to relevancy calculations, including word proximity, contextual evidence (a semantically-based technique in which the presence of related words supports the relevance of keywords) and favoring specific areas of documents, such as titles or section headings. Such methods are present, to some extent, in most contemporary search applications, forming a baseline for relevance judgement. Off-page criteria: Factors other than document content, such as adjusting relevance based on the document’s original location, or on incoming links in a hyperlinked environment Polularity: Based on the historical behavior or contributions of the community as a whole, this class of relvancy measurement can be used in an absolute way to order results, or as an influencer of relevance. Factors include: What people previously bought, or viewed Ratings and opinions actively provided by other users Automatically derived measures based on the observation of visitor behaviour on a website as a whole Personalization: Ranking based on personally identifiable information has implications and issues for some communities, and is generally blended into relevancy calculations with some subtlety rather than being used for explicit results ordering. Google’s main web search offering currently does this. The main methods are: Influencing results ordering based on pre-defined criteria that have been volunteered by the user
  • 4. Influencing results ordering based on observed previous behaviour of the individual user. An important reason for the widespread adoption of search navigators in sophisticated search systems is that they are complimentary rather than antagonistic to all of these other popular approaches. 4. Getting the most from search navigators Great search navigators exhibit two primary properties: Accuracy: The user needs to be able to trust search navigators to provide accurate information Contextual relevance: The most useful navigators are those that have been built specifically for the application. Users searching for an automobile will value a completely different set of navigators to users looking for stock market investment ideas. The key to delivering accuracy and contextual relevance is data preparation prior to indexing. Data preparation for search There are a wide range of techniques available for use in data preparation for search. Each application must deal with its own unique combination of data and users, and to get the best from search navigators, every applications should be approached on its merits. Specific technologies can often be helpful, especially in established niche applications, but in general, technology should be the assistant rather than the project focus. In our experience, the most important success factors are staff experience, well-practiced methodologies and a pragmatic approach. Knowing which of the many available extraction or matching techniques is suitable to an application is key to a successful outcome. The importance of data preparation goes beyond the accurate extraction of information to drive search navigators. Data cleansing, merging, splitting and enriching also improvesthe efficiency of the search experience as a whole. In struggling search applications, criticized by users in terms of relevancy or accuracy, the search engine is often not the problem – rather it is the poor quality of data being fed to the search engine that is causing issues. Search engine vendors only have themselves to blame for this – the industry has a history of over-selling the capabilities of technology to automatically overcome basic issues such as poor data quality. The good news for today’s buyer of enterprise search technology is this: Search is now a mature market and the leading products have all of the necessary capabilities to support most search applications. Comparison with the (even more) mature database market is insightful. Today, there are very few use cases where is it necessary to worry whether Oracle, DB2, SQLServer or MySQL is capable of providing the necessary functions or throughput. For the majority of structured data processing needs, it is the application-layer rather than the choice of database that makes the difference. Search engines are reaching this point too. At Search Technologies, we work with a range of leading search software vendors and we value our independence. Whatever your search engine of choice, proprietary or open source, if you need to provide an important search application to your users then we can help you to arrange clean, accurate and contextual data to feed search navigators and help your search engine provide a great service to users.
  • 5. Although dilligent data preparation is not the only thing you’ll need to do, it is the foundation on which many successful search applications are built. 5. Further reading Best Practices: A Document Processing Methodology for Search Case Study: United States Government Printing Office A short glossary of data preparation tasks -------------------------- Search Technologies Corporation Search Technologies Limited 590 Herndon Parkway, Suite 375 Kingswick House Herndon, VA 20170 Sunninghill, Berkshire T: +1 703 953 2791 T: +44 1344 292 292 jback@searchtechnologies.com gcharlesworth@searchtechnologies.com www.searchtechnologies.com