SlideShare a Scribd company logo
1 of 37
Search Engine Face-Off
          Keyword Search versus Metadata Search
Don Miller, VP of Business Development   Val Orekhov, VP of Business Development
1 (408) 828-3400                         1 (240) 450-2166 x 103
donm@conceptsearching.com                val@portalsolutions.net
Concept Searching
Don Miller
Don Miller is a senior executive at ConceptSearching with over 20 years experience in knowledge
management. He is a frequent speaker about Records Management and Information Architecture
problems and solutions. Don has been a guest speaker at Taxonomy Bootcamp, Management
Electronic Records and numerous SharePoint events about information organization and records
management.
Don Miller, VP of Business Development * 1 (408) 828-3400 * donm@conceptsearching.com


Portal Solutions
Val Orekhov
Val Orekhov, Chief Architect for Portal Solutions is deeply skilled in Enterprise Application Development,
Web development, portals, relational databases and data access, modeling, and is versed in a number
of programming languages and technologies. He has been with Portal Solutions for almost five years
and drives the technical team to excel year over year. He holds a Master of Science in Computer
Science from Kyrgyz Technical University in Bishkek, Kyrgyzstan.
Val Orekhov, Chief Technical Architect * (1) (240) 450-2166 x 103 * val@portalsolutions.net
Agenda
      ConceptSearching:
                Keyword vs Metadata
                Keyword vs Metadata Costs
                Google vs. SharePoint vs. FAST
                What’s wrong with a manual metadata approach
                Automated approaches
                USAF Case Study
      Portal Solutions:
                Enterprise Search – Google vs FAST in SharePoint
                Indexing Options
                Approach to Security Trimming
                Ranking Algorithms & Sorting Options
                Metadata & Search Refinements
      Questions and Answers
      Demo of product if time permits


Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
Concept Searching, Inc.
   Company founded in 2002
        Product launched in 2003
        Focus on management of structured and unstructured information

    Technology
         All technologies based on our ‘open conceptualTagging
          framework’
         Automatic concept identification, content tagging, auto-
          classification, taxonomy management
         Only statistical vendor that can extract conceptual metadata

    2009 and 2010 ‘100 Companies that Matter in KM’ (KM World
     Magazine)

    KMWorld ‘Trend Setting Product’ of 2009
     and 2010

    Locations: US, UK, & South Africa

   Client base: Fortune 500/1000 organizations

    Microsoft Enterprise Search ISV , FAST Partner

    Product Suite: conceptSearch, conceptTaxonomyManager,
     conceptClassifier, conceptClassifier for SharePoint,
     contentTypeUpdater for SharePoint


Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
What Type of Search or Information Architecture Do You Need?

     Keyword Search = ~66%+                                              Metadata Search = 100%
       of results (Recall)                                                of results (Recall)
     • Simple                                                            • Guided Navigation
     • No administration                                                 • Records Management
     • Good enough                                                       • Sensitive Information
                                                                           Removal
   Recall (information retrieval), a                                     • Collaboration
   statistical measure (contrasted with
   precision), the fraction of (all) relevant                            • Improved Precision and
   material that are returned by a search                                  Recall
   query
   Precision (information retrieval),                                    • Evolution of Keyword
   the percentage of documents returned                                    Search
   that are relevant


Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
What Is Keyword vs. Metadata Costing You?
   Problem               Pre Migration                    Search                 Records Management             Data Privacy Protection

                     •60% of stored                •“It’s not about better       •67% of data loss in          •Average cost per
                      documents are                 search”                       Records Management is         exposed record is $197
                      obsolete                     •Less than 50% of content      due to end user error         and ranges from $90-
                     •50% of documents are          is correctly indexed, meta   •It costs and organization     $305 per record
                      duplicates                    tagged or efficiently         $180 per document to         •70% of breaches are due
                     •Requires resources to         searchable                    recreate it when it is not    to a mistake or malicious
                      identify what                •85% of relevant               tagged correctly and          intent by an
                      should/not be migrated        documents are never           cannot be found               organization’s own staff
                                                    retrieved in search

                     •Eliminate duplicate          •Eliminate manual tagging     •Eliminate inconsistent       •Identify any type of
   Solution                                                                       end user tagging              organizationally defined
                      documents                     & replace with automatic
                     •Identify privacy data         identification of multi-     •Automatically declare         privacy data
                      exposures                     word concepts                 documents of record          •Combines pattern
                     •Identify and declare         •Provide guided                based on vocabulary and       matching with associated
                      records that were not         navigation via the            retention codes               vocabulary
                      previously identified         taxonomy structure (i.e.     •Automatically change the     •Automatic Content Type
                     •Identify high value           concepts)                     Content Type and route        updating enabling
                      content                      •Go beyond dynamic             to the Records                workflows and rights
                     •Migrating required            clustering with               Management repository         management
                      content to a structure        conceptual clustering
                                                    based on the taxonomies

    Benefit          •Reduces migration            •Taxonomy navigation           •Savings of $4.00 - $7.04 •Average cost runs from
                      costs                         is 36% - 48% faster            per record by eliminating $225K to $35M
                     •Ensures                      •Savings 2.5 hours              manual tagging
                      compliance and                per user per day              •Ensures compliance and
                      protection of                                                reduces potential
                      content assets                                               litigation exposures

Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
Metadata Search vs. Keyword and Guided Navigation “Proposal”



                              “Software License”                   “SLA” “Licensee”        “Addendum”

                              “License Agreement”                       “License”
                                                                                          100% of Results
  Results                      “Documents of Record”                                      Metadata Search
also known
as “Recall”                    “Proposals” “Contract”
                                                                                66% Key + Synonym Search

                                                                                          “Proposal”
                                                          Entity Extraction
                                                                                      33% Keyword Search
                                                          20-33% of results

        Entity extraction without complex
       rules is ineffective. It is just keyword                               Cost (Time, Money and Complex)
       match, which is what keyword search
             is, which is 33% effective.
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
Similar Features Against Total Number of Documents Returned

                                       Google                            SharePoint       FAST
     Index                             500 M +                           100 M            500 M +
     Key Word /– 33% of Yes                                              Yes – Good as    Yes
     results                                                             Google or FAST
     Synonyms Up to                    Yes                               Yes              Yes
     50-66%+ of results
     Apply metadata                    No                                No               Key Word only
     automatically for                                                                    which equals 33%
     100% of results                                                                      of results
     Ranking Algorithm Non tunable                                       Tunable          Very Tunable
     + Best Bets: Does
     not improve
     number of results
     only how presented




Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
What Is Missing To Get to 100% of Relevant Results in Every Search?

     Metadata                          Google                            SharePoint             FAST


     Auto Classification               No –                              No –                   Entity extraction,
                                       Missing 33-50% of                 Missing 33-50% of      which is the same
                                       results on any                    results on any         as keyword search
                                       particular topic                  particular topic       33% results. No
                                                                                                RECALL results
                                                                                                improvement with
                                                                                                this approach
     Taxonomy                          No                                Yes, but can’t do      No
     Management                                                          any thing with it in
                                                                         this release.
                                                                         Security issues for
                                                                         managing Term
                                                                         Store.




Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
Miscellaneous Items to Review

                                       Google                            SharePoint             FAST
     SharePoint Refiners Hard                                            Yes – Easy to use      Medium – Initial
     and Navigators with                                                 for standard search.   release, does not
     counts.                                                             No counts on           leverage Term Store
                                                                         results.               yet. XML –
     RECALL                                                                                     Powershell based
     Customization                     Difficult                         Difficult              Extendable




Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
Summary

     •    Google – Best for no administration, install and walk away. Usually missing 33%-
          50% of results on any given topic because of missing metadata. Not easy to
          integrate refiners or navigators into SharePoint UI.

     •    SharePoint Search – Cost effective, comes free with SharePoint. Search Algorithm
          is as good as FAST or Google. Also very easy to install and walk away. Limited
          extensibility. Easy integration for refiners and navigators (no counts). Also missing
          50% of results on any topic.

     •    FAST – Extremely customizable, but requires training or professional services to
          customize. Most likely Microsoft long term platform for search. Very scalable and
          can provide refiner counts. Still missing 33-50% of results from any given search
          because of metadata inconsistency.

     •    However, they are all missing a true metadata strategy which is the only way to
          ensure 100% of results.


Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
A Manual Metadata Approach Will Fail 95%+ Of The Time

     Issue                                       Organizational Impact
     Inconsistent                                Less than 50% of content is correctly indexed, meta-tagged or efficiently
                                                 searchable rendering it unusable to the organization (IDC)
     Subjective                                  Highly trained Information Specialists will agree on meta tags between
                                                 33% - 50% of the time. (C. Cleverdon)
     Cumbersome - Expensive                      Average cost of manually tagging one item runs from $4 - $7 per
                                                 document and does not factor in the accuracy of the meta tags nor the
                                                 repercussions from mis-tagged content (Hoovers)
     Malicious Compliance                        End users select first value in list (Perspectives on Metadata, Sarah Courier)
     No perceived value for end user             What’s in it for me? End user creates document, does not see value for
                                                 organization nor risks associated with litigation and non conformance to
                                                 policies.
     What have you seen                          Metadata will continue to be a problem due to inconsistent human
                                                 behavior
     The answer to consistent metadata is an automated approach that can extract the meaning from
         content eliminating manual metadata generation yet still providing the ability to manage
           knowledge assets in alignment with the unique corporate knowledge infrastructure.




Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
conceptClassifier’s TaxonomyManager Automated Metadata Approach
    Drives Business Value
      Create enterprise automated metadata
       framework/model
             Average return on investment minimum of 38%
              and runs as high as 600% (IDC)                                                  1. Model and
                                                                                                 Validate
      Apply consistent meaningful metadata to
       enterprise content
             Incorrect meta tags costs an organization                       6. Life Cycle                  2. Automate
                                                                              Management                        Tagging
              $2,500 per user per year – in addition potential
              costs for non-compliance (IDC)

      Guide users to relevant content with taxonomy
       navigation
             Savings of $8,965 per year per user based on an
                                                                               5. Records
              $80K salary (Chen & Dumais)                                     Management                     3. Findability
             100% “Recall” of content, 35% Faster access to                     and PII
              content “Precision”
                                                                                               4. Business
      Use automatic conceptual metadata                                                       Processes
       generation to improve Records Management
             Eliminate inconsistent end user tagging at $4-$7
              per record (Hoovers)
             Improve compliance processes, eliminate
              potential privacy exposures
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
USAF Human Performance Clearinghouse
                                 GOAL : Leverage Existing USAF, AFDW, and AFMS License Agreements to
                                            Enable IM, RM, & Privacy & Security Compliance
Requirements
• DoDD 8320 (Data Sharing in a Net-Centric DoD)
• DoDD 5015 (Records Management)                                                                             Data Privacy
• USAF Privacy Act Program & HIPAA
• Freedom of Information Act (FOIA)
                                            Migration
                                          Migration




                                                Records Management




                                                                                                              Search

                                                                        eDiscovery & FOIA




                                                                                                 Tel: 703.246.9360 | Fax: 240.465.1182

Distribution Statement A: Approved for public release; distribution is unlimited.
Distribution Statement A: Approved for public release; distribution is
311 ABG/PA No. 09-488, 16 Oct 2009                                                  unlimited.
311 ABG/PA No. 09-488, 16 Oct 2009
Taxonomy Improves “Precision” with Guided Refiners for “Proposals”

     • After 100% of Results are
       returned, leverage metadata
       for guided navigation and
       refiners

     • Use taxonomy/metadata
       structures before query and
       after query to guide users to
       the right document

     • Accelerate document finding
       [PRECISION] by a minimum
       of 35%
                    I want all proposals in two
                   specific regions. I could then
                     have a guided refiner for
                       vertical, amount, etc.
Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
Dynamic Clustering Is Not Guided Navigation for “Proposals”

     • Brings back clusters

     • They are best guesses

     • They might help, they
       might make it worse

     • Better than nothing,
       but not a long term
       strategy or evolution of
       key word search
      Dynamic navigation (CLUSTERING) is
     ineffective. How does an information
   worker know when it is a good topic or not?
             This is NOT PRECISION!

Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
Enterprise Search Comparison for SharePoint Google vs FAST
     Why Enterprise Search needs Metadata and Taxonomy Management
             Recall – Ensures you bring back 100% of Results
             Enhance Precision – Fastest way to filter to the right results so that you are
              looking at the documents that matter the most


     MUST HAVES:
         Heterogeneous content sources:
             HTML, Documents and LOBs records
             Located on Portals, File Systems and in Databases

             Required Security Trimming:
                Integrate with Identity Providers (AD, LDAP, SQL)
                Implement authorization decision logic

             Able to take advantage of metadata stored with documents and LOBs


Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
Google Search Appliance 6.8
                   vs.
  FAST Search Server for SharePoint 2010
For metadata-driven search scenarios in a SharePoint environment
Portal Solutions Corporate Overview - Vitals

 • Founded in 2002
 • SharePoint 2010 Microsoft Gold
   Certified partner
 • Over 100 SharePoint deployments
 • 30+ certified engineers/developers
 • Member of Microsoft SharePoint
   Early Adopter Program
 • A recognized best place to work by
   Washingtonian magazine
 • A growing IT consulting organization
   comprised of talented and certified
   staff
Corporate Overview - Solutions
 •   Employee Portals and Intranets
 •   Public facing web sites
 •   Knowledge Management solutions
 •   Document and Records Management
 •   Performance And Risk Management/BI
 •   Customer Extranets
 •   Enterprise Search solutions
 •   Business Process Automation
Introducing the Contenders
Google Search Appliance (GSA)
   •   Search Appliance, Google.com in a box
   •   Hardware & Software Solution
   •   Pre-packaged functionality ready to work
   •   “Black box” approach to search results

FAST Search Server for SharePoint 2010
  • Spin off of the earlier FAST ESP
  • Software-only solution
  • Allows to customize many aspects of the engine functionality
    down to relevancy tuning algorithms
  • Platform rather than a product
Comparing FS4SP and GSA
•   Indexing Options
•   Approach to Security Trimming
•   Ranking Algorithms & Sorting Options
•   Metadata & Search Refinements
Content Crawl Options
                      GSA                       FAST                      SharePoint
 Content Pull         HTTP Crawler              SharePoint Crawler        SharePoint
                                                Enterprise Crawler        Crawler
 Content Push         XML Feed API              Feed API                  -

 Indexing LOBs (Pull) Onboard Database          Databases & Web Services Databases &
                      Connector                 via SharePoint BCS       Web Services via
                                                                         SharePoint BCS
 Connectors           SharePoint,               OTB: File System,         OTB: File
                      Documentum,               Exchange Public Folders   System,
                      LiveLink, FileNet, File                             Exchange Public
                      System, LDAP              Custom: Documentum,       Folders
                                                Lotus Notes
 External Metadata    Push through XML          Custom Stages in the      -
                      Feed API                  processing pipeline
 Cloud Connectivity   Google Apps & Sites;      Custom connectors         -
                      Tweeter;
Comparing FS4SP and GSA
•   Indexing Options
•   Approach to Security Trimming
•   Ranking Algorithms & Sorting Options
•   Metadata & Search Refinements
Security Trimming
• Answers the “Who Am I” and “What Results Can I See”
  questions
• Required with most Enterprise Search scenarios
• Approaches include Late & Early Authorization/Biding
 Authorization   Access Rights          Pros                     Cons
 Approach        (ACLs)

 Late            Checked at run      - Up-to-date presentation   - Slow on large
                 time against system                               result sets
                 of record

 Early           Information stored     - Fast                   - Duplicates info
                 in the index at item   - Facilitates metadata   - Potential for
                 level                    clustering               outdated results
Security Trimming Options Support
                        GSA                                 FAST                                 SharePoint
                                                                                                 2010
Late                    - “Default” option in               -                                    - Custom

Authorization             many scenarios
                        - Via Kerberos, SAML
                          Bridge or Connector
Early                   - Rel. 6.0 –High level   - Item-level ACLs for                           Native support
Authorization             Policy ACLs configured   Windows and                                   for Item-level
                          by admins or through a   SharePoint security                           ACLs with
                          remote API *             principals supported                          Windows and
                        - Rel. 6.8 – Item-level    natively                                      SharePoint
                          ACLs) **               - Allows to setup multiple                      security
                                                   user property stores and                      principals
                                                   map user principals


* Best applied to enterprises with a manageable number of high level policies, or able to invest into custom ACL sync tools
** SharePoint Connector Rel. 2.6.4 sends SharePoint Site Groups with the feed but the Groups are not expanded property by GSA
Comparing FS4SP and GSA
•   Indexing Options
•   Approach to Security Trimming
•   Ranking Algorithms & Sorting Options
•   Metadata & Search Refinements
Search Engine Internals
Result Set Ranking
• Fidelity of keyword matches (All Engines)
     • Proximity
     • Frequency
     • Completeness
• Hyper Text Matching (GSA only)
     • Analyzes keyword location on a rendered page and related pages
• Hub and Spoke Algorithm (All engines)
     • Driven by linkages between web pages
     • Pages receiving or providing most links have higher rankings
     • GSA – PageRank; FAST – Document authority;
• Static rank biasing, document importance
     • Document, Site, Metadata -based promotion / demotion (All engines)
     • User-tagged documents receive higher importance (FAST, SharePoint search)
• Adaptive ranking
     • User clicks in search results (FAST, SharePoint search)
 • Custom Ranking
     • Build custom ranking models w/ FAST
Result Set Sorting
• GSA
   • Date/Time only (Document Modification Date, or a date extracted
     from Title, Metadata or Body of a document)
• FAST
   • Any property marked as Sortable
   • Supported data types: String, Number, Date/Time
Comparing FS4SP and GSA
•   Indexing Options
•   Approach to Security Trimming
•   Ranking Algorithms & Sorting Options
•   Metadata & Search Refinements
Index Schema Management
• GSA (All-inclusive)
    • All discovered metadata (Crawled Properties) are stored in the index by default
    • Metadata from MS Office documents stored in the index results. (GSA Feature
      Request ID# 1371024)
    • All string-type metadata is associated with FTI by default, matches on metadata
      controlled through query time (allintext:, allintitle: keyword filters)
    • Metadata in results limited to 1,500 chars per field (Rel. 6.8; prev. releases – 320
      chars)
• FAST (Opt-in)
    • Crawled properties have to be associated with Managed Properties (MPs) to be
      stored in the index
    • MPs represent a level of abstraction from Content Sources
    • MPs can be configured to be used as:
        •   Stored in the index (Queryable)
        •   Associated with FTI (Searchable)
        •   Sortable
        •   Refiner-enabled
Search Refinement with Metadata
 Approach      Completeness           Pros                           Cons
 Run-time      Smaller sample of      - Smaller index size           - Degraded
 clustering    much larger set;                                        performance w/
               Top 50-100 query                                        larger samples
               results.                                              - No cluster counts
 Index-based   Entire result set      - Fast                         - Increases index
 clustering    stored in the index.   - Allows for precise cluster     size
                                        counts
Search Refinement with Metadata
               GSA                          FAST                  SharePoint
                                                                  2010
 Run-time      - The only option prior to   - OTB                 - OTB
 clustering      Rel. 6.8 (Custom)

 Index-based   - “Preview” status in Rel. - OTB for MPs marked as - Not available
 clustering      6.8 (OTB)                  Refinable
                                          - Inverted Index and
                                            Metadata Property Store
                                            combined into a high
                                            performance OLAP cube
Conclusions*


               • SharePoint intranet as a hub +                    • Heterogeneous content sources
      FAST




                                                          GSA
                 document libraries, LOBs;                           dominated by web pages
               • Search results served from the                    • Search UI served by GSA
                 SharePoint portal                                 • Predominantly Keyword –driven
               • Active Directory -tied systems w/                   search experience,
                 content security policies applied                 • Custom run-time search refiners for
                 broadly                                             protected content; OTB “Dynamic
               • Fine level of control over index                    Navigation” for LOB / public data
                 schema and document processing                    • Result biasing via URL patterns,
               • Custom search results ranking /                     metadata values
                 relevancy models                                  • Medium complexity metadata-based
               • High complexity metadata-based                      search scenarios
                 search scenarios
               • Full & Mini Search-driven
                 applications



* Usage scenarios best aligned with OTB functionality, minimum possible customizations.
Special Offer
First ten attendees to sign up will receive a two-hour evaluation of
your current or planned enterprise search strategy.

For more information contact:
    Val Orekhov - val@portalsolutions.net
Questions

More Related Content

What's hot

Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...
Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...
Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...Cloudera, Inc.
 
Getting Knowledge Transfer Right Enterprise Wide Webinar
Getting Knowledge Transfer Right Enterprise Wide WebinarGetting Knowledge Transfer Right Enterprise Wide Webinar
Getting Knowledge Transfer Right Enterprise Wide WebinarConcept Searching, Inc
 
Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Peter Conradie
 
Taxonomy mgt in sp 2010 netwoven presentation slides
Taxonomy mgt in sp 2010   netwoven presentation slidesTaxonomy mgt in sp 2010   netwoven presentation slides
Taxonomy mgt in sp 2010 netwoven presentation slidesntenany
 
Groundbreaking and Game-changing Enterprise Search Webinar
Groundbreaking and Game-changing Enterprise Search WebinarGroundbreaking and Game-changing Enterprise Search Webinar
Groundbreaking and Game-changing Enterprise Search WebinarConcept Searching, Inc
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 
Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technologyStefanos Anastasiadis
 

What's hot (7)

Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...
Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...
Hadoop World 2011: Completing the Big Data Picture Understanding Why and Not ...
 
Getting Knowledge Transfer Right Enterprise Wide Webinar
Getting Knowledge Transfer Right Enterprise Wide WebinarGetting Knowledge Transfer Right Enterprise Wide Webinar
Getting Knowledge Transfer Right Enterprise Wide Webinar
 
Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...Exploring Process Barriers to Release Public Sector Information in Local Gove...
Exploring Process Barriers to Release Public Sector Information in Local Gove...
 
Taxonomy mgt in sp 2010 netwoven presentation slides
Taxonomy mgt in sp 2010   netwoven presentation slidesTaxonomy mgt in sp 2010   netwoven presentation slides
Taxonomy mgt in sp 2010 netwoven presentation slides
 
Groundbreaking and Game-changing Enterprise Search Webinar
Groundbreaking and Game-changing Enterprise Search WebinarGroundbreaking and Game-changing Enterprise Search Webinar
Groundbreaking and Game-changing Enterprise Search Webinar
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
Web search engines and search technology
Web search engines and search technologyWeb search engines and search technology
Web search engines and search technology
 

Viewers also liked

KMWorld Martin Briefing
KMWorld Martin BriefingKMWorld Martin Briefing
KMWorld Martin Briefingmartingarland
 
conceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business ValueconceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business Valuemartingarland
 
ConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public SectorConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public Sectormartingarland
 
Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...martingarland
 
Webinar: Business Solutions and Metadata Design
Webinar:  Business Solutions and Metadata DesignWebinar:  Business Solutions and Metadata Design
Webinar: Business Solutions and Metadata Designmartingarland
 

Viewers also liked (6)

Hv Feb09
Hv Feb09Hv Feb09
Hv Feb09
 
KMWorld Martin Briefing
KMWorld Martin BriefingKMWorld Martin Briefing
KMWorld Martin Briefing
 
conceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business ValueconceptClassifier For SharePoint Driving Business Value
conceptClassifier For SharePoint Driving Business Value
 
ConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public SectorConceptClassifier for SharePoint Turbo Charging the Public Sector
ConceptClassifier for SharePoint Turbo Charging the Public Sector
 
Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...Webinar: Records Management in SharePoint combining Governance with Content T...
Webinar: Records Management in SharePoint combining Governance with Content T...
 
Webinar: Business Solutions and Metadata Design
Webinar:  Business Solutions and Metadata DesignWebinar:  Business Solutions and Metadata Design
Webinar: Business Solutions and Metadata Design
 

Similar to Concept Searching Portal Solutions Search Engine Face Off

Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...martingarland
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Concept Searching, Inc
 
Intelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePointIntelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePointConcept Searching, Inc
 
Going Meta in SharePoint – Tricks of the Trade
Going Meta in SharePoint – Tricks of the TradeGoing Meta in SharePoint – Tricks of the Trade
Going Meta in SharePoint – Tricks of the TradeConcept Searching, Inc
 
SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?Agnes Molnar
 
Data Breaches and Security Rights in SharePoint Webinar
Data Breaches and Security Rights in SharePoint WebinarData Breaches and Security Rights in SharePoint Webinar
Data Breaches and Security Rights in SharePoint WebinarConcept Searching, Inc
 
How To Drive Intelligent Migration Webinar
How To Drive Intelligent Migration WebinarHow To Drive Intelligent Migration Webinar
How To Drive Intelligent Migration WebinarConcept Searching, Inc
 
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...Overcoming Capability Gaps in Information Transparency, Knowledge Management,...
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...Concept Searching, Inc
 
Why Metadata Matters in SharePoint Search and Information Governance Webinar
Why Metadata Matters in SharePoint Search and Information Governance WebinarWhy Metadata Matters in SharePoint Search and Information Governance Webinar
Why Metadata Matters in SharePoint Search and Information Governance WebinarConcept Searching, Inc
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarConcept Searching, Inc
 
Driving Value in Content Services with Smart Capture
Driving Value in Content Services with Smart CaptureDriving Value in Content Services with Smart Capture
Driving Value in Content Services with Smart CaptureStephen Boals
 
ARMA Calgary Spring Seminar: The Nuts and Bolts of Metadata Tagging and Taxon...
ARMA Calgary Spring Seminar: The Nuts and Bolts of Metadata Tagging and Taxon...ARMA Calgary Spring Seminar: The Nuts and Bolts of Metadata Tagging and Taxon...
ARMA Calgary Spring Seminar: The Nuts and Bolts of Metadata Tagging and Taxon...Concept Searching, Inc
 
European SharePoint Conference Automated Tagging and Metadata Management w...
European SharePoint Conference   Automated Tagging and Metadata  Management w...European SharePoint Conference   Automated Tagging and Metadata  Management w...
European SharePoint Conference Automated Tagging and Metadata Management w...B-S-S Business Software Solutions GmbH
 
Enterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and PowerfulEnterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and PowerfulFindwise
 
84% of Migration Projects Fail – Getting it Right in SharePoint Webinar
84% of Migration Projects Fail – Getting it Right in SharePoint Webinar84% of Migration Projects Fail – Getting it Right in SharePoint Webinar
84% of Migration Projects Fail – Getting it Right in SharePoint WebinarConcept Searching, Inc
 
How to Get the Most Out of Search Webinar
How to Get the Most Out of Search WebinarHow to Get the Most Out of Search Webinar
How to Get the Most Out of Search WebinarConcept Searching, Inc
 
SPConnections - Search Administration in SharePoint 2013
SPConnections - Search Administration in SharePoint 2013SPConnections - Search Administration in SharePoint 2013
SPConnections - Search Administration in SharePoint 2013Agnes Molnar
 
Enterprise Data Architect Job Description
Enterprise Data Architect Job DescriptionEnterprise Data Architect Job Description
Enterprise Data Architect Job DescriptionLars E Martinsson
 
How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0Enterprise 2.0 Conference
 

Similar to Concept Searching Portal Solutions Search Engine Face Off (20)

Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
Webinar - The Swiss Army Knife for SharePoint 2010 – Tagging, Term Store and ...
 
SharePoint Fest Chicago Presentation
SharePoint Fest Chicago PresentationSharePoint Fest Chicago Presentation
SharePoint Fest Chicago Presentation
 
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
Eliminating End User Tagging – Minimizing Organizational Risk and Improving B...
 
Intelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePointIntelligent Metadata Enabled Migration with SharePoint
Intelligent Metadata Enabled Migration with SharePoint
 
Going Meta in SharePoint – Tricks of the Trade
Going Meta in SharePoint – Tricks of the TradeGoing Meta in SharePoint – Tricks of the Trade
Going Meta in SharePoint – Tricks of the Trade
 
SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?SPLive Orlando - Beyond the Search Center - Application or Solution?
SPLive Orlando - Beyond the Search Center - Application or Solution?
 
Data Breaches and Security Rights in SharePoint Webinar
Data Breaches and Security Rights in SharePoint WebinarData Breaches and Security Rights in SharePoint Webinar
Data Breaches and Security Rights in SharePoint Webinar
 
How To Drive Intelligent Migration Webinar
How To Drive Intelligent Migration WebinarHow To Drive Intelligent Migration Webinar
How To Drive Intelligent Migration Webinar
 
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...Overcoming Capability Gaps in Information Transparency, Knowledge Management,...
Overcoming Capability Gaps in Information Transparency, Knowledge Management,...
 
Why Metadata Matters in SharePoint Search and Information Governance Webinar
Why Metadata Matters in SharePoint Search and Information Governance WebinarWhy Metadata Matters in SharePoint Search and Information Governance Webinar
Why Metadata Matters in SharePoint Search and Information Governance Webinar
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations Webinar
 
Driving Value in Content Services with Smart Capture
Driving Value in Content Services with Smart CaptureDriving Value in Content Services with Smart Capture
Driving Value in Content Services with Smart Capture
 
ARMA Calgary Spring Seminar: The Nuts and Bolts of Metadata Tagging and Taxon...
ARMA Calgary Spring Seminar: The Nuts and Bolts of Metadata Tagging and Taxon...ARMA Calgary Spring Seminar: The Nuts and Bolts of Metadata Tagging and Taxon...
ARMA Calgary Spring Seminar: The Nuts and Bolts of Metadata Tagging and Taxon...
 
European SharePoint Conference Automated Tagging and Metadata Management w...
European SharePoint Conference   Automated Tagging and Metadata  Management w...European SharePoint Conference   Automated Tagging and Metadata  Management w...
European SharePoint Conference Automated Tagging and Metadata Management w...
 
Enterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and PowerfulEnterprise Search, Simple, Complex and Powerful
Enterprise Search, Simple, Complex and Powerful
 
84% of Migration Projects Fail – Getting it Right in SharePoint Webinar
84% of Migration Projects Fail – Getting it Right in SharePoint Webinar84% of Migration Projects Fail – Getting it Right in SharePoint Webinar
84% of Migration Projects Fail – Getting it Right in SharePoint Webinar
 
How to Get the Most Out of Search Webinar
How to Get the Most Out of Search WebinarHow to Get the Most Out of Search Webinar
How to Get the Most Out of Search Webinar
 
SPConnections - Search Administration in SharePoint 2013
SPConnections - Search Administration in SharePoint 2013SPConnections - Search Administration in SharePoint 2013
SPConnections - Search Administration in SharePoint 2013
 
Enterprise Data Architect Job Description
Enterprise Data Architect Job DescriptionEnterprise Data Architect Job Description
Enterprise Data Architect Job Description
 
How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0How Search 2.0 Has Been Redefined by Enterprise 2.0
How Search 2.0 Has Been Redefined by Enterprise 2.0
 

More from martingarland

Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...martingarland
 
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
Expert Webinar Series:  SharePoint Governance - Managing Content SprawlExpert Webinar Series:  SharePoint Governance - Managing Content Sprawl
Expert Webinar Series: SharePoint Governance - Managing Content Sprawlmartingarland
 
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...martingarland
 
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...martingarland
 
Webinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share PointWebinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share Pointmartingarland
 
Concept Searching Webinar Presentation
Concept Searching Webinar PresentationConcept Searching Webinar Presentation
Concept Searching Webinar Presentationmartingarland
 
Concept Searching Webinar
Concept Searching WebinarConcept Searching Webinar
Concept Searching Webinarmartingarland
 
Concept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePointConcept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePointmartingarland
 

More from martingarland (8)

Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
Expert Webinar Series 5: "De-mystifying Content Types - Four Key Content...
 
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
Expert Webinar Series:  SharePoint Governance - Managing Content SprawlExpert Webinar Series:  SharePoint Governance - Managing Content Sprawl
Expert Webinar Series: SharePoint Governance - Managing Content Sprawl
 
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
Expert Webinar Series 2: Designing Information Architecture for SharePoint: M...
 
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...Webinar:  Does the SharePoint 2010 Term Store Seem Like Alphabet Soup?  Find ...
Webinar: Does the SharePoint 2010 Term Store Seem Like Alphabet Soup? Find ...
 
Webinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share PointWebinar: The How To Guide For Taxonomies In Share Point
Webinar: The How To Guide For Taxonomies In Share Point
 
Concept Searching Webinar Presentation
Concept Searching Webinar PresentationConcept Searching Webinar Presentation
Concept Searching Webinar Presentation
 
Concept Searching Webinar
Concept Searching WebinarConcept Searching Webinar
Concept Searching Webinar
 
Concept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePointConcept Searching ConceptClassifier For SharePoint
Concept Searching ConceptClassifier For SharePoint
 

Concept Searching Portal Solutions Search Engine Face Off

  • 1. Search Engine Face-Off Keyword Search versus Metadata Search Don Miller, VP of Business Development Val Orekhov, VP of Business Development 1 (408) 828-3400 1 (240) 450-2166 x 103 donm@conceptsearching.com val@portalsolutions.net
  • 2. Concept Searching Don Miller Don Miller is a senior executive at ConceptSearching with over 20 years experience in knowledge management. He is a frequent speaker about Records Management and Information Architecture problems and solutions. Don has been a guest speaker at Taxonomy Bootcamp, Management Electronic Records and numerous SharePoint events about information organization and records management. Don Miller, VP of Business Development * 1 (408) 828-3400 * donm@conceptsearching.com Portal Solutions Val Orekhov Val Orekhov, Chief Architect for Portal Solutions is deeply skilled in Enterprise Application Development, Web development, portals, relational databases and data access, modeling, and is versed in a number of programming languages and technologies. He has been with Portal Solutions for almost five years and drives the technical team to excel year over year. He holds a Master of Science in Computer Science from Kyrgyz Technical University in Bishkek, Kyrgyzstan. Val Orekhov, Chief Technical Architect * (1) (240) 450-2166 x 103 * val@portalsolutions.net
  • 3. Agenda  ConceptSearching:  Keyword vs Metadata  Keyword vs Metadata Costs  Google vs. SharePoint vs. FAST  What’s wrong with a manual metadata approach  Automated approaches  USAF Case Study  Portal Solutions:  Enterprise Search – Google vs FAST in SharePoint  Indexing Options  Approach to Security Trimming  Ranking Algorithms & Sorting Options  Metadata & Search Refinements  Questions and Answers  Demo of product if time permits Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 4. Concept Searching, Inc. Company founded in 2002  Product launched in 2003  Focus on management of structured and unstructured information  Technology  All technologies based on our ‘open conceptualTagging framework’  Automatic concept identification, content tagging, auto- classification, taxonomy management  Only statistical vendor that can extract conceptual metadata  2009 and 2010 ‘100 Companies that Matter in KM’ (KM World Magazine)  KMWorld ‘Trend Setting Product’ of 2009 and 2010  Locations: US, UK, & South Africa Client base: Fortune 500/1000 organizations  Microsoft Enterprise Search ISV , FAST Partner  Product Suite: conceptSearch, conceptTaxonomyManager, conceptClassifier, conceptClassifier for SharePoint, contentTypeUpdater for SharePoint Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 5. What Type of Search or Information Architecture Do You Need? Keyword Search = ~66%+ Metadata Search = 100% of results (Recall) of results (Recall) • Simple • Guided Navigation • No administration • Records Management • Good enough • Sensitive Information Removal Recall (information retrieval), a • Collaboration statistical measure (contrasted with precision), the fraction of (all) relevant • Improved Precision and material that are returned by a search Recall query Precision (information retrieval), • Evolution of Keyword the percentage of documents returned Search that are relevant Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 6. What Is Keyword vs. Metadata Costing You? Problem Pre Migration Search Records Management Data Privacy Protection •60% of stored •“It’s not about better •67% of data loss in •Average cost per documents are search” Records Management is exposed record is $197 obsolete •Less than 50% of content due to end user error and ranges from $90- •50% of documents are is correctly indexed, meta •It costs and organization $305 per record duplicates tagged or efficiently $180 per document to •70% of breaches are due •Requires resources to searchable recreate it when it is not to a mistake or malicious identify what •85% of relevant tagged correctly and intent by an should/not be migrated documents are never cannot be found organization’s own staff retrieved in search •Eliminate duplicate •Eliminate manual tagging •Eliminate inconsistent •Identify any type of Solution end user tagging organizationally defined documents & replace with automatic •Identify privacy data identification of multi- •Automatically declare privacy data exposures word concepts documents of record •Combines pattern •Identify and declare •Provide guided based on vocabulary and matching with associated records that were not navigation via the retention codes vocabulary previously identified taxonomy structure (i.e. •Automatically change the •Automatic Content Type •Identify high value concepts) Content Type and route updating enabling content •Go beyond dynamic to the Records workflows and rights •Migrating required clustering with Management repository management content to a structure conceptual clustering based on the taxonomies Benefit •Reduces migration •Taxonomy navigation •Savings of $4.00 - $7.04 •Average cost runs from costs is 36% - 48% faster per record by eliminating $225K to $35M •Ensures •Savings 2.5 hours manual tagging compliance and per user per day •Ensures compliance and protection of reduces potential content assets litigation exposures Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 7. Metadata Search vs. Keyword and Guided Navigation “Proposal” “Software License” “SLA” “Licensee” “Addendum” “License Agreement” “License” 100% of Results Results “Documents of Record” Metadata Search also known as “Recall” “Proposals” “Contract” 66% Key + Synonym Search “Proposal” Entity Extraction 33% Keyword Search 20-33% of results Entity extraction without complex rules is ineffective. It is just keyword Cost (Time, Money and Complex) match, which is what keyword search is, which is 33% effective. Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 8. Similar Features Against Total Number of Documents Returned Google SharePoint FAST Index 500 M + 100 M 500 M + Key Word /– 33% of Yes Yes – Good as Yes results Google or FAST Synonyms Up to Yes Yes Yes 50-66%+ of results Apply metadata No No Key Word only automatically for which equals 33% 100% of results of results Ranking Algorithm Non tunable Tunable Very Tunable + Best Bets: Does not improve number of results only how presented Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 9. What Is Missing To Get to 100% of Relevant Results in Every Search? Metadata Google SharePoint FAST Auto Classification No – No – Entity extraction, Missing 33-50% of Missing 33-50% of which is the same results on any results on any as keyword search particular topic particular topic 33% results. No RECALL results improvement with this approach Taxonomy No Yes, but can’t do No Management any thing with it in this release. Security issues for managing Term Store. Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 10. Miscellaneous Items to Review Google SharePoint FAST SharePoint Refiners Hard Yes – Easy to use Medium – Initial and Navigators with for standard search. release, does not counts. No counts on leverage Term Store results. yet. XML – RECALL Powershell based Customization Difficult Difficult Extendable Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 11. Summary • Google – Best for no administration, install and walk away. Usually missing 33%- 50% of results on any given topic because of missing metadata. Not easy to integrate refiners or navigators into SharePoint UI. • SharePoint Search – Cost effective, comes free with SharePoint. Search Algorithm is as good as FAST or Google. Also very easy to install and walk away. Limited extensibility. Easy integration for refiners and navigators (no counts). Also missing 50% of results on any topic. • FAST – Extremely customizable, but requires training or professional services to customize. Most likely Microsoft long term platform for search. Very scalable and can provide refiner counts. Still missing 33-50% of results from any given search because of metadata inconsistency. • However, they are all missing a true metadata strategy which is the only way to ensure 100% of results. Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 12. A Manual Metadata Approach Will Fail 95%+ Of The Time Issue Organizational Impact Inconsistent Less than 50% of content is correctly indexed, meta-tagged or efficiently searchable rendering it unusable to the organization (IDC) Subjective Highly trained Information Specialists will agree on meta tags between 33% - 50% of the time. (C. Cleverdon) Cumbersome - Expensive Average cost of manually tagging one item runs from $4 - $7 per document and does not factor in the accuracy of the meta tags nor the repercussions from mis-tagged content (Hoovers) Malicious Compliance End users select first value in list (Perspectives on Metadata, Sarah Courier) No perceived value for end user What’s in it for me? End user creates document, does not see value for organization nor risks associated with litigation and non conformance to policies. What have you seen Metadata will continue to be a problem due to inconsistent human behavior The answer to consistent metadata is an automated approach that can extract the meaning from content eliminating manual metadata generation yet still providing the ability to manage knowledge assets in alignment with the unique corporate knowledge infrastructure. Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 13. conceptClassifier’s TaxonomyManager Automated Metadata Approach Drives Business Value  Create enterprise automated metadata framework/model  Average return on investment minimum of 38% and runs as high as 600% (IDC) 1. Model and Validate  Apply consistent meaningful metadata to enterprise content  Incorrect meta tags costs an organization 6. Life Cycle 2. Automate Management Tagging $2,500 per user per year – in addition potential costs for non-compliance (IDC)  Guide users to relevant content with taxonomy navigation  Savings of $8,965 per year per user based on an 5. Records $80K salary (Chen & Dumais) Management 3. Findability  100% “Recall” of content, 35% Faster access to and PII content “Precision” 4. Business  Use automatic conceptual metadata Processes generation to improve Records Management  Eliminate inconsistent end user tagging at $4-$7 per record (Hoovers)  Improve compliance processes, eliminate potential privacy exposures Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 14. USAF Human Performance Clearinghouse GOAL : Leverage Existing USAF, AFDW, and AFMS License Agreements to Enable IM, RM, & Privacy & Security Compliance Requirements • DoDD 8320 (Data Sharing in a Net-Centric DoD) • DoDD 5015 (Records Management) Data Privacy • USAF Privacy Act Program & HIPAA • Freedom of Information Act (FOIA) Migration Migration Records Management Search eDiscovery & FOIA Tel: 703.246.9360 | Fax: 240.465.1182 Distribution Statement A: Approved for public release; distribution is unlimited. Distribution Statement A: Approved for public release; distribution is 311 ABG/PA No. 09-488, 16 Oct 2009 unlimited. 311 ABG/PA No. 09-488, 16 Oct 2009
  • 15. Taxonomy Improves “Precision” with Guided Refiners for “Proposals” • After 100% of Results are returned, leverage metadata for guided navigation and refiners • Use taxonomy/metadata structures before query and after query to guide users to the right document • Accelerate document finding [PRECISION] by a minimum of 35% I want all proposals in two specific regions. I could then have a guided refiner for vertical, amount, etc. Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 16. Dynamic Clustering Is Not Guided Navigation for “Proposals” • Brings back clusters • They are best guesses • They might help, they might make it worse • Better than nothing, but not a long term strategy or evolution of key word search Dynamic navigation (CLUSTERING) is ineffective. How does an information worker know when it is a good topic or not? This is NOT PRECISION! Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 17. Enterprise Search Comparison for SharePoint Google vs FAST Why Enterprise Search needs Metadata and Taxonomy Management  Recall – Ensures you bring back 100% of Results  Enhance Precision – Fastest way to filter to the right results so that you are looking at the documents that matter the most MUST HAVES:  Heterogeneous content sources:  HTML, Documents and LOBs records  Located on Portals, File Systems and in Databases  Required Security Trimming:  Integrate with Identity Providers (AD, LDAP, SQL)  Implement authorization decision logic  Able to take advantage of metadata stored with documents and LOBs Concept Searching • Don Miller • (408) 828-3400 • donm@conceptsearching.com
  • 18. Google Search Appliance 6.8 vs. FAST Search Server for SharePoint 2010 For metadata-driven search scenarios in a SharePoint environment
  • 19. Portal Solutions Corporate Overview - Vitals • Founded in 2002 • SharePoint 2010 Microsoft Gold Certified partner • Over 100 SharePoint deployments • 30+ certified engineers/developers • Member of Microsoft SharePoint Early Adopter Program • A recognized best place to work by Washingtonian magazine • A growing IT consulting organization comprised of talented and certified staff
  • 20. Corporate Overview - Solutions • Employee Portals and Intranets • Public facing web sites • Knowledge Management solutions • Document and Records Management • Performance And Risk Management/BI • Customer Extranets • Enterprise Search solutions • Business Process Automation
  • 21. Introducing the Contenders Google Search Appliance (GSA) • Search Appliance, Google.com in a box • Hardware & Software Solution • Pre-packaged functionality ready to work • “Black box” approach to search results FAST Search Server for SharePoint 2010 • Spin off of the earlier FAST ESP • Software-only solution • Allows to customize many aspects of the engine functionality down to relevancy tuning algorithms • Platform rather than a product
  • 22. Comparing FS4SP and GSA • Indexing Options • Approach to Security Trimming • Ranking Algorithms & Sorting Options • Metadata & Search Refinements
  • 23. Content Crawl Options GSA FAST SharePoint Content Pull HTTP Crawler SharePoint Crawler SharePoint Enterprise Crawler Crawler Content Push XML Feed API Feed API - Indexing LOBs (Pull) Onboard Database Databases & Web Services Databases & Connector via SharePoint BCS Web Services via SharePoint BCS Connectors SharePoint, OTB: File System, OTB: File Documentum, Exchange Public Folders System, LiveLink, FileNet, File Exchange Public System, LDAP Custom: Documentum, Folders Lotus Notes External Metadata Push through XML Custom Stages in the - Feed API processing pipeline Cloud Connectivity Google Apps & Sites; Custom connectors - Tweeter;
  • 24. Comparing FS4SP and GSA • Indexing Options • Approach to Security Trimming • Ranking Algorithms & Sorting Options • Metadata & Search Refinements
  • 25. Security Trimming • Answers the “Who Am I” and “What Results Can I See” questions • Required with most Enterprise Search scenarios • Approaches include Late & Early Authorization/Biding Authorization Access Rights Pros Cons Approach (ACLs) Late Checked at run - Up-to-date presentation - Slow on large time against system result sets of record Early Information stored - Fast - Duplicates info in the index at item - Facilitates metadata - Potential for level clustering outdated results
  • 26. Security Trimming Options Support GSA FAST SharePoint 2010 Late - “Default” option in - - Custom Authorization many scenarios - Via Kerberos, SAML Bridge or Connector Early - Rel. 6.0 –High level - Item-level ACLs for Native support Authorization Policy ACLs configured Windows and for Item-level by admins or through a SharePoint security ACLs with remote API * principals supported Windows and - Rel. 6.8 – Item-level natively SharePoint ACLs) ** - Allows to setup multiple security user property stores and principals map user principals * Best applied to enterprises with a manageable number of high level policies, or able to invest into custom ACL sync tools ** SharePoint Connector Rel. 2.6.4 sends SharePoint Site Groups with the feed but the Groups are not expanded property by GSA
  • 27. Comparing FS4SP and GSA • Indexing Options • Approach to Security Trimming • Ranking Algorithms & Sorting Options • Metadata & Search Refinements
  • 29. Result Set Ranking • Fidelity of keyword matches (All Engines) • Proximity • Frequency • Completeness • Hyper Text Matching (GSA only) • Analyzes keyword location on a rendered page and related pages • Hub and Spoke Algorithm (All engines) • Driven by linkages between web pages • Pages receiving or providing most links have higher rankings • GSA – PageRank; FAST – Document authority; • Static rank biasing, document importance • Document, Site, Metadata -based promotion / demotion (All engines) • User-tagged documents receive higher importance (FAST, SharePoint search) • Adaptive ranking • User clicks in search results (FAST, SharePoint search) • Custom Ranking • Build custom ranking models w/ FAST
  • 30. Result Set Sorting • GSA • Date/Time only (Document Modification Date, or a date extracted from Title, Metadata or Body of a document) • FAST • Any property marked as Sortable • Supported data types: String, Number, Date/Time
  • 31. Comparing FS4SP and GSA • Indexing Options • Approach to Security Trimming • Ranking Algorithms & Sorting Options • Metadata & Search Refinements
  • 32. Index Schema Management • GSA (All-inclusive) • All discovered metadata (Crawled Properties) are stored in the index by default • Metadata from MS Office documents stored in the index results. (GSA Feature Request ID# 1371024) • All string-type metadata is associated with FTI by default, matches on metadata controlled through query time (allintext:, allintitle: keyword filters) • Metadata in results limited to 1,500 chars per field (Rel. 6.8; prev. releases – 320 chars) • FAST (Opt-in) • Crawled properties have to be associated with Managed Properties (MPs) to be stored in the index • MPs represent a level of abstraction from Content Sources • MPs can be configured to be used as: • Stored in the index (Queryable) • Associated with FTI (Searchable) • Sortable • Refiner-enabled
  • 33. Search Refinement with Metadata Approach Completeness Pros Cons Run-time Smaller sample of - Smaller index size - Degraded clustering much larger set; performance w/ Top 50-100 query larger samples results. - No cluster counts Index-based Entire result set - Fast - Increases index clustering stored in the index. - Allows for precise cluster size counts
  • 34. Search Refinement with Metadata GSA FAST SharePoint 2010 Run-time - The only option prior to - OTB - OTB clustering Rel. 6.8 (Custom) Index-based - “Preview” status in Rel. - OTB for MPs marked as - Not available clustering 6.8 (OTB) Refinable - Inverted Index and Metadata Property Store combined into a high performance OLAP cube
  • 35. Conclusions* • SharePoint intranet as a hub + • Heterogeneous content sources FAST GSA document libraries, LOBs; dominated by web pages • Search results served from the • Search UI served by GSA SharePoint portal • Predominantly Keyword –driven • Active Directory -tied systems w/ search experience, content security policies applied • Custom run-time search refiners for broadly protected content; OTB “Dynamic • Fine level of control over index Navigation” for LOB / public data schema and document processing • Result biasing via URL patterns, • Custom search results ranking / metadata values relevancy models • Medium complexity metadata-based • High complexity metadata-based search scenarios search scenarios • Full & Mini Search-driven applications * Usage scenarios best aligned with OTB functionality, minimum possible customizations.
  • 36. Special Offer First ten attendees to sign up will receive a two-hour evaluation of your current or planned enterprise search strategy. For more information contact: Val Orekhov - val@portalsolutions.net