SlideShare a Scribd company logo
1 of 37
Download to read offline
+




    Taxonomy 101
    Controlled Vocabularies and Beyond
            Barbara McGlamery, Marthastewart.com
+
    About Me

       9+ years Time Inc.
           Entertainment Weekly
           This Old House
           Time
           People
           Instyle
           Recipe Finder


       1+ years Martha Stewart
           Martha Stewart Living
           Martha Stewart Weddings
           Whole Living
+
    Agenda


     Basics
           of taxonomy and controlled
     vocabularies
     Developing   a taxonomy
     Taxonomy   software and tagging tools
     Records   management and taxonomy
+
    What is a controlled vocabulary?


        Predefined,
                   authorized terms that
        can be consistently applied to
        content

        Types:
          Lists
          Synonym rings
          Authority Files
          Facets
+
    What is a taxonomy?


       Classificationof a controlled
        vocabulary in a hierarchical list

        Types:
         Taxonomy
         Thesaurus
         Ontology
+
    Controlled Vocabulary


     Predefined,authorized terms
     that can be consistently applied
     to content


     Relationshipis between the list
     value and class
+
    Controlled Vocabulary


       Units   of Measure
          Cup
          Tablespoon
          Teaspoon
+
    Synonym Ring



      Extendsa CV by adding synonyms as
      equivalent terms



      Relationship   is between list value and its
      synonyms
+
    Synonym Ring



       Units   of Measure
          Cup = C= c
          Tablespoon = Tbl = T
          Teaspoon – tsp = t
+
    Authority File

 Extends    CV’s and synonym rings further by
    assigning one term as the preferred term
    which all other synonyms will point to


 Relationship   assigns property (Preferred
    Term) to one term and all others as
    synonyms
+
    Authority File


      Units   of Measure
           (Preferred Term) Cup
             Syn: C, c



           (PT) Tablespoon
             Syn: Tbl, T



           (PT) Teaspoon
             Syn: tsp, t
+
    Facets


     Termsare broken down individually by
     unique properties, allowing a mix and match
     approach to search and retrieval


     Relationshipis between one facet node and
     multiple values
+
    Facets
+
    Taxonomy


     Classificationof a controlled vocabulary in
     a hierarchical list



     Relationship   is in assigning a hierarchy to
     list values
+
    Taxonomy


          Food
           Main     Ingredient
              Vegetables (ahem…fruit)
                  Tomatoes
                    Beefsteak tomatoes
                    Cherry tomatoes
                    Sundried tomatoes
+
    Thesaurus

     CV’sin a hierarchical structure with
     predefined relationships between terms
     (Broader Term, Narrower Term, Preferred
     Term, etc.)


     Relationship is in assigning standardized
     properties to list values
+
    Thesaurus


     Food
       (BT)     Main Ingredient
          (BT)Vegetables (ahem…fruit)
              (BT)Tomatoes
                  (NT)Beefsteak tomato
                    (NT)(PT)Cherry tomato
                      (RT) Roma tomato
                    (NT)Sundried tomato
                    (RT) Tomato sauce
+
    Ontology


     CV’s in a hierarchical structure with complex
     relationships defined


     Relationship
                 is in assigning predetermined
     standardized and freeform properties to list
     values
+
    Ontology

                Beefsteaktomatoes
                (isMainIngredient)
                Tomato sauce




                   Will
                       Smith
                   (isLeadActor)
                   Men in Black 3
+
    Semantic (semantic) Web
                      Big S
                          Initiative from W3C to create a web of
                           machine readable data by marking up
                           content with consistently applied,
                           standardized and freeform properties
                            RDF/OWL
                            Proprietary

                      Little s
                          Various standards that mark up content
                           with agreed-upon and freeform
                           properties
                            Microformats
                            Microdata
                            Proprietary
+
    Pros and Cons of CV’s and
    taxonomy
       Benefits
           Greater precision in search and retrieval
           Allows for faceted browsing
           Facilitates aggregation of content
           Clearly defines relationships between things

       Limitations
           Initial costs
           Upkeep
           Can spiral out of control
           May be too complex for some organizations
+
    What is taxonomy used for in web
    world?
     Search  and retrieval
     Faceted browsing
     Aggregation
      of content
     Internal organization
      of assets
+
    Developing a taxonomy

       Strategy and planning

       Choosing style and method

       Determine classes and relationships

       Gather terms and organize

       Add terms and relationships

       Review and approval
+
    Strategy and Planning

        Identify business case
            ROI
                Money saved
                Money earned
            Scope
            Use cases
                Front-end
                Back-end
        Approval
        Wireframes and functional specification
+
    Choose Style and Method

      Method         Styles

       Top down       CV

       Bottom up      Synonym    ring
                       Authority file
                       Facets
                       Taxonomy
                       Thesaurus
                       Ontology
+
    Determine Classes and
    Relationships
         Classes
           As few as necessary



         Relationships between terms
           As few as necessary

           With a taxonomy, determine nature of hierarchy

             Type of

           With a thesaurus, use predefined, but you may not want
            to use all
           With ontology, determine complex relationships
+
    Gather Terms and Organize

        Research
          Competitive analysis

          Identify existing outside CV’s that might be utilized (SIC
           codes)
        Meet with stakeholders
          Get as much input as possible

          Stick to biz case (spiraling problem)

        You are the final decision maker
          Must conform to structure decided upon otherwise mass
           chaos
          Always keep use cases in mind
+
    Add Terms and Relationships

       Things to keep in mind:
           Synonyms, misspellings, special characters
           Homonyms
             Different database identifiers or different names
               Shower (baby and bathroom)
           Duplicates
             Technical considerations if different children
               Breads as a main ingredient or as a dish
                 Bruschetta (dish, but not main ingredient)
           Descriptions
             Identifying duplicates or notes regarding the application to content
+
    Review and Approval


      Thorough  review by all stakeholders
       This can take several sessions if
        taxonomy is big

      Final approval and sign-off
       Critical for buy-in
+
    Taxonomy and Tagging Tools

   Relational databases              Thesaurus and taxonomy tools
       Filemaker Pro                     Open source
       Microsoft Access                      Protégée
       MySql                             Commercial
                                              SchemaLogic (Thesaurus)
   Content management                        TopBraid Composer,
    software                                   (Ontologies), Pro
       Drupal
                                      Auto categorization and text
       Sharepoint
                                       mining
       Proprietary applications
                                          Data Harmony MAIstro,
                                          Nstein
+
    Tagging the Content

       Manual
           Good for small, controlled sets of documents
             Highly accurate
             Time consuming


       Automated
           Good for large unwieldy sets of documents
             Fast and getting more accurate daily
             Expensive, 3rd party apps


       Hybrid
           Manual – content or document creators insert valuable metadata
           Automated – other data extracted and matched to taxonomy
+
    Real World Application of Taxonomy
    for Records Management


     Classifying

     Storing   and retrieving
     Securing

     Archiving   or destroying
+
    Real World Applications

       CV                                     Taxonomy/Thesaurus
           List of Departments (HR, IT,           Organizational chart
            Marketing)                               Investment Bank Director
                                                       SVP Investments
       Synonym rings
                                                         EVP Investments
           Mergers and acquisitions = M
            and A = M&A                                     Investment Analyst


       Authority File
           (PT) Mergers and acquisitions      Ontology
             Syn: M and A, M&A                    Relationships between
                                                    affiliations and
                                                    departments/industries
       Facets
                                                     ARMA (isProfessionalAssn)
           Authors, Departments,
                                                       for Records Managers
            Security Level
+
    What could it be used for in your
    world?




http://www.yutope.com/2008/07/is-your-email-inbox-overflowing/
+
    Industry standards
       Taxonomy specific
           Dublin Core (DC)
           Thesaurus construction
             ANSI/NISO Z39.19
             ISO 2788; 5964
           Ontology development
             W3C
               Resource Description Framework (RDF)
               Web Ontology Language (OWL)


       Records Management specific
           Metadata management
             ISO/S 23081-1
             ISO 23081-2
+



    Questions?
+
    My contact info


     BarbaraMcGlamery
     Taxonomist
     Martha Stewart Living Omnimedia

     (212)827-8817
     bmcglamery@marthastewart.com

More Related Content

Similar to Taxonomy 101

Chapter 9Enterprise Content and Record ManagementSt. Rit
Chapter 9Enterprise Content and Record ManagementSt. RitChapter 9Enterprise Content and Record ManagementSt. Rit
Chapter 9Enterprise Content and Record ManagementSt. RitJinElias52
 
Looking Under the Hood -- Australia SharePoint Conference
Looking Under the Hood -- Australia SharePoint ConferenceLooking Under the Hood -- Australia SharePoint Conference
Looking Under the Hood -- Australia SharePoint ConferenceChristian Buckley
 
How your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doHow your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doChristian Buckley
 
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You Do
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You DoLooking Under the Hood: How Your Metadata Strategy Impacts Everything You Do
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You DoChristian Buckley
 
Managed metadata in_share_point_2010
Managed metadata in_share_point_2010Managed metadata in_share_point_2010
Managed metadata in_share_point_2010G. Scott Singleton
 
SPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doSPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doChristian Buckley
 
SharePoint Taxonomy Introduction
SharePoint Taxonomy IntroductionSharePoint Taxonomy Introduction
SharePoint Taxonomy IntroductionChris Woodill
 
Introduction To Controlled Vocabularies
Introduction To Controlled VocabulariesIntroduction To Controlled Vocabularies
Introduction To Controlled VocabulariesFred Leise
 
You Say Dog I Say Canine
You Say Dog I Say CanineYou Say Dog I Say Canine
You Say Dog I Say Canineaubreymm
 
Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalA. LE
 
ProQuest Taxonomy Boot Camp Presentation 2008
ProQuest Taxonomy Boot Camp Presentation 2008ProQuest Taxonomy Boot Camp Presentation 2008
ProQuest Taxonomy Boot Camp Presentation 2008Synaptica, LLC
 
Classification, Tagging & Search
Classification, Tagging & SearchClassification, Tagging & Search
Classification, Tagging & SearchJames Melzer
 
Taxonomies - An Executive Summary
Taxonomies - An Executive SummaryTaxonomies - An Executive Summary
Taxonomies - An Executive Summarydbromberg
 
the secret life of metadata
the secret life of metadatathe secret life of metadata
the secret life of metadatabonniestrong
 
Putting Controlled Vocabulary To Work I Davis 2008
Putting Controlled Vocabulary To Work I Davis 2008Putting Controlled Vocabulary To Work I Davis 2008
Putting Controlled Vocabulary To Work I Davis 2008Ian Davis
 

Similar to Taxonomy 101 (20)

Chapter 9Enterprise Content and Record ManagementSt. Rit
Chapter 9Enterprise Content and Record ManagementSt. RitChapter 9Enterprise Content and Record ManagementSt. Rit
Chapter 9Enterprise Content and Record ManagementSt. Rit
 
Looking Under the Hood -- Australia SharePoint Conference
Looking Under the Hood -- Australia SharePoint ConferenceLooking Under the Hood -- Australia SharePoint Conference
Looking Under the Hood -- Australia SharePoint Conference
 
Thesauri
ThesauriThesauri
Thesauri
 
How your metadata strategy impacts everything you do
How your metadata strategy impacts everything you doHow your metadata strategy impacts everything you do
How your metadata strategy impacts everything you do
 
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You Do
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You DoLooking Under the Hood: How Your Metadata Strategy Impacts Everything You Do
Looking Under the Hood: How Your Metadata Strategy Impacts Everything You Do
 
Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013Taxonomy Fundamentals Workshop 2013
Taxonomy Fundamentals Workshop 2013
 
Managed metadata in_share_point_2010
Managed metadata in_share_point_2010Managed metadata in_share_point_2010
Managed metadata in_share_point_2010
 
SPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you doSPSBOS -- How your metadata strategy impacts everything you do
SPSBOS -- How your metadata strategy impacts everything you do
 
SharePoint Taxonomy Introduction
SharePoint Taxonomy IntroductionSharePoint Taxonomy Introduction
SharePoint Taxonomy Introduction
 
Introduction To Controlled Vocabularies
Introduction To Controlled VocabulariesIntroduction To Controlled Vocabularies
Introduction To Controlled Vocabularies
 
You Say Dog I Say Canine
You Say Dog I Say CanineYou Say Dog I Say Canine
You Say Dog I Say Canine
 
Introduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information RetrievalIntroduction into Search Engines and Information Retrieval
Introduction into Search Engines and Information Retrieval
 
ProQuest Taxonomy Boot Camp Presentation 2008
ProQuest Taxonomy Boot Camp Presentation 2008ProQuest Taxonomy Boot Camp Presentation 2008
ProQuest Taxonomy Boot Camp Presentation 2008
 
Classification, Tagging & Search
Classification, Tagging & SearchClassification, Tagging & Search
Classification, Tagging & Search
 
Hybrid Approaches to Taxonomy & Folksonmy
Hybrid Approaches to Taxonomy & FolksonmyHybrid Approaches to Taxonomy & Folksonmy
Hybrid Approaches to Taxonomy & Folksonmy
 
Taxonomy And Metadata
Taxonomy And MetadataTaxonomy And Metadata
Taxonomy And Metadata
 
Taxonomies - An Executive Summary
Taxonomies - An Executive SummaryTaxonomies - An Executive Summary
Taxonomies - An Executive Summary
 
the secret life of metadata
the secret life of metadatathe secret life of metadata
the secret life of metadata
 
Putting Controlled Vocabulary To Work I Davis 2008
Putting Controlled Vocabulary To Work I Davis 2008Putting Controlled Vocabulary To Work I Davis 2008
Putting Controlled Vocabulary To Work I Davis 2008
 
Tools for Taxonomies
Tools for TaxonomiesTools for Taxonomies
Tools for Taxonomies
 

Taxonomy 101

  • 1. + Taxonomy 101 Controlled Vocabularies and Beyond Barbara McGlamery, Marthastewart.com
  • 2. + About Me  9+ years Time Inc.  Entertainment Weekly  This Old House  Time  People  Instyle  Recipe Finder  1+ years Martha Stewart  Martha Stewart Living  Martha Stewart Weddings  Whole Living
  • 3. + Agenda  Basics of taxonomy and controlled vocabularies  Developing a taxonomy  Taxonomy software and tagging tools  Records management and taxonomy
  • 4. + What is a controlled vocabulary?  Predefined, authorized terms that can be consistently applied to content  Types:  Lists  Synonym rings  Authority Files  Facets
  • 5. + What is a taxonomy? Classificationof a controlled vocabulary in a hierarchical list  Types:  Taxonomy  Thesaurus  Ontology
  • 6. + Controlled Vocabulary  Predefined,authorized terms that can be consistently applied to content  Relationshipis between the list value and class
  • 7. + Controlled Vocabulary  Units of Measure  Cup  Tablespoon  Teaspoon
  • 8. + Synonym Ring  Extendsa CV by adding synonyms as equivalent terms  Relationship is between list value and its synonyms
  • 9. + Synonym Ring  Units of Measure  Cup = C= c  Tablespoon = Tbl = T  Teaspoon – tsp = t
  • 10. + Authority File  Extends CV’s and synonym rings further by assigning one term as the preferred term which all other synonyms will point to  Relationship assigns property (Preferred Term) to one term and all others as synonyms
  • 11. + Authority File  Units of Measure  (Preferred Term) Cup  Syn: C, c  (PT) Tablespoon  Syn: Tbl, T  (PT) Teaspoon  Syn: tsp, t
  • 12. + Facets  Termsare broken down individually by unique properties, allowing a mix and match approach to search and retrieval  Relationshipis between one facet node and multiple values
  • 13. + Facets
  • 14. + Taxonomy  Classificationof a controlled vocabulary in a hierarchical list  Relationship is in assigning a hierarchy to list values
  • 15. + Taxonomy  Food  Main Ingredient  Vegetables (ahem…fruit)  Tomatoes  Beefsteak tomatoes  Cherry tomatoes  Sundried tomatoes
  • 16. + Thesaurus  CV’sin a hierarchical structure with predefined relationships between terms (Broader Term, Narrower Term, Preferred Term, etc.)  Relationship is in assigning standardized properties to list values
  • 17. + Thesaurus  Food  (BT) Main Ingredient  (BT)Vegetables (ahem…fruit)  (BT)Tomatoes  (NT)Beefsteak tomato  (NT)(PT)Cherry tomato  (RT) Roma tomato  (NT)Sundried tomato  (RT) Tomato sauce
  • 18. + Ontology  CV’s in a hierarchical structure with complex relationships defined  Relationship is in assigning predetermined standardized and freeform properties to list values
  • 19. + Ontology  Beefsteaktomatoes (isMainIngredient) Tomato sauce  Will Smith (isLeadActor) Men in Black 3
  • 20. + Semantic (semantic) Web  Big S  Initiative from W3C to create a web of machine readable data by marking up content with consistently applied, standardized and freeform properties  RDF/OWL  Proprietary  Little s  Various standards that mark up content with agreed-upon and freeform properties  Microformats  Microdata  Proprietary
  • 21. + Pros and Cons of CV’s and taxonomy  Benefits  Greater precision in search and retrieval  Allows for faceted browsing  Facilitates aggregation of content  Clearly defines relationships between things  Limitations  Initial costs  Upkeep  Can spiral out of control  May be too complex for some organizations
  • 22. + What is taxonomy used for in web world?  Search and retrieval  Faceted browsing  Aggregation of content  Internal organization of assets
  • 23. + Developing a taxonomy  Strategy and planning  Choosing style and method  Determine classes and relationships  Gather terms and organize  Add terms and relationships  Review and approval
  • 24. + Strategy and Planning  Identify business case  ROI  Money saved  Money earned  Scope  Use cases  Front-end  Back-end  Approval  Wireframes and functional specification
  • 25. + Choose Style and Method  Method  Styles  Top down  CV  Bottom up  Synonym ring  Authority file  Facets  Taxonomy  Thesaurus  Ontology
  • 26. + Determine Classes and Relationships  Classes  As few as necessary  Relationships between terms  As few as necessary  With a taxonomy, determine nature of hierarchy  Type of  With a thesaurus, use predefined, but you may not want to use all  With ontology, determine complex relationships
  • 27. + Gather Terms and Organize  Research  Competitive analysis  Identify existing outside CV’s that might be utilized (SIC codes)  Meet with stakeholders  Get as much input as possible  Stick to biz case (spiraling problem)  You are the final decision maker  Must conform to structure decided upon otherwise mass chaos  Always keep use cases in mind
  • 28. + Add Terms and Relationships  Things to keep in mind:  Synonyms, misspellings, special characters  Homonyms  Different database identifiers or different names  Shower (baby and bathroom)  Duplicates  Technical considerations if different children  Breads as a main ingredient or as a dish  Bruschetta (dish, but not main ingredient)  Descriptions  Identifying duplicates or notes regarding the application to content
  • 29. + Review and Approval  Thorough review by all stakeholders  This can take several sessions if taxonomy is big  Final approval and sign-off  Critical for buy-in
  • 30. + Taxonomy and Tagging Tools  Relational databases  Thesaurus and taxonomy tools  Filemaker Pro  Open source  Microsoft Access  Protégée  MySql  Commercial  SchemaLogic (Thesaurus)  Content management  TopBraid Composer, software (Ontologies), Pro  Drupal  Auto categorization and text  Sharepoint mining  Proprietary applications  Data Harmony MAIstro,  Nstein
  • 31. + Tagging the Content  Manual  Good for small, controlled sets of documents  Highly accurate  Time consuming  Automated  Good for large unwieldy sets of documents  Fast and getting more accurate daily  Expensive, 3rd party apps  Hybrid  Manual – content or document creators insert valuable metadata  Automated – other data extracted and matched to taxonomy
  • 32. + Real World Application of Taxonomy for Records Management  Classifying  Storing and retrieving  Securing  Archiving or destroying
  • 33. + Real World Applications  CV  Taxonomy/Thesaurus  List of Departments (HR, IT,  Organizational chart Marketing)  Investment Bank Director  SVP Investments  Synonym rings  EVP Investments  Mergers and acquisitions = M and A = M&A  Investment Analyst  Authority File  (PT) Mergers and acquisitions  Ontology  Syn: M and A, M&A  Relationships between affiliations and departments/industries  Facets  ARMA (isProfessionalAssn)  Authors, Departments, for Records Managers Security Level
  • 34. + What could it be used for in your world? http://www.yutope.com/2008/07/is-your-email-inbox-overflowing/
  • 35. + Industry standards  Taxonomy specific  Dublin Core (DC)  Thesaurus construction  ANSI/NISO Z39.19  ISO 2788; 5964  Ontology development  W3C  Resource Description Framework (RDF)  Web Ontology Language (OWL)  Records Management specific  Metadata management  ISO/S 23081-1  ISO 23081-2
  • 36. + Questions?
  • 37. + My contact info  BarbaraMcGlamery Taxonomist Martha Stewart Living Omnimedia (212)827-8817 bmcglamery@marthastewart.com

Editor's Notes

  1. Why are we here?
  2. (MS centerpiece, Time Obama, RIM Industry name TK, banking?) This is a list, everything else we will talk about today has some kind of relationshipDonna: CVs provide the data to populate metadatafields Offer consistency in language used to describecontent] Act as an intermediary between the input of theuser and a database of terms by interpreting themeaning of the words Provide agreement in (semantic) meaning of termsused Facilitate retrieval Enable search input to better represent theoriginal intention of the user Provide consistent and clear hierarchies fornavigation
  3. You’ll notice facets are both a cv and a taxonomy as they can have a hierarchical structure as well as a (Pets:Animal>Mammal>Dog>Poodle, Food:MI>Veg>Peppers>Chili peppers, RIM example TK)
  4. Pros: EasyCons: Little context
  5. Pros: Only need one relationship of term to synonymCon: No preferred terms
  6. Pros: Preferred terms can be used for browse and displayCons: little context other than syn or official term
  7. Pro: Easy to implementCon: 1 parent/1 child
  8. Can be hierarchical, ackack
  9. Pro: Multiple levels of hierarchy allowing for multiple parent/child relationshipsCon: Can spin wildly out of control as you attempt to classify the universe
  10. Pro: Rock solid industry standardCon: Limited relationships
  11. Part of semantic web (both big and little S)Pro: Allows for complex relationships between things to be expressedCon: Spin out of control, can be dif for systems to retrieve and make use of relationships
  12. Microformats reuse existing html/xml tags to convey metadata Pro: highly extendable, Con:
  13. And may just not be appropriate for your company
  14. How does this work out for us in web world?Clickthroughs and return site visits, pure and simple
  15. All based on use cases
  16. Breads Bruschetta is bread dish but not MI
  17. RetentionDocument storagediscovery
  18. ISO/S 23081-1ISO 23081-2