1. +
Taxonomy 101
Controlled Vocabularies and Beyond
Barbara McGlamery, Marthastewart.com
2. +
About Me
9+ years Time Inc.
Entertainment Weekly
This Old House
Time
People
Instyle
Recipe Finder
1+ years Martha Stewart
Martha Stewart Living
Martha Stewart Weddings
Whole Living
3. +
Agenda
Basics
of taxonomy and controlled
vocabularies
Developing a taxonomy
Taxonomy software and tagging tools
Records management and taxonomy
4. +
What is a controlled vocabulary?
Predefined,
authorized terms that
can be consistently applied to
content
Types:
Lists
Synonym rings
Authority Files
Facets
5. +
What is a taxonomy?
Classificationof a controlled
vocabulary in a hierarchical list
Types:
Taxonomy
Thesaurus
Ontology
6. +
Controlled Vocabulary
Predefined,authorized terms
that can be consistently applied
to content
Relationshipis between the list
value and class
7. +
Controlled Vocabulary
Units of Measure
Cup
Tablespoon
Teaspoon
8. +
Synonym Ring
Extendsa CV by adding synonyms as
equivalent terms
Relationship is between list value and its
synonyms
9. +
Synonym Ring
Units of Measure
Cup = C= c
Tablespoon = Tbl = T
Teaspoon – tsp = t
10. +
Authority File
Extends CV’s and synonym rings further by
assigning one term as the preferred term
which all other synonyms will point to
Relationship assigns property (Preferred
Term) to one term and all others as
synonyms
11. +
Authority File
Units of Measure
(Preferred Term) Cup
Syn: C, c
(PT) Tablespoon
Syn: Tbl, T
(PT) Teaspoon
Syn: tsp, t
12. +
Facets
Termsare broken down individually by
unique properties, allowing a mix and match
approach to search and retrieval
Relationshipis between one facet node and
multiple values
16. +
Thesaurus
CV’sin a hierarchical structure with
predefined relationships between terms
(Broader Term, Narrower Term, Preferred
Term, etc.)
Relationship is in assigning standardized
properties to list values
18. +
Ontology
CV’s in a hierarchical structure with complex
relationships defined
Relationship
is in assigning predetermined
standardized and freeform properties to list
values
19. +
Ontology
Beefsteaktomatoes
(isMainIngredient)
Tomato sauce
Will
Smith
(isLeadActor)
Men in Black 3
20. +
Semantic (semantic) Web
Big S
Initiative from W3C to create a web of
machine readable data by marking up
content with consistently applied,
standardized and freeform properties
RDF/OWL
Proprietary
Little s
Various standards that mark up content
with agreed-upon and freeform
properties
Microformats
Microdata
Proprietary
21. +
Pros and Cons of CV’s and
taxonomy
Benefits
Greater precision in search and retrieval
Allows for faceted browsing
Facilitates aggregation of content
Clearly defines relationships between things
Limitations
Initial costs
Upkeep
Can spiral out of control
May be too complex for some organizations
22. +
What is taxonomy used for in web
world?
Search and retrieval
Faceted browsing
Aggregation
of content
Internal organization
of assets
23. +
Developing a taxonomy
Strategy and planning
Choosing style and method
Determine classes and relationships
Gather terms and organize
Add terms and relationships
Review and approval
24. +
Strategy and Planning
Identify business case
ROI
Money saved
Money earned
Scope
Use cases
Front-end
Back-end
Approval
Wireframes and functional specification
25. +
Choose Style and Method
Method Styles
Top down CV
Bottom up Synonym ring
Authority file
Facets
Taxonomy
Thesaurus
Ontology
26. +
Determine Classes and
Relationships
Classes
As few as necessary
Relationships between terms
As few as necessary
With a taxonomy, determine nature of hierarchy
Type of
With a thesaurus, use predefined, but you may not want
to use all
With ontology, determine complex relationships
27. +
Gather Terms and Organize
Research
Competitive analysis
Identify existing outside CV’s that might be utilized (SIC
codes)
Meet with stakeholders
Get as much input as possible
Stick to biz case (spiraling problem)
You are the final decision maker
Must conform to structure decided upon otherwise mass
chaos
Always keep use cases in mind
28. +
Add Terms and Relationships
Things to keep in mind:
Synonyms, misspellings, special characters
Homonyms
Different database identifiers or different names
Shower (baby and bathroom)
Duplicates
Technical considerations if different children
Breads as a main ingredient or as a dish
Bruschetta (dish, but not main ingredient)
Descriptions
Identifying duplicates or notes regarding the application to content
29. +
Review and Approval
Thorough review by all stakeholders
This can take several sessions if
taxonomy is big
Final approval and sign-off
Critical for buy-in
30. +
Taxonomy and Tagging Tools
Relational databases Thesaurus and taxonomy tools
Filemaker Pro Open source
Microsoft Access Protégée
MySql Commercial
SchemaLogic (Thesaurus)
Content management TopBraid Composer,
software (Ontologies), Pro
Drupal
Auto categorization and text
Sharepoint
mining
Proprietary applications
Data Harmony MAIstro,
Nstein
31. +
Tagging the Content
Manual
Good for small, controlled sets of documents
Highly accurate
Time consuming
Automated
Good for large unwieldy sets of documents
Fast and getting more accurate daily
Expensive, 3rd party apps
Hybrid
Manual – content or document creators insert valuable metadata
Automated – other data extracted and matched to taxonomy
32. +
Real World Application of Taxonomy
for Records Management
Classifying
Storing and retrieving
Securing
Archiving or destroying
33. +
Real World Applications
CV Taxonomy/Thesaurus
List of Departments (HR, IT, Organizational chart
Marketing) Investment Bank Director
SVP Investments
Synonym rings
EVP Investments
Mergers and acquisitions = M
and A = M&A Investment Analyst
Authority File
(PT) Mergers and acquisitions Ontology
Syn: M and A, M&A Relationships between
affiliations and
departments/industries
Facets
ARMA (isProfessionalAssn)
Authors, Departments,
for Records Managers
Security Level
34. +
What could it be used for in your
world?
http://www.yutope.com/2008/07/is-your-email-inbox-overflowing/
35. +
Industry standards
Taxonomy specific
Dublin Core (DC)
Thesaurus construction
ANSI/NISO Z39.19
ISO 2788; 5964
Ontology development
W3C
Resource Description Framework (RDF)
Web Ontology Language (OWL)
Records Management specific
Metadata management
ISO/S 23081-1
ISO 23081-2
37. +
My contact info
BarbaraMcGlamery
Taxonomist
Martha Stewart Living Omnimedia
(212)827-8817
bmcglamery@marthastewart.com
Editor's Notes
Why are we here?
(MS centerpiece, Time Obama, RIM Industry name TK, banking?) This is a list, everything else we will talk about today has some kind of relationshipDonna: CVs provide the data to populate metadatafields Offer consistency in language used to describecontent] Act as an intermediary between the input of theuser and a database of terms by interpreting themeaning of the words Provide agreement in (semantic) meaning of termsused Facilitate retrieval Enable search input to better represent theoriginal intention of the user Provide consistent and clear hierarchies fornavigation
You’ll notice facets are both a cv and a taxonomy as they can have a hierarchical structure as well as a (Pets:Animal>Mammal>Dog>Poodle, Food:MI>Veg>Peppers>Chili peppers, RIM example TK)
Pros: EasyCons: Little context
Pros: Only need one relationship of term to synonymCon: No preferred terms
Pros: Preferred terms can be used for browse and displayCons: little context other than syn or official term
Pro: Easy to implementCon: 1 parent/1 child
Can be hierarchical, ackack
Pro: Multiple levels of hierarchy allowing for multiple parent/child relationshipsCon: Can spin wildly out of control as you attempt to classify the universe
Pro: Rock solid industry standardCon: Limited relationships
Part of semantic web (both big and little S)Pro: Allows for complex relationships between things to be expressedCon: Spin out of control, can be dif for systems to retrieve and make use of relationships