This paper reports the efforts to make topic maps from Subject Headings (SHs) and discuss practical use of them for organizing information and knowledge. SHs are often maintained by libraries and used in bibliographic records. SHs are thesauri and they are well organized. Fortunately some SHs are published on the Web. We transformed them to topic maps. Usually each subject in SHs has own ID. It can play PSI role. By keeping the relationships included in SHs such as Broader-Narrower, Related, USE-UF etc in topic maps, information or knowledge can be linked together and organized according to the structure of SHs. In other words, by using SHs information and knowledge can be topic maps easily.
Subject Headings make information to be topic maps
1. Subject Headings make information to be
topic maps
2010-9-30
Motomu Naito
Center for Integrated Area Studies (CIAS)
Kyoto University
motom@green.ocn.ne.jp
Ψ http://psi.ontopedia.net/Motomu_Naito
http://www.cias.kyoto-u.ac.jp/english/CIAS/
3. 1. Background: Area Study and Area Informatics
This activity is a part of activities of Area Informatics in Center
for Integrated Area Study (CIAS) in Kyoto university
Area Study is an Interdisciplinary Science
Understanding/comparing areas comprehensively
Diverse languages/subjects/disciplines/methodologies:
• history, literature, religions, politics, economics, ethnology, folklore,
agriculture, environment, etc.
Area Informatics
Informatics paradigm in area studies
Focusing on quantitative analysis
• Objective, comparative and reproducible approaches
• Spatiotemporal attributes of events
Knowledge discovery supports
• Integration of disciplines
• Creation of hypotheses
Source: Shoichiro Hara, TMJP2010,
http://www.knowledge-synergy.com/events/documents/TMJP2010-hara.pdf
4. Model of Area Informatics
Source: Shoichiro Hara, TMJP2010
5. 2.Purpose
- Making and maintaining well organized knowledge is very hard
and time consuming work
- There have been many well organized knowledge
(ex: NDLSH, BSH, LCSH, JST thesaurus, etc.)
- Fortunately some Subject Headings (SHs) are published on the web
and we can use them (ex: NDLSH, LCSH)
Purpose of our activity:
To make good system for linking and organizing Area
Studies related information
Purpose of today’s presentation:
To report and discuss about our efforts to make topic
maps and PSI from SHs
4
6. 3.Subject Headings
What is Subject Headings:
Wikipedia redirects “Subject Headings” to “Index term” and
define the term as
“An index term, subject term, subject heading, or descriptor, in
information retrieval, is a term that captures the essence of the
topic of a document. Index terms make up a controlled vocabulary
for use in bibliographic records.”
(http://en.wikipedia.org/wiki/Index_term)
・We are working on the following SHs at the moment
- NDLSH, BSH and LCSH
・Probably we can find much more SHs in various countries
- German SH, Norwegian SH, Finnish SH, Thai SH, etc.
5
7. 3.1 NDLSH
・ NDLSH: National Diet Library Subject Headings, in Japan
・We are making topic map from NDLSH 2008 Version
- Subject Headings:17,953
- Subject Headings + Reference words:47,816 (47,377)
- BT-NT relation:13,220 RT relation: 9,738
- USE-UF relation with LCSH: 11,663
・Conversion from the SH to Topic Map
- Subject Headings -> Topics
- BT-NT, RT, USE-UF relation -> Associations
- USE-UF, SA relation, Scope note, reading, … -> Occurrences
・ SHs have each own ID that can be used as PSI (e.g. 00574308)
・ If NDLSH shares PSI with LCSH, it can be merged with LCSH
・ NDLSH was exposed on the Web
We can download it from http://id.ndl.go.jp/auth/ndlsh 6
8. Some part of NDLSH
Subject Headings around “ビール: Beer”
7
12. NDLSH topic map application
Screen shots of the application
11
13. 3.2 LCSH
・ LCSH : Library of Congress Subject Headings in US
・ We are making topic map from LCSH
- We downloaded it from “http://id.loc.gov/authorities/”
- Subject Headings : 380, 123
- BT-NT : 254,651 RT : 11,137
・ RDF (SKOS) to Topic Maps using Omnigator
- SH (core:Concept) -> Topics
- BT-NT, RT relation -> Associations
- scopeNote, created, modified, comment etc. -> Occurrences
・ SHs have each own identifiers as URI that can be used as PSIs
(e.g. http://id.loc.gov/authorities/sh85000002#concept)
・ LCSH has already exposed on the Web in consideration of
Linked data
12
14. Some part of LCSH
Subject Headings around “Beer”
13
15. Origianal data
LCSH is provided as RDF format data
<rdf:Description rdf:about="http://id.loc.gov/authorities/sh85012832#concept">
: :
<skos:narrower rdf:resource="http://id.loc.gov/authorities/sh97006323#concept"/>
<skos:broader rdf:resource="http://id.loc.gov/authorities/sh85080196#concept"/>
<skos:closeMatch
rdf:resource="http://stitch.cs.vu.nl/vocabularies/rameau/ark:/12148/cb11965887d"/>
<skos:inScheme rdf:resource="http://id.loc.gov/authorities#conceptScheme"/>
<skos:inScheme rdf:resource="http://id.loc.gov/authorities#topicalTerms"/>
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:related rdf:resource="http://id.loc.gov/authorities/sh85003341#concept"/>
<skos:related rdf:resource="http://id.loc.gov/authorities/sh85016775#concept"/>
<skos:related rdf:resource="http://id.loc.gov/authorities/sh85031951#concept"/>
<skos:prefLabel xml:lang="en">Beer</skos:prefLabel>
<owl:sameAs rdf:resource="info:lc/authorities/sh85012832"/>
<dcterms:modified
rdf:datatype="http://www.w3.org/2001/XMLSchema#dateTime">1989-03-
22T15:09:28-04:00</dcterms:modified>
</rdf:Description>
14
17. LCSH topic map application
Screen shots of the application
18. 4. Practical use of Subject Headings
Many practical uses are possible
For example:
・ Organizing internal and external information according to SHs
・ Multilanguage mapping using LCSH as a core system
・ Mutual complementing of our concept classification and SHs
・ SH providing web service using TMRAP
・ Using SHs as PSI
・ Using SHs as common test data for TM engines, TM Query
engines, etc.
17
19. (1) Organizing information according to SHs
Example: Organizing Wikipedia according to SHs
・Available links to Wikipedia (NDLSH: 12051, BSH: 6086)
Subject Headings
around “Beer”
18
20. Organizing Wikipedia
Beer The world around “Beer” in NDLSH
Wine Amenities of life
Fruit liquor
Hop Wines and Spirits
Beer Brandy Liquor
Distilled liquor
Whiskey
Malt
Barley
19
21. Organizing Wikipedia
We can easily generate Wikipedia’s address
“http://ja.wikipedia.org/wiki/” + “ビール” (SH)
20
22. (2) Mapping between multi-language
If each language is mapped to LCSH, multi-language mapping
will be achieved LCSH (English)
NDLSH or BSH (Japanese) Norwegian SH
merge Øl (Norwegian)
merge
ビール Beer
merge
merge
e.g. Japanese Norwegian mapping via LCSH (English)
21
24. (3) Mutual complementing
- Sometimes SHs doesn’t have enough subjects or vocabulary though
it is very hard to gather enough subjects from scratch by ourselves
- By merging our own subjects with SHs we can get enriched subjects
23
25. (4) Web service for providing Subject Headings
Subject Heading providing web service using TMRAP
SH providing
Client Web service
Information from
client’s Web Topic Maps Request SH Topic Maps
application
Web Application Web Application
SH related
information - JSP Page - JSP Page
Return
Ontopia SH related Ontopia
- Navigator Framework TM fragments - Navigator Framework
- Query engine - Query engine
Topic Map SH Topic Map
“Term or
Subject”
“Subject” topic
27. 6. Challenges
(1) Attach or extract subjects to/from information
In order to organize information , we need
・attach subject to information by human
- tagging systems are required
・extract subjects from information
- subject extraction systems are required
(2) Large data
・We can’t convert large RDF data to topic map at the moment
because of out of memory
We had to omit “skos:altLabel”, “owl:sameAs”, etc.
We need scalable and stable environment for big files
(3) Type or Instance?
・We are treating each Subject Heading as instance topic
But probably, Subject Headings are type topics
We want to make topic map treating those as type topics
26
28. 7.Conclusion & Future work No.1
・ CIAS has already stored huge amount of information that is wanted
to be topic maps
・ Many well organized knowledge such as NDLSH, BSH, LCSH, etc.
have already existed
・ We are making topic maps and their web application from them
・ Topic maps can inherit Subject Headings and their relationships
such as BT-NT, RT and USE-UF naturally
・ According to the relationships, information can be linked and
organized, in other words, to be topic maps
・ By providing Subject Headings as topic maps and PSI for use in
the context of Linked Topic Maps, they will become powerful
elements and they will be used in many way
27
29. 7. Conclusion & Future work No.2
・ To make our own ontologies
・ Continue to try our information to be topic maps
according to our ontologies and the SHs
・ Continue to try to achieve multi-language mapping
using the SHs
・ Try to merge our domain subjects with the SHs
・ Try to find out and realize good ways to link the SHs
with information resources
・ Try to realize the web service for providing the SHs
・ Others (Many, Many, Many, …. )
28