Science 7 - LAND and SEA BREEZE and its Characteristics
Inquiry Optimization Technique for a Topic Map Database
1. Inquiry Optimization Technique
for a Topic Map Database
Yuki Kuribara
(Graduate School of Engineering,
Shibaura Institute of Technology)
Masaomi Kimura
(Information Engineering,
Shibaura Institute of Technology)
2. Contents
Background
Research contents
Experimental
Conclusion
2 Data Engineering Lab 2010/10/6
3. Topic maps
Recently, many kinds of topic maps are created
For web portal site
For application development… and so on
When we target the large topic maps, we need to construct
databases for them
since databases can deal with the data larger than the size of physical
memory
Out of memory
On memory
3 Data Engineering Lab 2010/10/6
4. The role of database
Database systems should take responsibility for managing
information of topic maps
Query optimization
Transaction management
Physical data structure hiding
Query
optimization
query
Physical data
information Transaction
structure
of topic map management
hiding
Database system
4 Data Engineering Lab 2010/10/6
5. The physical data model for databases
We propose to utilize the object oriented model for the
databases
There are several options of data models for the databases
A relational model (table) and an object oriented model are mainly used
in topic map databases
When we crawl on the topic map to retrieve information, an
object oriented model needs not to join tables multiple times
unlike a relational model
A relational model An object oriented model
Object A Object B
5 Data Engineering Lab 2010/10/6
6. The logical data model for databases
We assumed the topic map data structure defined by the topic
maps data model (TMDM)
since topic maps should follow TMDM!!
The data model consists of seven types of information items
and 19 types of named properties
We implemented these items as classes, whose instance have reference
relationships to other corresponding information item objects
Association 0..* 1 TopicMap
+associations +parent
+parent 1 1 +parent
+roles 0..* 0..* +topics
AssociationRole 0..* 1 Topic 1 0..* TopicName
+roles +player +parent +topicNames
6 Data Engineering Lab 2010/10/6
7. The possibility of plural retrieval routes
The database systems need to select most suitable
retrieval route (Query optimization)
When we retrieve the information of topic map, there may be
more than one way to retrieve the same objects
We can retrieve objects efficiently by searching method
Association 0..* 1 TopicMap
+associations +parent
+parent 1 1 +parent
+roles 0..* 0..* +topics
AssociationRole 0..* 1 Topic 1 0..* TopicName
+roles +player +parent +topicNames
7 Data Engineering Lab 2010/10/6
8. Query optimization
The database should take responsibility for query
optimization
Database systems need to estimate the suitable execution plan
the database system may take very long retrieval time without the query
optimization
Though there are some topic map database systems, they
seem not to take the optimization into consideration
8 Data Engineering Lab 2010/10/6
9. Objective
We propose the optimization technique based on the
estimation of execution cost
In this presentation, we focus on retrieval of topic objects that
are referred by a specific association with a particular topic
e.g.) we want to know that what Conan Doyle write?
Intended topic A particular topic
Specified in the query
A study in
Scarlet Conan Doyle
write
A specific association
9 Data Engineering Lab 2010/10/6
10. Retrieval plan - the association route
e.g.) What did Conan Doyle write?
We search the association
objects ‘write’
1
A study in write Conan
Scarlet Doyle
2
2
We find the intended We search the topic object
topic objects ‘Conan Doyle’
10 Data Engineering Lab 2010/10/6
11. Retrieval plan - the topic route
e.g.) What did Conan Doyle write?
We search the topic object
‘Conan Doyle’
3 1
2
A study in Conan Doyle
write
Scarlet
We find intended We again search the association objects ‘write’
topics referred by the association role objects
11 Data Engineering Lab 2010/10/6
12. Estimation of execution cost
We define the estimation
formulae for the retrieval cost of each plan
Systems have to choose the most suitable plan
It is necessary to define the cost which can effectively estimate
the retrieval time (cost estimation)
cost : 10
query Route A
information Route B
of topic map cost : 100
12 Data Engineering Lab 2010/10/6
13. Cost of objects - definition of cost
We measured the total execution time and the retrieval time
of objects
The object retrieval time dominates the processing time more
than 99%
It is enough to measure the time to retrieve objects to
evaluate the cost of query processing
Execution time
of retrieval
Retrieval time Retrieval time of
Execution Time The ratio of object
of objects (B) objects :
(A) (nano sec) retrieval time (B/A)
(nano sec)
More than 99%
Association
6.025×108 5.991×108 99.44 (%)
Route
Topic Other time :
1.035×108 1.033×10 8
99.81 (%) Less than 1%
Route
13 Data Engineering Lab 2010/10/6
14. Cost estimation formula
for the association route
We need to retrieve all associations
since multiple associations may have A study in Conan
1
the same name Scarlet Doyle
write
Cassoc_ route Ca N 2Car Ct
N
2 2
1 Q
2
The cost is doubled since we retrieve We approximate the number of
two topics both sides of the association associations with the specified name by
the average number of associations per
their unique name
14 Data Engineering Lab 2010/10/6
15. Cost estimation formula
for the topic route
The average times of topic retrieval 3 1
( note that each topic must have a
A study in Conan
unique name ) 2
Scarlet Doyle
write
Ctopic_ route Ct Car Ca Car
M 2N 2N
2 M MQ
1 2 3
The average number of associations
The average number of associations per
that have the name specified by the
topic
query
15 Data Engineering Lab 2010/10/6
16. Experiment
In order to demonstrate our method, we applied our
technique to TOME
TOME is a prototype topic map database developed by authors
As target topic maps, we selected following two that have
different sizes
Rampo Edogawa* topic map
# of topics:29 (his name, his works and his hometown)
# of associations:15 (his works and his hometown)
Pokemon topic map
# of topics:174 (Pokemon names and their attributes)
# of associations:432 (evolutional and attribute relationships)
*Rampo Edogawa is a famous mystery story writer in Japan.
16 Data Engineering Lab 2010/10/6
17. Evaluation of cost estimation formulae
In order to evaluate our cost estimation formulae, we
measured the execution time of a query and compared the
tendency of the value of cost
We can see the tendencies :
the less estimated costs are, the short the execution time is
The average time of query execution The evalueated cost for each query
(nano sec) execution plan
Topic Maps
The association The association
The topic route The topic route
route route
Rampo Edogawa
Topic Map 31 < 157 133.2 < 164.0
Pokemon
Topic Map 297 > 31 2533 > 697.7
17 Data Engineering Lab 2010/10/6
18. Conclusion
We proposed the optimization technique based on the
estimation of execution cost
We showed that there are possibly more than one way to retrieve the
same objects
We defined the cost estimation formulae for the retrieval cost of each
plan
We estimated our optimization technique
The result of our experiment shows that we can see a proportional
tendency of the retrieval time and the object size
We can also see the tendencies that estimated costs are small in the
case that the execution time is short
18 Data Engineering Lab 2010/10/6
19. Thank you for your kind attention
19 Data Engineering Lab 2010/10/6
20. The effect of buffers
If the objects existing on the memory are required to be
loaded, a buffer shortens the retrieval time
the cost estimated by the formulae needs to be modified (reduced)
because of the effect of buffers
In our target query, there are two cases that the buffer is
used :
The topic existing on
The Sign Conan the memory is loaded
of Four Doyle from buffer
The topic for association A Study
name existing on the in Scarlet
memory is also loaded
Write
from buffer
20 Data Engineering Lab 2010/10/6
21. The coefficients of buffer
In our target query, we need two coefficients :
For retrieval of topic
M M
r 1
2N 2N
The probability that the topic do not
exist on buffer
For retrieval of topic for the association names
r : the effective retrieval
Q Q ratio of cost for buffer
r 1 N:the number of
N N
association objects
The probability that the topic for the M:the number of
association names do not exist on topic objects
Q:the number of unique
buffer
association names
21 Data Engineering Lab 2010/10/6
22. The modified cost estimation formulae
Taking the buffering effect into consideration, we modify the
cost estimation formulae into this
The contribution of loading topic name objects is also taken into
consideration
Cassoc_ route Ca Ct Ctn N 2Car Ct Ctn
N
Q
Ctopic_ route Ct Ctn Car Ca Ct Ctn Car Ct Ctn
M 2N 2N
2 M MQ
22 Data Engineering Lab 2010/10/6
23. Cost estimation formula
for the association route
We define the cost estimation formula as follows
C1 Ca Ct Ctn N 2Car Ct Ctn
N
Q
Q Q TMDM permits the redundant existence of
r 1 multiple associations that have the same name
N N Retrieval of
M M TopicMap objects
r 1 We assume that the association roles are
2N 2N Retrieval of
Retrieval of Topic Retrieval of TopicName
uniformly assigned to associationsare defined
objects that are defined objects that
Association objects
N:the number of as the Association name as the Association name
association objects
M:the number of Retrieval of
topic objects AssociationRole objects
Q:the number of unique
Retrieval of TopicName
association names Retrieval of
objects that are defined
Topic objects
as the Topic name
23 Data Engineering Lab 2010/10/6
24. The accurate cost estimation formula
for the association route
Cassoc_ route Ca Ct Ctn N 2Car Ct Ctn
N
Q
We have to consider
the retrieval cost of We have to consider the retrieval
topic and topic cost of topic name objects and
name objects and effect of buffer
effect of buffer
Cassoc_ route Ca N 2Car Ct
N
Q Ca: the retrieval cost of
association objects
Q Q Car: the retrieval cost of
r 1
N N association role objects
N:the number of association objects Ct: the retrieval cost of
M M M:the number of topic objects topic objects
r 1 Q:the number of Ctn: the retrieval cost of
2N 2N unique association names topic name objects
24 Data Engineering Lab 2010/10/6
25. Cost estimation formula
for the topic route
We define the cost estimation formula as follows
C2 Ct Ctn Car Ca Ct Ctn Car Ct Ctn
M 2N 2N
2 M MQ
Retrieval of
TopicMap objects
TMDM permits the existence of only one topic
Retrieval of Retrieval of TopicName objects
Topic objects that are defined as the Topic name name
that has the same
Retrieval of
AssociationRole objects Regarding the topic map as a graph, this is equal
to the average degree
Retrieval of Retrieval of Topic objects that are Retrieval of TopicName objects that
Association objects defined as the Association name are defined as the Association name
Retrieval of
We assume that the association roles are
AssociationRole objects uniformly assigned to associations
Retrieval of Retrieval of TopicName objects
Topic objects that are defined as the Topic name
25 Data Engineering Lab 2010/10/6
26. The accurate cost estimation formula
for the topic route
Ctopic_ route Ct Ctn Car Ca Ct Ctn Car Ct Ctn
M 2N 2N
2 M MQ
We have to We have to consider We have to
consider the the retrieval cost of consider the
retrieval cost of topic objects and retrieval cost of
topic name topic name objects topic name objects
objects and effect of buffer and effect of buffer
Car Ca Car
M 2N 2N
Ctopic_ route Ct
2 M MQ Ca: the retrieval cost of
association objects
Q Q Car: the retrieval cost of
r 1
N N association role objects
N:the number of association objects Ct: the retrieval cost of
M M M:the number of topic objects topic objects
r 1 Q:the number of Ctn: the retrieval cost of
2N 2N unique association names topic name objects
26 Data Engineering Lab 2010/10/6
27. Result-Cost estimation of an object of each
class
We can see a similar tendency between the retrieval
time and the object size
The normalized value The object The normalized value
The retrieval time
Topic Maps The object name by setting the retrieval time Size by setting the object size
(nano sec)
to be 1 (byte) to be 1
The retrieval time of
topic 969200 3.34 608 4.75
The retrieval time of
Rampo topicname 496700 1.71 376 2.94
Edogawa
The retrieval time of
Topic Map
associationrole 289900 1 128 1
The retrieval time of
association 562600 1.94 376 2.94
The retrieval time of
topic 1053000 5.5 608 4.75
The retrieval time of
Pokemon topicname 501600 2.62 376 2.94
Topic Map The retrieval time of
associationrole 191400 1 128 1
The retrieval time of
association 577700 3.02 376 2.94
27 Data Engineering Lab 2010/10/6
28. Retrieval cost of each object
We measured the retrieval time and the object size of each
object
The result tells us that the retrieval time is almost proportional to the
object size
Based on this, we define the cost as an object size scale factor
( the ratio of object size to association role objects)
We can see a similar tendency between the
retrieval time and the object size
The normalized value by setting
Topic Maps The object name Object size scale factor
the retrieval time to be 1
Topic object 5.5 4.75
Pokemon
Topic name object 2.62 2.94
Topic Map
Association role object 1 1
Association object 3.02 2.94
28 Data Engineering Lab 2010/10/6
29. Future perspective
We will apply our method to other topic maps that have much
larger size
Our target topic maps are less than 1000 topics
We need to confirm the universality of cost estimate formulae by
evaluating of various topic maps
We will develop the mechanism to measure the size of objects
in a topic map
Since the size of objects depends on each topic map, we have to
measure it to set the value of costs adequate to evaluate execution plan
29 Data Engineering Lab 2010/10/6
30. Reference
M. Naito:An Introduction to Topic Maps. Tokyo Denki University
Press, 2006.
Yuki Kuribara, Takeshi Hosoya, Masaomi Kimura : TOME : The
Topic Map Database Extended, 2009
Ontopia:tolog Language tutorial.
http://www.ontopia.net/
ISO/IEC JTC1/SC34, Topic Map – Data Model
http://www.isotopicmaps.org/sam/sam-model/
Pokemon Topic Map
http://www.ontopia.net/omnigator/models/topicmap_complete
.jsp?tm=pokemon.ltm
Pajek, http://vlado.fmf.uni-lj.si/pub/networks/pajek/
30 Data Engineering Lab 2010/10/6