SlideShare a Scribd company logo
1 of 48
Download to read offline
Presented by:
ChemsEddine Berbague STIC May. 2015
Supervisor: Pr.Seridi Hassina
Co-supervisor: Dr. Beldjoudi Samia
Jury members:
Dr. Hariati Mehdi
Dr. Mendjel Mehdi
Master Project Presentation
«Association Rules Mining: Ontological Approach»
University of Badji Mokhtar-Annaba
Computer Science Departement
2
• The work presented in the next slides is partially taken
and improved from the work of :
▫ Claudia Marinica. 2010 (Association Rule Interactive Post-processing
using Rule Schemas and Ontologies - ARIPSO).
Note
3
Timeline
2
3
5
6
Introduction
Existing Approaches
Proposed Approach
Application
Conclusion
4
Introduction
2
3
4
5
Introduction
Existing Approaches
Proposed Approach
Application
Conclusion
Context
Problematic
5
Context and project axe
This Project is about two main tasks:
• Knowledge extraction.
• Ontologies enrichment.
• Axe : mining ontologies using association rules to extract useful
knowledge.
6
Knowledge extraction: general scheme and definition
"...extracting from data original information, previously
unknown, and potentially useful."
[fayyad et al.,1996]
7
Data mining: association rules
8
STEP 1: generate frequent item-sets
STEP 2: generate association rules
Reduce the number of item-sets using support threshold
Reduce the number of comparisons
FP-Groth algorithm
Calculate support
Hashsets
Using advanced data structure
8
• APRIORI is one of well-known algorithms used for association rules
extraction. It identifies frequent sets from transactional datasets.
Data mining: association rules algorithm
9
Step1 :
generating
frequent sets
10
Step 2 : generating
association rules
11
The steps of APRIORI algorithm
Step 5
For every frequent set m, generate
all non-empty subsets E
Step 6
For each sub-set non-empty s of E,
generate the rules: "s => (E-s)" if the
confidence C [support (s) / support(E))]>
min_conf
Step 3
Scan the transactional dataset to get the support of
each k-item-set, then filter the set in regard to
min_sup, to get the set 𝑳𝑘 of most frequent k-item-
sets
Step 4
Set of
candidates
= Null
Step 1
Scan the transactional dataset to get the support of
each 1-itemsset
Step 2
use 𝐿𝑘−1 join 𝐿𝑘−1 to generate the set of k-
itemsets candidates. No
Yes
12
• APPRIORI is limited in the different steps of the extraction process:
 Wide number of rules.
 Semantically meaningless confidence and support measures.
 User help is required to extract the targeted rules.
 The complexity of the algorithm is O(b).
APRIORI: limitations
13
• Advantages : unsupervised technique, readable results , full sets
• limits: big volume and low quality of the extracted rules:
• invalid statistically.
▫ Onions => pain
• redundant:
▫ R1: X, Y=> Z [c];X => Y [c1]; X => Z [c2]
▫ c1>c or c2>c => R1 is redundant
• Known by the expert
▫ X => Y (rule can be acquired from the context)
• useless for the expert
▫ X => Y (rule is semantically meaningless such as apple implies skirt)
• Difficulty of the manual analyze
• The complexity of the algorithm
▫ Complexity O(b)
• Need:
▫ Eliminate the un-interesting rules.
▫ Target the rules of quality.
Data mining: association rules problematic
« an ontology is an explicit and formal
specification of a shared
conceptualization" [Gruber,1993]
14
Knowledge engineering:
the ontologies
« introducing an ontology in an
information system allows to reduce the
conceptual and terminological confusion
and offers a shared understanding that
enhances the communication, the
sharing, , the interpretation, and the
possible re-using"[gandon,2006]
Formal definition:
O={C,G,I,P}
C=Concepts- elements of the domain.
G= Graph of concepts- relation is-one
I=Instances – individuals of the concept
P=Properties- relation between concepts
Food
product
Fruit
grape
red grape
green
grape
appel pear
Dairy
product
milk
cheese
butter
Meat
chicken
beef
15
« semantic web is a part of the current web in which the information is represented
semantically, and allows machines and users to better function
together."
[berners-lee et al.,2001]
• Knowledge representation languages:
▫ RDF,OWL,...
▫ OWL-DL is based on the description logic and can be defined by an accurate and
decidable formalism.
• Reasoning engine:
▫ action-classification of concepts ,test of coherence et test of instantiation.
▫ Fact, Racer, Pellet,...
▫ Querying language: SparQL.
Knowledge engineering: semantic web
16
• Increase the use of ontologies in the process of association rules
extraction:
• Convert the ontologies intro a transaction dataset.
▫ Benefit from the semantic richness to improve the quality of association
rules.
▫ Reduce the complexity of the classical association rules algorithms.
Objectives
17
I. A new method to extract transactional information from the
ontologies.
II. Developing an application that allows to extract, validate, and
visualize the association rules.
III. Using the Framework HADOOP to extract frequent item-sets.
IV. Experimentations on NiceTag ontology.
Contribution
18
Timeline
2
3
4
5
Introduction
Existing approaches
Proposed approach
Application
Conclusion
Complexity problem
Quality problem
Conclusion
19
• FP-Growth identifies all frequent item-sets without generating candidate item-sets.
• Approach of two steps:
▫ Step 1: Build a compact data structure named FP-tree. This step requires to pass by the
dataset.
▫ Step 2: Directly extract frequent item-sets from the FP-tree.
Complexity problem: FP-Growth algorithm
20
• Algorithm MAFIA : [Burdick, 2005]:
▫ Extract maximal frequent item-sets.
• Algorithm CHARM: [J. Zaki et al, 2002]
▫ Extract closed item-sets.
Complexity problem: more algorithms
21
• Pruning: minimal augmentation (MICF) [bayardo et al.1999]
▫ R1 : milk, pork => pear[S=20%,C=71%]
▫ R2 :milk => pear [S=25%,C=70%] =>R1 is redundant
▫ R3 :pork= >pear [S=30%,C=72%]
• Deduce summaries [liu et al.,1999;Srikant et agrwal,1996]
▫ Apple => pork
▫ Pear => pork Fruit=>pork
Quality problem: post-processing technique
22
• Features of the selected rules [Silberschats et Tuzhilin,1995] :
▫ Novelty : unexpected rules for the expert.
▫ Actionability : useful rules, allow an expert to take decisions.
• Quality metrics: [Freitas,1999]
▫ Objective measures.
▫ Subjective measures.
• Objective metrics (data-based)
[Piatetsky-shapiro,1991;Guillet and Hamilton,2007]
• Based-data statistical indicator of the association rules significance,
• Advantage : non-supervised quality metrics are easy to apply.
• Disadvantages: not adequate for personalized criterion.
Quality problem: metrics
23
• Models [klementtinen et al., 1994]
• principal: the expert defines his expectations on which the association rules can be selected.
• Representing the expert expectations:
• inclusive pattern (PI) et restrictive pattern (RP)
• Selection technique: syntactic.
• Example:
▫ (PI) Fruit, Dairy products => Meat
▫ (PE) Pear, Dairy products => Meat
▫ R1: Pear, Milk => Pork
▫ R2: Apple , Milk => Chicken
▫ R3: Beef , Milk=> raisin
• R2 is selected.
Quality problem: models
Food
product
Fruit
grape
red grape
green
grape
appel pear
Dairy
product
milk
cheese
butter
Meat
chicken
beef
24
Quality problem: post-processing technique
I. Association rules extraction using the classical method.
II. Knowledge model: enrichment of a model by an expert.
III. Phase of post-processing ARIPSO [Claudia Marinica. 2010] : apply
pruning/selection models.
25
• Previous approaches have a limited use of ontologies.
• Using filtering models is a hard process which depends on the existence
of an expert.
Conclusion
26
Plan de travail
Ontological approach
2
3
4
5
Introduction
Approche existantes
Proposed approach
Application
Conclusion
Description logic
Semantic web and ontologies
Conclusion
27
Ontological layers: T-Box & A-Box
• Attributes assertion
• Concepts assertion
• Associations assertion
• Consistence verification
• Satisfability verification
T-Box A-Box
• Get/ search
• Instance verification
• training
• Coherence testing
Identity
evaluation
homonymie
Search
the text
• Define axioms
• Infer and classify concepts
• Infer associations
• Test the equivalence
• Test the implication
• Test the satisfability
Reasoning
« Extract a
knowledge base is
uncovering hidden
information»
28
Semantic web
SELECT ?player
WHERE {
?player rdf:type mnply:MonopolyPlayer .
}
29
• It exists many syntaxes to represent an ontology, we cite among them,
the next:
• Manchester OWL Syntax
 OWL/XML
 OWL Functional Syntax
 RDF/XML
 Turtle
 Latex
• OWL API permits to interrogate the ontology with different queries.
Ontologies: representation syntaxes
<owl:Class rdf:ID="Lait">
<rdfs:subClassOf
rdf:resource="&food;PotableLiquid"/>
<rdfs:label xml:lang="en">wine</rdfs:label>
<rdfs:label xml:lang="fr">vin</rdfs:label>
</owl:Class>
30
• Exploit the semantic richness to:
▫ Extract transactions:
 Step 1 : extract a T-Box model.
 Step 2 : extract an A-Box model.
▫ Apply an extraction algorithm to generate the association rules.
 How to achieve this task ?
Association rules extraction: ontological approach
31
Ontological approach : general scheme
APPRIORI F-PTREE
Validation and
visualisation
HADOOP
Association rules
extraction
Associations rules and
frequent item-sets
Transactions
Ontology manager
T-Box
extraction
A-Box
extraction
Transactions
extraction
Concepts-based
filtering
Instances-based
filtering
Table T-Box Table A-Box
Algorithm
choice
Ontology
User
filtering
1 2
32
T-Box layer
ID Item-sets
Patient <p1, disease>, <p2, drug>, <p3, cardiologist>,<p4,
gynecologist>, <p5, person>,
disease <p6-,symptom>,<p1-,patient>,<p7,drug>
doctor <p8-, cardiologist>, <p9-, gynecologist>, <p10, person>
symptom <p6, disease>
drug <p2-, patient>, <p7-, disease>
cardiologist <p8, doctor>, <p3-, patient>
gynecologist <p9, doctor>, <p4-, patient>
Person <p5-, patient>, <p10-, doctor>
1
patient
drug
doctor
disease
sympto
m
person
gynecol
ogist
cardiol
ogist
p
3
p
9
p
1
p
2
p
5
p
4
p
7
p
6 p
8
p
1
0
33
A A-Box layer
2
ID Item-sets
Pat 10 <p1, disease 12>, <p2, drug 23 >, <p3, cardiolo x>,<p5,
person>
Pat 12 <p1, disease 12>, <p2, drug 24 >, <p3, cardiolo x>, <p5,
person>
doct 23 <p8-, cardiolo>, <p10, person>
symptom 45 <p6, disease 12>
patient
drug
doctor
disease
sympto
m
person
gynecol
ogist
cardiol
ogist
p
3
p
9
p
1
p
2
p
5
p
4
p
7
p
6 p
8
p
1
0
34
Frequent item-sets extraction using HADOOP
Files of the
ontology
Resulted files
MAP
Identify all possible k-item-sets
REDUCE
Calculate the support of all k-item-
sets
Context
HADOOP
Using HADOOP to extract frequent item-sets
35
Ontological approach : steps of association rules extraction
F-PTREE
Generate frequent item-sets
Set of frequent item-sets
Generating association rules using multi-
threading process
Set of association rules
Sub-set of rules
Support threshold
Apriori
Hadoop
36
Ontological approach : running flow
1 4
3 5
2 6 7
Ontology loading from a set of files
37
Ontological approach : running flow
1 4
3 5
2 6 7
T-Box extraction to text file
38
Ontological approach : running flow
1 4
3 5
2 6 7
T-Box filtering using GUI filter
39
Ontological approach : running flow
1 4
3 5
2 6 7
A-Box extraction to text file
40
Ontological approach : running flow
1 4
3 5
2 6 7
Association rules extraction
We used three algorithms to extract
association rules:
• APRIORI [R. Agrawal et al, 1994]
• Fp-growth [J. Han et al, 2000]
• HADOOP Framework
41
Ontological approach : running flow
1 4
3 5
2 6 7
Validation and visualization of rules
42
Ontological approach : running flow
1 4
3 5
2 6 7
Association rules storing
43
Experimentations
44
• Association rules mining suffer two main issues:
▫ Data complexity processing.
▫ Association rules quality.
• Semantic web can be exploited successfully to improve the quality of
association rules.
• In this project:
• We have extracted a transactional dataset.
• We have applied different frequent item-sets extraction techniques.
• We implemented a visual application to mine ontologies for association
rules.
Conclusion
Thanks
Your questions!
45
46
• [Claudia Marinica. 2010] Association Rule Interactive Post-processing using Rule Schemas and Ontologies -
ARIPSO.
• [fayyad et al.,1996]: Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. From data mining to
knowledge discovery in databases. AI Magazine, 17:37 – 54, 1996.
• [gandon,2006]: Fabien Gandon. Ontologies informatiques, May 2006.
• [gruber,1993]: Thomas R. Gruber. Toward principles for the design of ontologies used for knowledge sharing. In
Nicola Guarino and Roberto Poli, editors, Formal Ontology in Conceptual Analysis and Knowledge
Representation. Kluwer AcademicPublishers, 1993.
• [berners-lee et al.,2001]: Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web - a new form of
web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American,
2001.
• [bayardo et al.1999]: Roberto J. Bayardo Jr., Rakesh Agrawal, and Dimitrios Gunopulos. Constraintbased rule
mining in large, dense databases. ICDE ’99: Proceedings of the 15th International Conference on Data
Engineering, pages 188–197, 1999
• [liu et al.,1999]: Bing Liu, Wynne Hsu, and Yiming Ma. Pruning and summarizing the discovered associations. In
KDD ’99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data
mining, pages 125–134.ACM, 1999.
REFERENCES
47
• [Srikant et agrwal,1996]: Ramakrishnan Srikant and Rakesh Agrawal. Mining quantitative association rules in
large relational tables. In Proceedings of the 1996 ACM SIGMOD international conference on Management of
data, pages 1–12, 1996.
• [Silberschats et Tuzhilin,1995] : Abraham Silberschatz and Alexander Tuzhilin. On subjective measures of
interestingness in knowledge discovery. Knowledge Discovery and Data Mining (KDD), pages 275–281, 1995.
• [Piatetsky-shapiro,1991]: G. Piatetsky-Shapiro. Knowledge Discovery in Databases, chapter Discovery, Analysis,
and Presentation of Strong Rules, page 229248. AAAI/MIT Press, 1991.
• [Guillet and Hamilton,2007]: F. Guillet and H. Hamilton. Quality Measures in Data Mining. Studies in
Computational Intelligence, 2007
• [klementtinen et al., 1994]: Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, and A. Inkeri
Verkamo. Finding interesting rules from large sets of discovered association rules. International Conference on
Information and Knowledge Management (CIKM), pages 401–407, 1994
• [Burdick, 2005]: Doug Burdick, Manuel Calimlim, Jason Flannick, Johannes Gehrke, and Tomi Yiu. Mafia: A
maximal frequent itemset algorithm. IEEE Transactions on Knowledge and Data Engineering, 17(11):1490–1504,
2005
REFERENCES
48
• [J. Zaki et al, 2002]: Mohammed J. Zaki and Ching J. Hsiao. Charm: An efficient algorithm for
• closed itemset mining. In Proceedings of SIAM’02, 2002.
• [R. Agrawal et al, 1994]: Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules.
Procedings of 20th International Conference Very Large Data Bases, VLDB, pages 487–499, 1994.
• [J. Han et al, 2000]: Jiawei Han and Jian Pei. Mining frequent patterns by pattern-growth: methodology and
implications. ACM SIGKDD Explorations Newsletter, Special issue on Scalable data mining algorithms,
2000(2):14–20, 2.
• [Hadoop]: Apache Software Foundation. (2010). Hadoop. Retrieved from https://hadoop.apache.org
References

More Related Content

What's hot

STAT 897D Project 2 - Final Draft
STAT 897D Project 2 - Final DraftSTAT 897D Project 2 - Final Draft
STAT 897D Project 2 - Final DraftJonathan Fivelsdal
 
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS ijcsa
 
Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...Ali Shahed
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
 
Comparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face RecognitionComparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face Recognitionijdmtaiir
 
352735346 rsh-qam11-tif-16-doc
352735346 rsh-qam11-tif-16-doc352735346 rsh-qam11-tif-16-doc
352735346 rsh-qam11-tif-16-docFiras Husseini
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityGon-soo Moon
 
A Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test FunctionsA Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test FunctionsIJMERJOURNAL
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...IJECEIAES
 
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...Shubhashis Shil
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...cscpconf
 
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...inventionjournals
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET Journal
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classificationcsandit
 

What's hot (20)

Ijetr021251
Ijetr021251Ijetr021251
Ijetr021251
 
STAT 897D Project 2 - Final Draft
STAT 897D Project 2 - Final DraftSTAT 897D Project 2 - Final Draft
STAT 897D Project 2 - Final Draft
 
JEDM_RR_JF_Final
JEDM_RR_JF_FinalJEDM_RR_JF_Final
JEDM_RR_JF_Final
 
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
A HYBRID COA-DEA METHOD FOR SOLVING MULTI-OBJECTIVE PROBLEMS
 
Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...Efficient evaluation of flatness error from Coordinate Measurement Data using...
Efficient evaluation of flatness error from Coordinate Measurement Data using...
 
T180203125133
T180203125133T180203125133
T180203125133
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
 
Comparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face RecognitionComparison on PCA ICA and LDA in Face Recognition
Comparison on PCA ICA and LDA in Face Recognition
 
352735346 rsh-qam11-tif-16-doc
352735346 rsh-qam11-tif-16-doc352735346 rsh-qam11-tif-16-doc
352735346 rsh-qam11-tif-16-doc
 
Paper-Allstate-Claim-Severity
Paper-Allstate-Claim-SeverityPaper-Allstate-Claim-Severity
Paper-Allstate-Claim-Severity
 
A Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test FunctionsA Genetic Algorithm on Optimization Test Functions
A Genetic Algorithm on Optimization Test Functions
 
The pertinent single-attribute-based classifier for small datasets classific...
The pertinent single-attribute-based classifier  for small datasets classific...The pertinent single-attribute-based classifier  for small datasets classific...
The pertinent single-attribute-based classifier for small datasets classific...
 
7734376
77343767734376
7734376
 
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
Solving Multidimensional Multiple Choice Knapsack Problem By Genetic Algorith...
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
 
DataMining_CA2-4
DataMining_CA2-4DataMining_CA2-4
DataMining_CA2-4
 
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
PROGRAM TEST DATA GENERATION FOR BRANCH COVERAGE WITH GENETIC ALGORITHM: COMP...
 
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
Predicting an Applicant Status Using Principal Component, Discriminant and Lo...
 
IRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data MiningIRJET- A Detailed Study on Classification Techniques for Data Mining
IRJET- A Detailed Study on Classification Techniques for Data Mining
 
Medical diagnosis classification
Medical diagnosis classificationMedical diagnosis classification
Medical diagnosis classification
 

Similar to Ontologies mining using association rules

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptopRising Media, Inc.
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Lionel Briand
 
PostMining of weighted assosiation rules using knowledge base
PostMining of weighted assosiation rules using knowledge basePostMining of weighted assosiation rules using knowledge base
PostMining of weighted assosiation rules using knowledge baseJeba Ranjani
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskQuantUniversity
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuantUniversity
 
fINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxfINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxdataKarthik
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseVaticle
 
Agile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgnirudra Sikdar
 
productionising-recommenders
productionising-recommendersproductionising-recommenders
productionising-recommendersLudovik Coba
 
Traceability Beyond Source Code: An Elusive Target?
Traceability Beyond Source Code: An Elusive Target?Traceability Beyond Source Code: An Elusive Target?
Traceability Beyond Source Code: An Elusive Target?Lionel Briand
 
Adopting Data Science and Machine Learning in the financial enterprise
Adopting Data Science and Machine Learning in the financial enterpriseAdopting Data Science and Machine Learning in the financial enterprise
Adopting Data Science and Machine Learning in the financial enterpriseQuantUniversity
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation SystemsAxel de Romblay
 
Applying soft computing techniques to corporate mobile security systems
Applying soft computing techniques to corporate mobile security systemsApplying soft computing techniques to corporate mobile security systems
Applying soft computing techniques to corporate mobile security systemsPaloma De Las Cuevas
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsAxel de Romblay
 
Retrosynthesis tutorial v2
Retrosynthesis tutorial v2Retrosynthesis tutorial v2
Retrosynthesis tutorial v2Wonjun Jeong
 
An intelligent framework using hybrid social media and market data, for stock...
An intelligent framework using hybrid social media and market data, for stock...An intelligent framework using hybrid social media and market data, for stock...
An intelligent framework using hybrid social media and market data, for stock...Eslam Nader
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsSri Ambati
 

Similar to Ontologies mining using association rules (20)

1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...Making Model-Driven Verification Practical and Scalable: Experiences and Less...
Making Model-Driven Verification Practical and Scalable: Experiences and Less...
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
PostMining of weighted assosiation rules using knowledge base
PostMining of weighted assosiation rules using knowledge basePostMining of weighted assosiation rules using knowledge base
PostMining of weighted assosiation rules using knowledge base
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
fINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxfINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptx
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Agile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity management
 
Ai in finance
Ai in financeAi in finance
Ai in finance
 
productionising-recommenders
productionising-recommendersproductionising-recommenders
productionising-recommenders
 
Traceability Beyond Source Code: An Elusive Target?
Traceability Beyond Source Code: An Elusive Target?Traceability Beyond Source Code: An Elusive Target?
Traceability Beyond Source Code: An Elusive Target?
 
Adopting Data Science and Machine Learning in the financial enterprise
Adopting Data Science and Machine Learning in the financial enterpriseAdopting Data Science and Machine Learning in the financial enterprise
Adopting Data Science and Machine Learning in the financial enterprise
 
[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems[UPDATE] Udacity webinar on Recommendation Systems
[UPDATE] Udacity webinar on Recommendation Systems
 
Applying soft computing techniques to corporate mobile security systems
Applying soft computing techniques to corporate mobile security systemsApplying soft computing techniques to corporate mobile security systems
Applying soft computing techniques to corporate mobile security systems
 
Udacity webinar on Recommendation Systems
Udacity webinar on Recommendation SystemsUdacity webinar on Recommendation Systems
Udacity webinar on Recommendation Systems
 
Retrosynthesis tutorial v2
Retrosynthesis tutorial v2Retrosynthesis tutorial v2
Retrosynthesis tutorial v2
 
An intelligent framework using hybrid social media and market data, for stock...
An intelligent framework using hybrid social media and market data, for stock...An intelligent framework using hybrid social media and market data, for stock...
An intelligent framework using hybrid social media and market data, for stock...
 
Intro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data ScientistsIntro to Data Science for Non-Data Scientists
Intro to Data Science for Non-Data Scientists
 

Recently uploaded

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 

Recently uploaded (20)

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 

Ontologies mining using association rules

  • 1. Presented by: ChemsEddine Berbague STIC May. 2015 Supervisor: Pr.Seridi Hassina Co-supervisor: Dr. Beldjoudi Samia Jury members: Dr. Hariati Mehdi Dr. Mendjel Mehdi Master Project Presentation «Association Rules Mining: Ontological Approach» University of Badji Mokhtar-Annaba Computer Science Departement
  • 2. 2 • The work presented in the next slides is partially taken and improved from the work of : ▫ Claudia Marinica. 2010 (Association Rule Interactive Post-processing using Rule Schemas and Ontologies - ARIPSO). Note
  • 5. 5 Context and project axe This Project is about two main tasks: • Knowledge extraction. • Ontologies enrichment. • Axe : mining ontologies using association rules to extract useful knowledge.
  • 6. 6 Knowledge extraction: general scheme and definition "...extracting from data original information, previously unknown, and potentially useful." [fayyad et al.,1996]
  • 8. 8 STEP 1: generate frequent item-sets STEP 2: generate association rules Reduce the number of item-sets using support threshold Reduce the number of comparisons FP-Groth algorithm Calculate support Hashsets Using advanced data structure 8 • APRIORI is one of well-known algorithms used for association rules extraction. It identifies frequent sets from transactional datasets. Data mining: association rules algorithm
  • 10. 10 Step 2 : generating association rules
  • 11. 11 The steps of APRIORI algorithm Step 5 For every frequent set m, generate all non-empty subsets E Step 6 For each sub-set non-empty s of E, generate the rules: "s => (E-s)" if the confidence C [support (s) / support(E))]> min_conf Step 3 Scan the transactional dataset to get the support of each k-item-set, then filter the set in regard to min_sup, to get the set 𝑳𝑘 of most frequent k-item- sets Step 4 Set of candidates = Null Step 1 Scan the transactional dataset to get the support of each 1-itemsset Step 2 use 𝐿𝑘−1 join 𝐿𝑘−1 to generate the set of k- itemsets candidates. No Yes
  • 12. 12 • APPRIORI is limited in the different steps of the extraction process:  Wide number of rules.  Semantically meaningless confidence and support measures.  User help is required to extract the targeted rules.  The complexity of the algorithm is O(b). APRIORI: limitations
  • 13. 13 • Advantages : unsupervised technique, readable results , full sets • limits: big volume and low quality of the extracted rules: • invalid statistically. ▫ Onions => pain • redundant: ▫ R1: X, Y=> Z [c];X => Y [c1]; X => Z [c2] ▫ c1>c or c2>c => R1 is redundant • Known by the expert ▫ X => Y (rule can be acquired from the context) • useless for the expert ▫ X => Y (rule is semantically meaningless such as apple implies skirt) • Difficulty of the manual analyze • The complexity of the algorithm ▫ Complexity O(b) • Need: ▫ Eliminate the un-interesting rules. ▫ Target the rules of quality. Data mining: association rules problematic
  • 14. « an ontology is an explicit and formal specification of a shared conceptualization" [Gruber,1993] 14 Knowledge engineering: the ontologies « introducing an ontology in an information system allows to reduce the conceptual and terminological confusion and offers a shared understanding that enhances the communication, the sharing, , the interpretation, and the possible re-using"[gandon,2006] Formal definition: O={C,G,I,P} C=Concepts- elements of the domain. G= Graph of concepts- relation is-one I=Instances – individuals of the concept P=Properties- relation between concepts Food product Fruit grape red grape green grape appel pear Dairy product milk cheese butter Meat chicken beef
  • 15. 15 « semantic web is a part of the current web in which the information is represented semantically, and allows machines and users to better function together." [berners-lee et al.,2001] • Knowledge representation languages: ▫ RDF,OWL,... ▫ OWL-DL is based on the description logic and can be defined by an accurate and decidable formalism. • Reasoning engine: ▫ action-classification of concepts ,test of coherence et test of instantiation. ▫ Fact, Racer, Pellet,... ▫ Querying language: SparQL. Knowledge engineering: semantic web
  • 16. 16 • Increase the use of ontologies in the process of association rules extraction: • Convert the ontologies intro a transaction dataset. ▫ Benefit from the semantic richness to improve the quality of association rules. ▫ Reduce the complexity of the classical association rules algorithms. Objectives
  • 17. 17 I. A new method to extract transactional information from the ontologies. II. Developing an application that allows to extract, validate, and visualize the association rules. III. Using the Framework HADOOP to extract frequent item-sets. IV. Experimentations on NiceTag ontology. Contribution
  • 19. 19 • FP-Growth identifies all frequent item-sets without generating candidate item-sets. • Approach of two steps: ▫ Step 1: Build a compact data structure named FP-tree. This step requires to pass by the dataset. ▫ Step 2: Directly extract frequent item-sets from the FP-tree. Complexity problem: FP-Growth algorithm
  • 20. 20 • Algorithm MAFIA : [Burdick, 2005]: ▫ Extract maximal frequent item-sets. • Algorithm CHARM: [J. Zaki et al, 2002] ▫ Extract closed item-sets. Complexity problem: more algorithms
  • 21. 21 • Pruning: minimal augmentation (MICF) [bayardo et al.1999] ▫ R1 : milk, pork => pear[S=20%,C=71%] ▫ R2 :milk => pear [S=25%,C=70%] =>R1 is redundant ▫ R3 :pork= >pear [S=30%,C=72%] • Deduce summaries [liu et al.,1999;Srikant et agrwal,1996] ▫ Apple => pork ▫ Pear => pork Fruit=>pork Quality problem: post-processing technique
  • 22. 22 • Features of the selected rules [Silberschats et Tuzhilin,1995] : ▫ Novelty : unexpected rules for the expert. ▫ Actionability : useful rules, allow an expert to take decisions. • Quality metrics: [Freitas,1999] ▫ Objective measures. ▫ Subjective measures. • Objective metrics (data-based) [Piatetsky-shapiro,1991;Guillet and Hamilton,2007] • Based-data statistical indicator of the association rules significance, • Advantage : non-supervised quality metrics are easy to apply. • Disadvantages: not adequate for personalized criterion. Quality problem: metrics
  • 23. 23 • Models [klementtinen et al., 1994] • principal: the expert defines his expectations on which the association rules can be selected. • Representing the expert expectations: • inclusive pattern (PI) et restrictive pattern (RP) • Selection technique: syntactic. • Example: ▫ (PI) Fruit, Dairy products => Meat ▫ (PE) Pear, Dairy products => Meat ▫ R1: Pear, Milk => Pork ▫ R2: Apple , Milk => Chicken ▫ R3: Beef , Milk=> raisin • R2 is selected. Quality problem: models Food product Fruit grape red grape green grape appel pear Dairy product milk cheese butter Meat chicken beef
  • 24. 24 Quality problem: post-processing technique I. Association rules extraction using the classical method. II. Knowledge model: enrichment of a model by an expert. III. Phase of post-processing ARIPSO [Claudia Marinica. 2010] : apply pruning/selection models.
  • 25. 25 • Previous approaches have a limited use of ontologies. • Using filtering models is a hard process which depends on the existence of an expert. Conclusion
  • 26. 26 Plan de travail Ontological approach 2 3 4 5 Introduction Approche existantes Proposed approach Application Conclusion Description logic Semantic web and ontologies Conclusion
  • 27. 27 Ontological layers: T-Box & A-Box • Attributes assertion • Concepts assertion • Associations assertion • Consistence verification • Satisfability verification T-Box A-Box • Get/ search • Instance verification • training • Coherence testing Identity evaluation homonymie Search the text • Define axioms • Infer and classify concepts • Infer associations • Test the equivalence • Test the implication • Test the satisfability Reasoning « Extract a knowledge base is uncovering hidden information»
  • 28. 28 Semantic web SELECT ?player WHERE { ?player rdf:type mnply:MonopolyPlayer . }
  • 29. 29 • It exists many syntaxes to represent an ontology, we cite among them, the next: • Manchester OWL Syntax  OWL/XML  OWL Functional Syntax  RDF/XML  Turtle  Latex • OWL API permits to interrogate the ontology with different queries. Ontologies: representation syntaxes <owl:Class rdf:ID="Lait"> <rdfs:subClassOf rdf:resource="&food;PotableLiquid"/> <rdfs:label xml:lang="en">wine</rdfs:label> <rdfs:label xml:lang="fr">vin</rdfs:label> </owl:Class>
  • 30. 30 • Exploit the semantic richness to: ▫ Extract transactions:  Step 1 : extract a T-Box model.  Step 2 : extract an A-Box model. ▫ Apply an extraction algorithm to generate the association rules.  How to achieve this task ? Association rules extraction: ontological approach
  • 31. 31 Ontological approach : general scheme APPRIORI F-PTREE Validation and visualisation HADOOP Association rules extraction Associations rules and frequent item-sets Transactions Ontology manager T-Box extraction A-Box extraction Transactions extraction Concepts-based filtering Instances-based filtering Table T-Box Table A-Box Algorithm choice Ontology User filtering 1 2
  • 32. 32 T-Box layer ID Item-sets Patient <p1, disease>, <p2, drug>, <p3, cardiologist>,<p4, gynecologist>, <p5, person>, disease <p6-,symptom>,<p1-,patient>,<p7,drug> doctor <p8-, cardiologist>, <p9-, gynecologist>, <p10, person> symptom <p6, disease> drug <p2-, patient>, <p7-, disease> cardiologist <p8, doctor>, <p3-, patient> gynecologist <p9, doctor>, <p4-, patient> Person <p5-, patient>, <p10-, doctor> 1 patient drug doctor disease sympto m person gynecol ogist cardiol ogist p 3 p 9 p 1 p 2 p 5 p 4 p 7 p 6 p 8 p 1 0
  • 33. 33 A A-Box layer 2 ID Item-sets Pat 10 <p1, disease 12>, <p2, drug 23 >, <p3, cardiolo x>,<p5, person> Pat 12 <p1, disease 12>, <p2, drug 24 >, <p3, cardiolo x>, <p5, person> doct 23 <p8-, cardiolo>, <p10, person> symptom 45 <p6, disease 12> patient drug doctor disease sympto m person gynecol ogist cardiol ogist p 3 p 9 p 1 p 2 p 5 p 4 p 7 p 6 p 8 p 1 0
  • 34. 34 Frequent item-sets extraction using HADOOP Files of the ontology Resulted files MAP Identify all possible k-item-sets REDUCE Calculate the support of all k-item- sets Context HADOOP Using HADOOP to extract frequent item-sets
  • 35. 35 Ontological approach : steps of association rules extraction F-PTREE Generate frequent item-sets Set of frequent item-sets Generating association rules using multi- threading process Set of association rules Sub-set of rules Support threshold Apriori Hadoop
  • 36. 36 Ontological approach : running flow 1 4 3 5 2 6 7 Ontology loading from a set of files
  • 37. 37 Ontological approach : running flow 1 4 3 5 2 6 7 T-Box extraction to text file
  • 38. 38 Ontological approach : running flow 1 4 3 5 2 6 7 T-Box filtering using GUI filter
  • 39. 39 Ontological approach : running flow 1 4 3 5 2 6 7 A-Box extraction to text file
  • 40. 40 Ontological approach : running flow 1 4 3 5 2 6 7 Association rules extraction We used three algorithms to extract association rules: • APRIORI [R. Agrawal et al, 1994] • Fp-growth [J. Han et al, 2000] • HADOOP Framework
  • 41. 41 Ontological approach : running flow 1 4 3 5 2 6 7 Validation and visualization of rules
  • 42. 42 Ontological approach : running flow 1 4 3 5 2 6 7 Association rules storing
  • 44. 44 • Association rules mining suffer two main issues: ▫ Data complexity processing. ▫ Association rules quality. • Semantic web can be exploited successfully to improve the quality of association rules. • In this project: • We have extracted a transactional dataset. • We have applied different frequent item-sets extraction techniques. • We implemented a visual application to mine ontologies for association rules. Conclusion
  • 46. 46 • [Claudia Marinica. 2010] Association Rule Interactive Post-processing using Rule Schemas and Ontologies - ARIPSO. • [fayyad et al.,1996]: Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth. From data mining to knowledge discovery in databases. AI Magazine, 17:37 – 54, 1996. • [gandon,2006]: Fabien Gandon. Ontologies informatiques, May 2006. • [gruber,1993]: Thomas R. Gruber. Toward principles for the design of ontologies used for knowledge sharing. In Nicola Guarino and Roberto Poli, editors, Formal Ontology in Conceptual Analysis and Knowledge Representation. Kluwer AcademicPublishers, 1993. • [berners-lee et al.,2001]: Tim Berners-Lee, James Hendler, and Ora Lassila. The semantic web - a new form of web content that is meaningful to computers will unleash a revolution of new possibilities. Scientific American, 2001. • [bayardo et al.1999]: Roberto J. Bayardo Jr., Rakesh Agrawal, and Dimitrios Gunopulos. Constraintbased rule mining in large, dense databases. ICDE ’99: Proceedings of the 15th International Conference on Data Engineering, pages 188–197, 1999 • [liu et al.,1999]: Bing Liu, Wynne Hsu, and Yiming Ma. Pruning and summarizing the discovered associations. In KDD ’99: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 125–134.ACM, 1999. REFERENCES
  • 47. 47 • [Srikant et agrwal,1996]: Ramakrishnan Srikant and Rakesh Agrawal. Mining quantitative association rules in large relational tables. In Proceedings of the 1996 ACM SIGMOD international conference on Management of data, pages 1–12, 1996. • [Silberschats et Tuzhilin,1995] : Abraham Silberschatz and Alexander Tuzhilin. On subjective measures of interestingness in knowledge discovery. Knowledge Discovery and Data Mining (KDD), pages 275–281, 1995. • [Piatetsky-shapiro,1991]: G. Piatetsky-Shapiro. Knowledge Discovery in Databases, chapter Discovery, Analysis, and Presentation of Strong Rules, page 229248. AAAI/MIT Press, 1991. • [Guillet and Hamilton,2007]: F. Guillet and H. Hamilton. Quality Measures in Data Mining. Studies in Computational Intelligence, 2007 • [klementtinen et al., 1994]: Mika Klemettinen, Heikki Mannila, Pirjo Ronkainen, Hannu Toivonen, and A. Inkeri Verkamo. Finding interesting rules from large sets of discovered association rules. International Conference on Information and Knowledge Management (CIKM), pages 401–407, 1994 • [Burdick, 2005]: Doug Burdick, Manuel Calimlim, Jason Flannick, Johannes Gehrke, and Tomi Yiu. Mafia: A maximal frequent itemset algorithm. IEEE Transactions on Knowledge and Data Engineering, 17(11):1490–1504, 2005 REFERENCES
  • 48. 48 • [J. Zaki et al, 2002]: Mohammed J. Zaki and Ching J. Hsiao. Charm: An efficient algorithm for • closed itemset mining. In Proceedings of SIAM’02, 2002. • [R. Agrawal et al, 1994]: Rakesh Agrawal and Ramakrishnan Srikant. Fast algorithms for mining association rules. Procedings of 20th International Conference Very Large Data Bases, VLDB, pages 487–499, 1994. • [J. Han et al, 2000]: Jiawei Han and Jian Pei. Mining frequent patterns by pattern-growth: methodology and implications. ACM SIGKDD Explorations Newsletter, Special issue on Scalable data mining algorithms, 2000(2):14–20, 2. • [Hadoop]: Apache Software Foundation. (2010). Hadoop. Retrieved from https://hadoop.apache.org References