Association in Frequent Pattern Mining

Associationinfrequent
patternmining
[AprioriAlgorithm]
By Asha Singh and Shreea Bose

TABLEOFCONTENTS
WhatisFrequentPattern
Analysis?
ImportanceofFrequent
PatternAnalysis
BasicConceptandRules
AprioriAlgorithm PseudoCodeand
WorkingCode
01 02
04 05
03
Limitations
06
Conclusion
07

01 WhatisFrequentPattern
Analysis?
It describes the task of finding the most
frequent and relevant patterns in large datasets.

Definition
Frequent Pattern Mining is a Data Mining
subject with the objective of extracting
frequent itemsets from a database.

ConceptofFrequentPatternAnalysis
Pattern
Series of data that
repeats in a recognizable
way. Can be study of
Sales and Volume.
Occurrence
Enable us to predict the
occurrence of a specific item
based on various transactions.
Relationship
It plays a crucial role in mining
associations, correlations, and many
other innovative relationships among
data.

Market Basket Analysis is the best example of Frequency Pattern Analysis. Here we
try to find sets of products that are frequently bought together by different
customers, so as to increase the sale in products. By applying algorithm on the sales
we can find the pattern in which items are bought, like bread and milk here occurs
thrice.

ImportanceofFrequent
PatternAnalysis
Where should we use this and
why?
02

INBRIEF
● It aims at finding regularities in the shopping behavior of customers of supermarkets,
mail-order companies, online shops.
● This method of analysis can be useful in evaluating data for various business
functions and industries.
● To work with other businesses that complement your own, not competitors. For
example, vehicle dealerships and manufacturers have cross marketing campaigns
with oil and gas companies for obvious reasons.
● Each patient is represented as a transaction containing the ordered set of diseases,
and which diseases are likely to occur simultaneously/sequentially can be predicted.

TermsassociatedwithPatternMining
Support
This says how popular an
itemset is, as measured by the
proportion of transactions in
which an itemset appears. Lift
This says how likely item Y is purchased
when item X is purchased, while
controlling for how popular item Y is.
01
02
03
Confidence
This says how likely item Y is
purchased when item X is
purchased, expressed as {X -> Y}.
This is measured by the proportion
of transactions with item X, in
which item Y also appears.

AssociationMining
Twostepprocess GenerateRules
These rules must satisfy
minimum support and
minimum confidence
The aim is to discover
associations of items
occurring together more
often than we expect from
randomly sampling all the
possibilities.
Findfrequent
itemsets
● Apriori Algorithm
● Fp Growth
01 03
02

04 AprioriAlgorithm
Given by R. Agrawal and R. Srikant in 1994 for
finding frequent itemsets in a dataset for
boolean association rule

AprioriAlgorithmandProperties
All non-empty subset of frequent
itemset must be frequent. The key
concept of Apriori algorithm is its anti-
monotonicity of support measure.
We apply an iterative approach or
level-wise search where k-frequent
itemsets are used to find k+1 itemsets
Name of the algorithm is Apriori
because it uses prior knowledge of
frequent itemset properties.
Apriori assumes that all subsets of a
frequent itemset must be frequent.
If an itemset is infrequent, all its
supersets will be infrequent.

Let’sworkonasimpleexample
Tid ITEMS
T1 I1,I2,I5
T2 I2,I4
T3 I2,I3
T4 I1,I2,I4
T5 I1,I3
T6 I2,I3
T7 I1,I3
T8 I1,I2,I3,I5
T9 I1,I2,I3
● minimum support count is 2
● minimum confidence is 60%

Itemset Support Count
I1 6
I2 7
I3 6
I4 2
I5 2
I1 6
I2 7
I3 6
I4 2
I5 2
Compare candidate set item’s support count with minimum support count
(here min_support=2 if support_count of candidate set items is less than min_support then
remove those items). This gives us itemset L1.

Generate candidate set C2 using L1 (this is
called join step). Condition of joining Lk-1
and Lk-1 is that it should have (K-2)
elements in common.
I1,I2 4
I1,I3 4
I1,I4 1
I1,I5 2
I2,I3 4
I2,I4 2
I2,I5 2
I3,I4 0
I3,I5 1
I4,I5 0
Tid ITEMS
T1 I1,I2,I5
T2 I2,I4
T3 I2,I3
T4 I1,I2,I4
T5 I1,I3
T6 I2,I3
T7 I1,I3
T8 I1,I2,I3,I5
T9 I1,I2,I3

Compare candidate (C2) support count with minimum
support count(here min_support=2 if support_count
of candidate set item is less than min_support then
remove those items) this gives us itemset L2.
I1,I2 4
I1,I3 4
I1,I5 2
I2,I3 4
I2,I4 2
I2,I5 2

● Generate candidate set C3 using L2 (join step).
Condition of joining Lk-1 and Lk-1 is that it
should have (K-2) elements in common. So here,
for L2, first element should match.
● find support count of these remaining itemset
by searching in dataset.
● Compare candidate (C3) support count with
minimum support count(here min_support=2 if
support_count of candidate set item is less than
min_support then remove those items) this
gives us itemset L3.
I1,I2,I3 2
I1,I2,I5 2

● Generate candidate set C4 using L3 (join step). Condition of joining Lk-1 and Lk-
1 (K=4) is that, they should have (K-2) elements in common. So here, for L3, first
2 elements (items) should match.
● Check all subsets of these itemsets are frequent or not (Here itemset formed
by joining L3 is {I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not
frequent). So no itemset in C4
● We stop here because no frequent itemsets are found further

StrongAssociationandConfidence
● Strong Association Rules: rules whose confidence is greater
than or equal to a confidence threshold value. Here the
threshold value is 60%
● Confidence(A->B)=Support_count(A∪B)/Support_count(A)
● Itemset B is Coke, and Itemset A is {diapers, milk} so we want
to find the probability that Coke exists in a transaction given
that {diapers, milk} does.
● So the Confidence of {diapers, milk}→coke = 2/3 =0.667
● {diapers, milk}→coke is a strong association rule because its
confidence is 0.67
Ti
d
Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Nowgenerationofstrongassociationrulecomesintopicture.
Forthatweneedtocalculateconfidenceofeachrule.
SO rules can be
● [I1Î2]=>[I3] //confidence = sup(I1Î2Î3)/sup(I1Î2) = 2/4*100=50%
● [I1]=>[I2Î3] //confidence = sup(I1Î2Î3)/sup(I1) = 2/6*100=33%
● So if minimum confidence is 50%, then first 3 rules can be
considered as strong association rules.
I1,I2,I3 2
I1,I2,I5 2
I1,I2 4
I1,I3 4
I1,I5 2
I2,I3 4
I2,I4 2
I2,I5 2

LimitationsofAprioriAlgorithm
Requires many
database scans.
Efficiency
It is slower than FP
Growth Algorithm
FPGrowth
To detect frequent pattern in size 100
i.e. v1, v2… v100, it have to generate
2^100 candidate itemsets
Costlyandwastingoftime
Time required to hold a vast number
of candidate sets with much frequent
itemsets, low minimum support or
large itemsets
Slow
02
01 03 04

● The Association rule is very useful in analyzing datasets.
● The data is collected using barcode scanners in supermarkets.
Such databases consists of a large number of transaction
records which list all items bought by a customer on a single
purchase.
● Apriori, while historically significant, suffers from a number of
inefficiencies or trade-offs, which have spawned other
algorithms.
● Later algorithms such as Max-Miner try to identify the maximal
frequent item sets without enumerating their subsets, and
perform "jumps" in the search space rather than a purely
bottom-up approach.

● https://www.youtube.com/watch?v=guVvtZ7ZClw
● http://people.cs.pitt.edu/~iyad/AR.pdf
● https://medium.com/@ciortanmadalina/an-introduction-to-frequent-
pattern-mining-research-564f239548e
● https://www.geeksforgeeks.org/apriori-algorithm/ apriori system
● apriori slide
● https://www.youtube.com/watch?v=guVvtZ7ZClw
● https://arxiv.org/ftp/arxiv/papers/1403/1403.3948.pdf
● https://www.geeksforgeeks.org/frequent-item-set-in-data-set-association-
rule-mining/
● https://en.wikipedia.org/wiki/Apriori_algorithm
RESOURCES

CREDITS: This presentation template was
created by Slidesgo, including icons by
Flaticon, and infographics & images by
Freepik
Do you have any questions?
THANKS
Please keep this slide for attribution

Association in Frequent Pattern Mining

Association in Frequent Pattern Mining

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Association in Frequent Pattern Mining

Similar to Association in Frequent Pattern Mining (20)

Recently uploaded

Recently uploaded (20)

Association in Frequent Pattern Mining