The document discusses the Apriori algorithm for frequent pattern mining. It begins with an introduction to frequent pattern analysis and its importance. The basic concepts of support, confidence and association rule mining are explained. The Apriori algorithm works in two steps - first it finds frequent itemsets by scanning the database and filtering out infrequent itemsets, then it generates strong association rules from the frequent itemsets using a minimum support and confidence threshold. An example is shown to illustrate how the Apriori algorithm processes a transactional database to find frequent itemsets and association rules. The limitations of Apriori include its multiple database scans which impact efficiency.
5. ConceptofFrequentPatternAnalysis
Pattern
Series of data that
repeats in a recognizable
way. Can be study of
Sales and Volume.
Occurrence
Enable us to predict the
occurrence of a specific item
based on various transactions.
Relationship
It plays a crucial role in mining
associations, correlations, and many
other innovative relationships among
data.
6. Market Basket Analysis is the best example of Frequency Pattern Analysis. Here we
try to find sets of products that are frequently bought together by different
customers, so as to increase the sale in products. By applying algorithm on the sales
we can find the pattern in which items are bought, like bread and milk here occurs
thrice.
8. INBRIEF
● It aims at finding regularities in the shopping behavior of customers of supermarkets,
mail-order companies, online shops.
● This method of analysis can be useful in evaluating data for various business
functions and industries.
● To work with other businesses that complement your own, not competitors. For
example, vehicle dealerships and manufacturers have cross marketing campaigns
with oil and gas companies for obvious reasons.
● Each patient is represented as a transaction containing the ordered set of diseases,
and which diseases are likely to occur simultaneously/sequentially can be predicted.
10. TermsassociatedwithPatternMining
Support
This says how popular an
itemset is, as measured by the
proportion of transactions in
which an itemset appears. Lift
This says how likely item Y is purchased
when item X is purchased, while
controlling for how popular item Y is.
01
02
03
Confidence
This says how likely item Y is
purchased when item X is
purchased, expressed as {X -> Y}.
This is measured by the proportion
of transactions with item X, in
which item Y also appears.
11. AssociationMining
Twostepprocess GenerateRules
These rules must satisfy
minimum support and
minimum confidence
The aim is to discover
associations of items
occurring together more
often than we expect from
randomly sampling all the
possibilities.
Findfrequent
itemsets
● Apriori Algorithm
● Fp Growth
01 03
02
12. 04 AprioriAlgorithm
Given by R. Agrawal and R. Srikant in 1994 for
finding frequent itemsets in a dataset for
boolean association rule
13. AprioriAlgorithmandProperties
All non-empty subset of frequent
itemset must be frequent. The key
concept of Apriori algorithm is its anti-
monotonicity of support measure.
We apply an iterative approach or
level-wise search where k-frequent
itemsets are used to find k+1 itemsets
Name of the algorithm is Apriori
because it uses prior knowledge of
frequent itemset properties.
Apriori assumes that all subsets of a
frequent itemset must be frequent.
If an itemset is infrequent, all its
supersets will be infrequent.
16. Let’sworkonasimpleexample
Itemset Support Count
I1 6
I2 7
I3 6
I4 2
I5 2
Itemset Support Count
I1 6
I2 7
I3 6
I4 2
I5 2
Compare candidate set item’s support count with minimum support count
(here min_support=2 if support_count of candidate set items is less than min_support then
remove those items). This gives us itemset L1.
17. Let’sworkonasimpleexample
Generate candidate set C2 using L1 (this is
called join step). Condition of joining Lk-1
and Lk-1 is that it should have (K-2)
elements in common.
Itemset Support Count
I1,I2 4
I1,I3 4
I1,I4 1
I1,I5 2
I2,I3 4
I2,I4 2
I2,I5 2
I3,I4 0
I3,I5 1
I4,I5 0
Tid ITEMS
T1 I1,I2,I5
T2 I2,I4
T3 I2,I3
T4 I1,I2,I4
T5 I1,I3
T6 I2,I3
T7 I1,I3
T8 I1,I2,I3,I5
T9 I1,I2,I3
18. Let’sworkonasimpleexample
Compare candidate (C2) support count with minimum
support count(here min_support=2 if support_count
of candidate set item is less than min_support then
remove those items) this gives us itemset L2.
Itemset Support Count
I1,I2 4
I1,I3 4
I1,I5 2
I2,I3 4
I2,I4 2
I2,I5 2
19. Let’sworkonasimpleexample
● Generate candidate set C3 using L2 (join step).
Condition of joining Lk-1 and Lk-1 is that it
should have (K-2) elements in common. So here,
for L2, first element should match.
● find support count of these remaining itemset
by searching in dataset.
● Compare candidate (C3) support count with
minimum support count(here min_support=2 if
support_count of candidate set item is less than
min_support then remove those items) this
gives us itemset L3.
Itemset Support Count
I1,I2,I3 2
I1,I2,I5 2
20. Let’sworkonasimpleexample
● Generate candidate set C4 using L3 (join step). Condition of joining Lk-1 and Lk-
1 (K=4) is that, they should have (K-2) elements in common. So here, for L3, first
2 elements (items) should match.
● Check all subsets of these itemsets are frequent or not (Here itemset formed
by joining L3 is {I1, I2, I3, I5} so its subset contains {I1, I3, I5}, which is not
frequent). So no itemset in C4
● We stop here because no frequent itemsets are found further
21. StrongAssociationandConfidence
● Strong Association Rules: rules whose confidence is greater
than or equal to a confidence threshold value. Here the
threshold value is 60%
● Confidence(A->B)=Support_count(A∪B)/Support_count(A)
● Itemset B is Coke, and Itemset A is {diapers, milk} so we want
to find the probability that Coke exists in a transaction given
that {diapers, milk} does.
● So the Confidence of {diapers, milk}→coke = 2/3 =0.667
● {diapers, milk}→coke is a strong association rule because its
confidence is 0.67
Ti
d
Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
22. Nowgenerationofstrongassociationrulecomesintopicture.
Forthatweneedtocalculateconfidenceofeachrule.
SO rules can be
● [I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%
● [I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
● [I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
● [I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
● [I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
● [I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%
● So if minimum confidence is 50%, then first 3 rules can be
considered as strong association rules.
Itemset Support Count
I1,I2,I3 2
I1,I2,I5 2
Itemset Support Count
I1,I2 4
I1,I3 4
I1,I5 2
I2,I3 4
I2,I4 2
I2,I5 2
25. LimitationsofAprioriAlgorithm
Requires many
database scans.
Efficiency
It is slower than FP
Growth Algorithm
FPGrowth
To detect frequent pattern in size 100
i.e. v1, v2… v100, it have to generate
2^100 candidate itemsets
Costlyandwastingoftime
Time required to hold a vast number
of candidate sets with much frequent
itemsets, low minimum support or
large itemsets
Slow
02
01 03 04
27. ● The Association rule is very useful in analyzing datasets.
● The data is collected using barcode scanners in supermarkets.
Such databases consists of a large number of transaction
records which list all items bought by a customer on a single
purchase.
● Apriori, while historically significant, suffers from a number of
inefficiencies or trade-offs, which have spawned other
algorithms.
● Later algorithms such as Max-Miner try to identify the maximal
frequent item sets without enumerating their subsets, and
perform "jumps" in the search space rather than a purely
bottom-up approach.
29. CREDITS: This presentation template was
created by Slidesgo, including icons by
Flaticon, and infographics & images by
Freepik
Do you have any questions?
THANKS
Please keep this slide for attribution