A merger of (at least) four disciplines. A merger of (at least) four disciplines


The Apriori algorithm was developed in 1993



Yüklə 500 b.
səhifə6/14
tarix25.07.2018
ölçüsü500 b.
#58059
1   2   3   4   5   6   7   8   9   ...   14

The Apriori algorithm was developed in 1993,

  • The Apriori algorithm was developed in 1993,

      • Agrawal, R., Imielinski, T. and Swami, A. (1993). Mining Association Rules Between Sets of Items in Large Databases. ACM SIGMOD International Conference on Management of Data, Washington DC, USA, 22. ACM Press. 207-216.
  • Although limited in its capabilities, it has since become the basis of many mining routines.

  • The problem can be summarised as follows:

    • Given a large set of transactions (t1, .. tn) which consist of one or more of a large number of items (i1, .. im), report those items that occur together frequently.
  • Apriori works by creating frequent itemsets of increasing length. It is assisted in this by the observation that if an itemset I of length k is not frequent, then nor will any longer itemsets that contain I.



Determining if a rule is “interesting” is a tough question. The two metrics used for association rule mining are:

  • Determining if a rule is “interesting” is a tough question. The two metrics used for association rule mining are:

    • Support (σ) - P(A∪B) - Applied to Itemsets
      • ie. How often do A and B appear in the dataset together.
    • Confidence (γ) - P(B|A) - Applied to Rules
      • ie. Of the times A appears in the dataset, how often does it appear with B. Note this is different from P(B|A)


Consider a transaction set as follows:

  • Consider a transaction set as follows:

    • Trans 1 Butter, Biscuits, Cream, Newspaper, Bread, Chocolate
    • Trans 2 Cream, Newspaper, Tea, Oil, Chocolate
    • Trans 3 Chocolate, Cereal, Bread
    • Trans 4 Chocolate, Flour, Biscuits, Newspaper
    • Trans 5 Chocolate, Biscuits, Newspaper
  • The first step is to make a list of all separate items that occur in the dataset together with the number of times they occur.



The next stage is to create 2-item itemsets by creating every possible combination of the 1-item itemsets.

  • The next stage is to create 2-item itemsets by creating every possible combination of the 1-item itemsets.

  • This is done by first:

    • Creating all possible 2-item itemsets from the 1-item itemsets. These are termed the C2 sets.
    • Then the C2 sets are checked against the dataset itself to see which actually occur. These are then known as the L2 itemsets.


The C2 itemsets includes every 2-item combination of Biscuits, Bread, Chocolate, Cream and Newspaper.

  • The C2 itemsets includes every 2-item combination of Biscuits, Bread, Chocolate, Cream and Newspaper.



After this, 3-item itemsets are created (first by creating the C3 itemset and then the L3 itemsets) and so on.

  • After this, 3-item itemsets are created (first by creating the C3 itemset and then the L3 itemsets) and so on.

    • Note that for a C3 itemset ABC to be created we need ALL of AB, BC and AC as L2 itemsets. Similarly, to form a C4 itemset ABCD to be created we need ALL of ABC, ABD, ACD and BCD as L3 itemsets.
  • This goes on until no more itemsets can be created.

  • The full list of k-itemsets (k≥2) is thus:

  • We then form rules from these itemsets.



Rules are constructed by taking each combination of each frequent itemset (on each side of the rule) and calculating confidence values (ie. P(B | A) = tc(A->B)/tc(A) where tc(x) is the number of transactions containing all items in x.

  • Rules are constructed by taking each combination of each frequent itemset (on each side of the rule) and calculating confidence values (ie. P(B | A) = tc(A->B)/tc(A) where tc(x) is the number of transactions containing all items in x.

  • A specified confidence level is used to cull those rules not reaching some specified level of accuracy.



Biscuits -> Chocolate 100%

  • Biscuits -> Chocolate 100%

  • Chocolate -> Biscuits 60% Not reported

  • Biscuits ∧ Chocolate -> Newspaper 100%

  • Chocolate ∧ Newspaper -> Biscuits 75% Not reported

  • Newspaper ∧ Biscuits -> Chocolate 100%

  • Biscuits -> Chocolate ∧ Newspaper 100%

  • Chocolate -> Newspaper ∧ Biscuits 60% Not reported

  • Newspaper -> Biscuits ∧ Chocolate 75% Not reported

  • Biscuits -> Newspaper 100%

  • Newspaper -> Biscuits 75% Not reported

  • Bread -> Chocolate 100%

  • Chocolate -> Bread 40% Not reported

  • Chocolate -> Cream 40% Not reported

  • Cream -> Chocolate 100%

  • Cream ∧ Chocolate -> Newspaper 100%

  • Chocolate ∧ Newspaper -> Cream 50% Not reported

  • Newspaper ∧ Cream -> Chocolate 100%

  • Cream -> Chocolate ∧ Newspaper 100%

  • Chocolate -> Newspaper ∧ Cream 40% Not reported

  • Newspaper -> Cream ∧ Chocolate 50% Not reported

  • Chocolate -> Newspaper 80%

  • Newspaper -> Chocolate 100%

  • Cream -> Newspaper 100%

  • Newspaper -> Cream 50% Not reported




Yüklə 500 b.

Dostları ilə paylaş:
1   2   3   4   5   6   7   8   9   ...   14




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin