A merger of (at least) four disciplines. A merger of (at least) four disciplines



Yüklə 500 b.
səhifə8/14
tarix25.07.2018
ölçüsü500 b.
#58059
1   ...   4   5   6   7   8   9   10   11   ...   14

Thus the rule

  • A -> B would have σ(15%), γ(30%)
  • which is less than the chance of purchasing B by itself (which is 40%).

  • A and B are actually said to be negatively associated. Ie. B is suppressing the purchase of A.



  • Templates (or Meta-Rules)

    • Templates (or Meta-Rules)

      • We can specify the shape of the sort of rule we are interested in. For example:
      • Temperature.Low -> *
      • * -> License.N, *
    • Note that we cannot use these templates to prune itemsets as they may get used later to form a larger k-item itemset that includes those attributes we do require.

    • Maximum (as well as minimum) support and confidence can also be used to get rid of “the obvious”.

    • A knowledge-base can be used to store known rules.

    • We can prune rules that have an ancestor or child relation that shows the association more accurately, for example:

      • A, B, C -> * σ(15%), γ(30%)
      • A, C -> * σ(18%), γ(40%)
    • The first can be pruned.



    How would you spot an anomaly or a pattern using association rule mining?

    • How would you spot an anomaly or a pattern using association rule mining?

    • One method is to spot anomalies in associations (i.e. variations in the confidence in a rule). This may detect changes in behaviour/events differently as:

      • The incidence of A and B may not change significantly, just their interaction, OR
      • The incidence of A and B might change but their association strength might remain static.


    We can extend the previous work by looking at the pattern formed by the support metrics.

    • We can extend the previous work by looking at the pattern formed by the support metrics.

    • For example, given the green columns represent January then we clearly have some sort of annual cycle.

    • Combining these we can also see that anomalies in annual cycles can be detected.





    To classify large numbers of objects into groups such that “like” (or “close”) objects are in the same cluster and “unlike” objects are in different clusters.

    • To classify large numbers of objects into groups such that “like” (or “close”) objects are in the same cluster and “unlike” objects are in different clusters.

    • “Like” and “Unlike” (or “close” and “distant”) can be variously determined:

      • User Allocated Clustering
        • User allocated objects to the clusters - very little computational required.
      • User Defined Clustering
        • User determines a criterion over which data items are classified. Does not need a high level of computational power.
      • Automated Clustering
        • User provides the basic parameters and the data mining system then decides on the relative criteria and allocates objects accordingly.


    These three terms can be easily confused.

    • These three terms can be easily confused.

      • Clustering creates the groups based on various criteria.
      • Classification tries to determine the criteria based on supplied groups.
      • Prediction takes the results of classification and applies it to new objects.
    • Two scenarios

      • An insurance company wants to determine the optimum location for up to fifty offices in Australia. As a first pass at this problem it employs a clustering algorithm which it provides with information such as current policy holder addresses, future growth predictions, demographic data, regulatory boundaries, etc.
      • The insurance company also wants to determine the characteristics that make certain policy holders default on their payments. In this case the groupings already exist and thus a classification algorithm is employed.


    There are a number of ways in which objects can be clustered. In these notes we will concentrate on five classes:

    • There are a number of ways in which objects can be clustered. In these notes we will concentrate on five classes:

      • Partitioning Methods
      • Hierarchical Methods
      • Density-based Methods
      • Grid-based Methods
      • Evolutionary Methods.
    • Each tries to:

      • Minimise computation time,
      • Maximise effective clustering,
      • Minimise the need for users to specify parameters,
      • Handle outliers effectively.


    Three approaches …

    • Three approaches …

      • They are errors. They are usually either “corrected” or ignored.
        • Eg. Spurious input errors.
      • They are correct but represent a small (and thus insignificant) set of the data. In this case they are usually ignored or dealt with later.
        • Eg. People who live in remote parts of the country.
      • They are correct and represent the area in which we are interested. In this case the focus of the algorithms is on the outlier rather than the main clusters.
        • Eg. People who recover from a potentially fatal disease.


    The k-means and k-medoids methods are two of the earliest clustering methods.

    • The k-means and k-medoids methods are two of the earliest clustering methods.

      • They are relatively simply coded.
      • They require the specification of k.
      • They have been included in a number of commercial systems such as SPSS, Clementine, etc.
      • Always produce approximately spherical clusters.
      • Both can fall into local minima.

    • Yüklə 500 b.

      Dostları ilə paylaş:
    1   ...   4   5   6   7   8   9   10   11   ...   14




    Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
    rəhbərliyinə müraciət

    gir | qeydiyyatdan keç
        Ana səhifə


    yükləyin