A merger of (at least) four disciplines. A merger of (at least) four disciplines



Yüklə 500 b.
səhifə9/14
tarix25.07.2018
ölçüsü500 b.
#58059
1   ...   6   7   8   9   10   11   12   13   14

K-means works as follows:

  • Choose k points at random as first centroid,
  • Repeat until no movement or some specified number of iterations.
    • Assign all other objects to the closest centroid.
    • Recalculate the means from the clusters formed


k-medoids differs from k-means in that the centroid of the cluster must be a “representative” object or medoid.

  • k-medoids differs from k-means in that the centroid of the cluster must be a “representative” object or medoid.

    • This has the effect of reducing the effect of outliers to some extent.
  • K-medoids work as follows:

    • Choose k objects at random as first medoids,
    • Repeat until no movement or some specified number of iterations.
      • Assign all other objects to the closest medoid.
      • Recalculate the most central object from the clusters formed




PAM, CLARA and CLARANS are three early attempts to improve on the polynomial nature of k-medoids/k-means.

  • PAM, CLARA and CLARANS are three early attempts to improve on the polynomial nature of k-medoids/k-means.

  • PAM works as follows

    • Step 1: Initialization - choose k medoids from T objects randomly.
    • Step 2: Evaluation - calculate the cost Dt’-Dt for each swap of one medoid with one object, where Dt is the total distance before the swap and Dt’ is the total distance after the swap.
    • Step 3: Selection - if the best cost is negative, accept the swap and go to step 2; otherwise record the medoids and terminate the program.
  • PAM still quite expensive if the number of objects and/or the number of clusters is high.



CLARA works by Sampling

  • CLARA works by Sampling

    • Repeat the following steps q times
      • Step 1: Call the PAM algorithm with a random sample, s objects from the original set of T objects.
      • Step 2: Partition the T objects based on the k medoids obtained from previous step. If the medoids just found are better than those found to date based on the average distance of the partition, record them.
  • CLARANS works by Randomising the Sample and searching a graph where neighbours are solutions.

    • Repeat the following steps numlocal times.
      • Step 1: Select a node randomly and calculate the average distance of this current node, where node is the collection of k medoids.
      • Step 2: Repeat the following maxneighbour times.
        • Select a neighbour node randomly and calculate the average distance for this node. If the average distance is lower, set current node to be the neighbour node.


Many algorithms suffer from problems handing certain kinds of datasets and situations, such as:

  • Many algorithms suffer from problems handing certain kinds of datasets and situations, such as:

    • Non-spherical clusters
    • Shaped clusters
    • Clusters with outliers
    • Bridges
    • Constraints
    • Non-parametric clustering
    • Clusters of varying density


There are two main types of Hierarchical Clustering methods

  • There are two main types of Hierarchical Clustering methods

    • Agglomerative- works by assuming that the N objects are each one cluster. It then calculates the advantage of merging two clusters based on two sets of competing criteria
    • Divisive - works by assuming that at the start of the run there is one cluster and works out the benefits of splitting it.
  • Agglomerative criteria

    • The cost of having a cluster
    • The mean squared error
  • Two advantages

    • Can produce non-spherical clusters
    • Does not need k specified at the start of the run - just the criteria.
  • Major Exemplars are

    • AGNES, DIANA, BIRCH, CURE and CHAMELEON




Density methods are based on forming clusters around dense areas.

  • Density methods are based on forming clusters around dense areas.

  • Such methods:

  • Some of the major methods include DBSCAN, OPTICS and DENCLUE.





A core object is then any object that has a defined number (MinPts) of other points within the ε-neighbourhood.

  • A core object is then any object that has a defined number (MinPts) of other points within the ε-neighbourhood.



DENCLUE uses a mathematical distribution function to automatically determine clusters and thus does not need the user to specify MinPts or .

  • DENCLUE uses a mathematical distribution function to automatically determine clusters and thus does not need the user to specify MinPts or .

  • The process effectively takes a set of points and works out the contribution to the density of object and compares this with the surrounding density itself.



This class of techniques divide the space into a grid structure and reason about the objects in each cell and the neighbours to each cell.

1   ...   6   7   8   9   10   11   12   13   14




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin