A merger of (at least) four disciplines. A merger of (at least) four disciplines


This class of techniques divide the space into a grid structure and reason about the objects in each cell and the neighbours to each cell



Yüklə 500 b.
səhifə10/14
tarix25.07.2018
ölçüsü500 b.
#58059
1   ...   6   7   8   9   10   11   12   13   14

This class of techniques divide the space into a grid structure and reason about the objects in each cell and the neighbours to each cell.

  • Can be done two ways:

  • Iteratively

    • Split cells that have too low a density
    • Merge cells with a similar density


    Grid-based methods have the following advantages -

    • Grid-based methods have the following advantages -

      • After going through the database once, their time complexity is not related to the number of data points but the number of cells. Thus the response time can be precomputed. Ie. the complexity of O(n) to go through the data and O(g) after that.
      • They can be parallelised fairly easily.
      • They lend themselves to query processing.
      • Various metrics for a cell can be precomputed (such as count, max, min, stdev, etc.) which saves time.
      • They are inherently multi-resolution which assists in finding aggregate clusters as well as more refined clusters.
    • Examples of Grid-based Methods - STING and Clique.

    • Clustering methods can be combined - in Clique, grid-based and density-based cluster are used together.



    Meta-heuristics (or evolutionary) algorithms refer to a class of algorithms that specify overall approaches to algorithm design. Many are based on analogies with the natural world. Examples include:

    • Meta-heuristics (or evolutionary) algorithms refer to a class of algorithms that specify overall approaches to algorithm design. Many are based on analogies with the natural world. Examples include:

      • Genetic Algorithms. Utilises the ideas of genetics and survival of the fittest in selecting the best examples to go forward to the next generation.
      • Ant Colony Systems. Employs the ideas of ants looking for food sources.
      • Neural Networks. Simulates (very approximately) the manner in which the human brain works.
      • Particle Swarms. Simulates the manner in which swarm of bees, etc. communicate and act.
    • Each of these can be adapted to generate clustering algorithms.

    • Some can be very Mathematical in their algorithms.



    Create a set of solutions (as members of the population)

    • Create a set of solutions (as members of the population)

    • Repeat until fitness criteria satisfied or a certain number of generations.

      • Randomly -
        • Cross-over - transfer segments of one solution to another,
        • Mutate - move elements of one solution to another,
      • Recalculate fitness metric (ie. inverse of error)
      • Prune worst solutions and replicate best solutions
    • Report best solution(s).

    • Advantages

      • May find many solutions (which can be useful).
      • Unlikely to get stuck in local minima,
      • Can be parallelised.


    Calculating the distance between numeric attributes is relatively easy.

    • Calculating the distance between numeric attributes is relatively easy.

    • Discrete attributes are more complex as the distance between them is not calculable from the value alone and may need either

      • A reference to an external dataset (eg. The ‘distance’ between cayenne, maraschino, strawberry, and salmon) or
      • Treat different values as simply different (eg. Between Male and Female or between Sales and Management.


    The distance between two points in Euclidean space can be calculated by the square-root of the sum of the squares in each dimension. Ie.

    • The distance between two points in Euclidean space can be calculated by the square-root of the sum of the squares in each dimension. Ie.

          • D = √(x1 - x2)2 + … (z1 - z2)2
    • Objects defined over differing attributes are done in a different manner. For example, consider a set of objects to be clustered over AGE, GENDER and DEPARTMENT. Each term of the calculation above will have to be translated to a number for calculation. Eg. For … (gender1 - gender2)2 … we will have to assign difference as, say, 1 and similarity as, say 0.

    • Note that each object will have a distance from each other object but each object will not have an explicit point in space.



    It is also then possible to assign different weightings to each attribute to bias the importance of the difference between different attributes.

    • It is also then possible to assign different weightings to each attribute to bias the importance of the difference between different attributes.

    • Thus the overall distance (error) metric might be:



    It is possible to deliberately create a diverse cluster by inverting the calculation of difference. For example, to create a possible set of electorates, the AEC might cluster (ie. amend the distance calculation) to keep Geographic distance (gd) low but gender (g), socio-economic status (ses), religion (r), etc. high.

    • It is possible to deliberately create a diverse cluster by inverting the calculation of difference. For example, to create a possible set of electorates, the AEC might cluster (ie. amend the distance calculation) to keep Geographic distance (gd) low but gender (g), socio-economic status (ses), religion (r), etc. high.

        • D = √(gd1 - gd2)2 + (g1 - inv(g2))2 + (ses1 - inv(ses2))2 + (r1 - inv(r2))2





    Yüklə 500 b.

    Dostları ilə paylaş:
    1   ...   6   7   8   9   10   11   12   13   14




    Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
    rəhbərliyinə müraciət

    gir | qeydiyyatdan keç
        Ana səhifə


    yükləyin