A merger of (at least) four disciplines. A merger of (at least) four disciplines

Needs to traverse the database for each level for if there are n transactions and k levels of item then the routine is O(nk)

Yüklə 500 b.

səhifə	7/14
tarix	25.07.2018
ölçüsü	500 b.
	#58059

1 2 3 4 5 6 7 8 9 10 ... 14

Needs to traverse the database for each level for if there are n transactions and k levels of item then the routine is O(nk).

Needs to traverse the database for each level for if there are n transactions and k levels of item then the routine is O(nk).
Cannot handle

Hierarchies
Categorical Values
Continuous Values

Has been improved on considerably by a number of new algorithms

FP Growth, for example, finds frequent itemsets without generating candidates first.
There are multi-level association rule generation routines that accommodate concept hierarchies.
There are routines that handle time and space,
There are routines that work on parallel machines,
There are routines that spot competitor items, and so on.

While Apriori uses transaction databases, conventional relational databases can also be used. For example, the following relation:

While Apriori uses transaction databases, conventional relational databases can also be used. For example, the following relation:

Id Sex Department Age License? Location
21788A M Sales 31 Y Adelaide
21771H F Mgmt 42 N Sydney
12299I M Payroll …
:

Can be translated to transaction dataset:

Id.21788A Sex.M, Department.Sales, Age.31, License.Y, Location.Adelaide
Id.21771H Sex.F, Department.Mgmt, Age.42, License.N, Location.Sydney
Id.12299I Sex.M, Department.Payroll …
:

This can then mined to create rules such as:

Sex.M, Department.Sales -> License.Y σ(15%), γ(72%)

Note that three attributes need special attention:

Note that three attributes need special attention:

Id never will never appear in a rule as it will never have the required support.

This is generally desirable as it effectively confidentialises the rules.

Age, if left as it is in the database is unlikely to be part of a rule as the values are too diverse. Thus we need to segment (or discretize) the range, eg.

Age.31-40; Age.41-50, etc.
The problem is often deciding on the ranges to use

Some are predetermined by the interests of the user,
Others may be automatically derived - there are routines to do this, such as binning and analyzing association rules after generation.
Finally, Age can be put in a hierarchy as for Location below

Location may be interpreted better if it was accommodated as part of a hierarchy. For example:

Location may be interpreted better if it was accommodated as part of a hierarchy. For example:
Thus while a rule
Sex.M, Location.Adelaide -> License.Y
may not have the required support
Sex.M, Location.SA -> License.Y
might have.

There are a few problems with hierarchies:

There are a few problems with hierarchies:

As the hierarchy is ascended, the support level needed for something to make sense increases. For example:
Peas -> Coffee
might be considered interesting, but:
Grocery -> Beverage
would have to reach a very high level of support before it would be consider useful.
There may be more than one hierarchy - which one do we use?

If they are (supposed to be) independent then we can use them all as an association between them might be interesting.
If they are known dependencies (eg. between, public holidays and the day of the week), then we must stop such associations dominating our rules.

Linking in external files as additional attributes can be extremely useful. For example, on the admission date field on a hospital record we might link:

Linking in external files as additional attributes can be extremely useful. For example, on the admission date field on a hospital record we might link:

Meteorological data
Public holiday data
Day of the week
Pollen Count
Lunar Cycles

Similarly, on home postcode we might link:

SLA data
Census data
Geographic data, and so on.

As for hierarchies, we have to ensure that known associations between data linked in externally does not dominate the mining routine. When mining hospital data, for example, we do not want to discover that:

Temperature.Low -> PollenCount.Low

Consider the following three rules

Consider the following three rules

A -> C σ(15%), γ(72%)
B -> C σ(16%), γ(67%)
AB -> C σ(3%), γ(12%)

It is clear that the last is interesting as it implies that while A and B are independently associated with C, A and B together very rarely occur and when they do then are only loosely correlated with C.
This is the common form of association rules for competitors. Ie. the existence of one suppresses the other and vice versa.
The common way to calculate competitors is to work out the expected support and confidence and to compare it with what was observed.

Consider the following

Consider the following

Transaction - 100,000
Purchases of A - 50,000
Purchases of B - 40,000
Purchases of A and B - 15,000

Yüklə 500 b.

Dostları ilə paylaş:

1 2 3 4 5 6 7 8 9 10 ... 14

A merger of (at least) four disciplines. A merger of (at least) four disciplines

Needs to traverse the database for each level for if there are n transactions and k levels of item then the routine is O(nk)

Needs to traverse the database for each level for if there are n transactions and k levels of item then the routine is O(nk).

Needs to traverse the database for each level for if there are n transactions and k levels of item then the routine is O(nk).

Cannot handle

Has been improved on considerably by a number of new algorithms

While Apriori uses transaction databases, conventional relational databases can also be used. For example, the following relation:

While Apriori uses transaction databases, conventional relational databases can also be used. For example, the following relation:

Can be translated to transaction dataset:

This can then mined to create rules such as:

Note that three attributes need special attention:

Note that three attributes need special attention:

Location may be interpreted better if it was accommodated as part of a hierarchy. For example:

There are a few problems with hierarchies:

There are a few problems with hierarchies:

Linking in external files as additional attributes can be extremely useful. For example, on the admission date field on a hospital record we might link:

Linking in external files as additional attributes can be extremely useful. For example, on the admission date field on a hospital record we might link:

Similarly, on home postcode we might link:

As for hierarchies, we have to ensure that known associations between data linked in externally does not dominate the mining routine. When mining hospital data, for example, we do not want to discover that:

Consider the following three rules

Consider the following three rules

It is clear that the last is interesting as it implies that while A and B are independently associated with C, A and B together very rarely occur and when they do then are only loosely correlated with C.

This is the common form of association rules for competitors. Ie. the existence of one suppresses the other and vice versa.

The common way to calculate competitors is to work out the expected support and confidence and to compare it with what was observed.

Consider the following

Consider the following