Prediction uses that model together with new data to predict new values.
Predictive Accuracy - how well does the model predict new values?
Predictive Accuracy - how well does the model predict new values?
Capability - how well does the process handle different forms of data?
Speed - how fast is the model building and/or the prediction process?
Robustness - how well does the model deal with spurious data?
Scalability - how well does the process handle increases in the volume of data?
Interpretability - how understandable are the results?
Incrementation - how well is the model able to change its model when given new data?
Class Attribute - the attribute that holds the different values for the groups we want to classify. Eg. If we are trying to classify the n different types of disease then diseasetype (which would have n distinct values) would be the class attribute.
Class Attribute - the attribute that holds the different values for the groups we want to classify. Eg. If we are trying to classify the n different types of disease then diseasetype (which would have n distinct values) would be the class attribute.
Sample - part of a dataset. In classification terms almost always a partitioning based on the class attribute.
Test Attribute - another attribute we are using to split the dataset into sample such that the diversity of values of the class attribute is lower.
Entropy - amount of disorder wrt the class attribute.
Information (Gain) - a theoretical measure of (the increase in) knowledge held by a given sample.
Select the class attribute.
Select the class attribute.
Repeat until all attributes used up or tree complete
For each branch (each sample at that point in the tree)
If not yet sufficiently accurate
Select the attribute (and the values of that attribute) that best distinguishes the different values in the target attribute
Based (loosely) on the physics concept of Entropy (the idea of disorder)
We choose that attribute that best decreases entropy (has the greatest information gain). Ie. we choose the attribute that most results in the target attribute being distinguished. May be a different attribute for each branch.
Steps are as follows (full equations given in Han and Kamber, p286):
Determine the information needed to classify a given dataset on a specified target attribute by working out the sum of the information needed to classify each sample (each distinct value) of the target attribute.
Choosing each attribute in turn, work out the entropy of choosing that attribute by calculating the sum of the (weighted) information for each subset based on a value of the attribute.
The Gain for that attribute is thus the original information less the entropy (the disorder) remaining.
Choose the attribute with the highest Gain. (ie. the attribute that results in the least disorder wrt the target attribute).
Run down the tree in a depth wise manner creating the rules from the attribute-value pairs on each limb.
Run down the tree in a depth wise manner creating the rules from the attribute-value pairs on each limb.
Amalgamate (and maybe simplify) the rules that end up predicting the same value.
Pruning Techniques:
Prepruning - stop creating the tree when the branch is sufficiently accurate,
Postpruning - create the tree to completion and then remove either:
Branches that are unnecessarily complex, or
Rule terms that do not affect (or have little affect on) the accuracy of the result.