Classification-rule learning involves finding rules or decision trees that partition given data into predefined classes. For any realistic problem domain of the classification-rule learning, the set of possible decision trees is too large to be searched exhaustively. In fact, the computational complexity of finding an optimal classification decision tree is NP hard.
Most of the existing induction-based algorithms use Hunt′s method as the basic algorithm.[2] Here is a recursive description of Hunt′s method for constructing a decision tree from a set T of training cases with classes denoted {C1, C2, … ,Ck }. 数据挖掘研究院
Case 1 T contains one or more cases, all belonging to a single class Cj : The decision tree for T is a leaf identifying class Cj .
Case 2 T contains no cases: The decision tree for T is a leaf, but the class to be associated with the leaf must be determined from information other than T.
Case 3 T contains cases that belong to a mixture of classes: A test is chosen, based on a single attribute, that has one or more mutually exclusive outcomes {O1, O2 , .. ,On }. T is partitioned into subsets T1, T2, … ,Tn , where Ti contains all the cases in T that have outcome Oi of the chosen test. The decision tree for T consists of a decision node identifying the test, and one branch for each possible outcome. The same tree building machinery is applied recursively to each subset of training cases.

| Attribute Value | ||
| Play | Don′t Play | |
| sunny | 2 | 3 |
| overcast | 4 | 0 |
| rain | 3 | 2 |
| Attribute Value | Binary Test | ||
| Play | Don′t Play | ||
| 65 | 1 | 0 | |
| > | 8 | 5 | |
| 70 | 3 | 1 | |
| > | 6 | 4 | |
| 75 | 4 | 1 | |
| > | 5 | 4 | |
| 78 | 5 | 1 | |
| > | 4 | 4 | |
| 80 | 7 | 2 | |
| > | 2 | 3 | |
| 85 | 7 | 3 | |
| > | 2 | 2 | |
| 90 | 8 | 4 | |
| > | 1 | 1 | |
| 95 | 8 | 5 | |
| > | 1 | 0 | |
| 96 | 9 | 5 | |
| > | 0 | 0 | |
Table 2.1 shows a training data set with four data attributes and two classes. Figure 2.1 shows how the Hunt′s method works with the training data set. In the case 3 of the Hunt′s method, a test based on a single attribute is chosen for expanding the current node. The choice of an attribute is normally based on the entropy gains of the attributes. The entropy of an attribute is calculated from class distribution information. For a discrete attribute, class distribution information of each value of the attribute is required. Table 2.2 shows the class distribution information of data attribute Outlook. For a continuous attribute, binary test involving all the distinct values of the attribute is considered. Table 2.3 shows the class distribution information of data attribute Humidity. Once the class distribution information of all the attributes are gathered, the entropy is calculated based on either information theory or Gini Index. One attribute with the most entropy gain is selected as a test for the node expansion. 数据挖掘实验室
This algorithm was proposed by Quinlan (1993). The C4.5 algorithm generates a classification-decision tree for the given data-set by recursive partitioning of data. The decision is grown using Depth-first strategy. The algorithm considers all the possible tests that can split the data set and selects a test that gives the best information gain. For each discrete attribute, one test with outcomes as many as the number of distinct values of the attribute is considered. For each continuous attribute, binary tests involving every distinct values of the attribute are considered. In order to gather the entropy gain of all these binary tests efficiently, the training data set belonging to the node in consideration is sorted for the values of the continuous attribute and the entropy gains of the binary cut based on each distinct values are calculated in one scan of the sorted data. This process is repeated for each continuous attributes. SLIQ (Supervised Learning In Quest) developed by IBM′s Quest project team, is a decision tree classifier designed to classify large training data [1]. It uses a pre-sorting technique in the tree-growth phase. This helps avoid costly sorting at each node. SLIQ keeps a separate sorted list for each continuous attribute and a separate list called class list. An entry in the class list corresponds to a data item, and has a class label and name of the node it belongs in the decision tree. An entry in the sorted attribute list has an attribute value and the index of data item in the class list. SLIQ grows the decision tree in breadth-first manner. For each attribute, it scans the corresponding sorted list and calculate entropy values of each distinct values of all the nodes in the frontier of the decision tree simultaneously. After the entropy values have been calculated for each attribute, one attribute is chosen for a split for each nodes in the current frontier, and they are expanded to have a new frontier. Then one more scan of the sorted attribute list is performed to update the class list for the new nodes.
While SLIQ handles disk-resident data that are too large to fit in memory, it still requires some information to stay memory-resident which grows in direct proportion to the number of input records, putting a hard-limit on the size of training data. The Quest team has recently designed a new decision-tree-based classification algorithm, called SPRINT (Scalable PaRallelizable INduction of decision Trees) that for the removes all of the memory restrictions.
Nearest-neighbor The classical nearest-neighbor with options for weight setting, normalizations, and editing (Dasarathy 1990, Aha 1992, Wettschereck 1994).
Naive-Bayes A simple induction algorithm that assumes a conditional independence model of attributes given the label (Domingos & Pazzani 1996, Langley, Iba & Thompson 1992, Duda & Hart 1973, Good 1965).
OODG Oblivous read-Once Decision Graph induction algorithm described in Kohavi (1995c).
Lazy decision trees An algorithm for building the ``best′′ decision tree for every test instance described in Friedman, Kohavi & Yun (1996). 数据挖掘研究院
Decision Table A simple lookup table. A simple algorithm that is useful with feature subset selection 数据挖掘研究院

