public class SimpleCart extends RandomizableClassifier implements AdditionalMeasureProducer, TechnicalInformationHandler
@book{Breiman1984,
address = {Belmont, California},
author = {Leo Breiman and Jerome H. Friedman and Richard A. Olshen and Charles J. Stone},
publisher = {Wadsworth International Group},
title = {Classification and Regression Trees},
year = {1984}
}
Valid options are:
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-M <min no> The minimal number of instances at the terminal nodes. (default 2)
-N <num folds> The number of folds used in the minimal cost-complexity pruning. (default 5)
-U Don't use the minimal cost-complexity pruning. (default yes).
-H Don't use the heuristic method for binary split. (default true).
-A Use 1 SE rule to make pruning decision. (default no).
-C Percentage of training data size (0-1]. (default 1).
| Modifier and Type | Field and Description |
|---|---|
protected double |
m_Alpha
Alpha-value (for pruning) at the node.
|
protected Attribute |
m_Attribute
Attribute used to split data.
|
protected Attribute |
m_ClassAttribute
Class attriubte of data.
|
protected double[] |
m_ClassProbs
Class probabilities.
|
protected double |
m_ClassValue
Class value if the node is leaf.
|
protected double[] |
m_Distribution
Distributions of leaf node (or temporary leaf node in minimal cost-complexity pruning)
|
protected boolean |
m_Heuristic
If use huristic search for nominal attributes in multi-class problems (default true).
|
protected boolean |
m_isLeaf
Indicate if the node is a leaf node.
|
protected double |
m_minNumObj
Minimum number of instances in at the terminal nodes.
|
protected int |
m_numFoldsPruning
Number of folds for minimal cost-complexity pruning.
|
protected double |
m_numIncorrectModel
Number of training examples misclassified by the model (subtree rooted).
|
protected double |
m_numIncorrectTree
Number of training examples misclassified by the model (subtree not rooted).
|
protected double[] |
m_Props
Proportion for each branch.
|
protected boolean |
m_Prune
If use minimal cost-compexity pruning.
|
protected double |
m_SizePer
Training data size.
|
protected String |
m_SplitString
Split subset used to split data for nominal attributes.
|
protected double |
m_SplitValue
Split point for a numeric attribute.
|
protected SimpleCart[] |
m_Successors
Successor nodes.
|
protected int |
m_totalTrainInstances
Total number of instances used to build the classifier.
|
protected Instances |
m_train
Training data.
|
protected boolean |
m_UseOneSE
If use the 1SE rule to make final decision tree.
|
m_Seedm_Debug| Constructor and Description |
|---|
SimpleCart() |
| Modifier and Type | Method and Description |
|---|---|
void |
buildClassifier(Instances data)
Build the classifier.
|
void |
calculateAlphas()
Updates the alpha field for all nodes.
|
protected double |
computeGini(double[] dist,
double total)
Compute and return gini index for a given distribution of a node.
|
protected double |
computeGiniGain(double[] parentDist,
double[][] childDist)
Compute and return gini gain for given distributions of a node and its
successor nodes.
|
protected double |
computeSortedInfo(Instances data,
int[][] sortedIndices,
double[][] weights,
double[] classProbs)
Compute sorted indices, weights and class probabilities for a given
dataset.
|
double[] |
distributionForInstance(Instance instance)
Computes class probabilities for instance using the decision tree.
|
Enumeration |
enumerateMeasures()
Return an enumeration of the measure names.
|
protected void |
fillInnerNodes(Vector nodeList)
Fills a list with all inner nodes in the tree.
|
Capabilities |
getCapabilities()
Returns default capabilities of the classifier.
|
boolean |
getHeuristic()
Get if use heuristic search for nominal attributes in multi-class problems.
|
protected Vector |
getInnerNodes()
Return a list of all inner nodes in the tree.
|
double |
getMeasure(String additionalMeasureName)
Returns the value of the named measure.
|
double |
getMinNumObj()
Get minimal number of instances at the terminal nodes.
|
int |
getNumFoldsPruning()
Set number of folds in internal cross-validation.
|
String[] |
getOptions()
Gets the current settings of the classifier.
|
String |
getRevision()
Returns the revision string.
|
double |
getSizePer()
Get training set size.
|
TechnicalInformation |
getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing
detailed information about the technical background of this class,
e.g., paper reference or book this class is based on.
|
boolean |
getUseOneSE()
Get if use the 1SE rule to choose final model.
|
boolean |
getUsePrune()
Get if use minimal cost-complexity pruning.
|
String |
globalInfo()
Return a description suitable for displaying in the explorer/experimenter.
|
String |
heuristicTipText()
Returns the tip text for this property
|
Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(String[] args)
Main method.
|
protected void |
makeLeaf(Instances data)
Make the node leaf node.
|
protected void |
makeTree(Instances data,
int totalInstances,
int[][] sortedIndices,
double[][] weights,
double[] classProbs,
double totalWeight,
double minNumObj,
boolean useHeuristic)
Make binary decision tree recursively.
|
double |
measureTreeSize()
Return number of tree size.
|
String |
minNumObjTipText()
Returns the tip text for this property
|
void |
modelErrors()
Updates the numIncorrectModel field for all nodes when subtree (to be
pruned) is rooted.
|
protected SimpleCart |
nodeToPrune(Vector nodeList)
Find the node with minimal alpha value.
|
protected String |
nominalDistribution(double[][] props,
double[][][] dists,
Attribute att,
int[] sortedIndices,
double[] weights,
double[][] subsetWeights,
double[] giniGains,
Instances data,
boolean useHeuristic)
Compute distributions, proportions and total weights of two successor
nodes for a given nominal attribute.
|
protected double |
numericDistribution(double[][] props,
double[][][] dists,
Attribute att,
int[] sortedIndices,
double[] weights,
double[][] subsetWeights,
double[] giniGains,
Instances data)
Compute distributions, proportions and total weights of two successor
nodes for a given numeric attribute.
|
String |
numFoldsPruningTipText()
Returns the tip text for this property
|
int |
numInnerNodes()
Method to count the number of inner nodes in the tree.
|
int |
numLeaves()
Compute number of leaf nodes.
|
int |
numNodes()
Compute size of the tree.
|
void |
prune(double alpha)
Prunes the original tree using the CART pruning scheme, given a
cost-complexity parameter alpha.
|
int |
prune(double[] alphas,
double[] errors,
Instances test)
Method for performing one fold in the cross-validation of minimal
cost-complexity pruning.
|
void |
setHeuristic(boolean value)
Set if use heuristic search for nominal attributes in multi-class problems.
|
void |
setMinNumObj(double value)
Set minimal number of instances at the terminal nodes.
|
void |
setNumFoldsPruning(int value)
Set number of folds in internal cross-validation.
|
void |
setOptions(String[] options)
Parses a given list of options.
|
void |
setSizePer(double value)
Set training set size.
|
void |
setUseOneSE(boolean value)
Set if use the 1SE rule to choose final model.
|
void |
setUsePrune(boolean value)
Set if use minimal cost-complexity pruning.
|
String |
sizePerTipText()
Returns the tip text for this property
|
protected void |
splitData(int[][][] subsetIndices,
double[][][] subsetWeights,
Attribute att,
double splitPoint,
String splitStr,
int[][] sortedIndices,
double[][] weights,
Instances data)
Split data into two subsets and store sorted indices and weights for two
successor nodes.
|
String |
toString()
Prints the decision tree using the protected toString method from below.
|
protected String |
toString(int level)
Outputs a tree at a certain level.
|
void |
treeErrors()
Updates the numIncorrectTree field for all nodes.
|
protected void |
unprune()
Method to "unprune" the CART tree.
|
String |
useOneSETipText()
Returns the tip text for this property
|
String |
usePruneTipText()
Return the tip text for this property
|
getSeed, seedTipText, setSeedclassifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, runClassifier, setDebugprotected Instances m_train
protected SimpleCart[] m_Successors
protected Attribute m_Attribute
protected double m_SplitValue
protected String m_SplitString
protected double m_ClassValue
protected Attribute m_ClassAttribute
protected double m_minNumObj
protected int m_numFoldsPruning
protected double m_Alpha
protected double m_numIncorrectModel
protected double m_numIncorrectTree
protected boolean m_isLeaf
protected boolean m_Prune
protected int m_totalTrainInstances
protected double[] m_Props
protected double[] m_ClassProbs
protected double[] m_Distribution
protected boolean m_Heuristic
protected boolean m_UseOneSE
protected double m_SizePer
public String globalInfo()
public TechnicalInformation getTechnicalInformation()
getTechnicalInformation in interface TechnicalInformationHandlerpublic Capabilities getCapabilities()
getCapabilities in interface CapabilitiesHandlergetCapabilities in class ClassifierCapabilitiespublic void buildClassifier(Instances data) throws Exception
buildClassifier in class Classifierdata - the training instancesException - if something goes wrongprotected void makeTree(Instances data, int totalInstances, int[][] sortedIndices, double[][] weights, double[] classProbs, double totalWeight, double minNumObj, boolean useHeuristic) throws Exception
data - the training instancestotalInstances - total number of instancessortedIndices - sorted indices of the instancesweights - weights of the instancesclassProbs - class probabilitiestotalWeight - total weight of instancesminNumObj - minimal number of instances at leaf nodesuseHeuristic - if use heuristic search for nominal attributes in multi-class problemException - if something goes wrongpublic void prune(double alpha)
throws Exception
alpha - the cost-complexity parameterException - if something goes wrongpublic int prune(double[] alphas,
double[] errors,
Instances test)
throws Exception
alphas - array to hold the generated alpha-valueserrors - array to hold the corresponding error estimatestest - test set of that fold (to obtain error estimates)Exception - if something goes wrongprotected void unprune()
protected double numericDistribution(double[][] props,
double[][][] dists,
Attribute att,
int[] sortedIndices,
double[] weights,
double[][] subsetWeights,
double[] giniGains,
Instances data)
throws Exception
props - proportions of each two branches for each attributedists - class distributions of two branches for each attributeatt - numeric att split onsortedIndices - sorted indices of instances for the attirubteweights - weights of instances for the attirbutesubsetWeights - total weight of two branches split based on the attributeginiGains - Gini gains for each attributedata - training instancesException - if something goes wrongprotected String nominalDistribution(double[][] props, double[][][] dists, Attribute att, int[] sortedIndices, double[] weights, double[][] subsetWeights, double[] giniGains, Instances data, boolean useHeuristic) throws Exception
props - proportions of each two branches for each attributedists - class distributions of two branches for each attributeatt - numeric att split onsortedIndices - sorted indices of instances for the attirubteweights - weights of instances for the attirbutesubsetWeights - total weight of two branches split based on the attributeginiGains - Gini gains for each attributedata - training instancesuseHeuristic - if use heuristic searchException - if something goes wrongprotected void splitData(int[][][] subsetIndices,
double[][][] subsetWeights,
Attribute att,
double splitPoint,
String splitStr,
int[][] sortedIndices,
double[][] weights,
Instances data)
throws Exception
subsetIndices - sorted indecis of instances for each attribute
for two successor nodesubsetWeights - weights of instances for each attribute for
two successor nodeatt - attribute the split based onsplitPoint - split point the split based on if att is numericsplitStr - split subset the split based on if att is nominalsortedIndices - sorted indices of the instances to be splitweights - weights of the instances to bes splitdata - training dataException - if something goes wrongpublic void modelErrors()
throws Exception
Exception - if something goes wrongpublic void treeErrors()
throws Exception
Exception - if something goes wrongpublic void calculateAlphas()
throws Exception
Exception - if something goes wrongprotected SimpleCart nodeToPrune(Vector nodeList)
nodeList - list of inner nodesprotected double computeSortedInfo(Instances data, int[][] sortedIndices, double[][] weights, double[] classProbs) throws Exception
data - training datasortedIndices - sorted indices of instances at the nodeweights - weights of instances at the nodeclassProbs - class probabilities at the nodeException - if something goes wrongprotected double computeGiniGain(double[] parentDist,
double[][] childDist)
parentDist - class distributions of parent nodechildDist - class distributions of successor nodesprotected double computeGini(double[] dist,
double total)
dist - class distributionstotal - class distributionspublic double[] distributionForInstance(Instance instance) throws Exception
distributionForInstance in class Classifierinstance - the instance for which class probabilities is to be computedException - if something goes wrongprotected void makeLeaf(Instances data)
data - trainging datapublic String toString()
protected String toString(int level)
level - the level at which the tree is to be printedpublic int numNodes()
public int numInnerNodes()
protected Vector getInnerNodes()
protected void fillInnerNodes(Vector nodeList)
nodeList - the list to be filledpublic int numLeaves()
public Enumeration listOptions()
listOptions in interface OptionHandlerlistOptions in class RandomizableClassifierpublic void setOptions(String[] options) throws Exception
-S <num> Random number seed. (default 1)
-D If set, classifier is run in debug mode and may output additional info to the console
-M <min no> The minimal number of instances at the terminal nodes. (default 2)
-N <num folds> The number of folds used in the minimal cost-complexity pruning. (default 5)
-U Don't use the minimal cost-complexity pruning. (default yes).
-H Don't use the heuristic method for binary split. (default true).
-A Use 1 SE rule to make pruning decision. (default no).
-C Percentage of training data size (0-1]. (default 1).
setOptions in interface OptionHandlersetOptions in class RandomizableClassifieroptions - the list of options as an array of stringsException - if an options is not supportedpublic String[] getOptions()
getOptions in interface OptionHandlergetOptions in class RandomizableClassifierpublic Enumeration enumerateMeasures()
enumerateMeasures in interface AdditionalMeasureProducerpublic double measureTreeSize()
public double getMeasure(String additionalMeasureName)
getMeasure in interface AdditionalMeasureProduceradditionalMeasureName - the name of the measure to query for its valueIllegalArgumentException - if the named measure is not supportedpublic String minNumObjTipText()
public void setMinNumObj(double value)
value - minimal number of instances at the terminal nodespublic double getMinNumObj()
public String numFoldsPruningTipText()
public void setNumFoldsPruning(int value)
value - number of folds in internal cross-validation.public int getNumFoldsPruning()
public String usePruneTipText()
public void setUsePrune(boolean value)
value - if use minimal cost-complexity pruningpublic boolean getUsePrune()
public String heuristicTipText()
public void setHeuristic(boolean value)
value - if use heuristic search for nominal attributes in
multi-class problemspublic boolean getHeuristic()
public String useOneSETipText()
public void setUseOneSE(boolean value)
value - if use the 1SE rule to choose final modelpublic boolean getUseOneSE()
public String sizePerTipText()
public void setSizePer(double value)
value - training set sizepublic double getSizePer()
public String getRevision()
getRevision in interface RevisionHandlergetRevision in class Classifierpublic static void main(String[] args)
args - the options for the classifierCopyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.