public class Discretize extends PotentialClassIgnorer implements UnsupervisedFilter, WeightedInstancesHandler
-unset-class-temporarily Unsets the class index temporarily before the filter is applied to the data. (default: no)
-B <num> Specifies the (maximum) number of bins to divide numeric attributes into. (default = 10)
-M <num> Specifies the desired weight of instances per bin for equal-frequency binning. If this is set to a positive number then the -B option will be ignored. (default = -1)
-F Use equal-frequency instead of equal-width discretization.
-O Optimize number of bins using leave-one-out estimate of estimated entropy (for equal-width discretization). If this is set then the -B option will be ignored.
-R <col1,col2-col4,...> Specifies list of columns to Discretize. First and last are valid indexes. (default: first-last)
-V Invert matching sense of column indexes.
-D Output binary attributes for discretized attributes.
| Modifier and Type | Field and Description |
|---|---|
protected double[][] |
m_CutPoints
Store the current cutpoints
|
protected String |
m_DefaultCols
The default columns to discretize
|
protected double |
m_DesiredWeightOfInstancesPerInterval
The desired weight of instances per bin
|
protected Range |
m_DiscretizeCols
Stores which columns to Discretize
|
protected boolean |
m_FindNumBins
Find the number of bins using cross-validated entropy.
|
protected boolean |
m_MakeBinary
Output binary attributes for discretized attributes.
|
protected int |
m_NumBins
The number of bins to divide the attribute into
|
protected boolean |
m_UseEqualFrequency
Use equal-frequency binning if unsupervised discretization turned on
|
m_ClassIndex, m_IgnoreClassm_FirstBatchDone, m_InputRelAtts, m_InputStringAtts, m_NewBatch, m_OutputRelAtts, m_OutputStringAtts| Constructor and Description |
|---|
Discretize()
Constructor - initialises the filter
|
Discretize(String cols)
Another constructor, sets the attribute indices immediately
|
| Modifier and Type | Method and Description |
|---|---|
String |
attributeIndicesTipText()
Returns the tip text for this property
|
boolean |
batchFinished()
Signifies that this batch of input to the filter is finished.
|
String |
binsTipText()
Returns the tip text for this property
|
protected void |
calculateCutPoints()
Generate the cutpoints for each attribute
|
protected void |
calculateCutPointsByEqualFrequencyBinning(int index)
Set cutpoints for a single attribute.
|
protected void |
calculateCutPointsByEqualWidthBinning(int index)
Set cutpoints for a single attribute.
|
protected void |
convertInstance(Instance instance)
Convert a single instance over.
|
String |
desiredWeightOfInstancesPerIntervalTipText()
Returns the tip text for this property
|
protected void |
findNumBins(int index)
Optimizes the number of bins using leave-one-out cross-validation.
|
String |
findNumBinsTipText()
Returns the tip text for this property
|
String |
getAttributeIndices()
Gets the current range selection
|
int |
getBins()
Gets the number of bins numeric attributes will be divided into
|
Capabilities |
getCapabilities()
Returns the Capabilities of this filter.
|
double[] |
getCutPoints(int attributeIndex)
Gets the cut points for an attribute
|
double |
getDesiredWeightOfInstancesPerInterval()
Get the DesiredWeightOfInstancesPerInterval value.
|
boolean |
getFindNumBins()
Get the value of FindNumBins.
|
boolean |
getInvertSelection()
Gets whether the supplied columns are to be removed or kept
|
boolean |
getMakeBinary()
Gets whether binary attributes should be made for discretized ones.
|
String[] |
getOptions()
Gets the current settings of the filter.
|
String |
getRevision()
Returns the revision string.
|
boolean |
getUseEqualFrequency()
Get the value of UseEqualFrequency.
|
String |
globalInfo()
Returns a string describing this filter
|
boolean |
input(Instance instance)
Input an instance for filtering.
|
String |
invertSelectionTipText()
Returns the tip text for this property
|
Enumeration |
listOptions()
Gets an enumeration describing the available options.
|
static void |
main(String[] argv)
Main method for testing this class.
|
String |
makeBinaryTipText()
Returns the tip text for this property
|
void |
setAttributeIndices(String rangeList)
Sets which attributes are to be Discretized (only numeric
attributes among the selection will be Discretized).
|
void |
setAttributeIndicesArray(int[] attributes)
Sets which attributes are to be Discretized (only numeric
attributes among the selection will be Discretized).
|
void |
setBins(int numBins)
Sets the number of bins to divide each selected numeric attribute into
|
void |
setDesiredWeightOfInstancesPerInterval(double newDesiredNumber)
Set the DesiredWeightOfInstancesPerInterval value.
|
void |
setFindNumBins(boolean newFindNumBins)
Set the value of FindNumBins.
|
boolean |
setInputFormat(Instances instanceInfo)
Sets the format of the input instances.
|
void |
setInvertSelection(boolean invert)
Sets whether selected columns should be removed or kept.
|
void |
setMakeBinary(boolean makeBinary)
Sets whether binary attributes should be made for discretized ones.
|
void |
setOptions(String[] options)
Parses a given list of options.
|
protected void |
setOutputFormat()
Set the output format.
|
void |
setUseEqualFrequency(boolean newUseEqualFrequency)
Set the value of UseEqualFrequency.
|
String |
useEqualFrequencyTipText()
Returns the tip text for this property
|
getIgnoreClass, getOutputFormat, ignoreClassTipText, setIgnoreClassbatchFilterFile, bufferInput, copyValues, copyValues, filterFile, flushInput, getCapabilities, getInputFormat, initInputLocators, initOutputLocators, inputFormatPeek, isFirstBatchDone, isNewBatch, isOutputFormatDefined, makeCopies, makeCopy, numPendingOutput, output, outputFormatPeek, outputPeek, push, resetQueue, runFilter, setOutputFormat, testInputFormat, toString, useFilter, wekaStaticWrapperprotected Range m_DiscretizeCols
protected int m_NumBins
protected double m_DesiredWeightOfInstancesPerInterval
protected double[][] m_CutPoints
protected boolean m_MakeBinary
protected boolean m_FindNumBins
protected boolean m_UseEqualFrequency
protected String m_DefaultCols
public Discretize()
public Discretize(String cols)
cols - the attribute indicespublic Enumeration listOptions()
listOptions in interface OptionHandlerlistOptions in class PotentialClassIgnorerpublic void setOptions(String[] options) throws Exception
-unset-class-temporarily Unsets the class index temporarily before the filter is applied to the data. (default: no)
-B <num> Specifies the (maximum) number of bins to divide numeric attributes into. (default = 10)
-M <num> Specifies the desired weight of instances per bin for equal-frequency binning. If this is set to a positive number then the -B option will be ignored. (default = -1)
-F Use equal-frequency instead of equal-width discretization.
-O Optimize number of bins using leave-one-out estimate of estimated entropy (for equal-width discretization). If this is set then the -B option will be ignored.
-R <col1,col2-col4,...> Specifies list of columns to Discretize. First and last are valid indexes. (default: first-last)
-V Invert matching sense of column indexes.
-D Output binary attributes for discretized attributes.
setOptions in interface OptionHandlersetOptions in class PotentialClassIgnoreroptions - the list of options as an array of stringsException - if an option is not supportedpublic String[] getOptions()
getOptions in interface OptionHandlergetOptions in class PotentialClassIgnorerpublic Capabilities getCapabilities()
getCapabilities in interface CapabilitiesHandlergetCapabilities in class FilterCapabilitiespublic boolean setInputFormat(Instances instanceInfo) throws Exception
setInputFormat in class PotentialClassIgnorerinstanceInfo - an Instances object containing the input instance
structure (any instances contained in the object are ignored - only the
structure is required).Exception - if the input format can't be set successfullypublic boolean input(Instance instance)
input in class Filterinstance - the input instanceIllegalStateException - if no input format has been defined.public boolean batchFinished()
batchFinished in class FilterIllegalStateException - if no input structure has been definedpublic String globalInfo()
public String findNumBinsTipText()
public boolean getFindNumBins()
public void setFindNumBins(boolean newFindNumBins)
newFindNumBins - Value to assign to FindNumBins.public String makeBinaryTipText()
public boolean getMakeBinary()
public void setMakeBinary(boolean makeBinary)
makeBinary - if binary attributes are to be madepublic String desiredWeightOfInstancesPerIntervalTipText()
public double getDesiredWeightOfInstancesPerInterval()
public void setDesiredWeightOfInstancesPerInterval(double newDesiredNumber)
newDesiredNumber - The new DesiredNumber value.public String useEqualFrequencyTipText()
public boolean getUseEqualFrequency()
public void setUseEqualFrequency(boolean newUseEqualFrequency)
newUseEqualFrequency - Value to assign to UseEqualFrequency.public String binsTipText()
public int getBins()
public void setBins(int numBins)
numBins - the number of binspublic String invertSelectionTipText()
public boolean getInvertSelection()
public void setInvertSelection(boolean invert)
invert - the new invert settingpublic String attributeIndicesTipText()
public String getAttributeIndices()
public void setAttributeIndices(String rangeList)
rangeList - a string representing the list of attributes. Since
the string will typically come from a user, attributes are indexed from
1. IllegalArgumentException - if an invalid range list is suppliedpublic void setAttributeIndicesArray(int[] attributes)
attributes - an array containing indexes of attributes to Discretize.
Since the array will typically come from a program, attributes are indexed
from 0.IllegalArgumentException - if an invalid set of ranges
is suppliedpublic double[] getCutPoints(int attributeIndex)
attributeIndex - the index (from 0) of the attribute to get the cut points ofprotected void calculateCutPoints()
protected void calculateCutPointsByEqualWidthBinning(int index)
index - the index of the attribute to set cutpoints forprotected void calculateCutPointsByEqualFrequencyBinning(int index)
index - the index of the attribute to set cutpoints forprotected void findNumBins(int index)
index - the attribute indexprotected void setOutputFormat()
protected void convertInstance(Instance instance)
instance - the instance to convertpublic String getRevision()
getRevision in interface RevisionHandlergetRevision in class Filterpublic static void main(String[] argv)
argv - should contain arguments to the filter: use -h for helpCopyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.