public class SubspaceCluster extends ClusterGenerator
-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 1).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
-C <cluster-definition> A cluster definition of class 'SubspaceClusterDefinition' (definition needs to be quoted to be recognized as a single argument).
Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
-A <range> Generates randomly distributed instances in the cluster.
-U <range> Generates uniformly distributed instances in the cluster.
-G <range> Generates gaussian distributed instances in the cluster.
-D <num>,<num> The attribute min/max (-A and -U) or mean/stddev (-G) for the cluster.
-N <num>..<num> The range of number of instances per cluster (default 1..50).
-I Uses integer instead of continuous values (default continuous).
| Modifier and Type | Field and Description |
|---|---|
static int |
CONTINUOUS
cluster subtype: continuous
|
static int |
GAUSSIAN
cluster type: gaussian
|
static int |
INTEGER
cluster subtype: integer
|
protected ClusterDefinition[] |
m_Clusters
cluster list
|
protected double[] |
m_globalMaxValue
store global max values
|
protected double[] |
m_globalMinValue
store global min values
|
protected double |
m_NoiseRate
noise rate in percent (option P, between 0 and 30)
|
protected int[] |
m_numValues
if nominal, store number of values
|
static Tag[] |
TAGS_CLUSTERSUBTYPE
the tags for the cluster types
|
static Tag[] |
TAGS_CLUSTERTYPE
the tags for the cluster types
|
static int |
TOTAL_UNIFORM
cluster type: total uniform
|
static int |
UNIFORM_RANDOM
cluster type: uniform/random
|
m_booleanCols, m_ClassFlag, m_nominalCols, m_NumAttributesm_CreatingRelationName, m_DatasetFormat, m_Debug, m_DefaultOutput, m_NumExamplesAct, m_OptionBlacklist, m_Output, m_Random, m_RelationName, m_Seed| Constructor and Description |
|---|
SubspaceCluster()
initializes the generator, sets the number of clusters to 0, since user
has to specify them explicitly
|
| Modifier and Type | Method and Description |
|---|---|
protected boolean |
checkCoverage()
Checks, whether all attributes are covered by cluster definitions and
returns TRUE in that case.
|
String |
clusterDefinitionsTipText()
Returns the tip text for this property
|
protected double |
defaultNoiseRate()
returns the default noise rate
|
protected int |
defaultNumAttributes()
returns the default number of attributes
|
Instances |
defineDataFormat()
Initializes the format for the dataset produced.
|
Instance |
generateExample()
Generate an example of the dataset.
|
Instances |
generateExamples()
Generate all examples of the dataset.
|
String |
generateFinished()
Compiles documentation about the data generation after
the generation process
|
String |
generateStart()
Compiles documentation about the data generation before
the generation process
|
ClusterDefinition[] |
getClusterDefinitions()
returns the currently set clusters
|
protected ClusterDefinition[] |
getClusters()
returns the current cluster definitions, if necessary initializes them
|
double |
getNoiseRate()
Gets the percentage of noise set.
|
int[] |
getNumValues()
returns array that stores the number of values for a nominal attribute.
|
String[] |
getOptions()
Gets the current settings of the datagenerator.
|
String |
getRevision()
Returns the revision string.
|
boolean |
getSingleModeFlag()
Gets the single mode flag.
|
String |
globalInfo()
Returns a string describing this data generator.
|
boolean |
isBoolean(int index)
Returns true if attribute is boolean
|
boolean |
isNominal(int index)
Returns true if attribute is nominal
|
Enumeration |
listOptions()
Returns an enumeration describing the available options.
|
static void |
main(String[] args)
Main method for testing this class.
|
String |
noiseRateTipText()
Returns the tip text for this property
|
String |
numAttributesTipText()
Returns the tip text for this property
|
void |
setClusterDefinitions(ClusterDefinition[] value)
sets the clusters to use
|
void |
setNoiseRate(double newNoiseRate)
Sets the percentage of noise set.
|
void |
setNumAttributes(int numAttributes)
Sets the number of attributes the dataset should have.
|
void |
setOptions(String[] options)
Parses a list of options for this object.
|
booleanColsTipText, checkIndices, classFlagTipText, getBooleanCols, getClassFlag, getNominalCols, getNumAttributes, nominalColsTipText, setBooleanCols, setBooleanIndices, setClassFlag, setNominalCols, setNominalIndicesaddToBlacklist, clearBlacklist, debugTipText, defaultNumExamplesAct, defaultOutput, defaultRelationName, defaultSeed, enumToVector, formatTipText, getDatasetFormat, getDebug, getNumExamplesAct, getOutput, getRandom, getRelationName, getRelationNameToUse, getSeed, isOnBlacklist, makeData, makeOptionString, numExamplesActTipText, outputTipText, randomTipText, relationNameTipText, removeBlacklist, runDataGenerator, seedTipText, setDatasetFormat, setDebug, setNumExamplesAct, setOutput, setRandom, setRelationName, setSeed, toStringFormatprotected double m_NoiseRate
protected ClusterDefinition[] m_Clusters
protected int[] m_numValues
protected double[] m_globalMinValue
protected double[] m_globalMaxValue
public static final int UNIFORM_RANDOM
public static final int TOTAL_UNIFORM
public static final int GAUSSIAN
public static final Tag[] TAGS_CLUSTERTYPE
public static final int CONTINUOUS
public static final int INTEGER
public static final Tag[] TAGS_CLUSTERSUBTYPE
public SubspaceCluster()
public String globalInfo()
public Enumeration listOptions()
listOptions in interface OptionHandlerlistOptions in class ClusterGeneratorpublic void setOptions(String[] options) throws Exception
-h Prints this help.
-o <file> The name of the output file, otherwise the generated data is printed to stdout.
-r <name> The name of the relation.
-d Whether to print debug informations.
-S The seed for random function (default 1)
-a <num> The number of attributes (default 1).
-c Class Flag, if set, the cluster is listed in extra attribute.
-b <range> The indices for boolean attributes.
-m <range> The indices for nominal attributes.
-P <num> The noise rate in percent (default 0.0). Can be between 0% and 30%. (Remark: The original algorithm only allows noise up to 10%.)
-C <cluster-definition> A cluster definition of class 'SubspaceClusterDefinition' (definition needs to be quoted to be recognized as a single argument).
Options specific to weka.datagenerators.clusterers.SubspaceClusterDefinition:
-A <range> Generates randomly distributed instances in the cluster.
-U <range> Generates uniformly distributed instances in the cluster.
-G <range> Generates gaussian distributed instances in the cluster.
-D <num>,<num> The attribute min/max (-A and -U) or mean/stddev (-G) for the cluster.
-N <num>..<num> The range of number of instances per cluster (default 1..50).
-I Uses integer instead of continuous values (default continuous).
setOptions in interface OptionHandlersetOptions in class ClusterGeneratoroptions - the list of options as an array of stringsException - if an option is not supportedpublic String[] getOptions()
getOptions in interface OptionHandlergetOptions in class ClusterGeneratorDataGenerator.removeBlacklist(String[])protected ClusterDefinition[] getClusters()
protected int defaultNumAttributes()
defaultNumAttributes in class ClusterGeneratorpublic void setNumAttributes(int numAttributes)
setNumAttributes in class ClusterGeneratornumAttributes - the new number of attributespublic String numAttributesTipText()
numAttributesTipText in class ClusterGeneratorprotected double defaultNoiseRate()
public double getNoiseRate()
public void setNoiseRate(double newNoiseRate)
newNoiseRate - new percentage of noisepublic String noiseRateTipText()
public ClusterDefinition[] getClusterDefinitions()
public void setClusterDefinitions(ClusterDefinition[] value) throws Exception
value - the clusters do useException - if clusters are not the correct classpublic String clusterDefinitionsTipText()
protected boolean checkCoverage()
public boolean getSingleModeFlag()
getSingleModeFlag in class DataGeneratorpublic Instances defineDataFormat() throws Exception
defineDataFormat in class DataGeneratorException - data format could not be definedDataGenerator.defaultRelationName()public boolean isBoolean(int index)
index - of the attributepublic boolean isNominal(int index)
index - of the attributepublic int[] getNumValues()
public Instance generateExample() throws Exception
generateExample in class DataGeneratorException - if format not defined or generating public Instances generateExamples() throws Exception
generateExamples in class DataGeneratorException - if format not definedpublic String generateFinished() throws Exception
generateFinished in class DataGeneratorException - no input structure has been definedpublic String generateStart()
generateStart in class DataGeneratorpublic String getRevision()
public static void main(String[] args)
args - should contain arguments for the data producer:Copyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.