public static class ArffLoader.ArffReader extends Object implements RevisionHandler
BufferedReader reader = new BufferedReader(new FileReader("/some/where/file.arff"));
ArffReader arff = new ArffReader(reader);
Instances data = arff.getData();
data.setClassIndex(data.numAttributes() - 1);
Typical code for incremental usage:
BufferedReader reader = new BufferedReader(new FileReader("/some/where/file.arff"));
ArffReader arff = new ArffReader(reader, 1000);
Instances data = arff.getStructure();
data.setClassIndex(data.numAttributes() - 1);
Instance inst;
while ((inst = arff.readInstance(data)) != null) {
data.add(inst);
}
| Modifier and Type | Field and Description |
|---|---|
protected Instances |
m_Data
the actual data
|
protected int[] |
m_IndicesBuffer
Buffer of indices for sparse instance
|
protected int |
m_Lines
the number of lines read so far
|
protected StreamTokenizer |
m_Tokenizer
the tokenizer for reading the stream
|
protected double[] |
m_ValueBuffer
Buffer of values for sparse instance
|
| Constructor and Description |
|---|
ArffReader(Reader reader)
Reads the data completely from the reader.
|
ArffReader(Reader reader,
Instances template,
int lines)
Reads the data without header according to the specified template.
|
ArffReader(Reader reader,
Instances template,
int lines,
int capacity)
Initializes the reader without reading the header according to the
specified template.
|
ArffReader(Reader reader,
int capacity)
Reads only the header and reserves the specified space for instances.
|
| Modifier and Type | Method and Description |
|---|---|
protected void |
compactify()
compactifies the data
|
protected void |
errorMessage(String msg)
Throws error message with line number and last token read.
|
Instances |
getData()
Returns the data that was read
|
protected void |
getFirstToken()
Gets next token, skipping empty lines.
|
protected void |
getIndex()
Gets index, checking for a premature and of line.
|
protected Instance |
getInstance(Instances structure,
boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected Instance |
getInstanceFull(boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected Instance |
getInstanceSparse(boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected double |
getInstanceWeight()
Gets the value of an instance's weight (if one exists)
|
protected void |
getLastToken(boolean endOfFileOk)
Gets token and checks if its end of line.
|
int |
getLineNo()
returns the current line number
|
protected void |
getNextToken()
Gets next token, checking for a premature and of line.
|
String |
getRevision()
Returns the revision string.
|
Instances |
getStructure()
Returns the header format
|
protected void |
initBuffers()
initializes the buffers for sparse instances to be read
|
protected void |
initTokenizer()
Initializes the StreamTokenizer used for reading the ARFF file.
|
protected FastVector |
parseAttribute(FastVector attributes)
Parses the attribute declaration.
|
protected void |
readHeader(int capacity)
Reads and stores header of an ARFF file.
|
Instance |
readInstance(Instances structure)
Reads a single instance using the tokenizer and returns it.
|
Instance |
readInstance(Instances structure,
boolean flag)
Reads a single instance using the tokenizer and returns it.
|
protected void |
readTillEOL()
Reads and skips all tokens before next end of line token.
|
protected StreamTokenizer m_Tokenizer
protected double[] m_ValueBuffer
protected int[] m_IndicesBuffer
protected Instances m_Data
protected int m_Lines
public ArffReader(Reader reader) throws IOException
getData() method.reader - the reader to useIOException - if something goes wronggetData()public ArffReader(Reader reader, int capacity) throws IOException
readInstance().reader - the reader to usecapacity - the capacity of the new datasetIOException - if something goes wrongIllegalArgumentException - if capacity is negativegetStructure(),
readInstance(Instances)public ArffReader(Reader reader, Instances template, int lines) throws IOException
getData() method.reader - the reader to usetemplate - the template headerlines - the lines read so farIOException - if something goes wronggetData()public ArffReader(Reader reader, Instances template, int lines, int capacity) throws IOException
readInstance() method.reader - the reader to usetemplate - the template headerlines - the lines read so farcapacity - the capacity of the new datasetIOException - if something goes wronggetData()protected void initBuffers()
m_ValueBuffer,
m_IndicesBufferprotected void compactify()
protected void errorMessage(String msg) throws IOException
msg - the error message to be thrownIOException - containing the error messagepublic int getLineNo()
protected void getFirstToken()
throws IOException
IOException - if reading the next token failsprotected void getIndex()
throws IOException
IOException - if it finds a premature end of lineprotected void getLastToken(boolean endOfFileOk)
throws IOException
endOfFileOk - whether EOF is OKIOException - if it doesn't find an end of lineprotected double getInstanceWeight()
throws IOException
IOExceptionprotected void getNextToken()
throws IOException
IOException - if it finds a premature end of lineprotected void initTokenizer()
public Instance readInstance(Instances structure) throws IOException
structure - the dataset header information, will get updated
in case of string or relational attributesIOException - if the information is not read
successfullypublic Instance readInstance(Instances structure, boolean flag) throws IOException
structure - the dataset header information, will get updated
in case of string or relational attributesflag - if method should test for carriage return after
each instanceIOException - if the information is not read
successfullyprotected Instance getInstance(Instances structure, boolean flag) throws IOException
structure - the dataset header information, will get updated
in case of string or relational attributesflag - if method should test for carriage return after
each instanceIOException - if the information is not read
successfullyprotected Instance getInstanceSparse(boolean flag) throws IOException
flag - if method should test for carriage return after
each instanceIOException - if the information is not read
successfullyprotected Instance getInstanceFull(boolean flag) throws IOException
flag - if method should test for carriage return after
each instanceIOException - if the information is not read
successfullyprotected void readHeader(int capacity)
throws IOException
capacity - the number of instances to reserve in the data
structureIOException - if the information is not read
successfullyprotected FastVector parseAttribute(FastVector attributes) throws IOException
attributes - the current attributes vectorIOException - if the information is not read
successfullyprotected void readTillEOL()
throws IOException
IOException - in case something goes wrongpublic Instances getStructure()
public Instances getData()
public String getRevision()
getRevision in interface RevisionHandlerCopyright © 2015 University of Waikato, Hamilton, NZ. All rights reserved.