使用arff文件存储数据 [英] Using a arff file for storing data
问题描述
我正在使用此示例为我的weka projext创建我的.arff文件在此输入链接描述。
I am using this example to create my .arff file for my weka projext enter link description here.
double[][] data = {{4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0},
{19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0, 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 354.0, 356.0, 357.0, 358.0}};
int numInstances = data[0].length;
FastVector atts = new FastVector();
ArrayList<Instance> instances = new ArrayList<Instance>();
for (int dim = 0; dim < 2; dim++) {
// Create new attribute / dimension
Attribute current = new Attribute("Attribute" + dim, dim);
// Create an instance for each data object
if (dim == 0) {
for (int obj = 0; obj < numInstances; obj++) {
instances.add(new SparseInstance(0));
}
}
// Fill the value of dimension "dim" into each object
for (int obj = 0; obj < numInstances; obj++) {
instances.get(obj).setValue(current, data[dim][obj]);
System.out.println(instances.get(obj));
}
// Add attribute to total attributes
atts.addElement(current);
}
// Create new dataset
Instances newDataset = new Instances("Dataset", atts, instances.size());
// Fill in data objects
for (Instance inst : instances) {
newDataset.add(inst);
}
BufferedWriter writer = new BufferedWriter(new FileWriter("test.arff"));
writer.write(newDataset.toString());
writer.flush();
writer.close();
}
我注意到结果格式将rows元素放在向量
。在.arff文件的列中。我想将整行放在.arff文件的第一行。我怎么能这样做?对于我的情况,2d向量的最后一列表示行数据的标签。
I ve noticed that the result format puts the rows element the vector in the columns of the .arff file. I want to put the whole row in the first row of the .arff file. How can i do so? For my case the last column of the 2d vector represents the label of the row data.
我的arff文件的预期结果:
The expected result for my arff file:
4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0, 1 // for example the first row
19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0,
243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0,
354.0, 356.0, 357.0, 358.0, 0 // the second row.
推荐答案
示例中的代码处理表中的每一列作为一个实例(所以有29个实例,每个实例有两个属性)。听起来你想将每一行视为一个实例(给出两个实例,每个实例有29个属性):
The code in the example treats each column in the table as an instance (so there are 29 instances, each with two attributes). It sounds like you want to treat each row as an instance (giving two instances, each with 29 attributes):
double[][] data = {
{4058.0, 4059.0, ... }, /* first instance */
{19.0, 20.0, ... } /* second instance */
};
int numAtts = data[0].length;
FastVector atts = new FastVector(numAtts);
for (int att = 0; att < numAtts; att++)
{
atts.addElement(new Attribute("Attribute" + att, att));
}
int numInstances = data.length;
Instances dataset = new Instances("Dataset", atts, numInstances);
for (int inst = 0; inst < numInstances; inst++)
{
dataset.add(new Instance(1.0, data[inst]));
}
BufferedWriter writer = new BufferedWriter(new FileWriter("test.arff"));
writer.write(dataset.toString());
writer.flush();
writer.close();
我用<$ c $替换了 SparseInstance
c> Instance ,因为几乎所有属性值都不为零。请注意,在Weka 3.7 实例
已成为一个接口,而应使用 DenseInstance
。此外,不推荐使用 FastVector
,而选择Java的 ArrayList
。
I replaced SparseInstance
with Instance
, since almost all of the attribute values are non-zero. Note that in Weka 3.7 Instance
has become an interface and DenseInstance
should be used instead. Also, FastVector
has been deprecated in favour of Java's ArrayList
.
这篇关于使用arff文件存储数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!