如何将两组weka实例合并在一起 [英] How to merge two sets of weka Instances together

查看:388
本文介绍了如何将两组weka实例合并在一起的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我正在将一个实例从一个数据集复制到另一个数据集。有没有办法做到这一点,以便字符串映射保持不变? mergeInstances是水平工作的,有一个等价的垂直合并吗?

Currently, I'm copying one instance at a time from one dataset to the other. Is there a way to do this so that string mappings remain intact? The mergeInstances works horizontally, is there an equivalent vertical merge?

这是我用来从多个arff文件读取同一结构的数据集到一个大的循环的一步数据集。必须有一种更简单的方法。

This is one step of a loop I use to read datasets of the same structure from multiple arff files into one large dataset. There has got to be a simpler way.

Instances iNew = new ConverterUtils.DataSource(name).getDataSet();
for (int i = 0; i < iNew.numInstances(); i++) {
    Instance nInst = iNew.instance(i);
    inst.add(nInst);
}


推荐答案

为什么不制作新的ARFF哪个文件包含两个原件的数据?一个简单的

Why not make a new ARFF file which has the data from both of the originals? A simple

cat 1.arff > tmp.arff
tail -n+20 2.arff >> tmp.arff

其中 20 被替换为但是你的arff标题很长。然后,这将生成一个包含所有所需实例的新arff文件,您可以使用现有代码读取此新文件:

where 20 is replaced by however many lines long your arff header is. This would then produce a new arff file with all of the desired instances, and you could read this new file with your existing code:

Instances iNew = new ConverterUtils.DataSource(name).getDataSet();

您还可以使用此文档在命令行上调用weka: http: //old.nabble.com/how-to-merge-two-data-file-a.arff-and-b.arff-into-one-data-list--td22890856.html

You could also invoke weka on the command line using this documentation: http://old.nabble.com/how-to-merge-two-data-file-a.arff-and-b.arff-into-one-data-list--td22890856.html

java weka.core.Instances append filename1 filename2 > output-file 

但是,文档中没有函数 http://weka.sourceforge.net/doc.dev/weka/core/ Instances.html #main%28java.lang.String 允许您在java代码中原生地附加多个arff文件。从Weka 3.7.6开始,附加两个arff文件的代码如下:

However, there is no function in the documentation http://weka.sourceforge.net/doc.dev/weka/core/Instances.html#main%28java.lang.String which will allow you to append multiple arff files natively within your java code. As of Weka 3.7.6, the code that appends two arff files is this:

     // read two files, append them and print result to stdout
  else if ((args.length == 3) && (args[0].toLowerCase().equals("append"))) {
DataSource source1 = new DataSource(args[1]);
DataSource source2 = new DataSource(args[2]);
String msg = source1.getStructure().equalHeadersMsg(source2.getStructure());
if (msg != null)
  throw new Exception("The two datasets have different headers:\n" + msg);
Instances structure = source1.getStructure();
System.out.println(source1.getStructure());
while (source1.hasMoreElements(structure))
  System.out.println(source1.nextElement(structure));
structure = source2.getStructure();
while (source2.hasMoreElements(structure))
  System.out.println(source2.nextElement(structure));
  }

因此看起来Weka本身只是遍历数据中的所有实例设置并打印它们,代码使用的过程相同。

Thus it looks like Weka itself simply iterates through all of the instances in a data set and prints them, the same process your code uses.

这篇关于如何将两组weka实例合并在一起的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆