如何从命令行使用weka计算最近的邻居? [英] How to calculate the nearest neighbors using weka from the command line?

查看:131
本文介绍了如何从命令行使用weka计算最近的邻居?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个csv文件,其中每一行都是代表数据点的数字矢量.我想从命令行使用weka来计算csv文件中每个数据点的最近邻居.我知道如何从命令行进行k最近邻分类,但这不是我想要的.我要真正的邻居.我该怎么做?

I have a csv file, where each row is a vector of numbers representing a data point. I want to use weka from the command line to calculate the nearest neighbor of each data point in the csv file. I know how to do k nearest neighbor classification from the command line, but that's not what I want. I want the actual neighbors. How do I do this?

我想使用weka而不是其他一些工具来做到这一点.

I want to do this using weka and not some other tool.

推荐答案

Weka没有一个班轮来执行我认为您建议的操作(将文件放入,将其转换为实例,然后找到所有N个最近的每个实例的邻居)

Weka doesn't have a one liner to do what I think you are suggesting (ingest a file, convert it to instances, and then find all the N nearest neighbors of each instance)

但是您可以通过以下方式利用Weka和几行Java来设置命令行样式的一个衬里:

but you can set up a command line style one liner by leveraging Weka and a couple of lines of Java in the following way:

编译以下代码.我使用了Eclipse,但是您可以在命令行中轻松使用javac-只要确保您的类路径中有 weka.jar .我向您展示了一个示例,该示例说明了如何在下面的代码之后从cammand行中将其称为一个衬套

Compile the following code. I used Eclipse, but you can just as easily use javac at the command line - just make sure that you have weka.jar in your classpath. I show you an example of how to call this as a one liner from the cammand line after the code below

import weka.core.Instances;
import weka.core.converters.ConverterUtils.DataSource;
import weka.core.neighboursearch.LinearNNSearch;

public class WekaCLFindNN {
     public static void main(String[] args) throws Exception {

            //report that the code is running
            System.out.println("Weka Command Line Find Nearest " + args[0] + " Neighbors for each Instance in "  + args[1]); // Display the string.

            //setup datasources, grab instances, and calculate the nearest neighbors
            DataSource source = new DataSource(""+args[1]);
            Instances instances = source.getDataSet();  
            weka.core.neighboursearch.LinearNNSearch knn = new LinearNNSearch(instances);

            //cycle through the dataset and get instances for the nearestneighbors
            for(int j=0;j<instances.numInstances();j++){
            Instances nearestInstances= knn.kNearestNeighbours(instances.instance(j), Integer.parseInt(args[0]));

            //cycle through the instances and printout the nearestneighbors
            System.out.println("\n\n" + instances.instance(j));
            for(int i =0;i<Integer.parseInt(args[0]);i++) 
            {
                System.out.println("\n\t" + nearestInstances.instance(i));

            }

            }

            //close the code
            System.out.println("\n"+"Nearest Neighbors found"); // Display the string.

     }
}

现在只需使用以下命令从命令行运行它即可.

Now just run it from the command line using the following command.

java -cp weka.jar;. WekaCLFindNN numNN csvfile

这是在我的机器上工作的屏幕快照.请注意,运行Java时,我所在的目录中有weka.jar文件和WekaCLFindNN文件.还要注意,我是在Windows下运行的,其中类路径分隔符是分号(;);如果您是在Linux下运行的,则必须使用冒号(:)

here is a screen shot of it working on my machine. Note that I have the weka.jar file and the WekaCLFindNN file in the directory I am in when I run java. Also note that I am running this under Windows where the classpath separater is a semicolon (;) if you were running this under Linux you would have to use a colon (:)

您可以忽略有关数据库驱动程序的部分,这仅仅是Weka向stderr抛出的东西.但是正如您所看到的,向量保持左对齐,并且按照您的要求列出了它们的最近邻居.

You can ignore the part about the database driver that's just Weka throwing something out to stderr. but as you can see the vectors are left aligned and their nearest neighbors are listed just like you asked for.

如果您希望日志文件中的数据仅以这种方式执行

if you want the data in a log file just execute it this way

java -cp weka.jar;. WekaCLFindNN>输出日志

java -cp weka.jar;. WekaCLFindNN > outputlog

日志文件将如下所示,并注意它没有有关数据库的错误:

the log file will look like this, and notice it doesn't have the error about the database:

虽然最好在原始实例数据集中同时包含最近的邻居和它们的索引,但是我检查了kNearestNeighbours方法,发现索引数据在报告之前就被丢弃了.如果需要的话,您将不得不继承LinearNNSearch类并编写一个输出实例和索引的新方法.

While it would be nice to have both the nearest neighbors and their index in the original instance dataset, I checked the kNearestNeighbours method and found that the index data is thrown away right before reporting. If you want it than you are going to have to inherit the LinearNNSearch class and write a new method that outputs both the instances and the indices.

所以我希望这会有所帮助.不幸的是,Weka没有提供开箱即用的功能,但是您只需几行代码就可以实现.

So I hope this helps. It's unfortunate that Weka doesn't offer this out of the box, but you can do it in just a few lines of code.

这篇关于如何从命令行使用weka计算最近的邻居?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆