在哪里可以使用Weka在Java中找到KNN的实际示例 [英] Where can I find practical example of KNN in java using weka

查看:99
本文介绍了在哪里可以使用Weka在Java中找到KNN的实际示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在寻找使用weka实现KNN的实际示例,但是我发现对于我来说太笼统了,无法理解它需要能够工作的数据(或者也许是如何制作它需要的对象)工作)以及显示的结果,也许以前使用过它的人有一个更好的例子,例如现实事物(产品,电影,书籍等),而不是您在代数上看到的典型字母.

I have been searching for a practical example of KNN implementation using weka, but all I find is too general for me to understand the data that it needs to be able to work (or maybe how to make the objects that it needs to work) and also the results it shows, maybe someone that has worked with it before has a better example like with realistic things (products, movies, books, etc) and not the typical letters you see on algebra.

因此,我很想知道如何在我的案例中实现它(这是向使用KNN的活跃用户推荐的菜式),非常感谢.

So I can figure out how to implement it on my case (which is recommend dishes to active user with KNN), would be highly appreciated, thanks.

我试图通过此链接理解 https://www .ibm.com/developerworks/library/os-weka3/index.html ,但我什至不知道他们如何获得此结果以及如何获得公式

I was trying to understand with this link https://www.ibm.com/developerworks/library/os-weka3/index.html but I don't even understand how did they get this results and how did they get the formula

第1步:确定距离公式

Distance = SQRT( ((58 - Age)/(69-35))^2) + ((51000 - Income)/(150000-38000))^2 )

为什么总是/(69-35)还是/(150000-38000)吗?

why is it always /(69-35) and also /(150000-38000) ?

这里我尝试的代码没有成功,如果有人可以清除我认可的代码,那么我也通过结合以下两个答案来完成了此代码:

Heres the Code I have tried without success, if someone can clear it for me I appreacite, also I did this code by combining this 2 answers:

此答案显示了如何获取knn:

This answer shows how to get the knn:

如何使用Java

这个告诉我如何创建实例(我真的不知道它们对于weka是什么)

And this one tells me how to create instances (which I don't really know what they are for weka) Adding a new Instance in weka

所以我想到了这个:

public class Wekatest {

    public static void main(String[] args) {

        ArrayList<Attribute> atts = new ArrayList<>();
        ArrayList<String> classVal = new ArrayList<>();
        // I don't really understand whats happening here
        classVal.add("A");
        classVal.add("B");
        classVal.add("C");
        classVal.add("D");
        classVal.add("E");
        classVal.add("F");

        atts.add(new Attribute("content", (ArrayList<String>) null));
        atts.add(new Attribute("@@class@@", classVal));

        // Here in my case the data to evaluate are dishes (plato mean dish in spanish)
        Instances dataRaw = new Instances("TestInstancesPlatos", atts, 0);

        // I imagine that every instance is like an Object that will be compared with the other instances, to get its neaerest neightbours (so an instance is like a dish for me)..

        double[] instanceValue1 = new double[dataRaw.numAttributes()];

        instanceValue1[0] = dataRaw.attribute(0).addStringValue("Pizzas");
        instanceValue1[1] = 0;

        dataRaw.add(new DenseInstance(1.0, instanceValue1));

        double[] instanceValue2 = new double[dataRaw.numAttributes()];

        instanceValue2[0] = dataRaw.attribute(0).addStringValue("Tunas");
        instanceValue2[1] = 1;

        dataRaw.add(new DenseInstance(1.0, instanceValue2));

        double[] instanceValue3 = new double[dataRaw.numAttributes()];

        instanceValue3[0] = dataRaw.attribute(0).addStringValue("Pizzas");
        instanceValue3[1] = 2;

        dataRaw.add(new DenseInstance(1.0, instanceValue3));

        double[] instanceValue4 = new double[dataRaw.numAttributes()];

        instanceValue4[0] = dataRaw.attribute(0).addStringValue("Hamburguers");
        instanceValue4[1] = 3;

        dataRaw.add(new DenseInstance(1.0, instanceValue4));

        double[] instanceValue5 = new double[dataRaw.numAttributes()];

        instanceValue5[0] = dataRaw.attribute(0).addStringValue("Pizzas");
        instanceValue5[1] = 4;

        dataRaw.add(new DenseInstance(1.0, instanceValue5));

        System.out.println("---------------------");

        weka.core.neighboursearch.LinearNNSearch knn = new LinearNNSearch(dataRaw);
        try {

            // This method receives the goal instance which you wanna know its neighbours and N (I don't really know what N is but I imagine it is the number of neighbours I want)
            Instances nearestInstances = knn.kNearestNeighbours(dataRaw.get(0), 1);
            // I expected the output to be the closes neighbour to dataRaw.get(0) which would be Pizzas, but instead I got some data that I don't really understand.


            System.out.println(nearestInstances);

        } catch (Exception e) {

            e.printStackTrace();
        }

    }

}

OUTPUT:

---------------------
@relation TestInstancesPlatos

@attribute content string
@attribute @@class@@ {A,B,C,D,E,F}

@data
Pizzas,A
Tunas,B
Pizzas,C
Hamburguers,D

使用的维卡依赖项:

<dependency>
        <groupId>nz.ac.waikato.cms.weka</groupId>
        <artifactId>weka-stable</artifactId>
        <version>3.8.0</version>
    </dependency>

推荐答案

KNN是一种机器学习技术,通常被分类为基于实例的预测器".它获取所有已分类样本的实例,并将它们绘制在n维空间中.

KNN is a machine learning technique usually classified as an "Instance-Based predictor". It takes all instances of classified samples and draws them in a n-dimensional space.

使用诸如欧几里得距离之类的算法,KNN在此n维空间中查找最接近的点,并根据这些邻居估计其所属的类.如果它更靠近蓝点,那就是蓝色,如果它更靠近红点...

Using algorithms such as Euclidean distance, KNN looks for the closest points in this n-dimensional space and estimates to which class it belongs based on these neighbors. If it is closer to blue dots, it is blue, if its closer to red dots...

但是现在,我们如何将其应用于您的问题?

But now, how could we apply it to your problem?

想象一下,您只有两个属性,价格和卡路里(二维空间).您想将客户分为三类:适合,垃圾食品,美食.这样,您就可以在餐厅提供与客户喜好类似的交易.

Imagine that you only have two attributes, price and calories (2 dimensional space). You want to classify customers into three classes: fit, junk-food, gourmet. With this, you can offer a deal in a restaurant similar to the customer's preferences.

您具有以下数据:

+-------+----------+-----------+
| Price | Calories | Food Type |
+-------+----------+-----------+
| $2    |    350   | Junk Food |
+-------+----------+-----------+
| $5    |    700   | Junk Food |
+-------+----------+-----------+
| $10   |    200   | Fit       |
+-------+----------+-----------+
| $3    |    400   | Junk Food |
+-------+----------+-----------+
| $8    |    150   | Fit       |
+-------+----------+-----------+
| $7    |    650   | Junk Food |
+-------+----------+-----------+
| $5    |    120   | Fit       |
+-------+----------+-----------+
| $25   |    230   | Gourmet   |
+-------+----------+-----------+
| $12   |    210   | Fit       |
+-------+----------+-----------+
| $40   |    475   | Gourmet   |
+-------+----------+-----------+
| $37   |    600   | Gourmet   |
+-------+----------+-----------+

现在,让我们看看它是在2D空间中绘制的:

Now, let's see it plotted in a 2D space:

接下来会发生什么?

对于每个新条目,该算法都会计算到所有点(实例)的距离,并找到k个最接近的点.从这k个最近的类别中,它定义新条目的类别.

For every new entry, the algorithm calculates the distance to all dots (instances) and find the k nearest ones. From the class of these k nearest ones, it defines the class of the new entry.

取k = 3,值$ 15和165卡路里.让我们找到3个最近的邻居:

Take k = 3 and values $15 and 165 cal. Let's find the 3 nearest neighbors:

存在距离公式的地方.实际上,它对每个点进行此计算.然后对这些距离进行排名",最后的k个组成最后一个类.

There's where the Distance formula comes on. It actually makes this computation for every dot. These distances are then "ranked" and the k closest ones compose the final class.

现在,为什么值/(69-35)以及/(150000-38000)?如其他答案中所述,这是由于归一化.我们的示例使用价格和cal.如图所示,卡路里的顺序比金钱大(每个值的单位更多).为了避免失衡,例如那些会使卡路里比价格更有价值的卡路里的失衡(例如,这会杀死Gourmet阶级),有必要使所有属性同样重要,因此需要使用归一化.

Now, Why the values /(69-35) and also /(150000-38000)? As mentioned in other answers, this is due to normalization. Our example uses price and cal. As seen, calories are in a greater order than money (more units per value). To avoid inbalances, such as the one that can make calories more valuable for class than price (which would kill Gourmet class, for example), there's the need to make all attributes similarly important, hence the use of normalization.

Weka为您抽象了这一点,但您也可以对其进行可视化.查看我为Weka ML课程制作的项目中的可视化示例:

Weka abstracts that for you, but you can visualize it as well. See an example of visualization from a project I made for a Weka ML course:

请注意,由于存在2个以上的维,因此有很多图,但是想法很相似.

Notice that, since there are many more than 2 dimensions, there are a lot of plots, but the idea is similar.

解释代码:

public class Wekatest {

    public static void main(String[] args) {
//These two ArrayLists are the inputs of your algorithm.
//atts are the attributes that you're going to pass for training, usually called X.
//classVal is the target class that is to be predicted, usually called y.
        ArrayList<Attribute> atts = new ArrayList<>();
        ArrayList<String> classVal = new ArrayList<>();
//Here you initiate a "dictionary" of all distinct types of restaurants that can be targeted.
        classVal.add("A");
        classVal.add("B");
        classVal.add("C");
        classVal.add("D");
        classVal.add("E");
        classVal.add("F");
// The next two lines initiate the attributes, one made of "content" and other pertaining to the class of the already labeled values.
        atts.add(new Attribute("content", (ArrayList<String>) null));
        atts.add(new Attribute("@@class@@", classVal));

//This loads a Weka object of data for training, using attributes and classes from a file "TestInstancePlatos" (or should happen).
//dataRaw contains a set of previously labelled instances that are going to be used do "train the model" (kNN actually doesn't tain anything, but uses all data for predictions)
        Instances dataRaw = new Instances("TestInstancesPlatos", atts, 0);


//Here you're starting new instances to test your model. This is where you can substitute for new inputs for production.
        double[] instanceValue1 = new double[dataRaw.numAttributes()];

//It looks you only have 2 attributes, a food product and a rating maybe.
        instanceValue1[0] = dataRaw.attribute(0).addStringValue("Pizzas");
        instanceValue1[1] = 0;

//You're appending this new instance to the model for evaluation.
        dataRaw.add(new DenseInstance(1.0, instanceValue1));

        double[] instanceValue2 = new double[dataRaw.numAttributes()];

        instanceValue2[0] = dataRaw.attribute(0).addStringValue("Tunas");
        instanceValue2[1] = 1;

        dataRaw.add(new DenseInstance(1.0, instanceValue2));

        double[] instanceValue3 = new double[dataRaw.numAttributes()];

        instanceValue3[0] = dataRaw.attribute(0).addStringValue("Pizzas");
        instanceValue3[1] = 2;

        dataRaw.add(new DenseInstance(1.0, instanceValue3));

        double[] instanceValue4 = new double[dataRaw.numAttributes()];

        instanceValue4[0] = dataRaw.attribute(0).addStringValue("Hamburguers");
        instanceValue4[1] = 3;

        dataRaw.add(new DenseInstance(1.0, instanceValue4));

        double[] instanceValue5 = new double[dataRaw.numAttributes()];

        instanceValue5[0] = dataRaw.attribute(0).addStringValue("Pizzas");
        instanceValue5[1] = 4;

        dataRaw.add(new DenseInstance(1.0, instanceValue5));

// After adding 5 instances, time to test:
        System.out.println("---------------------");

//Load the algorithm with data.
        weka.core.neighboursearch.LinearNNSearch knn = new LinearNNSearch(dataRaw);
//You're predicting the class of value 0 of your data raw values. You're asking the answer among 1 neighbor (second attribute)
        try {
            Instances nearestInstances = knn.kNearestNeighbours(dataRaw.get(0), 1);
//You will get a value among A and F, that are the classes passed.
           System.out.println(nearestInstances);

        } catch (Exception e) {

            e.printStackTrace();
        }

    }

}

你应该怎么做?

-> Gather data. 
-> Define a set of attributes that help you to predict which cousine you have (ex.: prices, dishes or ingredients (have one attribute for each dish or ingredient). 
-> Organize this data. 
-> Define a set of labels.
-> Manually label a set of data.
-> Load labelled data to KNN.
-> Label new instances by passing their attributes to KNN. It'll return you the label of the k nearest neighbors (good values for k are 3 or 5, have to test).
-> Have fun!

这篇关于在哪里可以使用Weka在Java中找到KNN的实际示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆