WEKA分类的可能性 [英] WEKA classification likelihood of the classes

查看:91
本文介绍了WEKA分类的可能性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道WEKA中是否有一种方法可以输出许多最佳猜测"进行分类.

I would like to know if there is a way in WEKA to output a number of 'best-guesses' for a classification.

我的场景是:例如,我使用交叉验证对数据进行分类,然后在weka的输出上得到如下信息:这是对该实例进行分类的3个最佳方法.我想要的是,即使实例未正确分类,我也会得到该实例的3个或5个最佳猜测的输出.

My scenario is: I classify the data with cross-validation for instance, then on weka's output I get something like: these are the 3 best-guesses for the classification of this instance. What I want is like, even if an instance isn't correctly classified i get an output of the 3 or 5 best-guesses for that instance.

示例:

类别:A,B,C,D,E 实例:1 ... 10

Classes: A,B,C,D,E Instances: 1...10

输出将是: 实例1有90%的人可能是A类,有75%的人是B类,有60%的人是C类.

And output would be: instance 1 is 90% likely to be class A, 75% likely to be class B, 60% like to be class C..

谢谢.

推荐答案

Weka的API有一个称为Classifier.distributionForInstance()的方法,可用于获取分类预测分布.然后,您可以通过降低概率来对分布进行排序,以获得前N个预测.

Weka's API has a method called Classifier.distributionForInstance() tha can be used to get the classification prediction distribution. You can then sort the distribution by decreasing probability to get your top-N predictions.

下面是一个打印输出的函数:(1)测试实例的地面真相标签; (2)来自classifyInstance()的预测标签; (3)来自distributionForInstance()的预测分布.我已经在J48上使用了它,但是它应该与其他分类器一起使用.

Below is a function that prints out: (1) the test instance's ground truth label; (2) the predicted label from classifyInstance(); and (3) the prediction distribution from distributionForInstance(). I have used this with J48, but it should work with other classifiers.

输入参数是序列化的模型文件(您可以在模型训练阶段创建并应用-d选项)和ARFF格式的测试文件.

The inputs parameters are the serialized model file (which you can create during the model training phase and applying the -d option) and the test file in ARFF format.

public void test(String modelFileSerialized, String testFileARFF) 
    throws Exception
{
    // Deserialize the classifier.
    Classifier classifier = 
        (Classifier) weka.core.SerializationHelper.read(
            modelFileSerialized);

    // Load the test instances.
    Instances testInstances = DataSource.read(testFileARFF);

    // Mark the last attribute in each instance as the true class.
    testInstances.setClassIndex(testInstances.numAttributes()-1);

    int numTestInstances = testInstances.numInstances();
    System.out.printf("There are %d test instances\n", numTestInstances);

    // Loop over each test instance.
    for (int i = 0; i < numTestInstances; i++)
    {
        // Get the true class label from the instance's own classIndex.
        String trueClassLabel = 
            testInstances.instance(i).toString(testInstances.classIndex());

        // Make the prediction here.
        double predictionIndex = 
            classifier.classifyInstance(testInstances.instance(i)); 

        // Get the predicted class label from the predictionIndex.
        String predictedClassLabel =
            testInstances.classAttribute().value((int) predictionIndex);

        // Get the prediction probability distribution.
        double[] predictionDistribution = 
            classifier.distributionForInstance(testInstances.instance(i)); 

        // Print out the true label, predicted label, and the distribution.
        System.out.printf("%5d: true=%-10s, predicted=%-10s, distribution=", 
                          i, trueClassLabel, predictedClassLabel); 

        // Loop over all the prediction labels in the distribution.
        for (int predictionDistributionIndex = 0; 
             predictionDistributionIndex < predictionDistribution.length; 
             predictionDistributionIndex++)
        {
            // Get this distribution index's class label.
            String predictionDistributionIndexAsClassLabel = 
                testInstances.classAttribute().value(
                    predictionDistributionIndex);

            // Get the probability.
            double predictionProbability = 
                predictionDistribution[predictionDistributionIndex];

            System.out.printf("[%10s : %6.3f]", 
                              predictionDistributionIndexAsClassLabel, 
                              predictionProbability );
        }

        o.printf("\n");
    }
}

这篇关于WEKA分类的可能性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆