我们如何在字符串数组上运行斯坦福分类器? [英] How do we get run Stanford Classifier on an array of Strings?

查看:118
本文介绍了我们如何在字符串数组上运行斯坦福分类器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串数组

String strarr[] = {
        "What a wonderful day",  
        "beautiful beds",
        "food was awesome"
    };

我也有训练有素的数据集

I also have a trained dataset

Room    What a beautiful room
Room    Wonderful sea-view
Room    beds are comfortable
Room    bed-spreads are good
Food    The dinner was marvellous
Food    Tasty foods
Service people are rude
Service waitors were not on time
Service service was horrible

从文字上来说,我无法获得我要分类的字符串的分数和标签. 但是,如果我使用的是火车数据集,则像测试数据集中的两列一样有效.我的问题是,实际上,无法理解哪个标签属于我数组中的每个字符串.

Pogrammatically I am unable to get the scores and labels of the strings I want to classify. If however, I am using a train dataset, with the two columns like in the test dataset, it works. My problem is, in reality, it is not possible to understand which label falls to each of the strings in my array.

如何使分类器在数组上运行,而不是创建训练数据集?

How can get the classifier to run on the array, instead of creating a train dataset?

尝试计算时出现错误

ColumnDataClassifier cdc = new ColumnDataClassifier("examples/drogo.prop");
        Classifier<String, String> cl
            = cdc.makeClassifier(cdc.readTrainingExamples("examples/drogo.train"));

        for (String li : strarr){
            Datum<String, String> d = cdc.makeDatumFromLine(li);

            System.out.println(li + "  ==>  " + cl.classOf(d) + " (score: " + cl.scoresOf(d) + ")");
        }

错误:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
    at edu.stanford.nlp.classify.ColumnDataClassifier.makeDatum(ColumnDataClassifier.java:738)
    at edu.stanford.nlp.classify.ColumnDataClassifier.makeDatumFromStrings(ColumnDataClassifier.java:275)
    at edu.stanford.nlp.classify.ColumnDataClassifier.makeDatumFromLine(ColumnDataClassifier.java:245)
    at alchemypoc.DrogoClassifier.main(DrogoClassifier.java:55)
Java Result: 1

推荐答案

好的,所以我做了以下工作,现在看来可以了.由于它是一个ColumnDataClassifier,并且以某种方式期望列数据,因此我在每个句子前添加了一个制表符.

Okay, so I did the following and it now seemed to work. Since it was a ColumnDataClassifier and it somehow expected columnar data, I added a tab before each sentence.

String strarr[] = {
            "\tWhat a wonderful day",
            "\tbeautiful beds",
            "\tfood was awesome"
        };

它现在为我提供了值.

What a wonderful day  ==>  Room (score: {Service=-0.6692784244930884, Room=1.4113604761865859, Food=-0.7420810715491954})
    beautiful beds  ==>  Room (score: {Service=-2.1042147142001038, Room=3.888249805012589, Food=-1.7840358277259})
    food was awesome  ==>  Food (score: {Service=-0.44203328206155995, Room=-0.9779506257026013, Food=1.4199861760769543})

如果有人有不同的答案或更正确的方法,请发布您的答案.

If anyone, has a different answer or a more correct way to do this, please do post your answers.

这篇关于我们如何在字符串数组上运行斯坦福分类器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆