openNLP对内容进行归类始终返回第一类 [英] openNLP categorize content return always first category

查看：86 发布时间：2020/5/4 9:42:45 machine-learning nlp opennlp

本文介绍了openNLP对内容进行归类始终返回第一类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用openNLP库进行测试，以实现内容分类的自动化，但遇到了麻烦.我正在使用此代码，它始终返回训练数据中传递的第一类，我正在从任何新闻站点传递全文.

I'm testing with openNLP library to implemented automation in categorizing content but i have trouble. I'm using this code and it returns always the first category that i have in my training data which i'm passing full article from any news site.

    public void trainModel() {
        try {
            InputStreamFactory inputStreamFactory = new MarkableFileInputStreamFactory( new File("C:\\Users\\emehm\\Desktop\\data\\training_data.txt") );
            ObjectStream<String> lineStream = new PlainTextByLineStream(inputStreamFactory, "UTF-8");
            ObjectStream<DocumentSample> sampleStream = new DocumentSampleStream(lineStream);

            DoccatModel model = DocumentCategorizerME.train("en", sampleStream, TrainingParameters.defaultParams(), new DoccatFactory());
            DocumentCategorizerME myCategorizer = new DocumentCategorizerME(model);
            double[] outcomes = myCategorizer.categorize(  new String[]{ this.getFileContent() });
            String category = myCategorizer.getBestCategory(outcomes);
            Map<String, Double> map = myCategorizer.scoreMap(new String[]{ this.getFileContent() });
            System.out.println(category);
        } catch (IOException e) {
            // Failed to read or parse training data, training failed
            e.printStackTrace();
        }
    }

    public String getFileContent() throws IOException {
        InputStream is = new FileInputStream("C:\\Users\\emehm\\Desktop\\data\\statija.txt");
        BufferedReader buf = new BufferedReader(new InputStreamReader(is));
        String line = buf.readLine();
        StringBuilder sb = new StringBuilder();
        while (line != null) {
            sb.append(line).append("\n");
            line = buf.readLine();
        }
        buf.close();
        return sb.toString();
    }

培训数据: http://pastebin.com/ZhxswkvJ

我正在使用的文章: http://pastebin.com/xtABGcbh

它总是返回列表中的第一个类别，我想知道我所缺少的是什么?当我调试它时，所有这些都返回0.25分，并且出于某种原因选择它们中的第一个.当我测试一个单词时，我猜它可以工作，但不适用于文章.

it always returns the the first category from the list and i want to know what i'm missing? when i debug it it returns 0.25 score for all of them and picks first of them for some reason. when i test one word it works i guess but it's not working with an article.

openNLP对内容进行归类始终返回第一类 [英] openNLP categorize content return always first category

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

openNLP对内容进行归类始终返回第一类 [英] openNLP categorize content return always first category

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭