如何用斯坦福解析器解析英语以外的语言? 在 Java 中，而不是命令行 [英] How to parse languages other than English with Stanford Parser？ in java, not command lines

查看：24 发布时间：2022/1/2 18:02:25 java nlp stanford-nlp

本文介绍了如何用斯坦福解析器解析英语以外的语言? 在 Java 中，而不是命令行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在尝试在我的Java程序中使用Stanford Parser来解析一些中文句子.由于我对 Java 和斯坦福解析器都很陌生，所以我使用了ParseDemo.java"来练习.该代码适用于英语句子并输出正确的结果.但是，当我将模型更改为chinesePCFG.ser.gz"并尝试解析一些分段的中文句子时，出现问题.

I have been trying to use Stanford Parser in my Java program to parse some sentences in Chinese. Since I am quite new at both Java and Stanford Parser, I used the 'ParseDemo.java' to practice. The code works fine with sentences in English and outputs the right result. However, when I changed the model to 'chinesePCFG.ser.gz' and tried to parse some segmented Chinese sentences, things went wrong.

这是我的 Java 代码

Here's my code in Java

class ParserDemo {

  public static void main(String[] args) {
    LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz");
    if (args.length > 0) {
      demoDP(lp, args[0]);
    } else {
      demoAPI(lp);
    }
  }

  public static void demoDP(LexicalizedParser lp, String filename) {
    // This option shows loading and sentence-segment and tokenizing
    // a file using DocumentPreprocessor
    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    // You could also create a tokenier here (as below) and pass it
    // to DocumentPreprocessor
    for (List<HasWord> sentence : new DocumentPreprocessor(filename)) {
      Tree parse = lp.apply(sentence);
      parse.pennPrint();
      System.out.println();

      GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
      Collection tdl = gs.typedDependenciesCCprocessed(true);
      System.out.println(tdl);
      System.out.println();
    }
  }

  public static void demoAPI(LexicalizedParser lp) {
    // This option shows parsing a list of correctly tokenized words
    String sent[] = { "我", "是", "一名", "学生" };
    List<CoreLabel> rawWords = Sentence.toCoreLabelList(sent);
    Tree parse = lp.apply(rawWords);
    parse.pennPrint();
    System.out.println();

    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
    List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
    System.out.println(tdl);
    System.out.println();

    TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
    tp.printTree(parse);
  }

  private ParserDemo() {} // static methods only
}

它和 ParserDemo.java 基本相同，但是当我运行它时，我得到以下结果:

It's basically the same as ParserDemo.java, but when I run it I get the following result:

从序列化文件加载解析器edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz ... 完成 [2.2秒].(根(IP(NP (PN 我))(VP (VC 是)(NP(QP (CD 一名))(NP (NN 学生))))))

Loading parser from serialized file edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz ... done [2.2 sec]. (ROOT (IP (NP (PN 我)) (VP (VC 是) (NP (QP (CD 一名)) (NP (NN 学生))))))

线程main"中的异常 java.lang.RuntimeException: Failed to调用公共edu.stanford.nlp.trees.EnglishGrammaticalStructure(edu.stanford.nlp.trees.Tree)在edu.stanford.nlp.trees.GrammaticalStructureFactory.newGrammaticalStructure(GrammaticalStructureFactory.java:104)在 parserdemo.ParserDemo.demoAPI(ParserDemo.java:65) 在parserdemo.ParserDemo.main(ParserDemo.java:23)

Exception in thread "main" java.lang.RuntimeException: Failed to invoke public edu.stanford.nlp.trees.EnglishGrammaticalStructure(edu.stanford.nlp.trees.Tree) at edu.stanford.nlp.trees.GrammaticalStructureFactory.newGrammaticalStructure(GrammaticalStructureFactory.java:104) at parserdemo.ParserDemo.demoAPI(ParserDemo.java:65) at parserdemo.ParserDemo.main(ParserDemo.java:23)

第 65 行的代码是:

the code on line 65 is:

 GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);

我的猜测是 chinesePCFG.ser.gz 遗漏了一些与edu.stanford.nlp.trees.EnglishGrammaticalStructure"相关的内容.由于解析器通过命令行正确解析中文，所以一定是我自己的代码有问题.我一直在搜索，只找到了一些类似的案例，其中一些提到了使用正确的模型，但我真的不知道如何将代码修改为正确的模型".希望有人可以帮助我.我是 Java 和斯坦福解析器的新手，所以请具体说明.谢谢！

My guess is that chinesePCFG.ser.gz misses something relevant to 'edu.stanford.nlp.trees.EnglishGrammaticalStructure'. Since the parser parses Chinese correctly via commandlines, there must be something wrong with my own code. I have been searching, only to find few similar cases some of which mentioned about using the right model, but I don't really know how to modify the code to the 'right model'. Hope that someone could help me with it. I am a newbie on Java and Stanford Parser, so please be specific. Thank you!

如何用斯坦福解析器解析英语以外的语言? 在 Java 中，而不是命令行 [英] How to parse languages other than English with Stanford Parser？ in java, not command lines

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何用斯坦福解析器解析英语以外的语言? 在 Java 中，而不是命令行 [英] How to parse languages other than English with Stanford Parser？ in java, not command lines

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭