如何用Stanford Parser解析英语以外的语言?在Java中而不是在命令行中 [英] How to parse languages other than English with Stanford Parser？ in java, not command lines

查看：62 发布时间：2020/5/18 0:41:58 java nlp stanford-nlp

本文介绍了如何用Stanford Parser解析英语以外的语言?在Java中而不是在命令行中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直试图在我的Java程序中使用Stanford Parser来解析一些中文句子.由于我在Java和Stanford Parser上都是新手，因此我使用了'ParseDemo.java'进行练习.该代码可以很好地处理英语句子，并输出正确的结果.但是，当我将模型更改为"chinesePCFG.ser.gz"并尝试解析某些分段的中文句子时，出现了问题.

I have been trying to use Stanford Parser in my Java program to parse some sentences in Chinese. Since I am quite new at both Java and Stanford Parser, I used the 'ParseDemo.java' to practice. The code works fine with sentences in English and outputs the right result. However, when I changed the model to 'chinesePCFG.ser.gz' and tried to parse some segmented Chinese sentences, things went wrong.

这是我的Java代码

class ParserDemo {

  public static void main(String[] args) {
    LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz");
    if (args.length > 0) {
      demoDP(lp, args[0]);
    } else {
      demoAPI(lp);
    }
  }

  public static void demoDP(LexicalizedParser lp, String filename) {
    // This option shows loading and sentence-segment and tokenizing
    // a file using DocumentPreprocessor
    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    // You could also create a tokenier here (as below) and pass it
    // to DocumentPreprocessor
    for (List<HasWord> sentence : new DocumentPreprocessor(filename)) {
      Tree parse = lp.apply(sentence);
      parse.pennPrint();
      System.out.println();

      GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
      Collection tdl = gs.typedDependenciesCCprocessed(true);
      System.out.println(tdl);
      System.out.println();
    }
  }

  public static void demoAPI(LexicalizedParser lp) {
    // This option shows parsing a list of correctly tokenized words
    String sent[] = { "我", "是", "一名", "学生" };
    List<CoreLabel> rawWords = Sentence.toCoreLabelList(sent);
    Tree parse = lp.apply(rawWords);
    parse.pennPrint();
    System.out.println();

    TreebankLanguagePack tlp = new PennTreebankLanguagePack();
    GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
    GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
    List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
    System.out.println(tdl);
    System.out.println();

    TreePrint tp = new TreePrint("penn,typedDependenciesCollapsed");
    tp.printTree(parse);
  }

  private ParserDemo() {} // static methods only
}

它与ParserDemo.java基本相同，但是当我运行它时，会得到以下结果:

It's basically the same as ParserDemo.java, but when I run it I get the following result:

从序列化文件加载解析器 edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz ...完成[2.2 秒]. (根(IP (NP(PN我)) (副总裁(VC是) (NP (QP(CD一名)) (NP(NN学生))))))

Loading parser from serialized file edu/stanford/nlp/models/lexparser/chinesePCFG.ser.gz ... done [2.2 sec]. (ROOT (IP (NP (PN 我)) (VP (VC 是) (NP (QP (CD 一名)) (NP (NN 学生))))))

线程"main"中的异常java.lang.RuntimeException:无法执行召集公众 edu.stanford.nlp.trees.English语法结构(edu.stanford.nlp.trees.Tree) 在 edu.stanford.nlp.trees.GrammaticalStructureFactory.newGrammaticalStructure(GrammaticalStructureFactory.java:104) 在parserdemo.ParserDemo.demoAPI(ParserDemo.java:65)在 parserdemo.ParserDemo.main(ParserDemo.java:23)

Exception in thread "main" java.lang.RuntimeException: Failed to invoke public edu.stanford.nlp.trees.EnglishGrammaticalStructure(edu.stanford.nlp.trees.Tree) at edu.stanford.nlp.trees.GrammaticalStructureFactory.newGrammaticalStructure(GrammaticalStructureFactory.java:104) at parserdemo.ParserDemo.demoAPI(ParserDemo.java:65) at parserdemo.ParserDemo.main(ParserDemo.java:23)

第65行的代码是:

 GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);

我的猜测是chinesePCFG.ser.gz缺少与'edu.stanford.nlp.trees.EnglishGrammaticalStructure'相关的内容.由于解析器是通过命令行正确解析中文的，所以我自己的代码肯定有问题.我一直在搜索，仅发现了一些类似的案例，其中有些提到使用正确的模型，但是我真的不知道如何将代码修改为正确的模型".希望有人可以帮助我.我是Java和Stanford Parser的新手，所以请具体说明.谢谢！

My guess is that chinesePCFG.ser.gz misses something relevant to 'edu.stanford.nlp.trees.EnglishGrammaticalStructure'. Since the parser parses Chinese correctly via commandlines, there must be something wrong with my own code. I have been searching, only to find few similar cases some of which mentioned about using the right model, but I don't really know how to modify the code to the 'right model'. Hope that someone could help me with it. I am a newbie on Java and Stanford Parser, so please be specific. Thank you!

如何用Stanford Parser解析英语以外的语言?在Java中而不是在命令行中 [英] How to parse languages other than English with Stanford Parser？ in java, not command lines

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何用Stanford Parser解析英语以外的语言?在Java中而不是在命令行中 [英] How to parse languages other than English with Stanford Parser？ in java, not command lines

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭