通过传递字符串而不是字符串数组来解析句子斯坦福解析器 [英] Parse sentence Stanford Parser by passing String not an array of strings

查看:16
本文介绍了通过传递字符串而不是字符串数组来解析句子斯坦福解析器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以通过传递字符串而不是字符串数组来使用斯坦福解析器解析句子.这是他们在简短教程中给出的示例(查看文档) :

Is it possible to parse a sentence using the Stanford Parser by passing a string and not an array of strings. This is the example they gave in their short tutorial (See Docs) :

示例如下:

    import java.util.*;
    import edu.stanford.nlp.ling.*;
    import edu.stanford.nlp.trees.*;
    import edu.stanford.nlp.parser.lexparser.LexicalizedParser;

    class ParserDemo {
      public static void main(String[] args) {
        LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
        lp.setOptionFlags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"});

        String[] sent = { "This", "is", "an", "easy", "sentence", "." }; // This is the sentence to be parsed
        List<CoreLabel> rawWords = Sentence.toCoreLabelList(sent);
        Tree parse = lp.apply(rawWords);
        parse.pennPrint();
        System.out.println();

        TreebankLanguagePack tlp = new PennTreebankLanguagePack();
        GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
        GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
        List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
        System.out.println(tdl);
        System.out.println();

      }

}

我想看看我是否可以这样做,因为我需要从 MySQL 数据库中获取句子并将它们直接解析为字符串.我可以对句子进行分词并将单词、逗号和句点添加到字符串数组中,但是,要标记这些句子,我必须使用斯坦福分词器、PTBTokenizer.此处列出的此标记器的构造函数

I am trying to see if I can do this because I need to get sentences from a MySQL database and parse them directly as strings. I could tokezine the sentences and add the words, commas, and period to a String Array, However, to tokenize these sentences, I would have to use the Stanford Tokenizer , PTBTokenizer. The constructor of this tokenizer as listed here

(查看文档)

需要一个java.io.FileReader"对象,但我没有从目录中读取文件.所以我想知道是否有办法通过传递字符串直接解析句子,或者我是否可以通过标记句子而不需要java.io.FileReader"对象来解决我的问题.

requires a "java.io.FileReader" Object, but I am not reading a file from directory. So I am wondering if there is a way to either Parse the sentence directly by passing a string, or if I can solve my problem by tokenizing the sentence without requiring a "java.io.FileReader" Object.

推荐答案

为了简单的使用,使用语法的默认分词器和默认分词器选项,您可以使用一种简单方便的方法:

For simple usage, with the default tokenizer and default tokenizer options for a grammar, there is an easy convenience method you can use:

lp.parse(String)

但是您指向的 PTBTokenizer 方法不采用 FileReader,它们只采用 Reader,因此您也可以轻松地通过将字符串包装在 StringReader 中,将 PTBTokenizer 指向字符串.如果您需要更多地控制标记化的发生方式,这是正确的方法.

But the PTBTokenizer methods that you point at don't take a FileReader, they just take a Reader, so you can also easily point a PTBTokenizer at a String by wrapping the String in a StringReader. This is the right approach if you need more control over how tokenization happens.

这篇关于通过传递字符串而不是字符串数组来解析句子斯坦福解析器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆