通过传递String而不是字符串数组来解析Stanford Parser [英] Parse sentence Stanford Parser by passing String not an array of strings
问题描述
是否可以通过传递字符串而不是字符串数组来解析使用Stanford Parser的句子。这是他们在简短教程中给出的示例(查看文档):
Is it possible to parse a sentence using the Stanford Parser by passing a string and not an array of strings. This is the example they gave in their short tutorial (See Docs) :
以下示例:
import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.parser.lexparser.LexicalizedParser;
class ParserDemo {
public static void main(String[] args) {
LexicalizedParser lp = LexicalizedParser.loadModel("edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz");
lp.setOptionFlags(new String[]{"-maxLength", "80", "-retainTmpSubcategories"});
String[] sent = { "This", "is", "an", "easy", "sentence", "." }; // This is the sentence to be parsed
List<CoreLabel> rawWords = Sentence.toCoreLabelList(sent);
Tree parse = lp.apply(rawWords);
parse.pennPrint();
System.out.println();
TreebankLanguagePack tlp = new PennTreebankLanguagePack();
GrammaticalStructureFactory gsf = tlp.grammaticalStructureFactory();
GrammaticalStructure gs = gsf.newGrammaticalStructure(parse);
List<TypedDependency> tdl = gs.typedDependenciesCCprocessed();
System.out.println(tdl);
System.out.println();
}
}
我正在尝试看看我是否可以这样做是因为我需要从MySQL数据库中获取句子并将它们直接解析为字符串。我可以纠正句子并将单词,逗号和句点添加到字符串数组中。但是,为了对这些句子进行标记,我将不得不使用Stanford Tokenizer,PTBTokenizer。此处列出的此tokenizer的构造函数
I am trying to see if I can do this because I need to get sentences from a MySQL database and parse them directly as strings. I could tokezine the sentences and add the words, commas, and period to a String Array, However, to tokenize these sentences, I would have to use the Stanford Tokenizer , PTBTokenizer. The constructor of this tokenizer as listed here
(查看文档)
需要一个java.io.FileReader对象,但我不是从目录中读取文件。所以我想知道是否有办法通过传递字符串直接解析句子,或者如果我可以通过标记句子而不需要java.io.FileReader对象来解决我的问题。
requires a "java.io.FileReader" Object, but I am not reading a file from directory. So I am wondering if there is a way to either Parse the sentence directly by passing a string, or if I can solve my problem by tokenizing the sentence without requiring a "java.io.FileReader" Object.
推荐答案
对于简单用法,使用语法的默认tokenizer和默认tokenizer选项,您可以使用一种简单方便的方法:
For simple usage, with the default tokenizer and default tokenizer options for a grammar, there is an easy convenience method you can use:
lp.parse(String)
但你指向的 PTBTokenizer
方法没有采用 FileReader
,他们只需要 Reader
,因此您也可以通过将String包装在 StringReader中,轻松地在String处指向
。如果您需要更多地控制标记化的发生,这是正确的方法。 PTBTokenizer
But the PTBTokenizer
methods that you point at don't take a FileReader
, they just take a Reader
, so you can also easily point a PTBTokenizer
at a String by wrapping the String in a StringReader
. This is the right approach if you need more control over how tokenization happens.
这篇关于通过传递String而不是字符串数组来解析Stanford Parser的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!