在Python中使用Stanford Tregex [英] Using Stanford Tregex in Python

查看:110
本文介绍了在Python中使用Stanford Tregex的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是NLP和Python的新手.我正在尝试使用Tregex工具和Python子进程库从StanfordCoreNLP的已解析树中提取名词短语的子集.特别是,我尝试查找和提取与以下模式匹配的名词短语:'(NP [$ VP]> S)|(NP [$ VP]> S \ n)|(NP \ n [$ VP] > S)|(NP \ n [$ VP]> S \ n)'在Tregex语法中.

I'm a newbie in NLP and Python. I'm trying to extract a subset of noun phrases from parsed trees from StanfordCoreNLP by using the Tregex tool and the Python subprocess library. In particular, I'm trying to find and extract noun phrases that match the following pattern: '(NP[$VP]>S)|(NP[$VP]>S\n)|(NP\n[$VP]>S)|(NP\n[$VP]>S\n)' in the Tregex grammar.

例如,以下是原始文本,保存在名为"text"的字符串中:

For example, below is the original text, saved in a string named "text":

text = ('Pusheen and Smitha walked along the beach. "I want to surf", said Smitha, the CEO of Tesla. However, she fell off the surfboard')

使用Python包装器运行StanfordCoreNLP解析器后,我为这3个句子得到了以下3棵树:

After running the StanfordCoreNLP parser using the Python wrapper, I got the following 3 trees for the 3 sentences:

output1['sentences'][0]['parse']

Out[58]: '(ROOT\n  (S\n    (NP (NNP Pusheen)\n      (CC and)\n      (NNP Smitha))\n    (VP (VBD walked)\n      (PP (IN along)\n        (NP (DT the) (NN beach))))\n    (. .)))'

output1['sentences'][1]['parse']

Out[59]: "(ROOT\n  (SINV (`` ``)\n    (S\n      (NP (PRP I))\n      (VP (VBP want)\n        (PP (TO to)\n          (NP (NN surf) ('' '')))))\n    (, ,)\n    (VP (VBD said))\n    (NP\n      (NP (NNP Smitha))\n      (, ,)\n      (NP\n        (NP (DT the) (NNP CEO))\n        (PP (IN of)\n          (NP (NNP Tesla)))))\n    (. .)))"

output1['sentences'][2]['parse']

Out[60]: '(ROOT\n  (S\n    (ADVP (RB However))\n    (, ,)\n    (NP (PRP she))\n    (VP (VBD fell)\n      (PRT (RP off))\n      (NP (DT the) (NN surfboard)))))'

我想提取以下3个名词短语(每个句子一个),并将它们另存为Python中的变量(或标记列表):

I would like to extract the following 3 noun phrases (one for each sentence) and save them as variables (or lists of tokens) in Python:

  • (NP(NNP Pusheen)\ n(CC和)\ n(NNP Smitha))
  • (NP(PRP I))
  • (NP(她是PRP))

为供您参考,我在命令行中使用了tregex,并提供了以下代码:

For your information, I have used of tregex from the command-line with the following code:

cd stanford-tregex-2016-10-31
java -cp 'stanford-tregex.jar:' edu.stanford.nlp.trees.tregex.TregexPattern -f -s '(NP[$VP]>S)|(NP[$VP]>S\n)|(NP\n[$VP]>S)|(NP\n[$VP]>S\n)' /Users/AS/stanford-tregex-2016-10-31/exampletree.txt

输出为:

Pattern string:
(NP[$VP]>S)|(NP[$VP]>S\n)|(NP\n[$VP]>S)|(NP\n[$VP]>S\n)
Parsed representation:
or
   Root NP
      and
         $ VP
         > S
   Root NP
      and
         $ VP
         > S\n
   Root NP\n
      and
         $ VP
         > S
   Root NP\n
      and
         $ VP
         > S\n
Reading trees from file(s) file path
\# /Users/AS/stanford-tregex-2016-10-31/exampletree.txt
(NP (NNP Pusheen) \n (CC and) \n (NNP Smitha))
\# /Users/AS/stanford-tregex-2016-10-31/exampletree.txt
(NP\n (NP (NNP Smitha)) \n (, ,) \n (NP\n (NP (DT the) (NN spokesperson)) \n   (PP (IN of) \n (NP (DT the) (NNP CIA)))) \n (, ,))
\# /Users/AS/stanford-tregex-2016-10-31/exampletree.txt
(NP (PRP They))
There were 3 matches in total.

如何在Python中复制此结果?

How can I replicate this result in Python?

作为参考,我通过Google找到了以下与我的问题有关的帖子,但已过时(

For your reference, I found the following post via Google, which is relevant to my question but outdated (https://mailman.stanford.edu/pipermail/parser-user/2010-July/000606.html):

[parser-user] Tregex的变量输入

[parser-user] Variable input to Tregex

stanford.edu的克里斯托弗·曼宁(Christopher Manning)曼宁 PDT 2010年7月7日星期三17:41:32 海阳,

Christopher Manning manning at stanford.edu Wed Jul 7 17:41:32 PDT 2010 Hi Haiyang,

抱歉,回复速度慢,在学年结束时事情太忙了.

Sorry, slow reply, things are too busy at the end of the academic year.

2010年6月1日,晚上8:56,海阳AI写道:

On Jun 1, 2010, at 8:56 PM, Haiyang AI wrote:

亲爱的

我希望这是寻求帮助的正确地方.

I hope this is the right place to seek help.

是的,尽管我们只能在特定于Python的任何事物上提供非常有限的帮助.....

It is, though we can only give very limited help on anything Python specific.....

但是,这似乎很简单(我认为).

But this seems to be straightforward (I think).

如果您想要的是让模式在通过stdin输入的树上运行,则需要在参数列表中的"NP"之前添加标志"-filter".

If what you're wanting is for the pattern to be run on trees being fed in over stdin, you need to add the flag "-filter" in the argument list prior to "NP".

如果在模式之后未指定任何文件,并且未给出标志"-filter",则它将在固定的默认句子上运行模式....

If no file is specified after the pattern, and the flag "-filter" is not given, then it runs the pattern on a fixed default sentence....

克里斯.

我正在从事与Tregex有关的项目.我正在尝试从python调用Tregex,但是我不知道如何将数据提供给Tregex,而不是常规文件,而是变量.例如,我正在尝试使用以下代码计算给定变量(例如,文本,已经使用Stanford Parser解析过的树)中的"NP"数,

I'm working on a project related to Tregex. I'm trying to call Tregex from python, but I don't know how to feed data into Tregex, not from conventional file, but from a variable. For example, I'm trying to count the number of "NP" from a given variable (e.g. text, already parsed tree, using Stanford Parser), with the following code,

def tregex(文本):
tregex_dir ="/root/nlp/stanford-tregex-2009-08-30/" op = Popen(["java","-mx900m","-cp","stanford-tregex.jar:","edu.stanford.nlp.trees.tregex.TregexPattern","NP"], cwd = tregex_dir, 标准输出= PIPE, 标准输入= PIPE, stderr = STDOUT) res = op.communicate(输入=文本)[0] 返回res

def tregex(text):
tregex_dir = "/root/nlp/stanford-tregex-2009-08-30/" op = Popen(["java", "-mx900m", "-cp", "stanford-tregex.jar:", "edu.stanford.nlp.trees.tregex.TregexPattern", "NP"], cwd = tregex_dir, stdout = PIPE, stdin = PIPE, stderr = STDOUT) res = op.communicate(input=text)[0] return res

结果如下.它没有从变量中搜索内容,而是以某种方式退回到了使用默认树".有人可以帮我吗?我已经在这里呆了很长时间了.真的感谢您的时间和帮助. 模式字符串: NP 解析的表示形式: 根NP 使用默认树 (NP (NP(DT this)(NN酒)) (抄送和) (NP(DT这些)(NNS蜗牛))

The results are like the following. It didn't search the content from the variable, but somehow falling back to "using default tree". Can anyone give me a hand? I have been stuck here for quite a long time. Really appreciate your time and help. Pattern string: NP Parsed representation: Root NP using default tree (NP (NP (DT this) (NN wine)) (CC and) (NP (DT these) (NNS snails)))

(NP(此为DT(NN酒))

(NP (DT this) (NN wine))

(NP(DT这些)(NNS蜗牛))

(NP (DT these) (NNS snails))

共有3场比赛.

- 海阳AI,博士学生 应用语言学系 宾夕法尼亚州立大学

-- Haiyang AI, Ph.D. student Department of Applied Linguistics The Pennsylvania State University

解析器用户邮件列表 在lists.stanford.edu上的分析器用户 https://mailman.stanford.edu/mailman/listinfo/parser-user

parser-user mailing list parser-user at lists.stanford.edu https://mailman.stanford.edu/mailman/listinfo/parser-user

推荐答案

为什么不使用Stanford CoreNLP服务器!

Why not use the Stanford CoreNLP server!

1.)启动服务器!

java -Xmx4g edu.stanford.nlp.pipeline.StanfordCoreNLPServer -port 9000 - timeout 15000

2.)发出python请求!

2.) Make a python request!

import requests

url = "http://localhost:9000/tregex"
request_params = {"pattern": "(NP[$VP]>S)|(NP[$VP]>S\\n)|(NP\\n[$VP]>S)|(NP\\n[$VP]>S\\n)"}
text = "Pusheen and Smitha walked along the beach."
r = requests.post(url, data=text, params=request_params)
print r.json()

3.)这是结果!

{u'sentences': [{u'0': {u'namedNodes': [], u'match': u'(NP (NNP Pusheen)\n  (CC and)\n  (NNP Smitha))\n'}}]}

这篇关于在Python中使用Stanford Tregex的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆