在Spacy NLP中,如何提取代理,操作和患者-以及因果关系? [英] In Spacy NLP, how extract the agent, action, and patient -- as well as cause/effect relations?

查看:277
本文介绍了在Spacy NLP中,如何提取代理,操作和患者-以及因果关系?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用Space提取代理人,行动和病人"形式的单词关系信息.例如,自动驾驶汽车将保险责任转移给制造商". -> (自动驾驶汽车",转移",责任")或(自动驾驶汽车",转移",对制造商的责任").换句话说,谁对谁做了什么"指的是谁对谁做了什么".以及什么将动作应用于其他事物".我对输入数据了解不多,所以我不能做很多假设.

I would like to use Space to extract word relation information in the form of "agent, action, and patient." For example, "Autonomous cars shift insurance liability toward manufacturers" -> ("autonomous cars", "shift", "liability") or ("autonomous cars", "shift", "liability towards manufacturers"). In other words, "who did what to whom" and "what applied the action to something else." I don't know much about my input data, so I can't make many assumptions.

我也想提取逻辑关系.例如,无论何时/如果天空中有太阳,那只鸟都会飞".或诸如热使冰淇淋融化"之类的原因/结果案例.

I also want to extract logical relationships. For example, "Whenever/if the sun is in the sky, the bird flies" or cause/effect cases like "Heat makes ice cream melt."

对于依存关系,Space建议逐个单词地遍历句子并以这种方式找到其根,但是我不确定要使用哪种清晰的遍历模式才能以一种可靠的方式来获取信息,以便进行组织.我的用例涉及将这些句子构造为一种可用于查询和逻辑结论的形式.这可能与我自己的迷你Prolog数据存储区相当.

For dependencies, Space recommends iterating through sentences word by word and finding the root that way, but I'm not sure what clear pattern in traversal to use in order to get the information in a reliable way I can organize. My use case involves structuring these sentences into a form that I can use for queries and logical conclusions. This might be comparable to my own mini Prolog data store.

对于因果关系,我可以对一些规则进行硬编码,但是我仍然需要找到一种可靠地遍历依赖关系树并提取信息的方法. (我可能会将其与使用Neurocoref的核心分辨率以及词向量和概念网相结合来解决歧义,但这有点切线.)

For cause/effect, I could hard-code some rules, but then I still need to find a way of reliably traversing the dependency tree and extracting information. (I will probably combine this with core resolution using neuralcoref and also word vectors and concept net to resolve ambiguities, but this is a little tangential.)

简而言之,问题实际上是关于如何提取该信息/如何最好地遍历.

In short, the question is really about how to extract that information / how best to traverse.

从切线角度看,我想知道我是否真的也需要一个选区树来进行短语级解析才能实现这一点.我认为斯坦福大学可以提供这一点,但Spacy可能不会.

On a tangential note, I am wondering if I really need a constituency tree as well for phrase-level parsing to achieve this. I think that Stanford provides that, but Spacy might not.

推荐答案

对于您的问题的第一部分,使用token.dep_识别nsubjROOTdobj标签非常容易./p>

To the first part of your question, it's pretty easy to use token.dep_ to identify nsubj, ROOT, and dobj tags.

doc = nlp("She eats carrots")

for t in doc:
  if t.dep_ == "nsubj":
    print(f"The agent is {t.text}")
  elif t.dep_ == "dobj":
    print(f"The patient is {t.text}")

在被动语态中,患者的dep为nsubjpass,但可能存在或可能没有病原体-这就是被动语调的重点.

In passive constructions, the patient's dep is nsubjpass, but there may or may not be an agent - that's the point of passive voice.

要使单词处于依赖项解析的相同级别,token.leftstoken.childrentoken.rights是您的朋友.但是,由于nuts不是直接的对象,而是一个属性,因此不会捕获到他真是疯了!"之类的东西.如果您也想抓住它,则需要寻找attr标签.

To get the words at the same level of the dependency parse, token.lefts, token.children and token.rights are your friends. However, this won't catch things like "He is nuts!", since nuts isn't a direct object, but an attribute. If you also want to catch that, you'll want to look for attr tags.

对于因果关系,在决定规则与模型以及哪个库之前,只需收集一些数据即可.得到500个句子,并用因果注释它们.然后查看您的数据.看看是否可以使用规则将其取出.有一个中间立场:您可以使用规则(高召回率,低精度)来识别候选句子,然后使用模型来实际提取关系.但是您不能从首要原则做到这一点.进行数据科学需要熟悉您的数据.

For the cause and effect stuff, before you decide on rules vs model, and what library... just gather some data. Get 500 sentences, and annotate them with the cause and effect. Then look at your data. See if you can pull it out with rules. There's a middle ground: you can identify candidate sentences with rules (high recall, low precision), then use a model to actually extract the relationships. But you can't do it from first principles. Doing data science requires being familiar with your data.

这篇关于在Spacy NLP中,如何提取代理,操作和患者-以及因果关系?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆