有没有办法在 spaCy 中使用根标记检索整个名词块? [英] Is there a way to retrieve the whole noun chunk using a root token in spaCy?

查看:50
本文介绍了有没有办法在 spaCy 中使用根标记检索整个名词块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对使用 spaCy 很陌生.我已经阅读了几个小时的文档,如果可以按照我的问题做我仍然感到困惑.总之...

正如标题所说,有没有办法使用包含它的标记来实际获取给定的名词块.例如,给定句子:

自动驾驶汽车将保险责任转移给制造商"

当我只有 "cars" 标记时,是否有可能获得 "autonomous cars" 名词块?这是我正在尝试的场景的示例片段.

startingSentence = "自动驾驶汽车和魔杖将保险责任转移给制造商"doc = nlp(startingSentence)noun_chunks = doc.noun_chunks对于文档中的令牌:如果 token.dep_ == "dobj":print(child) # 这将打印liability"# 是否有可能从这里做任何事情来实际获得保险责任"令牌?

任何帮助将不胜感激.谢谢!

解决方案

通过检查标记是否在名词块范围之一中,您可以轻松找到包含您已识别的标记的名词块:

doc = nlp("自动驾驶汽车和魔杖将保险责任转移给制造商")Interest_token = doc[7] # 或者你确定你想要的令牌对于 doc.noun_chunks 中的 noun_chunk:ifinterest_token 在 noun_chunk 中:打印(名词块)

en_core_web_sm 和 spacy 2.0.18 的输出不正确,因为 shift 没有被识别为动词,所以你得到:

<块引用>

魔杖转移保险责任

使用 en_core_web_md 是正确的:

<块引用>

保险责任

(在文档中包含具有真正歧义的示例是有意义的,因为这是一个现实场景(https://spacy.io/usage/linguistic-features#noun-chunks),但如果新用户不够明确以至于跨版本/模型的分析不稳定,就会让新用户感到困惑.)>

I'm very new to using spaCy. I have been reading the documentation for hours and I'm still confused if it's possible to do what I have in my question. Anyway...

As the title says, is there a way to actually get a given noun chunk using a token containing it. For example, given the sentence:

"Autonomous cars shift insurance liability toward manufacturers"

Would it be possible to get the "autonomous cars" noun chunk when what I only have the "cars" token? Here is an example snippet of the scenario that I'm trying to go for.

startingSentence = "Autonomous cars and magic wands shift insurance liability toward manufacturers"
doc = nlp(startingSentence)
noun_chunks = doc.noun_chunks

for token in doc:
    if token.dep_ == "dobj":
        print(child) # this will print "liability"

        # Is it possible to do anything from here to actually get the "insurance liability" token?

Any help will be greatly appreciated. Thanks!

解决方案

You can easily find the noun chunk that contains the token you've identified by checking if the token is in one of the noun chunk spans:

doc = nlp("Autonomous cars and magic wands shift insurance liability toward manufacturers")
interesting_token = doc[7] # or however you identify the token you want
for noun_chunk in doc.noun_chunks:
    if interesting_token in noun_chunk:
        print(noun_chunk)

The output is not correct with en_core_web_sm and spacy 2.0.18 because shift isn't identified as a verb, so you get:

magic wands shift insurance liability

With en_core_web_md, it's correct:

insurance liability

(It makes sense to include examples with real ambiguities in the documentation because that's a realistic scenario (https://spacy.io/usage/linguistic-features#noun-chunks), but it's confusing for new users if they're ambiguous enough that the analysis is unstable across versions/models.)

这篇关于有没有办法在 spaCy 中使用根标记检索整个名词块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆