如何使用动词时态/情绪制作空间匹配器模式? [英] How to make a spacy matcher pattern using verb tense/mood?

查看:78
本文介绍了如何使用动词时态/情绪制作空间匹配器模式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试使用动词时态和情绪为空间匹配器制作特定模式.
我发现了如何使用 model.vocab.morphology.tag_map[token.tag_] 访问用 spacy 解析的单词的形态特征,当动词处于虚拟语气模式(我感兴趣的模式)时,它会打印出类似的内容:

I've been trying to make a specific pattern for a spacy matcher using Verbs tenses and moods.
I found out how to access morphological features of words parsed with spacy using model.vocab.morphology.tag_map[token.tag_], which prints out something like this when the verb is in subjunctive mode (the mode I am interested in):

{'Mood_sub':真,'Number_sing':真,'Person_three':真,'Tense_pres':真,'VerbForm_fin':真,74:100}

{'Mood_sub': True, 'Number_sing': True, 'Person_three': True, 'Tense_pres': True, 'VerbForm_fin': True, 74: 100}

但是,我想要一个这样的模式来重新标记特定的动词短语:模式 = [{'TAG':'Mood_sub'}, {'TAG':'VerbForm_ger'}]

however, I would like to have a pattern like this one to retokenize specific verb phrases: pattern = [{'TAG':'Mood_sub'}, {'TAG':'VerbForm_ger'}]

对于像这样的西班牙语短语:'Que siga aprendiendo','siga' 的标签中包含 'Mood_sub' = True,而 'aprendiendo' 的标签中包含 'VerbForm_ger' = True.但是,匹配器没有检测到这个匹配.

In the case of a spanish phrase like: 'Que siga aprendiendo', 'siga' has 'Mood_sub' = True in its tag, and 'aprendiendo' has 'VerbForm_ger' = True in its tag. However, the matcher is not detecting this match.

谁能告诉我这是为什么,我该如何解决?这是我正在使用的代码:

Can anyone tell me why this is and how I could fix it? This is the code I am using:

model = spacy.load('es_core_news_md')
text = 'Que siga aprendiendo de sus alumnos'
doc = model(text)
pattern = [{'TAG':'Mood_sub'}, {'TAG':'VerbForm_ger'}] 
matcher.add(1, None, pattern)
matches = matcher(doc)
for i, start, end in matches:
    span = doc[start:end]
    if len(span) > 0:
       with doc.retokenize() as retokenizer:
            retokenizer.merge(span)

推荐答案

morph 支持在 spacy v2 中没有完全实现,所以使用像 这样的直接 morph 值是不可能的Mood_sub.

The morph support isn't fully implemented in spacy v2, so this is not possible using the direct morph values like Mood_sub.

相反,我认为 Matcher 的最佳选择是在组合/扩展的 TAG 值上使用 REGEX.它不会特别优雅,但应该可以工作:

Instead, I think the best option with the Matcher to is use REGEX over the combined/extended TAG values. It's not going to be particularly elegant, but it should work:

import spacy
from spacy.matcher import Matcher

nlp = spacy.load('es_core_news_sm')
doc = nlp("Que siga aprendiendo de sus alumnos")
assert doc[1].tag_ == "AUX__Mood=Sub|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin"
matcher = Matcher(nlp.vocab)
matcher.add("MOOD_SUB", [[{"TAG": {"REGEX": ".*Mood=Sub.*"}}]])
assert matcher(doc) == [(513366231240698711, 1, 2)]

这篇关于如何使用动词时态/情绪制作空间匹配器模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆