用spacy找到名词块的根的POS [英] finding the POS of the root of a noun_chunk with spacy

查看:349
本文介绍了用spacy找到名词块的根的POS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用spacy时,您可以轻松地在文本的名词短语之间循环,如下所示:

When using spacy you can easily loop across the noun_phrases of a text as follows:

S='This is an example sentence that should include several parts and also make clear that studying Natural language Processing is not difficult'
nlp = spacy.load('en_core_web_sm')
doc = nlp(S)

[chunk.text for chunk in doc.noun_chunks]
# = ['an example sentence', 'several parts', 'Natural language Processing']

您还可以获取名词块的根":

You can also get the "root" of the noun chunk:

[chunk.root.text for chunk in doc.noun_chunks]
# = ['sentence', 'parts', 'Processing']

如何获取所有这些单词的POS(即使看起来名词的短语的根始终是一个名词),又如何获得该单词的引数,形状和单词的单数形式.

How can I get the POS of every of those words (even if looks like the root of a noun_phrase is always a noun), and how can I get the lemma, the shape and the word in singular of that particular word.

那有可能吗?

thx.

推荐答案

每个chunk.root令牌,您可以在其中获得不同的属性,包括lemma_pos_(如果喜欢PennTreekbak POS标签,则可以选择tag_).

Each chunk.root is a Token where you can get different attributes including lemma_ and pos_ (or tag_ if you prefer the PennTreekbak POS tags).

import spacy
S='This is an example sentence that should include several parts and also make ' \
  'clear that studying Natural language Processing is not difficult'
nlp = spacy.load('en_core_web_sm')
doc = nlp(S)
for chunk in doc.noun_chunks:
    print('%-12s %-6s  %s' % (chunk.root.text, chunk.root.pos_, chunk.root.lemma_))

sentence     NOUN    sentence
parts        NOUN    part
Processing   NOUN    processing

BTW ...在这句话中,"processing"是一个名词,所以它的引理是"processing",而不是"process",这是动词"processing"的引理.

BTW... In this sentence "processing" is a noun so the lemma of it is "processing", not "process" which is the lemma of the verb "processing".

这篇关于用spacy找到名词块的根的POS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆