如何使用 spacy/nltk 生成双/三元组 [英] How to generate bi/tri-grams using spacy/nltk
本文介绍了如何使用 spacy/nltk 生成双/三元组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
输入文本总是有1~3个形容词和一个名词的菜名列表
The input text are always list of dish names where there are 1~3 adjectives and a noun
输入
thai iced tea
spicy fried chicken
sweet chili pork
thai chicken curry
输出:
thai tea, iced tea
spicy chicken, fried chicken
sweet pork, chili pork
thai chicken, chicken curry, thai curry
基本上,我希望解析句子树并尝试通过将形容词与名词配对来生成双元组.
Basically, I am looking to parse the sentence tree and try to generate bi-grams by pairing an adjective with the noun.
我想用 spacy 或 nltk 来实现这一点
And I would like to achieve this with spacy or nltk
推荐答案
我使用了带有英文模型的 spacy 2.0.找到名词和非名词"来解析输入,然后我将非名词和名词放在一起以创建所需的输出.
I used spacy 2.0 with english model. To find nouns and "not-nouns" to parse the input and then I put together not-nouns and nouns to create a desired output.
您的输入:
s = ["thai iced tea",
"spicy fried chicken",
"sweet chili pork",
"thai chicken curry",]
空间解决方案:
import spacy
nlp = spacy.load('en') # import spacy, load model
def noun_notnoun(phrase):
doc = nlp(phrase) # create spacy object
token_not_noun = []
notnoun_noun_list = []
for item in doc:
if item.pos_ != "NOUN": # separate nouns and not nouns
token_not_noun.append(item.text)
if item.pos_ == "NOUN":
noun = item.text
for notnoun in token_not_noun:
notnoun_noun_list.append(notnoun + " " + noun)
return notnoun_noun_list
调用函数:
for phrase in s:
print(noun_notnoun(phrase))
结果:
['thai tea', 'iced tea']
['spicy chicken', 'fried chicken']
['sweet pork', 'chili pork']
['thai chicken', 'curry chicken']
这篇关于如何使用 spacy/nltk 生成双/三元组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文