如何使用spacy/nltk生成bi/tri-gram [英] How to generate bi/tri-grams using spacy/nltk

查看:169
本文介绍了如何使用spacy/nltk生成bi/tri-gram的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

输入的文本始终是菜名的列表,其中包含1〜3个形容词和一个名词

The input text are always list of dish names where there are 1~3 adjectives and a noun

输入

thai iced tea
spicy fried chicken
sweet chili pork
thai chicken curry

输出:

thai tea, iced tea
spicy chicken, fried chicken
sweet pork, chili pork
thai chicken, chicken curry, thai curry

基本上,我希望解析句子树,并尝试通过将形容词与名词配对来生成二元语法.

Basically, I am looking to parse the sentence tree and try to generate bi-grams by pairing an adjective with the noun.

我想通过spacy或nltk

And I would like to achieve this with spacy or nltk

推荐答案

我将spacy 2.0与英语模型一起使用.要找到名词和非名词"来解析输入,然后将非名词和名词放在一起以创建所需的输出.

I used spacy 2.0 with english model. To find nouns and "not-nouns" to parse the input and then I put together not-nouns and nouns to create a desired output.

您的输入:

s = ["thai iced tea",
"spicy fried chicken",
"sweet chili pork",
"thai chicken curry",]

Spacy解决方案:

Spacy solution:

import spacy
nlp = spacy.load('en') # import spacy, load model

def noun_notnoun(phrase):
    doc = nlp(phrase) # create spacy object
    token_not_noun = []
    notnoun_noun_list = []

    for item in doc:
        if item.pos_ != "NOUN": # separate nouns and not nouns
            token_not_noun.append(item.text)
        if item.pos_ == "NOUN":
            noun = item.text

    for notnoun in token_not_noun:
        notnoun_noun_list.append(notnoun + " " + noun)

    return notnoun_noun_list

通话功能:

for phrase in s:
    print(noun_notnoun(phrase))

结果:

['thai tea', 'iced tea']
['spicy chicken', 'fried chicken']
['sweet pork', 'chili pork']
['thai chicken', 'curry chicken']

这篇关于如何使用spacy/nltk生成bi/tri-gram的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆