使用 NLTK,当给出某个单词时,如何生成不同形式的单词? [英] with NLTK, How can I generate different form of word, when a certain word is given?

查看:44
本文介绍了使用 NLTK,当给出某个单词时,如何生成不同形式的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,假设给出了happy"这个词,我想生成其他形式的happy,例如happy, happy...等

For example, Suppose the word "happy" is given, I want to generate other forms of happy such as happiness, happily... etc.

我已经阅读了有关 Stackoverflow 和 NLTK 参考的其他一些先前问题.但是,只有词性标注、变形就像识别句子中某些单词的语法形式一样,而不是生成不同单词的列表.有没有人遇到过类似的问题?谢谢.

I have read some other previous questions on Stackoverflow and NLTK references. However, there are only POS tagging, morph just like identifying the grammatical form of certain words within sentences, not generating a list of different words. Is there anyone who bumped into similar issues? Thank you.

推荐答案

此类信息包含在 NLTK 的 WordNet 实现的 Lemma 类中.具体来说,它可以在 Lemma.derivationally_related_forms() 中找到.

This type of information is included in the Lemma class of NLTK's WordNet implementation. Specifically, it's found in Lemma.derivationally_related_forms().

这是一个用于查找happy"的所有可能派生形式的示例脚本:

Here's an example script for finding all possible derivation forms of "happy":

from nltk.corpus import wordnet as wn

forms = set() #We'll store the derivational forms in a set to eliminate duplicates
for happy_lemma in wn.lemmas("happy"): #for each "happy" lemma in WordNet
    forms.add(happy_lemma.name()) #add the lemma itself
    for related_lemma in happy_lemma.derivationally_related_forms(): #for each related lemma
        forms.add(related_lemma.name()) #add the related lemma

很遗憾,WordNet 中的信息并不完整.上面的脚本找到了happy"和happiness",但没有找到happily",即使有多个happily"引理.

Unfortunately, the information in WordNet is not complete. The above script finds "happy" and "happiness" but it fails to find "happily", even though there are multiple "happily" lemmas.

这篇关于使用 NLTK,当给出某个单词时,如何生成不同形式的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆