使用NLTK在Python中获取大量名词(或形容词);或Python Mad Libs [英] Getting a Large List of Nouns (or Adjectives) in Python with NLTK; or Python Mad Libs

查看:37
本文介绍了使用NLTK在Python中获取大量名词(或形容词);或Python Mad Libs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题一样,我很感兴趣通过词性获得大量单词(一长串名词;形容词列表)以编程方式在其他地方使用.此答案具有使用WordNet数据库(SQL)格式的解决方案.

Like this question, I am interested in getting a large list of words by part of speech (a long list of nouns; a list of adjectives) to be used programmatically elsewhere. This answer has a solution using the WordNet database (in SQL) format.

是否有一种方法可以使用Python NLTK中内置的语料库/工具获取此类列表.我可以拿一大堆文本,进行解析,然后存储名词和形容词.但是,鉴于内置的词典和其他工具,是否有一种更聪明的方法来简单地提取NLTK数据集中已经存在的,编码为名词/形容词(无论如何)的单词?

Is there a way to get at such list using the corpora/tools built into the Python NLTK. I could take a large bunch of text, parse it and then store the nouns and adjectives. But given the dictionaries and other tools built in, is there a smarter way to simply extract the words that are already present in the NLTK datasets, encoded as nouns/adjectives (whatever)?

谢谢.

推荐答案

值得注意的是,默认情况下,Wordnet实际上是NLTK下载程序中包含的语料库之一.因此可以想象,您可以使用已经找到的解决方案,而无需重新发明任何轮子.

It's worth noting that Wordnet is actually one of the corpora included in the NLTK downloader by default. So you could conceivably just use the solution you already found without having to reinvent any wheels.

例如,您可以执行以下操作以获取所有名词同义词集:

For instance, you could just do something like this to get all noun synsets:

from nltk.corpus import wordnet as wn

for synset in list(wn.all_synsets('n')):
    print synset

# Or, equivalently
for synset in list(wn.all_synsets(wn.NOUN)):
    print synset

该示例将为您提供所需的每个名词,甚至将它们分组为它们的同义词集,以便您可以确保在正确的上下文中使用它们.

That example will give you every noun that you want and it will even group them into their synsets so you can try to be sure that they're being used in the correct context.

如果您希望将它们全部放入列表中,则可以执行以下操作(尽管根据您要使用的单词和同义词集的方式,它们会有很大的不同):

If you want to get them all into a list you can do something like the following (though this will vary quite a bit based on how you want to use the words and synsets):

all_nouns = []
for synset in wn.all_synsets('n'):
    all_nouns.extend(synset.lemma_names())

或者作为单线:

all_nouns = [word for synset in wn.all_synsets('n') for word in synset.lemma_names()]

这篇关于使用NLTK在Python中获取大量名词(或形容词);或Python Mad Libs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆