如何使用WordNet查找英文单词的频率计数? [英] How do I find the frequency count of a word in English using WordNet?

查看:274
本文介绍了如何使用WordNet查找英文单词的频率计数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否有一种方法可以使用WordNet或NLTK使用Python查找英语单词的用法频率?

Is there a way to find the frequency of the usage of a word in the English language using WordNet or NLTK using Python?

注意:我不希望给定输入文件中单词的频率计数.我希望根据当前时间的使用情况来大致计算一个单词的频率.

NOTE: I do not want the frequency count of a word in a given input file. I want the frequency count of a word in general based on the usage in today's time.

推荐答案

在WordNet中,每个引理都有该方法返回的频率计数 lemma.count(),并且存储在文件nltk_data/corpora/wordnet/cntlist.rev中.

In WordNet, every Lemma has a frequency count that is returned by the method lemma.count(), and which is stored in the file nltk_data/corpora/wordnet/cntlist.rev.

代码示例:

from nltk.corpus import wordnet
syns = wordnet.synsets('stack')
for s in syns:
    for l in s.lemmas():
        print l.name + " " + str(l.count())

结果:

stack 2
batch 0
deal 1
flock 1
good_deal 13
great_deal 10
hatful 0
heap 2
lot 13
mass 14
mess 0
...

但是,许多计数为零,并且在源文件或文档中没有信息用来创建该数据的语料库.根据 Daniel Jurafsky的语音和语言处理 一书和James H. Martin,感官频率来自 SemCor 语料库,已经很小而过时的布朗语料库的一部分.

However, many counts are zero and there is no information in the source file or in the documentation which corpus was used to create this data. According to the book Speech and Language Processing from Daniel Jurafsky and James H. Martin, the sense frequencies come from the SemCor corpus which is a subset of the already small and outdated Brown Corpus.

因此,最好选择最适合您的应用程序的语料库,然后按照Christopher的建议自己创建数据.

So it's probably best to choose the corpus that fits best to the your application and create the data yourself as Christopher suggested.

要使此Python3.x兼容,请执行以下操作:

To make this Python3.x compatible just do:

代码示例:

from nltk.corpus import wordnet
syns = wordnet.synsets('stack')
for s in syns:
    for l in s.lemmas():
        print( l.name() + " " + str(l.count()))

这篇关于如何使用WordNet查找英文单词的频率计数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆