使用get vs get()对NLTK中的FreqDist进行排序 [英] Sorting FreqDist in NLTK with get vs get()

查看:372
本文介绍了使用get vs get()对NLTK中的FreqDist进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用NLTK和模块freqDist

I am playing around with NLTK and the module freqDist

import nltk
from nltk.corpus import gutenberg
print(gutenberg.fileids())
from nltk import FreqDist
fd = FreqDist()

for word in gutenberg.words('austen-persuasion.txt'):
    fd[word] += 1

newfd = sorted(fd, key=fd.get, reverse=True)[:10]

所以我正在和NLTK一起玩,并且有关于排序部分的问题。当我运行这样的代码时,会正确排序freqDist对象。但是当我运行它与get()而不是得到我遇到的错误

So I am playing around with NLTK and have a question regarding the sort portion. When I run the code like this it properly sorts the freqDist object. However when I run it with get() instead of get I encounter the error

Traceback (most recent call last):
  File "C:\Python34\NLP\NLP.py", line 21, in <module>
newfd = sorted(fd, key=fd.get(), reverse=True)[:10]
TypeError: get expected at least 1 arguments, got 0

为什么get和get()错误。我的印象是get()应该是正确的,但我猜不是这样的。

Why is get right and get() wrong. I was under the impression that get() should be correct, but I guess it is not.

推荐答案

基本上,code> 中的freqDist 对象NLTK 是本机Python的 c> collections.Counter ,所以让我们看看 Counter 的工作原理:

Essentially, the FreqDist object in NLTK is a sub-class of the native Python's collections.Counter, so let's see how Counter works:

A Counter 是一个字典,将列表中的元素存储为其关键字,元素的计数值作为值:

A Counter is a dictionary which stores the elements in a list as its key and the counts of the elements as the values:

>>> from collections import Counter
>>> Counter(['a','a','b','c','c','c','d'])
Counter({'c': 3, 'a': 2, 'b': 1, 'd': 1})
>>> c = Counter(['a','a','b','c','c','c','d'])

要获取按频率排序的元素列表,您可以使用 .most_common()函数,它将返回元素的元组及其计数按计数排序。

To get a list of elements sorted by their frequency, you can use .most_common() function and it will return a tuple of the element and its count sorted by the counts.

>>> c.most_common()
[('c', 3), ('a', 2), ('b', 1), ('d', 1)]

反之亦然:

>>> list(reversed(c.most_common()))
[('d', 1), ('b', 1), ('a', 2), ('c', 3)]

像字典一样,您可以迭代一个Counter对象,它将返回键:

Like a dictionary you can iterate through a Counter object and it will return the keys:

>>> [key for key in c]
['a', 'c', 'b', 'd']
>>> c.keys()
['a', 'c', 'b', 'd']

您还可以使用 .items()函数来获取键和它们的值的元组:

You can also use the .items() function to get a tuple of the keys and their values:

>>> c.items()
[('a', 2), ('c', 3), ('b', 1), ('d', 1)]

或者,如果您只需要按照计数排序的密钥,请参阅转置/解压缩函数(与zip相反)?

Alternatively, if you only need the keys sorted by their counts, see Transpose/Unzip Function (inverse of zip)?:

>>> k, v = zip(*c.most_common())
>>> k
('c', 'a', 'b', 'd')

回到 .get vs .get()的问题,前者是函数本身,而后者是需要字典键的参数的函数实例:

Going back to the question of .get vs .get(), the former is the function itself, while the latter is an instance of the function that requires the key of the dictionary as its parameter:

>>> c = Counter(['a','a','b','c','c','c','d'])
>>> c
Counter({'c': 3, 'a': 2, 'b': 1, 'd': 1})
>>> c.get
<built-in method get of Counter object at 0x7f5f95534868>
>>> c.get()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: get expected at least 1 arguments, got 0
>>> c.get('a')
2

调用 sorted()中的 key = ... 参数在中排序函数是您正在排序的列表/字典的关键字,但是 c c 应用于排序的键。

When invoking the sorted(), the key=... parameter inside the sorted function is not the key of the list/dictionary you're sorting but the key that sorted should use for sorting.

所以这些都是一样的,但是它们只返回这些键的值:

So these are the same, but they only return the values of the keys:

>>> [c.get(key) for key in c]
[2, 3, 1, 1]
>>> [c[key] for key in c]
[2, 3, 1, 1]

当排序时,这些值被用作排序的标准,所以这些值达到相同的输出:

And when sorting, the values are used as the criteria for sorting, so these achieves the same output:

>>> sorted(c, key=c.get)
['b', 'd', 'a', 'c']
>>> v, k = zip(*sorted((c.get(key), key) for key in c))
>>> list(k)
['b', 'd', 'a', 'c']
>>> sorted(c, key=c.get, reverse=True) # Highest to lowest
['c', 'a', 'b', 'd']
>>> v, k = zip(*reversed(sorted((c.get(key), key) for key in c)))
>>> k
('c', 'a', 'd', 'b')

这篇关于使用get vs get()对NLTK中的FreqDist进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆