使用get vs get()对NLTK中的FreqDist进行排序 [英] Sorting FreqDist in NLTK with get vs get()
问题描述
我正在使用NLTK和模块freqDist
I am playing around with NLTK and the module freqDist
import nltk
from nltk.corpus import gutenberg
print(gutenberg.fileids())
from nltk import FreqDist
fd = FreqDist()
for word in gutenberg.words('austen-persuasion.txt'):
fd[word] += 1
newfd = sorted(fd, key=fd.get, reverse=True)[:10]
所以我正在和NLTK一起玩,并且有关于排序部分的问题。当我运行这样的代码时,会正确排序freqDist对象。但是当我运行它与get()而不是得到我遇到的错误
So I am playing around with NLTK and have a question regarding the sort portion. When I run the code like this it properly sorts the freqDist object. However when I run it with get() instead of get I encounter the error
Traceback (most recent call last):
File "C:\Python34\NLP\NLP.py", line 21, in <module>
newfd = sorted(fd, key=fd.get(), reverse=True)[:10]
TypeError: get expected at least 1 arguments, got 0
为什么get和get()错误。我的印象是get()应该是正确的,但我猜不是这样的。
Why is get right and get() wrong. I was under the impression that get() should be correct, but I guess it is not.
推荐答案
基本上,code> 中的freqDist
对象NLTK 是本机Python的 c> collections.Counter ,所以让我们看看 Counter
的工作原理:
Essentially, the FreqDist
object in NLTK
is a sub-class of the native Python's collections.Counter
, so let's see how Counter
works:
A Counter
是一个字典,将列表中的元素存储为其关键字,元素的计数值作为值:
A Counter
is a dictionary which stores the elements in a list as its key and the counts of the elements as the values:
>>> from collections import Counter
>>> Counter(['a','a','b','c','c','c','d'])
Counter({'c': 3, 'a': 2, 'b': 1, 'd': 1})
>>> c = Counter(['a','a','b','c','c','c','d'])
要获取按频率排序的元素列表,您可以使用 .most_common()
函数,它将返回元素的元组及其计数按计数排序。
To get a list of elements sorted by their frequency, you can use .most_common()
function and it will return a tuple of the element and its count sorted by the counts.
>>> c.most_common()
[('c', 3), ('a', 2), ('b', 1), ('d', 1)]
反之亦然:
>>> list(reversed(c.most_common()))
[('d', 1), ('b', 1), ('a', 2), ('c', 3)]
像字典一样,您可以迭代一个Counter对象,它将返回键:
Like a dictionary you can iterate through a Counter object and it will return the keys:
>>> [key for key in c]
['a', 'c', 'b', 'd']
>>> c.keys()
['a', 'c', 'b', 'd']
您还可以使用 .items()
函数来获取键和它们的值的元组:
You can also use the .items()
function to get a tuple of the keys and their values:
>>> c.items()
[('a', 2), ('c', 3), ('b', 1), ('d', 1)]
或者,如果您只需要按照计数排序的密钥,请参阅转置/解压缩函数(与zip相反)?:
Alternatively, if you only need the keys sorted by their counts, see Transpose/Unzip Function (inverse of zip)?:
>>> k, v = zip(*c.most_common())
>>> k
('c', 'a', 'b', 'd')
回到 .get
vs .get()
的问题,前者是函数本身,而后者是需要字典键的参数的函数实例:
Going back to the question of .get
vs .get()
, the former is the function itself, while the latter is an instance of the function that requires the key of the dictionary as its parameter:
>>> c = Counter(['a','a','b','c','c','c','d'])
>>> c
Counter({'c': 3, 'a': 2, 'b': 1, 'd': 1})
>>> c.get
<built-in method get of Counter object at 0x7f5f95534868>
>>> c.get()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: get expected at least 1 arguments, got 0
>>> c.get('a')
2
调用 sorted()
,中的
key = ...
参数在中排序
函数是不您正在排序的列表/字典的关键字,但是 c
When invoking the sorted()
, the key=...
parameter inside the sorted
function is not the key of the list/dictionary you're sorting but the key that sorted
should use for sorting.
所以这些都是一样的,但是它们只返回这些键的值:
So these are the same, but they only return the values of the keys:
>>> [c.get(key) for key in c]
[2, 3, 1, 1]
>>> [c[key] for key in c]
[2, 3, 1, 1]
当排序时,这些值被用作排序的标准,所以这些值达到相同的输出:
And when sorting, the values are used as the criteria for sorting, so these achieves the same output:
>>> sorted(c, key=c.get)
['b', 'd', 'a', 'c']
>>> v, k = zip(*sorted((c.get(key), key) for key in c))
>>> list(k)
['b', 'd', 'a', 'c']
>>> sorted(c, key=c.get, reverse=True) # Highest to lowest
['c', 'a', 'b', 'd']
>>> v, k = zip(*reversed(sorted((c.get(key), key) for key in c)))
>>> k
('c', 'a', 'd', 'b')
这篇关于使用get vs get()对NLTK中的FreqDist进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!