字母频率:绘制直方图,对值PYTHON进行排序 [英] Letter frequencies: plot a histogram ordering the values PYTHON
问题描述
我要尝试的是分析文本中字母的出现频率.例如,在这里我将使用一小段句子,但是所有这些都被认为可以分析大型文本(因此最好是高效的).
What I am trying to do is to analyse the frequency of the letters in a text. As an example, I will use here a small sentence, but all that is thought to analyse huge texts (so it's better to be efficient).
test = "quatre jutges dun jutjat mengen fetge dun penjat"
然后我创建了一个计算频率的函数
Then I created a function which counts the frequencies
def create_dictionary2(txt):
dictionary = {}
i=0
for x in set(txt):
dictionary[x] = txt.count(x)/len(txt)
return dictionary
然后
import numpy as np
import matplotlib.pyplot as plt
test_dict = create_dictionary2(test)
plt.bar(test_dict.keys(), test_dict.values(), width=0.5, color='g')
我获得
问题:我想查看所有字母,但其中一些字母看不到(15位艺术家的容器对象)如何扩展直方图?然后,我想对直方图进行排序,以从中获得类似的结果
ISSUES: I want to see all the letters, but some of them are not seen (Container object of 15 artists) How to expand the histogram? Then, I would like to sort the histogram, to obtain something like from this
这个
推荐答案
要进行计数,我们可以使用 Counter
对象.Counter还支持在最常见的上获取键值对值:
For counting we can use a Counter
object. Counter also supports getting key-value pairs on the most common values:
from collections import Counter
import numpy as np
import matplotlib.pyplot as plt
c = Counter("quatre jutges dun jutjat mengen fetge dun penjat")
plt.bar(*zip(*c.most_common()), width=.5, color='g')
plt.show()
most_common
方法返回键值元组的列表. * zip(* ..)
用于解压缩(请参见此答案).
The most_common
method returns a list of key-value tuples. The *zip(*..)
is used to unpack (see this answer).
注意:我尚未更新宽度或颜色以匹配您的结果图.
Note: I haven't updated the width or color to match your result plots.
这篇关于字母频率:绘制直方图,对值PYTHON进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!