字母频率:绘制直方图,对值PYTHON进行排序 [英] Letter frequencies: plot a histogram ordering the values PYTHON

查看:134
本文介绍了字母频率:绘制直方图,对值PYTHON进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要尝试的是分析文本中字母的出现频率.例如,在这里我将使用一小段句子,但是所有这些都被认为可以分析大型文本(因此最好是高效的).

What I am trying to do is to analyse the frequency of the letters in a text. As an example, I will use here a small sentence, but all that is thought to analyse huge texts (so it's better to be efficient).

test = "quatre jutges dun jutjat mengen fetge dun penjat"

然后我创建了一个计算频率的函数

Then I created a function which counts the frequencies

def create_dictionary2(txt):
    dictionary = {}
    i=0
    for x in set(txt):
        dictionary[x] = txt.count(x)/len(txt)
    return dictionary

然后

import numpy as np
import matplotlib.pyplot as plt
test_dict = create_dictionary2(test)
plt.bar(test_dict.keys(), test_dict.values(), width=0.5, color='g')

我获得

问题:我想查看所有字母,但其中一些字母看不到(15位艺术家的容器对象)如何扩展直方图?然后,我想对直方图进行排序,以从中获得类似的结果

ISSUES: I want to see all the letters, but some of them are not seen (Container object of 15 artists) How to expand the histogram? Then, I would like to sort the histogram, to obtain something like from this

这个

推荐答案

要进行计数,我们可以使用 Counter 对象.Counter还支持在最常见的上获取键值对值:

For counting we can use a Counter object. Counter also supports getting key-value pairs on the most common values:

from collections import Counter

import numpy as np
import matplotlib.pyplot as plt

c = Counter("quatre jutges dun jutjat mengen fetge dun penjat")
plt.bar(*zip(*c.most_common()), width=.5, color='g')
plt.show()

most_common 方法返回键值元组的列表. * zip(* ..)用于解压缩(请参见此答案).

The most_common method returns a list of key-value tuples. The *zip(*..) is used to unpack (see this answer).

注意:我尚未更新宽度或颜色以匹配您的结果图.

Note: I haven't updated the width or color to match your result plots.

这篇关于字母频率:绘制直方图,对值PYTHON进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆