在主词上使用matplotlib的Python直方图 [英] Python Histogram using matplotlib on top words

查看:55
本文介绍了在主词上使用matplotlib的Python直方图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读取文件并计算前100个字的频率.我能够找到并创建以下列表:

I am reading a file and calculating the frequency of the top 100 words. I am able to find that and create the following list:

[('test', 510), ('Hey', 362), ("please", 753), ('take', 446), ('herbert', 325), ('live', 222), ('hate', 210), ('white', 191), ('simple', 175), ('harry', 172), ('woman', 170), ('basil', 153), ('things', 129), ('think', 126), ('bye', 124), ('thing', 120), ('love', 107), ('quite', 107), ('face', 107), ('eyes', 107), ('time', 106), ('himself', 105), ('want', 105), ('good', 105), ('really', 103), ('away',100), ('did', 100), ('people', 99), ('came', 97), ('say', 97), ('cried', 95), ('looked', 94), ('tell', 92), ('look', 91), ('world', 89), ('work', 89), ('project', 88), ('room', 88), ('going', 87), ('answered', 87), ('mr', 87), ('little', 87), ('yes', 84), ('silly', 82), ('thought', 82), ('shall', 81), ('circle', 80), ('hallward', 80), ('told', 77), ('feel', 76), ('great', 74), ('art', 74), ('dear',73), ('picture', 73), ('men', 72), ('long', 71), ('young', 70), ('lady', 69), ('let', 66), ('minute', 66), ('women', 66), ('soul', 65), ('door', 64), ('hand',63), ('went', 63), ('make', 63), ('night', 62), ('asked', 61), ('old', 61), ('passed', 60), ('afraid', 60), ('night', 59), ('looking', 58), ('wonderful', 58), ('gutenberg-tm', 56), ('beauty', 55), ('sir', 55), ('table', 55), ('turned', 54), ('lips', 54), ("one's", 54), ('better', 54), ('got', 54), ('vane', 54), ('right',53), ('left', 53), ('course', 52), ('hands', 52), ('portrait', 52), ('head', 51), ("can't", 49), ('true', 49), ('house', 49), ('believe', 49), ('black', 49), ('horrible', 48), ('oh', 48), ('knew', 47), ('curious', 47), ('myself', 47)]

获取此列表后,我想使用matplotlib绘制直方图.我正在尝试以下操作,但是无法绘制适当的直方图.

After getting this list, I want to draw histogram using matplotlib. I am trying something as below, but I am not able to draw a proper histogram.

我的问题:如何将总频率传递给图表?我所有的酒吧现在都处于相同的高度.甚至垃圾箱中心也不正确.我应该如何在下面的代码中将数据传递给ax.hist方法?我正在尝试从 http://matplotlib.org/1.2.1更新示例/examples/api/histogram_demo.html .

My question: How do I pass the total frequency to the graph? All of my bars are at the same height right now. And even the bin center is not correct. How should I pass data to the ax.hist method on below code? I am trying to update the example from http://matplotlib.org/1.2.1/examples/api/histogram_demo.html.

totalWords = counts.most_common(100)
print(totalWords)
for z in range(len(totalWords)):
    words.append(totalWords[z][0])

x = np.arange(len(words))
#print x
i, s = 100, 15

fig = plt.figure()
ax = fig.add_subplot(111)

n, bins, patches = ax.hist(x, 50, normed=1, facecolor='green', alpha=0.75)


bincenters = 0.5*(bins[1:]+bins[:-1])

y = mlab.normpdf(bincenters*1.00, i, s)
l = ax.plot(bincenters, y, 'r--', linewidth=1)

ax.set_xlabel('Words')
ax.set_ylabel('Frequency')
ax.set_xlim(50, 160)
ax.set_ylim(0, 0.04)
ax.grid(True)

plt.show()

推荐答案

也许这就是您想要的.此代码生成一个条形图,每个条形图代表单个单词,垂直轴提供文本中单词的出现频率.

Perhaps this is what you want. This code produces a bar chart with each bar representing individual words and vertical axis provides the frequency of word in your text.

您提供的counts数组的排序没有达到我的预期.

The counts array you provided is not sorted as I had expected though.

import numpy as np
import matplotlib.pyplot as plt

counts = [('test', 510), ('Hey', 362), ("please", 753), ('take', 446), ('herbert', 325), ('live', 222), ('hate', 210), ('white', 191), ('simple', 175), ('harry', 172), ('woman', 170), ('basil', 153), ('things', 129), ('think', 126), ('bye', 124), ('thing', 120), ('love', 107), ('quite', 107), ('face', 107), ('eyes', 107), ('time', 106), ('himself', 105), ('want', 105), ('good', 105), ('really', 103), ('away',100), ('did', 100), ('people', 99), ('came', 97), ('say', 97), ('cried', 95), ('looked', 94), ('tell', 92), ('look', 91), ('world', 89), ('work', 89), ('project', 88), ('room', 88), ('going', 87), ('answered', 87), ('mr', 87), ('little', 87), ('yes', 84), ('silly', 82), ('thought', 82), ('shall', 81), ('circle', 80), ('hallward', 80), ('told', 77), ('feel', 76), ('great', 74), ('art', 74), ('dear',73), ('picture', 73), ('men', 72), ('long', 71), ('young', 70), ('lady', 69), ('let', 66), ('minute', 66), ('women', 66), ('soul', 65), ('door', 64), ('hand',63), ('went', 63), ('make', 63), ('night', 62), ('asked', 61), ('old', 61), ('passed', 60), ('afraid', 60), ('night', 59), ('looking', 58), ('wonderful', 58), ('gutenberg-tm', 56), ('beauty', 55), ('sir', 55), ('table', 55), ('turned', 54), ('lips', 54), ("one's", 54), ('better', 54), ('got', 54), ('vane', 54), ('right',53), ('left', 53), ('course', 52), ('hands', 52), ('portrait', 52), ('head', 51), ("can't", 49), ('true', 49), ('house', 49), ('believe', 49), ('black', 49), ('horrible', 48), ('oh', 48), ('knew', 47), ('curious', 47), ('myself', 47)]
words = [x[0] for x in counts]
values = [int(x[1]) for x in counts]
print words
mybar = plt.bar(range(len(words)), values, color='green', alpha=0.4)

plt.xlabel('Word Index')
plt.ylabel('Frequency')
plt.title('Word Frequency Chart')
plt.legend()

plt.show()

您可以看到遵循齐氏曲线(幂律曲线)的图形.修改代码以适合您的需求.

You can see the graph following a ziphian curve (power law curve). Modify the code to suit your need.

这篇关于在主词上使用matplotlib的Python直方图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆