Python:发生频率 [英] Python: Frequency of occurrences

查看:47
本文介绍了Python:发生频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个整数列表,想获取每个整数的频率.在此处

I have list of integers and want to get frequency of each integer. This was discussed here

问题是当我的数据集仅包含整数时,我使用的方法给了我浮点数的频率.为什么会发生这种情况,以及如何从数据中获取整数频率?

The problem is that approach I'm using gives me frequency of floating numbers when my data set consist of integers only. Why that happens and how I can get frequency of integers from my data?

我正在使用pyplot.histogram绘制具有发生频率的直方图

I'm using pyplot.histogram to plot a histogram with frequency of occurrences

import numpy as np
import matplotlib.pyplot as plt
from numpy import *
data = loadtxt('data.txt',dtype=int,usecols=(4,)) #loading 5th column of csv file into array named data. 
plt.hist(data) #plotting the column as histogram 

我得到了直方图,但我注意到如果我打印" hist(data)

I'm getting the histogram, but I've noticed that if I "print" hist(data)

hist=np.histogram(data)
print hist(data)

我明白了:

(array([ 2323, 16338,  1587,   212,    26,    14,     3,     2,     2,     2]), 
array([  1. ,   2.8,   4.6,   6.4,   8.2,  10. ,  11.8,  13.6,  15.4,
    17.2,  19. ]))

第二个数组代表值,第一个数组代表出现的次数.

Where the second array represent values and first array represent number of occurrences.

在我的数据集中,所有值都是整数,第二个数组具有浮点数怎么办?我该如何获取整数频率?

In my data set all values are integers, how that happens that second array have floating numbers and how should I get frequency of integers?

更新:

这样就解决了,谢谢Lev的回复.

This solves the problem, thank you Lev for the reply.

plt.hist(data, bins=np.arange(data.min(), data.max()+1))

为避免产生一个新问题,我如何为每个整数绘制中间"列?说,我希望整数3的列的空间在2.5和3.5之间,而不是3和4之间.

To avoid creating a new question how I can plot columns "in the middle" for each integer? Say, I want column for integer 3 take space between 2.5 and 3.5 not between 3 and 4.

推荐答案

如果你没有指定使用什么 bins,np.histogrampyplot.hist 会使用默认设置,即使用 10 个相等的 bin.第一个容器的左边界是最小的值,最后一个容器的右边界是最大的.

If you don't specify what bins to use, np.histogram and pyplot.hist will use a default setting, which is to use 10 equal bins. The left border of the 1st bin is the smallest value and the right border of the last bin is the largest.

这就是 bin 边界是浮点数的原因.您可以使用 bins 关键字参数来强制选择另一个 bins,例如:

This is why the bin borders are floating point numbers. You can use the bins keyword arguments to enforce another choice of bins, e.g.:

plt.hist(data, bins=np.arange(data.min(), data.max()+1))

将所有 bin 向左移动的最简单方法可能只是从所有 bin 边界中减去 0.5:

the easiest way to shift all bins to the left is probably just to subtract 0.5 from all bin borders:

plt.hist(data, bins=np.arange(data.min(), data.max()+1)-0.5)

实现相同效果的另一种方法(如果存在非整数则不等效):

Another way to achieve the same effect (not equivalent if non-integers are present):

plt.hist(data, bins=np.arange(data.min(), data.max()+1), align='left')

这篇关于Python:发生频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆