如何在Python的matplotlib中绘制经验CDF? [英] How to plot empirical cdf in matplotlib in Python?

查看:139
本文介绍了如何在Python的matplotlib中绘制经验CDF?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在Python的matplotlib中绘制数字数组的经验CDF?我正在寻找pylab的"hist"函数的cdf类似物.

How can I plot the empirical CDF of an array of numbers in matplotlib in Python? I'm looking for the cdf analog of pylab's "hist" function.

我能想到的一件事是:

from scipy.stats import cumfreq
a = array([...]) # my array of numbers
num_bins =  20
b = cumfreq(a, num_bins)
plt.plot(b)

那是正确的吗?有没有更简单/更好的方法?

Is that correct though? Is there an easier/better way?

谢谢.

推荐答案

(几乎)正是您想要的.两件事:

That looks to be (almost) exactly what you want. Two things:

首先,结果是四个项目的元组.第三个是垃圾箱的大小.第二个是最小垃圾箱的起点.第一个是每个垃圾箱中或下方的点数. (最后一个是超出限制的点数,但是由于您没有设置任何点数,因此将对所有点进行分箱.)

First, the results are a tuple of four items. The third is the size of the bins. The second is the starting point of the smallest bin. The first is the number of points in the in or below each bin. (The last is the number of points outside the limits, but since you haven't set any, all points will be binned.)

第二,您需要调整结果的比例,使最终值是1,以遵循CDF的常规约定,但是否则是正确的.

Second, you'll want to rescale the results so the final value is 1, to follow the usual conventions of a CDF, but otherwise it's right.

这是引擎盖下的作用:

def cumfreq(a, numbins=10, defaultreallimits=None):
    # docstring omitted
    h,l,b,e = histogram(a,numbins,defaultreallimits)
    cumhist = np.cumsum(h*1, axis=0)
    return cumhist,l,b,e

它进行直方图处理,然后在每个bin中生成计数的累积总和.因此,结果的第i个值是小于或等于第i个bin的最大值的数组值的数量.因此,最终值就是初始数组的大小.

It does the histogramming, then produces a cumulative sum of the counts in each bin. So the ith value of the result is the number of array values less than or equal to the the maximum of the ith bin. So, the final value is just the size of the initial array.

最后,要进行绘制,需要使用bin的初始值和bin大小来确定所需的x轴值.

Finally, to plot it, you'll need to use the initial value of the bin, and the bin size to determine what x-axis values you'll need.

另一种选择是使用numpy.histogram,它可以进行归一化并返回bin边缘.您需要自己对结果计数进行累加.

Another option is to use numpy.histogram which can do the normalization and returns the bin edges. You'll need to do the cumulative sum of the resulting counts yourself.

a = array([...]) # your array of numbers
num_bins = 20
counts, bin_edges = numpy.histogram(a, bins=num_bins, normed=True)
cdf = numpy.cumsum(counts)
pylab.plot(bin_edges[1:], cdf)

(bin_edges[1:]是每个容器的上边缘.)

(bin_edges[1:] is the upper edge of each bin.)

这篇关于如何在Python的matplotlib中绘制经验CDF?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆