如何绘制经验 cdf (ecdf) [英] How to plot empirical cdf (ecdf)

查看:62
本文介绍了如何绘制经验 cdf (ecdf)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 Python 中的 matplotlib 中绘制数字数组的经验 CDF?我正在寻找 pylab 的hist"的 cdf 模拟文件.功能.

How can I plot the empirical CDF of an array of numbers in matplotlib in Python? I'm looking for the cdf analog of pylab's "hist" function.

我能想到的一件事是:

from scipy.stats import cumfreq
a = array([...]) # my array of numbers
num_bins =  20
b = cumfreq(a, num_bins)
plt.plot(b)

推荐答案

这看起来(几乎)正是您想要的.两件事:

That looks to be (almost) exactly what you want. Two things:

首先,结果是四项的元组.第三是垃圾桶的大小.第二个是最小bin的起点.第一个是每个区间内或下方的点数.(最后一个是超出限制的点数,但由于您没有设置任何点,所有点都将被分箱.)

First, the results are a tuple of four items. The third is the size of the bins. The second is the starting point of the smallest bin. The first is the number of points in the in or below each bin. (The last is the number of points outside the limits, but since you haven't set any, all points will be binned.)

其次,您需要重新调整结果,使最终值为 1,以遵循 CDF 的通常约定,否则它是正确的.

Second, you'll want to rescale the results so the final value is 1, to follow the usual conventions of a CDF, but otherwise it's right.

这是它的幕后工作:

def cumfreq(a, numbins=10, defaultreallimits=None):
    # docstring omitted
    h,l,b,e = histogram(a,numbins,defaultreallimits)
    cumhist = np.cumsum(h*1, axis=0)
    return cumhist,l,b,e

它进行直方图绘制,然后生成每个 bin 中计数的累积总和.所以结果的第 i 个值是小于或等于第 i 个 bin 的最大值的数组值的个数.所以,最终值就是初始数组的大小.

It does the histogramming, then produces a cumulative sum of the counts in each bin. So the ith value of the result is the number of array values less than or equal to the the maximum of the ith bin. So, the final value is just the size of the initial array.

最后,要绘制它,您需要使用 bin 的初始值和 bin 大小来确定您需要的 x 轴值.

Finally, to plot it, you'll need to use the initial value of the bin, and the bin size to determine what x-axis values you'll need.

另一种选择是使用 numpy.histogram 它可以进行归一化并返回 bin 边缘.您需要自己计算结果计数的累积总和.

Another option is to use numpy.histogram which can do the normalization and returns the bin edges. You'll need to do the cumulative sum of the resulting counts yourself.

a = array([...]) # your array of numbers
num_bins = 20
counts, bin_edges = numpy.histogram(a, bins=num_bins, normed=True)
cdf = numpy.cumsum(counts)
pylab.plot(bin_edges[1:], cdf)

(bin_edges[1:] 是每个 bin 的上边缘.)

(bin_edges[1:] is the upper edge of each bin.)

这篇关于如何绘制经验 cdf (ecdf)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆