用Python计算累积分布函数(CDF) [英] Calculate the Cumulative Distribution Function (CDF) in Python

查看:4053
本文介绍了用Python计算累积分布函数(CDF)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在python中计算累积分布函数(CDF)?

How can I calculate in python the Cumulative Distribution Function (CDF)?

我想从我拥有的点(离散分布)数组中计算出它,而不是用scipy具有的连续分布来计算它.

I want to calculate it from an array of points I have (discrete distribution), not with the continuous distributions that, for example, scipy has.

推荐答案

(我对问题的解释可能是错误的.如果问题是如何从离散的PDF转换为离散的CDF,则np.cumsum如果样本是等距的,则将其除以合适的常数;如果数组不是等距的,则将数组的np.cumsum乘以点之间的距离即可.)

(It is possible that my interpretation of the question is wrong. If the question is how to get from a discrete PDF into a discrete CDF, then np.cumsum divided by a suitable constant will do if the samples are equispaced. If the array is not equispaced, then np.cumsum of the array multiplied by the distances between the points will do.)

如果您有一个离散的样本数组,并且想知道样本的CDF,则可以对数组进行排序.如果查看排序结果,您将意识到最小值代表0%,最大值代表100%.如果您想知道分布的50%处的值,只需查看位于已排序数组中间的array元素即可.

If you have a discrete array of samples, and you would like to know the CDF of the sample, then you can just sort the array. If you look at the sorted result, you'll realize that the smallest value represents 0% , and largest value represents 100 %. If you want to know the value at 50 % of the distribution, just look at the array element which is in the middle of the sorted array.

让我们通过一个简单的例子来仔细研究一下:

Let us have a closer look at this with a simple example:

import matplotlib.pyplot as plt
import numpy as np

# create some randomly ddistributed data:
data = np.random.randn(10000)

# sort the data:
data_sorted = np.sort(data)

# calculate the proportional values of samples
p = 1. * np.arange(len(data)) / (len(data) - 1)

# plot the sorted data:
fig = figure()
ax1 = fig.add_subplot(121)
ax1.plot(p, data_sorted)
ax1.set_xlabel('$p$')
ax1.set_ylabel('$x$')

ax2 = fig.add_subplot(122)
ax2.plot(data_sorted, p)
ax2.set_xlabel('$x$')
ax2.set_ylabel('$p$')

这给出了下面的图,其中右侧图是传统的累积分布函数.它应该反映出这些点背后的过程的CDF,但是自然地,只要点数是有限的,它就不是CDF.

This gives the following plot where the right-hand-side plot is the traditional cumulative distribution function. It should reflect the CDF of the process behind the points, but naturally it is not the as long as the number of points is finite.

此功能易于转换,并且取决于您所需的应用程序.

This function is easy to invert, and it depends on your application which form you need.

这篇关于用Python计算累积分布函数(CDF)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆