在 Python 中计算累积分布函数 (CDF) [英] Calculate the Cumulative Distribution Function (CDF) in Python

查看:179
本文介绍了在 Python 中计算累积分布函数 (CDF)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 python 中计算累积分布函数 (CDF)?

How can I calculate in python the Cumulative Distribution Function (CDF)?

我想根据我拥有的一组点(离散分布)来计算它,而不是使用例如 scipy 具有的连续分布.

I want to calculate it from an array of points I have (discrete distribution), not with the continuous distributions that, for example, scipy has.

推荐答案

(可能是我对问题的理解有误.如果问题是如何从离散PDF转为离散CDF,则np.cumsum 除以合适的常数,如果样本是等距的.如果数组不是等距的,那么数组的 np.cumsum 乘以点之间的距离就可以了.)

(It is possible that my interpretation of the question is wrong. If the question is how to get from a discrete PDF into a discrete CDF, then np.cumsum divided by a suitable constant will do if the samples are equispaced. If the array is not equispaced, then np.cumsum of the array multiplied by the distances between the points will do.)

如果您有一个离散的样本数组,并且您想知道样本的 CDF,那么您可以对数组进行排序.如果查看排序结果,您会发现最小值代表 0% ,最大值代表 100% .如果您想知道分布的 50% 处的值,只需查看排序数组中间的数组元素即可.

If you have a discrete array of samples, and you would like to know the CDF of the sample, then you can just sort the array. If you look at the sorted result, you'll realize that the smallest value represents 0% , and largest value represents 100 %. If you want to know the value at 50 % of the distribution, just look at the array element which is in the middle of the sorted array.

让我们用一个简单的例子来仔细看看这个:

Let us have a closer look at this with a simple example:

import matplotlib.pyplot as plt
import numpy as np

# create some randomly ddistributed data:
data = np.random.randn(10000)

# sort the data:
data_sorted = np.sort(data)

# calculate the proportional values of samples
p = 1. * np.arange(len(data)) / (len(data) - 1)

# plot the sorted data:
fig = plt.figure()
ax1 = fig.add_subplot(121)
ax1.plot(p, data_sorted)
ax1.set_xlabel('$p$')
ax1.set_ylabel('$x$')

ax2 = fig.add_subplot(122)
ax2.plot(data_sorted, p)
ax2.set_xlabel('$x$')
ax2.set_ylabel('$p$')

这给出了以下图,其中右侧图是传统的累积分布函数.它应该反映点后面过程的CDF,但自然不是只要点数是有限的.

This gives the following plot where the right-hand-side plot is the traditional cumulative distribution function. It should reflect the CDF of the process behind the points, but naturally, it is not as long as the number of points is finite.

这个函数很容易反转,看你的应用需要哪种形式.

This function is easy to invert, and it depends on your application which form you need.

这篇关于在 Python 中计算累积分布函数 (CDF)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆