如何找到经验累积密度函数(ECDF)的分位数 [英] How to find quantiles of an empirical cumulative density function (ECDF)

查看:423
本文介绍了如何找到经验累积密度函数(ECDF)的分位数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 ecdf()函数从一些随机样本中计算经验累积密度函数(ECDF):

I am using ecdf() function to calculate empirical cumulative density function (ECDF) from some random samples:

set.seed(0)
X = rnorm(100)
P = ecdf(X)

现在 P 给出ECDF,我们可以绘制它:

Now P gives ECDF and we could plot it:

plot(P)
abline(h = 0.6, lty = 3)

我的问题是:如何找到样本值 x ,这样 P(x)= 0.6 ,即ECDF的0.6分位数,或者ECDF和 h = 0.6 的交点的x坐标?

My question is: how can I find the sample value x, such that P(x) = 0.6, i.e., the 0.6-quantile of ECDF, or the x-coordinate of the intersection point of ECDF and h = 0.6?

推荐答案

在下面,我将不使用 ecdf(),因为我们自己很容易获得经验累积密度函数(ECDF)。

In the following, I will not use ecdf(), as it is easy to obtain empirical cumulative density function (ECDF) ourselves.

首先,我们对样本 X 进行排序升序:

First, we sort samples X in ascending order:

X <- sort(X)

$这些样本处的b $ b

ECDF取函数值:

ECDF at those samples takes function values:

e_cdf <- 1:length(X) / length(X)

然后我们可以通过以下方式绘制ECDF:

We could then sketch ECDF by:

plot(X, e_cdf, type = "s")
abline(h = 0.6, lty = 3)

现在,我们正在寻找 X 的第一个值,这样 P(X)> = 0.6 。就是这样:

Now, we are looking for the first value of X, such that P(X) >= 0.6. This is just:

X[which(e_cdf >= 0.6)[1]]
# [1] 0.2290196

由于我们的数据是从标准正态分布中采样的,因此理论分位数为

Since our data are sampled from standard normal distribution, the theoretical quantile is

qnorm(0.6)
# [1] 0.2533471

所以我们的结果非常接近

So our result is quite close.

由于 CDF的倒数是分位数函数(例如, pnorm()的倒数是 qnorm()),人们可能会猜测ECDF的倒数是样本分位数,即,倒数 ecdf() quantile()

Since the inverse of CDF is quantile function (for example, the inverse of pnorm() is qnorm()), one may guess the inverse of ECDF as sample quantile, i,e, the inverse ecdf() is quantile(). This is not true!

ECDF是阶梯/步函数,它没有反函数。如果将ECDF绕 y = x 旋转,则所得曲线不是数学函数。 因此样本分位数与ECDF无关。

ECDF is a staircase / step function, and it does not have inverse. If we rotate ECDF around y = x, the resulting curve is not a mathematical function. So sample quantile is has nothing to do with ECDF.

对于 n 个分类的样本,样本分位数函数实际上是(x,y)的线性插值函数,其中:

For n sorted samples, sample quantile function is in fact a linear interpolation function of (x, y), with:


  • x值是 seq(0,1,length = n);

  • y值是被排序的样本。

我们可以通过以下方式定义自己的样本分位数函数

my_quantile <- function(x, prob) {
  if (is.unsorted(x)) x <- sort(x)
  n <- length(x)
  approx(seq(0, 1, length = n), x, prob)$y
  }

让我们进行测试:

my_quantile(x, 0.6)
# [1] 0.2343171

quantile(x, prob = 0.6, names = FALSE)
# [1] 0.2343171

结果不同于我们从 X [which(e_cdf> = 0.6)[1]] 得到的结果。

Note that result is different from what we get from X[which(e_cdf >= 0.6)[1]].

因此,我拒绝使用 quantile()。

这篇关于如何找到经验累积密度函数(ECDF)的分位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆