解释R中的Quantile()函数 [英] Explain the quantile() function in R

查看:772
本文介绍了解释R中的Quantile()函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我整天都被R分位数功能迷住了.

我对分位数的工作方式有一个直观的认识,并获得了硕士学位.在统计数据中,但是天哪,它的文档对我来说很混乱.

从文档中

Q [i](p)=(1-伽马)x [j] +伽马 x [j + 1],

到目前为止,我已经接受了.对于类型 i 的分位数,它是x [j]和x [j + 1]之间的插值,基于一些神秘的常量 gamma

其中1< = i< = 9,(j-m)/n< = p< (j-m + 1)/n,x [j]是第j阶 统计量,n是样本量,m 是由样本确定的常数 分位数类型.在这里伽玛取决于 g的小数部分= np + m-j.

那么,如何计算j? m?

对于连续样本分位数 类型(4到9),样本 分位数可以通过线性获得 第k阶之间的插值 统计信息和p(k):

p(k)=(k-alpha)/(n-alpha-beta +1), 其中α和β是确定的常数 按类型.此外,m = alpha + p(1 -alpha-beta)和gamma = g.

现在我真的迷路了. p,以前是一个常数,现在显然是一个函数.

对于类型7的分位数,则默认为...

类型7

p(k)=(k-1)/(n-1).在这种情况下,p(k)=模式[F(x [k])].由S使用.

有人想要帮助我吗?特别是,我对p是函数和常数的概念感到困惑,到底什么是 m ,现在要为某些特定的 p 计算j.

我希望基于此处的答案,我们可以提交一些经过修订的文档,以更好地解释此处的情况.

quantile.R源代码 或输入:Quantile.default

解决方案

您很困惑.该文档太糟糕了.我不得不回过头看它的论文,其依据是(Hyndman,RJ; Fan,Y.(1996年11月).统计软件包中的样本分位数". American Statistician 50(4):361–365 . doi:10.2307/2684934 )进行了解.让我们从第一个问题开始.

其中1< = i< = 9,(j-m)/n< = p< (j-m + 1)/n,x [j]是j阶统计量,n是样本大小,m是由样本分位数类型确定的常数.在这里,伽玛取决于g = np + m-j的小数部分.

第一部分直接来自论文,但是文档作者所省略的是j = int(pn+m).这意味着Q[i](p)仅取决于最接近通过(排序的)观察的方式的p分数的两个阶统计量. (对于像我这样不熟悉该术语的人,一系列观测值的顺序统计"就是排序后的序列.)

此外,最后一句话是错误的.它应该显示为

这里的伽玛取决于np + m的小数部分,g = np + m-j

对于m来说很简单. m取决于选择的9种算法中的哪一种.因此,就像Q[i]是分位数函数一样,应将m视为m[i].对于算法1和2,m为0,对于算法3,m为-1/2,对于其他算法,在下一部分.

对于连续样本分位数类型(4至9),可以通过在k阶统计量与p(k)之间进行线性插值来获得样本分位数:

p(k)=(k-alpha)/(n-alpha-beta + 1),其中α和β是由类型确定的常数.此外,m = alpha + p(1-alpha-beta),gamma = g.

这真是令人困惑.文档所说的p(k)与以前的p不同. p(k)绘图位置.在本文中,作者将其写为p k ,这很有帮助.特别是因为在m的表达式中,p是原始的pm = alpha + p * (1 - alpha - beta).从概念上讲,对于算法4-9,对点(p k x[k])进行插值以获得解(pQ[i](p)).每种算法的区别仅在于p k 的算法.

关于最后一位,R只是说明S的用途.

原始论文列出了6个样本分位数的理想属性"功能的列表,并声明了对#8的优先选择,该优先条件满足所有条件1.#5满足了所有条件,但他们不喜欢其他条件理由(比从原理中得出的现象更重要). #2是像我这样的非统计专家会考虑的分位数,也是Wikipedia中描述的内容.

顺便说一句,作为对 dreeves答案的回应,Mathematica事情有很大的不同.我想我了解映射.虽然Mathematica的用法更容易理解,但(a)使用无意义的参数更容易用脚射击自己,并且(b)无法执行R的算法#2. (这是 Mathworld的分位数页面,其中指出Mathematica不能执行#2,但给出了更简单的概括所有其他算法的四个参数.)

I've been mystified by the R quantile function all day.

I have an intuitive notion of how quantiles work, and an M.S. in stats, but boy oh boy, the documentation for it is confusing to me.

From the docs:

Q[i](p) = (1 - gamma) x[j] + gamma x[j+1],

I'm with it so far. For a type i quantile, it's an interpolation between x[j] and x [j+1], based on some mysterious constant gamma

where 1 <= i <= 9, (j-m)/n <= p < (j-m+1)/ n, x[j] is the jth order statistic, n is the sample size, and m is a constant determined by the sample quantile type. Here gamma depends on the fractional part of g = np+m-j.

So, how calculate j? m?

For the continuous sample quantile types (4 through 9), the sample quantiles can be obtained by linear interpolation between the kth order statistic and p(k):

p(k) = (k - alpha) / (n - alpha - beta + 1), where α and β are constants determined by the type. Further, m = alpha + p(1 - alpha - beta), and gamma = g.

Now I'm really lost. p, which was a constant before, is now apparently a function.

So for Type 7 quantiles, the default...

Type 7

p(k) = (k - 1) / (n - 1). In this case, p(k) = mode[F(x[k])]. This is used by S.

Anyone want to help me out? In particular I'm confused by the notation of p being a function and a constant, what the heck m is, and now to calculate j for some particular p.

I hope that based on the answers here, we can submit some revised documentation that better explains what is going on here.

quantile.R source code or type: quantile.default

解决方案

You're understandably confused. That documentation is terrible. I had to go back to the paper its based on (Hyndman, R.J.; Fan, Y. (November 1996). "Sample Quantiles in Statistical Packages". American Statistician 50 (4): 361–365. doi:10.2307/2684934) to get an understanding. Let's start with the first problem.

where 1 <= i <= 9, (j-m)/n <= p < (j-m+1)/ n, x[j] is the jth order statistic, n is the sample size, and m is a constant determined by the sample quantile type. Here gamma depends on the fractional part of g = np+m-j.

The first part comes straight from the paper, but what the documentation writers omitted was that j = int(pn+m). This means Q[i](p) only depends on the two order statistics closest to being p fraction of the way through the (sorted) observations. (For those, like me, who are unfamiliar with the term, the "order statistics" of a series of observations is the sorted series.)

Also, that last sentence is just wrong. It should read

Here gamma depends on the fractional part of np+m, g = np+m-j

As for m that's straightforward. m depends on which of the 9 algorithms was chosen. So just like Q[i] is the quantile function, m should be considered m[i]. For algorithms 1 and 2, m is 0, for 3, m is -1/2, and for the others, that's in the next part.

For the continuous sample quantile types (4 through 9), the sample quantiles can be obtained by linear interpolation between the kth order statistic and p(k):

p(k) = (k - alpha) / (n - alpha - beta + 1), where α and β are constants determined by the type. Further, m = alpha + p(1 - alpha - beta), and gamma = g.

This is really confusing. What the documentation calls p(k) is not the same as the p from before. p(k) is the plotting position. In the paper, the authors write it as pk, which helps. Especially since in the expression for m, the p is the original p, and the m = alpha + p * (1 - alpha - beta). Conceptually, for algorithms 4-9, the points (pk, x[k]) are interpolated to get the solution (p, Q[i](p)). Each algorithm only differs in the algorithm for the pk.

As for the last bit, R is just stating what S uses.

The original paper gives a list of 6 "desirable properties for a sample quantile" function, and states a preference for #8 which satisfies all by 1. #5 satisfies all of them, but they don't like it on other grounds (it's more phenomenological than derived from principles). #2 is what non-stat geeks like myself would consider the quantiles and is what's described in wikipedia.

BTW, in response to dreeves answer, Mathematica does things significantly differently. I think I understand the mapping. While Mathematica's is easier to understand, (a) it's easier to shoot yourself in the foot with nonsensical parameters, and (b) it can't do R's algorithm #2. (Here's Mathworld's Quantile page, which states Mathematica can't do #2, but gives a simpler generalization of all the other algorithms in terms of four parameters.)

这篇关于解释R中的Quantile()函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆