dnorm 是如何工作的? [英] How dnorm works?

查看:32
本文介绍了dnorm 是如何工作的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对统计和 R 很陌生.也许这是一个非常微不足道的问题,但我真的不明白这是如何工作的.

I am very new to Statistics and R. Maybe this is a very trivial question, but I don't really understand how this works.

假设我使用 dnorm(5, 0, 2.5).这是什么意思?

Suppose I use dnorm(5, 0, 2.5). What does that mean?

我看到一些资源,他们说这个函数计算密度曲线中点的高度.

I saw some resources where they told that this function computes the height of the point in the density curve.

现在我又读到一个数字在连续分布中的确切概率为 0.所以,我的问题是,如果我能找出某个值的高度或概率,那么它为什么是 0?

Now again I read that the exact probability of a number is 0 in continuous distribution. So, my question is if I can find out the height or probability of a certain value, then how come it is 0?

我知道我混淆了一些概念.但是我找不到我错在哪里.如果您能抽出时间让我理解这一点,那就太好了.提前致谢.

I know I have mixed up some concepts. But I'm unable to find where I'm wrong. It will be great if you spare your time to make me understand this. Thanks in advance.

推荐答案

密度返回一个数字,该数字本身不会直接转换为概率.但它给出了一条曲线的高度,如果在所有可能的数字范围内绘制,则其下方的面积加起来为 1.

The density returns a number that in itself does not translate directly into a probability. But it gives the height of a curve that, if drawn over the full range of possible numbers, has the area underneath it that adds up to 1.

考虑一下.如果我使向量 x 从 -7.5 到 7.5、0.1 的均匀间隔的数字相距 0.1,并且对于 x.

Consider this. If I make the vector x of evenly spaced numbers from -7.5 to 7.5, 0.1 apart, and get the density of a normal variable with mean 0 and standard deviation 2.5 for each value of x.

x <- seq(from = -7.5, to = 7.55, by = 0.1)
y <- dnorm(x, 0, 2.5)

由这些密度(我将其存储为 y)形成的曲线下面积的近似值乘以它们之间的距离 (0.1) 接近 1:

The approximate value of the area under the curve formed by those densities (which I have stored as y), multiplied by their distance apart (0.1) is nearly 1:

> sum(y * 0.1)
[1] 0.9974739

如果你用微积分正确地做到这一点,而不是用数字来近似它,它就会是一.

If you did this properly with calculus rather than approximating it with numbers, it would be exactly one.

为什么这很有用?曲线部分下的累积面积可用于估计变量出现在特定范围中的概率,尽管正如您的一位消息来源指出的那样,任何精确数字的机会在技术上都是连续变量为零.

Why is this useful? The cumulative area under parts of the curve can be used to estimate the probability of the variable coming anywhere in a particular range, even though as one of your sources points out, the chance of any precise number is technically zero for a continuous variable.

考虑这个图形.阴影空间的面积显示了正态分布(均值为零,标准差 2.5)中的变量介于 -7.5 和 4 之间的可能性.这会导致许多有用的应用.

Consider this graphic. The area of the shaded space shows the chance of a variable from your normal distribution (mean zero, standard deviation 2.5) being between -7.5 and 4. This leads to many useful applications.

制作:

library(ggplot2)

d <- data.frame(x, y)

ggplot(d, aes(x = x, y = y)) +
  geom_line() +
  geom_point() +
  geom_ribbon(fill = "steelblue", aes(ymax = y), ymin = 0, alpha = 0.5, data = subset(d, x <= 4)) +
  annotate("text", x= -4, y = 0.13, label = "Each point is an individual density\nestimate of dnorm(x, 0, 2.5)") +
  annotate("text", x = -.3, y = 0.02, label = "Filled area under the curve shows the cumulative probability\nof getting a number as high as a given x, in this case 4") +
  ggtitle("Density of a random normal variable with mean zero and standard deviation 2.5")

这篇关于dnorm 是如何工作的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆