R中二维核密度估计的混淆 [英] confusion on 2 dimension kernel density estimation in R
问题描述
核密度估计器用于估计特定的概率密度函数(参见
二维核密度怎么样?
# library(MASS)b <- log10(rgamma(1000, 6, 3))a <- log10((rweibull(1000, 8, 2)))# a 和 b 各包含 1000 个值.密度 <- kde2d(a,b,n=100)
该函数创建一个从 min(a)
到 max(a)
和从 min(b)
到 max 的网格(b)
.kde2d
现在不是在 a
或 b
中的每个值上拟合一个微小的 1D 法线密度,而是在每个点上拟合一个微小的 2D 法线密度.网格.就像在一维情况下核密度一样,它然后将所有密度值相加.
颜色是什么意思?正如@cel 在评论中指出的那样:估计概率取决于两个变量,所以我们现在有三个轴(a
、b
和 estimated概率
).可视化 3 个轴的一种方法是使用等概率轮廓.这听起来很花哨,但与我们从天气预报中知道的高低压图像基本相同.
您正在使用
filled.contour(密度,color.palette = colorRampPalette(c('white', 'blue', 'yellow', 'red', 'darkred')))))
所以从低到高,情节将被着色 white
, blue
, yellow
, red
和最终 darkred
为估计概率的最高值.这导致以下图:
A kernel density estimator is used to estimate a particular probability density function (see mvstat.net and sckit-learn docs for references)
My confusion is about what exactly does kde2d()
do? Does it estimate the joint distribution probability density function of two random variables f(a,b) in the below example? And what does the color mean?
Here is the code example I am referring to.
b <- log10(rgamma(1000, 6, 3))
a <- log10((rweibull(1000, 8, 2)))
density <- kde2d(a, b, n=100)
colour_flow <- colorRampPalette(c('white', 'blue', 'yellow', 'red', 'darkred'))
filled.contour(density, color.palette=colour_flow)
What is a kernel density estimator? Essentially it fits a little normal density curve over every point (the center of the normal density being that point) of the data and then adds up all little normal densities to a kernel density estimator.
For the sake of illustration I will add an image of a 1 dimensional kernel density estimator from
What about 2 dimensional kernel densities?
# library(MASS)
b <- log10(rgamma(1000, 6, 3))
a <- log10((rweibull(1000, 8, 2)))
# a and b contain 1000 values each.
density <- kde2d(a,b,n=100)
The function creates a grid from min(a)
to max(a)
and from min(b)
to max(b)
. Instead of fitting a tiny 1D normal density over every value in a
or b
, kde2d
now fits a tiny 2D normal density over every point in the grid. Just like in the 1 dimensional case kernel density, it then adds up all density values.
What do the colours mean?
As @cel pointed out in the comments: the estimated probability depends on two variables, so we have three axes now (a
, b
and estimated probability
). One way to visualize 3 axes is by using iso-probability contours. This sounds fancy, but it is basically the same as the high/low pressure images we know from the weather forecast.
You are using
filled.contour(density,
color.palette = colorRampPalette(c('white', 'blue', 'yellow', 'red', 'darkred')))))
So from low to high, the plot will be coloured white
, blue
, yellow
, red
and eventually darkred
for the highest values of estimated probability. This results in the following plot:
这篇关于R中二维核密度估计的混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!