R中二维核密度估计的混淆 [英] confusion on 2 dimension kernel density estimation in R

查看:39
本文介绍了R中二维核密度估计的混淆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

核密度估计器用于估计特定的概率密度函数(参见

二维核密度怎么样?

# library(MASS)b <- log10(rgamma(1000, 6, 3))a <- log10((rweibull(1000, 8, 2)))# a 和 b 各包含 1000 个值.密度 <- kde2d(a,b,n=100)

该函数创建一个从 min(a)max(a) 和从 min(b)max 的网格(b).kde2d 现在不是在 ab 中的每个值上拟合一个微小的 1D 法线密度,而是在每个点上拟合一个微小的 2D 法线密度.网格.就像在一维情况下核密度一样,它然后将所有密度值相加.

颜色是什么意思?正如@cel 在评论中指出的那样:估计概率取决于两个变量,所以我们现在有三个轴(abestimated概率).可视化 3 个轴的一种方法是使用等概率轮廓.这听起来很花哨,但与我们从天气预报中知道的高低压图像基本相同.

您正在使用

filled.contour(密度,color.palette = colorRampPalette(c('white', 'blue', 'yellow', 'red', 'darkred')))))

所以从低到高,情节将被着色 white, blue, yellow, red 和最终 darkred 为估计概率的最高值.这导致以下图:

A kernel density estimator is used to estimate a particular probability density function (see mvstat.net and sckit-learn docs for references)

My confusion is about what exactly does kde2d() do? Does it estimate the joint distribution probability density function of two random variables f(a,b) in the below example? And what does the color mean?

Here is the code example I am referring to.

b <- log10(rgamma(1000, 6, 3))
a <- log10((rweibull(1000, 8, 2)))
density <- kde2d(a, b, n=100)

colour_flow <- colorRampPalette(c('white', 'blue', 'yellow', 'red', 'darkred'))
filled.contour(density, color.palette=colour_flow)

解决方案

What is a kernel density estimator? Essentially it fits a little normal density curve over every point (the center of the normal density being that point) of the data and then adds up all little normal densities to a kernel density estimator.

For the sake of illustration I will add an image of a 1 dimensional kernel density estimator from

What about 2 dimensional kernel densities?

# library(MASS)
b <- log10(rgamma(1000, 6, 3))
a <- log10((rweibull(1000, 8, 2)))
# a and b contain 1000 values each. 

density <- kde2d(a,b,n=100) 

The function creates a grid from min(a) to max(a) and from min(b) to max(b). Instead of fitting a tiny 1D normal density over every value in a or b, kde2d now fits a tiny 2D normal density over every point in the grid. Just like in the 1 dimensional case kernel density, it then adds up all density values.

What do the colours mean? As @cel pointed out in the comments: the estimated probability depends on two variables, so we have three axes now (a, b and estimated probability). One way to visualize 3 axes is by using iso-probability contours. This sounds fancy, but it is basically the same as the high/low pressure images we know from the weather forecast.

You are using

filled.contour(density, 
    color.palette = colorRampPalette(c('white', 'blue', 'yellow', 'red', 'darkred')))))

So from low to high, the plot will be coloured white, blue, yellow, red and eventually darkred for the highest values of estimated probability. This results in the following plot:

这篇关于R中二维核密度估计的混淆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆