ggplot2 - 修改geom_density2d以接受权重作为参数? [英] ggplot2 - Modify geom_density2d to accept weights as a parameter?

查看:564
本文介绍了ggplot2 - 修改geom_density2d以接受权重作为参数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我第一次发布到R社区,所以如果它很傻,请原谅我。我想使用ggplot2中的函数geom_density2d和stat_density2d来绘制内核密度估计值,但问题是他们无法处理加权数据。据我所知,这两个函数从MASS包中调用函数kde2d来进行内核密度估计。而且kde2d不会将数据权重作为参数。



现在,我发现kde2d http://www.inside-r.org/node/226757 ,它以权重作为参数并基于kde2d的源代码。这个函数的代码:

  kde2d.weighted < -  function(x,y,w,h,n = 25, lims = c(range(x),range(y))){
nx < - length(x)
if(length(y)!= nx)
stop(data (长度(w)!= nx& length(w)!= 1)
stop(weight vectors must be 1 or length of data)$ (ses(lims [1],lims [2],length = n)#gridpoints x
gy <-seq(lims [3],lims [4] #gridpoints y
if(missing(h))
h < - c(bandwidth.nrd(x),bandwidth.nrd(y));
if(missing(w))
w < - numeric(nx)+1;
h < - h / 4
ax< - outer(gx,x, - )/ h [1]#x方向上每个点到每个网格点的距离
ay < - outer(gy,y, - )/ h [2]#每个点在y方向上到每个网格点的距离
z < - (matrix(rep(w,n),nrow = n,ncol = nx,byrow = TRUE)*矩阵(dnorm(ax),n,nx))%*%t(矩阵(dnorm(ay),n,nx))/(sum(w)* h [ 1] * h [2])#z是密度
return(list(x = gx,y = gy,z = z))
}

我想使函数geom_density2d和stat_density2d调用kd2d.weighted而不是kde2d,并且使它们接受加权数据。



我从来没有改变现有R包中的任何函数,所以我的问题是做这件事最简单的方法是什么?

解决方案

实际上,您可以将自己的密度数据传递给 geom_contour ,这可能是最简单的。我们从样本数据集开始,给间歇数据添加权重。

  library(MASS)
data(间歇性,MASS)
geyserw< - 变换(geyser,
称重=样本(1:5,nrow(geyser),replace = T)

现在我们使用您的加权函数来计算密度并将其转化为data.frame

  dens<  -  kde2d.weighted(geyserw $ duration,geyserw $ waiting,geyserw $ weight)
dfdens< - data.frame(expand.grid (x = dens $ x,y = dens $ y),z = as.vector(dens $ z))

现在我们绘制数据

$ g $ p $ g $ p $ ggplot(geyserw,aes(x = duration,y = waiting))+
geom_point()+ xlim(0.5,6)+ ylim(40,110)
geom_contour(aes(x = x,y = y,z = z),data = dfdens)

而且应该这样做


This is my first post to the R-community, so pardon me if it is silly. I would like to use the functions geom_density2d and stat_density2d in ggplot2 to plot kernel density estimates, but the problem is that they can't handle weighted data. From what I understand, these two functions call the function kde2d from package MASS to make the kernel density estimate. And the kde2d doesn't take data weights as a parameter.

Now, I have found this altered version of kde2d http://www.inside-r.org/node/226757, which takes weights as a parameter and is based on the source code of kde2d. The code of this function:

  kde2d.weighted <- function (x, y, w, h, n = 25, lims = c(range(x), range(y))) {
  nx <- length(x)
  if (length(y) != nx) 
    stop("data vectors must be the same length")
  if (length(w) != nx & length(w) != 1)
    stop("weight vectors must be 1 or length of data")
  gx <- seq(lims[1], lims[2], length = n) # gridpoints x
  gy <- seq(lims[3], lims[4], length = n) # gridpoints y
  if (missing(h)) 
    h <- c(bandwidth.nrd(x), bandwidth.nrd(y));
  if (missing(w)) 
    w <- numeric(nx)+1;
  h <- h/4
  ax <- outer(gx, x, "-")/h[1] # distance of each point to each grid point in x-direction
  ay <- outer(gy, y, "-")/h[2] # distance of each point to each grid point in y-direction
  z <- (matrix(rep(w,n), nrow=n, ncol=nx, byrow=TRUE)*matrix(dnorm(ax), n, nx)) %*% t(matrix(dnorm(ay), n, nx))/(sum(w) * h[1] * h[2]) # z is the density
  return(list(x = gx, y = gy, z = z))
}

I would like to make the functions geom_density2d and stat_density2d call kd2d.weighted instead of kde2d, and by that making them accept weighted data.

I have never changed any functions in existing R packages so my question is what is the easiest way doing this?

解决方案

You can actually pass your own density data to geom_contour which would probably be the easiest. Let's start with a sample dataset by adding weights to the geyser data.

library("MASS")
data(geyser, "MASS")
geyserw <- transform(geyser,
   weigh = sample(1:5, nrow(geyser), replace=T)
)

Now we use your weighted function to calculate the density and turn it into a data.frame

dens <- kde2d.weighted(geyserw$duration, geyserw$waiting, geyserw$weight)
dfdens <- data.frame(expand.grid(x=dens$x, y=dens$y), z=as.vector(dens$z))

Now we plot the data

ggplot(geyserw, aes(x = duration, y = waiting)) +
    geom_point() + xlim(0.5, 6) + ylim(40, 110)
    geom_contour(aes(x=x, y=y, z=z), data= dfdens)

And that should do it

这篇关于ggplot2 - 修改geom_density2d以接受权重作为参数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆