ggplot2循环数据的密度 [英] ggplot2 density of circular data

查看:56
本文介绍了ggplot2循环数据的密度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中 x 代表一年中的某天(例如生日),我想创建一个密度图.此外,由于我有一些分组信息(例如男孩或女孩),因此我想使用 ggplot2 的功能来绘制密度图.

一开始很简单:

  require(ggplot2);要求(dplyr)bdays<-data.frame(性别=样本(c('M','F'),100,替换= T),bday =样本(1:365,100,替换= T))bdays%>%ggplot(aes(x = bday))+ geom_density(aes(color = factor(gender))) 

但是,由于边缘效应,这给出的估计值很差.我想应用一个事实,就是我可以使用圆坐标,这样365 + 1 = 1-12月31日之后的第一天就是1月1日.我知道 circular 包提供了此功能,但是使用 stat_function()调用实现它并没有获得成功.对我来说,使用 ggplot2 特别有用,因为我希望能够使用构面, aes 调用等.

此外,为澄清起见,我想要看起来像 geom_density 的东西-我不是在寻找极坐标图,如以下所示:

I have a data set where x represents day of year (say birthdays) and I want to create a density graph of this. Further, since I have some grouping information (say boys or girls), I want to use the capabilities of ggplot2 to make a density plot.

Easy enough at first:

require(ggplot2); require(dplyr)
bdays <- data.frame(gender = sample(c('M', 'F'), 100, replace = T), bday = sample(1:365, 100, replace = T))
bdays %>% ggplot(aes(x = bday)) + geom_density(aes(color = factor(gender)))

However, this gives a poor estimate because of edge effects. I want to apply the fact that I can use circular coordinates so that 365 + 1 = 1 -- one day after December 31st is January 1st. I know that the circular package provides this functionality, but I haven't had any success implementing it using a stat_function() call. It's particularly useful for me to use ggplot2 because I want to be able to use facets, aes calls, etc.

Also, for clarification, I would like something that looks like geom_density -- I am not looking for a polar plot like the one shown at: Circular density plot using ggplot2.

解决方案

To remove the edge effects you could stack three copies of the data, create the density estimate, and then show the density only for the middle copy of data. That will guarantee "wrap around" continuity of the density function from one edge to the other.

Below is an example comparing your original plot with the new version. I've used the adjust parameter to set the same bandwidth between the two plots. Note also that in the circularized version, you'll need to renormalize the densities if you want them to add to 1:

set.seed(105)
bdays <- data.frame(gender = sample(c('M', 'F'), 100, replace = T), bday = sample(1:365, 100, replace = T))

# Stack three copies of the data, with adjusted values of bday
bdays = bind_rows(bdays, bdays, bdays)
bdays$bday = bdays$bday + rep(c(0,365,365*2),each=100)

# Function to adjust bandwidth of density plot
# Source: http://stackoverflow.com/a/24986121/496488
bw = function(b,x) b/bw.nrd0(x)

# New "circularized" version of plot
bdays %>% ggplot(aes(x = bday)) + 
  geom_density(aes(color = factor(gender)), adjust=bw(10, bdays$bday[1:100])) +
  coord_cartesian(xlim=c(365, 365+365+1), expand=0) +
  scale_x_continuous(breaks=seq(366+89, 366+365, 90), labels=seq(366+89, 366+365, 90)-365) +
  scale_y_continuous(limits=c(0,0.0016))
  ggtitle("Circularized")

# Original plot
ggplot(bdays[1:100,], aes(x = bday)) + 
  geom_density(aes(color = factor(gender)), adjust=bw(30, bdays$bday[1:100])) +
  scale_x_continuous(breaks=seq(90,360,90), expand=c(0,0)) +
  ggtitle("Not Circularized")

这篇关于ggplot2循环数据的密度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆