ggplot2循环数据的密度 [英] ggplot2 density of circular data
问题描述
我有一个数据集,其中 x
代表一年中的某天(例如生日),我想创建一个密度图.此外,由于我有一些分组信息(例如男孩或女孩),因此我想使用 ggplot2
的功能来绘制密度图.
一开始很简单:
require(ggplot2);要求(dplyr)bdays<-data.frame(性别=样本(c('M','F'),100,替换= T),bday =样本(1:365,100,替换= T))bdays%>%ggplot(aes(x = bday))+ geom_density(aes(color = factor(gender)))
但是,由于边缘效应,这给出的估计值很差.我想应用一个事实,就是我可以使用圆坐标,这样365 + 1 = 1-12月31日之后的第一天就是1月1日.我知道 circular
包提供了此功能,但是使用 stat_function()
调用实现它并没有获得成功.对我来说,使用 ggplot2
特别有用,因为我希望能够使用构面, aes
调用等.
此外,为澄清起见,我想要看起来像 geom_density
的东西-我不是在寻找极坐标图,如以下所示:
I have a data set where x
represents day of year (say birthdays) and I want to create a density graph of this.
Further, since I have some grouping information (say boys or girls), I want to use the capabilities of ggplot2
to make a density plot.
Easy enough at first:
require(ggplot2); require(dplyr)
bdays <- data.frame(gender = sample(c('M', 'F'), 100, replace = T), bday = sample(1:365, 100, replace = T))
bdays %>% ggplot(aes(x = bday)) + geom_density(aes(color = factor(gender)))
However, this gives a poor estimate because of edge effects.
I want to apply the fact that I can use circular coordinates so that 365 + 1 = 1 -- one day after December 31st is January 1st.
I know that the circular
package provides this functionality, but I haven't had any success implementing it using a stat_function()
call.
It's particularly useful for me to use ggplot2
because I want to be able to use facets, aes
calls, etc.
Also, for clarification, I would like something that looks like geom_density
-- I am not looking for a polar plot like the one shown at: Circular density plot using ggplot2.
To remove the edge effects you could stack three copies of the data, create the density estimate, and then show the density only for the middle copy of data. That will guarantee "wrap around" continuity of the density function from one edge to the other.
Below is an example comparing your original plot with the new version. I've used the adjust
parameter to set the same bandwidth between the two plots. Note also that in the circularized version, you'll need to renormalize the densities if you want them to add to 1:
set.seed(105)
bdays <- data.frame(gender = sample(c('M', 'F'), 100, replace = T), bday = sample(1:365, 100, replace = T))
# Stack three copies of the data, with adjusted values of bday
bdays = bind_rows(bdays, bdays, bdays)
bdays$bday = bdays$bday + rep(c(0,365,365*2),each=100)
# Function to adjust bandwidth of density plot
# Source: http://stackoverflow.com/a/24986121/496488
bw = function(b,x) b/bw.nrd0(x)
# New "circularized" version of plot
bdays %>% ggplot(aes(x = bday)) +
geom_density(aes(color = factor(gender)), adjust=bw(10, bdays$bday[1:100])) +
coord_cartesian(xlim=c(365, 365+365+1), expand=0) +
scale_x_continuous(breaks=seq(366+89, 366+365, 90), labels=seq(366+89, 366+365, 90)-365) +
scale_y_continuous(limits=c(0,0.0016))
ggtitle("Circularized")
# Original plot
ggplot(bdays[1:100,], aes(x = bday)) +
geom_density(aes(color = factor(gender)), adjust=bw(30, bdays$bday[1:100])) +
scale_x_continuous(breaks=seq(90,360,90), expand=c(0,0)) +
ggtitle("Not Circularized")
这篇关于ggplot2循环数据的密度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!