在ggplot2中绘制运行平均值 [英] plot running average in ggplot2

查看:256
本文介绍了在ggplot2中绘制运行平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望创建一个图表,显示观察数据的散点图上的运行平均值。数据包括随着时间的推移对兔子外套颜色(颜色)的观察(朱利安)。

 颜色朱利安
50 85
50 87
50 89
50 90
100 91
50 91
50 92
50 92
100 92
50 93
100 93
50 93
50 95
100 95
50 95
50 96
50 96
50 99
50 100
0 101
0 101
0 103
50 103
50 104
50 104
50 104
50 104
100 104
100 104
50 109
50 109
100 109
0 110
0 110
50 110
50 110
50 110
50 110
0 112

一位朋友为我写了一个函数,颜色观察值的平均值,但我无法弄清楚如何将这一行(haresAveNoNa)添加到图中。

函数:

  haresAverage<  - 矩阵(NA, max(hares $ Julian),3)
for(i in 4:max(hares $ Julian)){
haresAverage [i,1]< -i
haresAverage [i,2 ]< -mean(hares $ Color [hares $ Julian> =(i-3)&
hares $ Julian< =(i + 3)]
,na.rm = T)
haresAverage [i,3] <-sd(hares $ Color [hares $ Julian> =(i-3)&
hares $ Julian< =(i + 3)]

,na.rm = T)
}
haresAveNoNa < - na.omit(haresAverage)


$ b $ p



  p < -  ggplot(hares,aes(Julian,颜色))
p +
geom_jitter(width = 1,height = 5,color =blue,alpha = .65)

你能否帮我添加运行平均值'haresAveNoNa'到情节中?
非常感谢!

解决方案

您可以使用 rollmean zoo 包的c $ c>,而不是编写自己的函数。您可以在ggplot内快速调用 rollmean 来添加滚动平均线,或者您可以将滚动平均值添加到数据框中,然后绘制它们。我为这两种方法提供了以下示例。下面的代码计算一个具有七天窗口的居中滚动平均值,但是您可以为不同的窗口大小和左对齐或右对齐滚动平均值自定义函数,而不是居中。



ggplot



 库(动物园)

ggplot(hares,aes(Julian,Color))+
geom_point(position = position_jitter(1,3),pch = 21,fill =#FF0000AA )+
geom_line(aes(y = rollmean(Color,7,na.pad = TRUE)))+
theme_bw()



将滚动平均值添加到您的数据框中作为新列然后绘制它



要回答您的具体问题,假设您确实需要从单独的数据中添加滚动平均线,而不是实时计算。如果滚动平均值是数据框中的另一列,则只需将新列名称设置为 geom_line

  hares $ roll7 = rollmean(hares $ Color,7,na.pad = TRUE)

ggplot(hares,aes(Julian,Color))+
geom_point(position = position_jitter(1,3),pch = 21,fill =#FF0000AA)+
geom_line(aes(y = roll7))+
theme_bw()



使用单独的数据框将滚动平均值添加到图中

如果滚动均值位于单独的数据框中,则需要将该数据帧馈入 geom_line

  haresAverage = data.frame(Julian = hares $ Julian,
Color = rollmean(hares $ Color,7,na.pad = TRUE))

ggplot(hares,aes(Julian,Color))+
geom_point(position = position_jitter(1,3),pch = 21,fill =#FF0000AA)+
geom_line(data = haresAverage, aes(Julian,Color))+
theme_bw()



UPDATE:To sh第一次转换 Julian $ c>日期格式。我不知道数据中实际从 Julian 映射的映射,所以对于这个例子,假设 Julian 是一年中的一天,将今年的第一天计为1,假设年份是2015年。

  hares $ Date = as.Date(hares $ Julian + as.numeric(as.Date(2015-01-01)) -  1)

现在我们使用新的 Date 列来绘制x轴。要定制休息次数和日期标签,请使用 scale_x_date

  ggplot(hares,aes(Date,Color))+ 
geom_point(position = position_jitter(1,3),pch = 21,fill =#FF0000AA)+
geom_line(aes(y = rollmean(Color,7,na.pad = TRUE)))+
theme_bw()+
scale_x_date(date_breaks =weeks,date_labels =%b%e)


I'm hoping to create a plot that shows a running average over a scatterplot of the observed data. The data consists of observations of hares' coat color (Color) over time (Julian).

Color  Julian
50  85
50  87
50  89
50  90
100 91
50  91
50  92
50  92
100 92
50  93
100 93
50  93
50  95
100 95
50  95
50  96
50  96
50  99
50  100
0   101
0   101
0   103
50  103
50  104
50  104
50  104
50  104
100 104
100 104
50  109
50  109
100 109
0   110
0   110
50  110
50  110
50  110
50  110
0   112

A friend wrote a function for me that calculates a running average of the color observations, but I can't figure out how to add the line (haresAveNoNa) into the plot.

The function:

haresAverage <- matrix( NA, max(hares$Julian), 3 )
for( i in 4:max(hares$Julian) ){
  haresAverage[i,1]<-i
  haresAverage[i,2]<-mean( hares$Color[ hares$Julian >= (i-3) &
                                             hares$Julian <= (i+3)]
                              , na.rm=T )
  haresAverage[i,3]<-sd( hares$Color[ hares$Julian >= (i-3) &
                                           hares$Julian <= (i+3)]

                            , na.rm=T )
}
haresAveNoNa <- na.omit( haresAverage)

The plot:

p <- ggplot(hares, aes(Julian, Color))
p  +
  geom_jitter(width = 1, height = 5, color="blue", alpha = .65) 

Can you please help me add the running average 'haresAveNoNa' into the plot? Thanks very much!

解决方案

You can calculate the rolling mean using rollmean from the zoo package instead of writing your own function. You can invoke rollmean on the fly, within ggplot, to add the rolling mean line, or you can add the rolling mean values to your data frame and then plot them. I provide examples below for both methods. The code below calculates a centered rolling mean with a seven-day window, but you can customize the function for different window sizes and for a left- or right-aligned rolling mean, rather than centered.

Calculate rolling mean on the fly within ggplot

library(zoo)

ggplot(hares, aes(Julian, Color)) + 
  geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
  geom_line(aes(y=rollmean(Color, 7, na.pad=TRUE))) +
  theme_bw()

Add rolling mean to your data frame as a new column and then plot it

To answer your specific question, let's say you actually do need to add the rolling mean line from separate data, rather than calculate it on the fly. If the rolling mean is another column in your data frame, you just need to give the new column name to geom_line:

hares$roll7 = rollmean(hares$Color, 7, na.pad=TRUE)

ggplot(hares, aes(Julian, Color)) + 
  geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
  geom_line(aes(y=roll7)) +
  theme_bw()

Add rolling mean to a plot using a separate data frame

If the rolling mean is in a separate data frame, you need to feed that data frame to geom_line:

haresAverage = data.frame(Julian=hares$Julian, 
                          Color=rollmean(hares$Color, 7, na.pad=TRUE))

ggplot(hares, aes(Julian, Color)) + 
  geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
  geom_line(data=haresAverage, aes(Julian, Color)) +
  theme_bw()

UPDATE: To show date instead of the numeric Julian value

First, convert Julian to Date format. I don't know the actual mapping from Julian to date in your data, so for this example let's assume that Julian is the day of the year, counting the first day of the year as 1, and let's assume the year is 2015.

hares$Date = as.Date(hares$Julian + as.numeric(as.Date("2015-01-01")) - 1)

Now we plot using our new Date column for the x-axis. To customize both the number of breaks and the date labels, use scale_x_date.

ggplot(hares, aes(Date, Color)) + 
  geom_point(position=position_jitter(1,3), pch=21, fill="#FF0000AA") +
  geom_line(aes(y=rollmean(Color, 7, na.pad=TRUE))) +
  theme_bw() +
  scale_x_date(date_breaks="weeks", date_labels="%b %e")

这篇关于在ggplot2中绘制运行平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆