使用ggplot绘制时间序列中多个变量的平均值 [英] Plotting average of multiple variables in time-series using ggplot

查看:222
本文介绍了使用ggplot绘制时间序列中多个变量的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个文件,其中包含从a到k的多个变量的时间序列数据。



我想创建一个图表,绘制变量的平均值a到k随着时间的推移以及在该平均线之上和之下添加代表每天最大和最小变化的平滑区域。



以下是数据集:

$ b

a href =https://dl.dropbox.com/u/22681355/co.csv> https://dl.dropbox.com/u/22681355/co.csv



以下是我到目前为止的代码:

  library(ggplot2)
library(reshape2)
meltdf< - melt(df,id =Year)
ggplot(meltdf,aes(x = Year,y = value,color = variable,group = variable))+ geom_line()


解决方案



  ggplot(meltdf,aes(x = Year,y = value,color = variable,group = variable))+ 
stat_summary(fun.data =mean_cl_boot,geom =smooth)



这描述了所有值的平均值所有变量+ -1SD:

  ggplot(meltdf,a es(x = Year,y = value))+ 
stat_summary(fun.data =mean_sdl,mult = 1,geom =smooth)



您可能需要计算年份,然后再计算变量的平均值和标准差,但我会将其留给您。

不过,我相信一个推测可信区间会更明智,因为分布显然不对称。它也会变窄。 ;)



当然,您可以对数值进行对数转换。


I have a file which contains time-series data for multiple variables from a to k.

I would like to create a graph that plots the average of the variables a to k over time and above and below that average line adds a smoothed area representing maximum and minimum variation on each day.

So something like confidence intervals but in a smoothed version.

Here's the dataset: https://dl.dropbox.com/u/22681355/co.csv

and here's the code I have so far:

library(ggplot2)
library(reshape2)
meltdf <- melt(df,id="Year")
ggplot(meltdf,aes(x=Year,y=value,colour=variable,group=variable)) + geom_line()

解决方案

This depicts bootstrapped 95 % confidence intervals:

ggplot(meltdf,aes(x=Year,y=value,colour=variable,group=variable)) +
  stat_summary(fun.data = "mean_cl_boot", geom = "smooth")

This depicts the mean of all values of all variables +-1SD:

ggplot(meltdf,aes(x=Year,y=value)) +
  stat_summary(fun.data ="mean_sdl", mult=1, geom = "smooth")

You might want to calculate the year means before calculating the means and SD over the variables, but I leave that to you.

However, I believe a boostrap confidence interval would be more sensible, since the distribution is clearly not symmetric. It would also be narrower. ;)

And of course you could log-transform your values.

这篇关于使用ggplot绘制时间序列中多个变量的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆