绘制2x2x2时间序列的原始值和预测值 [英] Plot raw and predict values for 2x2x2 time-series

查看:110
本文介绍了绘制2x2x2时间序列的原始值和预测值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的数据示例

library(tidyr)
library(dplyr)
library(ggplot2)

resource <- c("good","good","bad","bad","good","good","bad","bad","good","good","bad","bad","good","good","bad","bad")

fertilizer <- c("none", "nitrogen","none","nitrogen","none", "nitrogen","none","nitrogen","none", "nitrogen","none","nitrogen","none", "nitrogen","none","nitrogen")

t0 <-  sample(1:20, 16)
t1 <-  sample(1:20, 16) 
t2 <-  sample(1:20, 16)
t3 <-  sample(1:20, 16)
t4 <-  sample(1:20, 16)
t5 <-  sample(1:20, 16)
t6 <-  sample(10:100, 16)
t7 <-  sample(10:100, 16)
t8 <-  sample(10:100, 16)
t9 <-  sample(10:100, 16)
t10 <-  sample(10:100, 16)

replicates <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)

data <- data.frame(resource, fertilizer,replicates, t0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10)

data$resource <- as.factor(data$resource)
data$fertilizer <- as.factor(data$fertilizer)

data.melt <- data %>% ungroup %>% gather(time, value, -replicates, -resource, -fertilizer)

data.melt$predict <- sample(1:200, 176)

在这里,有2个资源和肥料因素,因此实际上有4种处理方式,并且有4 x 4 = 16个重复.时间是10级的一个因素.我运行了一个模型,并在predict列中提供了预测值.

现在,对于每种类型的资源和肥料,我想在x轴上绘制一个时间序列,在x轴上绘制拟合值(预测)的平均值,在y轴上绘制原始值(值)的平均值( 4个处理)[即4个地块].我还想为每个时间点的藻类生长添加一个置信区间.这是我尝试的代码.

ggplot(df, aes(x=time, y=predicted)) + geom_point(size=3)+ stat_summary(geom = "point", fun.y = "mean") + facet_grid(resource + fertilizer ~.) 

使用这个简单的代码,我仍然只得到2个图形而不是4个图形.此外,没有绘制预测函数的平均值.我不知道如何将valuepredicted以及相应的置信区间绘制在一起.

如果任何人也可以在单一图上显示所有四种处理方法,以及我能将其介绍给其他方面(如上),这将是有帮助的

我提出的解决方案是创建第二个data.frame,其中包含所有汇总统计信息,例如平均预测值.我展示了使用dplyr包中的group_bysummarize做到这一点的一种方法.摘要数据需要具有与主数据匹配的列resourcefertilizertime.摘要数据还包含带有其他y值的列.

然后,需要分别将主数据和摘要数据提供给相应的ggplot函数,而不是在主ggplot()调用中提供. facet_grid可用于将数据分为四个图.

# Convert time to factor, specifying correct order of time points.
data.melt$time = factor(data.melt$time, levels=paste("t", seq(0, 10), sep=""))

# Create an auxilliary data.frame containing summary data.
# I've used standard deviation as place-holder for confidence intervals;
# I'll let you calculate those on your own.
summary_dat = data.melt %>%
              group_by(resource, fertilizer, time) %>%
              summarise(mean_predicted=mean(predict),
                        upper_ci=mean(predict) + sd(predict),
                        lower_ci=mean(predict) - sd(predict))

p = ggplot() + 
    theme_bw() +
    geom_errorbar(data=summary_dat, aes(x=time, ymax=upper_ci, ymin=lower_ci),
                  width=0.3, size=0.7, colour="tomato") + 
    geom_point(data=data.melt, aes(x=time, y=value),
               size=1.6, colour="grey20", alpha=0.5) +
    geom_point(data=summary_dat, aes(x=time, y=mean_predicted),
               size=3, shape=21, fill="tomato", colour="grey20") +
    facet_grid(resource ~ fertilizer)

ggsave("plot.png", plot=p, height=4, width=6.5, units="in", dpi=150)

This is the sample of my data

library(tidyr)
library(dplyr)
library(ggplot2)

resource <- c("good","good","bad","bad","good","good","bad","bad","good","good","bad","bad","good","good","bad","bad")

fertilizer <- c("none", "nitrogen","none","nitrogen","none", "nitrogen","none","nitrogen","none", "nitrogen","none","nitrogen","none", "nitrogen","none","nitrogen")

t0 <-  sample(1:20, 16)
t1 <-  sample(1:20, 16) 
t2 <-  sample(1:20, 16)
t3 <-  sample(1:20, 16)
t4 <-  sample(1:20, 16)
t5 <-  sample(1:20, 16)
t6 <-  sample(10:100, 16)
t7 <-  sample(10:100, 16)
t8 <-  sample(10:100, 16)
t9 <-  sample(10:100, 16)
t10 <-  sample(10:100, 16)

replicates <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16)

data <- data.frame(resource, fertilizer,replicates, t0,t1,t2,t3,t4,t5,t6,t7,t8,t9,t10)

data$resource <- as.factor(data$resource)
data$fertilizer <- as.factor(data$fertilizer)

data.melt <- data %>% ungroup %>% gather(time, value, -replicates, -resource, -fertilizer)

data.melt$predict <- sample(1:200, 176)

Where, there are 2 factors for resources and fertilizer, so there are effectively 4 treatments and 4 x 4 = 16 replicates. Time is a factor with 10 levels. I ran a model, and predicted values which is in the predict column.

Now I want to plot a time-series with time on the x-axis and mean of the fitted value (predict) on and the raw values (value) on the y-axis, for each type of resource and fertilizer (4 treatments) [That is 4 plots]. I also want to add a confidence interval for the algal growth at each time point. Here is my attempt at the code.

ggplot(df, aes(x=time, y=predicted)) + geom_point(size=3)+ stat_summary(geom = "point", fun.y = "mean") + facet_grid(resource + fertilizer ~.) 

With this simple code, I still get only 2 graphs and not 4. Also, the means of the predict function are not plotted. I don't know how to plot the value and predicted together, and the corresponding confidence intervals.

It would be helpful if anyone could also show how all four treatments can be on a single plot, and if I can get it to facet (like above)

解决方案

My proposed solution is to create a second data.frame containing all summary statistics such as mean predicted value. I show one way to do this with group_by and summarize from the dplyr package. The summary data needs to have columns resource, fertilizer and time that match the main data. The summary data also has columns with additional y values.

Then, the main data and the summary data need to be provided separately to the appropriate ggplot functions, but not in the main ggplot() call. facet_grid can be used to split the data into four plots.

# Convert time to factor, specifying correct order of time points.
data.melt$time = factor(data.melt$time, levels=paste("t", seq(0, 10), sep=""))

# Create an auxilliary data.frame containing summary data.
# I've used standard deviation as place-holder for confidence intervals;
# I'll let you calculate those on your own.
summary_dat = data.melt %>%
              group_by(resource, fertilizer, time) %>%
              summarise(mean_predicted=mean(predict),
                        upper_ci=mean(predict) + sd(predict),
                        lower_ci=mean(predict) - sd(predict))

p = ggplot() + 
    theme_bw() +
    geom_errorbar(data=summary_dat, aes(x=time, ymax=upper_ci, ymin=lower_ci),
                  width=0.3, size=0.7, colour="tomato") + 
    geom_point(data=data.melt, aes(x=time, y=value),
               size=1.6, colour="grey20", alpha=0.5) +
    geom_point(data=summary_dat, aes(x=time, y=mean_predicted),
               size=3, shape=21, fill="tomato", colour="grey20") +
    facet_grid(resource ~ fertilizer)

ggsave("plot.png", plot=p, height=4, width=6.5, units="in", dpi=150)

这篇关于绘制2x2x2时间序列的原始值和预测值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆