在for循环中添加geom_ * [英] adding geom_* in a for loop

查看:119
本文介绍了在for循环中添加geom_ *的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将真实世界的数据与一张图中的模拟数据进行比较.该代码应接受任意数量的线条进行绘制.我想到了这个:

I want to compare real world data with simulated data within one graph. The code should accept any number of lines to plot. I came up with this:

simulationRuns <- 5 #Variable to be changed depending on how many simulations were made

plotLoop <- ggplot() + 
  geom_line(data = relWorldData, 
            mapping = aes(x = DateTime, y = VALUE, color = "realWorldData"))

for (i in 1:simulationRuns){
    plotLoop <- plotLoop +
      geom_line(data = listOfSimResults[[i]], 
                mapping = aes(x = DateTime, y = VALUE, color = paste0("simRun-", i)))
  }

figureLoop <- ggplotly(plotLoop)

问题是,所有行都显示为simRun-5,因此不是独立的-

The problem is, that all lines are displayed as simRun-5 and therefore not independent -

我是R的新手,所以请留意;) 预先感谢,帕特里克

I am new to R so please have mercy ;) Thanks in advance, Patrick

后续问题b.代码在评论中很糟糕:

FollowUp Question bc. code is terrible to read in a comment:

我阅读了Lapply并重写了代码:

I read up on Lapply and rewrote the code to this:

plotLoop <- ggplot() + geom_line(data = relWorldData, mapping = aes(x = DateTime, y = VALUE, color = "RealWorldData"))

  addGeomLine <- function (i, obj){
    obj <- obj +
      geom_line(data = listOfSimResults[[i]], mapping = aes(x = DateTime, y = VALUE, color = paste0("simRun-", i)))
  }
  lapply(1:runs, addGeomLine, plotLoop)

  figureLoop <- ggplotly(plotLoop)

这次,仅显示RealWorldData,但不显示任何模拟.你能告诉我我在想什么吗?

This time, only the RealWorldData is displayed, but none of the Simulations. Could you tell me what I am missing?

推荐答案

欢迎使用!

您遇到了一个细微的问题,使很多人比自己拥有更多的经验困惑.问题是ggplot2懒惰地评估 .简而言之,这意味着它做笔记".当您告诉自己想要的东西时,它需要做什么,但实际上直到最后一刻才做任何事情.

You've run into a subtle problem that confuses a lot of people with far more experience than yourself. The problem is that ggplot2 evaluates lazily. Put simply, that means that it "makes a note" of what it needs to do when you tell it what you want, but doesn't actually do anything until the last possible moment.

在这里,您告诉ggplot您要在for循环中添加geom. ggplot记录geom的定义,但不对其进行评估. 在最后一刻"是当您呼叫ggplotly时.现在ggplot意识到需要做一些工作.对于每个geom,它注意到它需要知道i的值.因此,它会查找并找到值5.因此,您的问题.

Here, you tell ggplot that you want to add a geom in your for loop. ggplot makes a note of the geom's definition, but doesn't evaluate it. "At the last moment" is when you call ggplotly. Now ggplot realises it's got some work to do. For each geom, it notices that it needs to know the value of i. So it looks it up and finds the value 5. Hence your problem.

有几种解决方法.对于您的代码,我的首选方法是将for循环替换为lapply.与for循环不同,lapply在执行时强制评估变量.

There are several ways to solve this. With your code, my preferred option is to replace the for loop with an lapply. Unlike a for loop, lapply forces evaluation of variables at the time of execution.

我相信您也可以保留for循环并将每个对i的引用包装在force()中,尽管我没有亲自尝试过.

I believe you could also keep the for loop and wrap each reference to i in force(), though I've not personally tried that.

从长远来看,最好的方法是使您的工作流

The best approach in the long run, in my opinion, would be to make your workflow tidy and avoid the need for either the for loop or lapply altogether. This will also give you the benefits of more compact, robust and readable code that will almost certainly run faster. [I did some work the other day that converted a loop similar to yours to a tidy solution and the run time was reduced from nearly 40 seconds to under 2.]

另外,请阅读这篇文章,以获取有关如何创建最小工作示例的建议.提供MWE将最大程度地提高您获得有用答案的机会.

Also, please read this post for advice on how to create a minimum working example. Providing MWEs will maximise your chances of getting a useful answer.

更新

在我对使用整洁数据方法的优势的评论中扩展...

To expand on my comment about the advantages of using a tidy data approach...

首先合成一些数据,因为您没有提供任何数据.我将尝试匹配您的数据结构,而不是您的值.与您的数据集唯一的区别是,我添加了一个ID变量来标识每个观测值来自的模拟运行/真实世界数据集.

First synthesize some data as you haven't provided any. I'll try to match the structure of your data, but not your values. The only difference to your datasets is that I've added an ID variable to identify the simulation run/real world dataset that each observation comes from.

library(lubridate)
library(tidyverse)

inVivoBG <- tibble(
              ID="Real-world data",
              DateTime2=seq(as_date("2006-03-01"), as_date("2015-03-01"), "3 months"),
              VALUE=100 + rnorm(37, mean=150, sd=20)
            ) 

listOfSimResults <- lapply(
                      1:5, 
                      function(x) {
                        tibble(
                          ID=paste0("simRun-", x),
                          DateTime2=seq(as_date("2006-03-01"), as_date("2015-03-01"), "3 months"),
                          VALUE=100 + rnorm(37, mean=150, sd=20)
                        )
                      }
                    )

现在将各种数据帧组合为一个.

Now combine the various data frames into a single one.

data <- bind_rows(inVivoBG, listOfSimResults)

在这一点上,您的图的构建是单线调用.

At this point, the construction of your plot is a single line call.

data %>% 
  ggplot() + 
    geom_line(mapping = aes(x = DateTime2, y = VALUE, color = ID)) 

给予

此方法避免了需要自定义功能或lapply.关于所需的行数及其标签,它也很可靠.就个人而言,我也认为它更容易理解.

This approach avoids the need for a custom function or the need for lapply. It is also robust with respect to the number of lines required and their labels. Personally, I also think it's far easier to understand.

这篇关于在for循环中添加geom_ *的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆