"对于"循环只添加最后的ggplot图层 [英] "for" loop only adds the final ggplot layer

查看:234
本文介绍了"对于"循环只添加最后的ggplot图层的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

总结:当我使用for循环将图层添加到小提琴图(在ggplot中)时,唯一添加的图层是由最终循环迭代创建的图层。然而,在模仿循环产生的代码的明确代码中,所有的图层都被添加了。



详细信息:我正在尝试创建具有重叠图层的小提琴图形,以显示估计分布在多个调查问题答复中是否重叠的程度按地点分层。我希望能够包含任意数量的位置,所以我在每个位置都有一个数据框的列,并试图使用for循环为每个位置生成一个ggplot图层。但是循环只添加了循环最后一次迭代中的图层。



这段代码说明了这个问题,并提出了一些失败的建议:

  library(ggplot2)

#创建一个500个随机正常值的数据框,用于回答来自两个城市的3个调查问题
topic < - c(贫穷%,平均年龄,%吸烟者)
地点<-c(芝加哥,迈阿密)
n < - 500
(35,40,58,50,25,20)
var c(7,1.5,3,.25,.5,1)
df< data.frame(topic = rep(topic,rep(n,length(topic)))
,c(rnorm(n,mean [1],var [1]),rnorm ,var [3]),rnorm(n,mean [5],var [5]))
,c(rnorm(n,mean [2],var [2]), 4],var [4]),rnorm(n,mean [6],var [6]))

names(df)[2:dim(df)[2]]< - 地点#用对应的地名命名最后两列。
head(df)

#这个for循环似乎只执行最后一个循环(即p = 3)
g < - ggplot(df,aes (p 2:dim(df)[2]){
g< -g + geom_violin(aes(y = df [,p ],color = place [p-1]),alpha = 0.3)
}
g

#但是模仿for循环在显式代码中的工作正常,图中显示地点。
g < - ggplot(df,aes(factor(topic),df [,2]))
g < - g + geom_violin(aes(y = df [,2],color = (aes(y = df [,3],color = place [3-1]),alpha = 0.3)b $ bg
g < - g + geom_violin b
$ b ##我试过
g< - ggplot(df,aes(factor (主题),df [,2]))
for(p in 2:dim(df)[2]){
df1 < - df [,c(1,p)]
g < - g + geom_violin(aes(y = df1 [,2],color = place [p-1]),alpha = 0.3)
}
g
#相同的不良结果

#每个http://stackoverflow.com/questions/15987367/how-to-add-layers-in-ggplot-using-a-for-loop,我试过$ b $ (p)(b)(b) ggplot(df,aes(factor(topic),df [,2]))
for(p in name(df)[ - 1]){
cat \\ n)
g < - g + geom_violin(aes_string(y = p,color = p),alpha = 0.3)#产生这个错误:错误(tic_pos.c,mm):'x '和'单位'必须有长度> 0
#g < - g + geom_violin(aes_string(y = p),alpha = 0.3)#产生这个错误:错误:stat_ydensity需要下面缺少的美学:y

g $


这是因为 ggplot 的懒惰评估。当 ggplot 被这样使用时,这是一个常见的问题(使得图层分别在一个循环中,而不是使用 ggplot 在@ hrbrmstr的解决方案中)。

$ p $ g $ ggplot 将参数存储到 aes(...)作为表达式,并且只在绘制图表时对它们进行求值。所以,在你的循环中,像

  aes(y = df [,p],color = place [p-1] )

将按原样保存,并在循环完成后渲染绘图时进行评估。在这一点上,p = 3所以所有的情节都呈现与p = 3。所以正确的方法是使用 fusion(...) reshape2 包中,将数据从宽格式转换为长格式,并让 ggplot 为你管理图层。我把正确的放在引号中,因为在这个特定的情况下,有一个微妙之处。当使用融化数据框计算小提琴的分布时, ggplot 使用总计(芝加哥和迈阿密)作为比例。如果你想要基于单独的频率缩放的小提琴,你需要使用循环(悲伤)。

围绕懒惰评估问题的方法是把任何参考循环索引在 data = ... 定义中。这是不是 存储为一个表达式,实际的数据存储在绘图定义。所以你可以这样做:

  g < -  ggplot(df,aes(x = topic))
for (p in 2:length(df)){
gg.data< - data.frame(topic = df $ topic,value = df [,p],city = names(df)[p])
g < - g + geom_violin(data = gg.data,aes(y = value,color = city))
}
g



给出和你一样的结果。请注意,索引 p 不会显示在 aes(...)中。






更新:关于 scale =width(注释中提到)的注释。这将导致所有的小提琴具有相同的宽度(见下文),与OP的原始代码不同。海事组织这不是一个好的数据可视化的方法,因为它表明芝加哥组的数据更多。

  ggplot (gg)+ geom_violin(aes(x = topic,y = value,color = variable),
alpha = 0.3,position =identity,scale =width)


Summary: When I use a "for" loop to add layers to a violin plot (in ggplot), the only layer that is added is the one created by the final loop iteration. Yet in explicit code that mimics the code that the loop would produce, all the layers are added.

Details: I am trying to create violin graphs with overlapping layers, to show the extent that estimate distributions do or do not overlap for several survey question responses, stratified by place. I want to be able to include any number of places, so I have one column in by dataframe for each place, and am trying to use a "for" loop to generate one ggplot layer per place. But the loop only adds the layer from the loop's final iteration.

This code illustrates the problem, and some suggested approaches that failed:

library(ggplot2) 

# Create a dataframe with 500 random normal values for responses to 3 survey questions from two cities
topic <- c("Poverty %","Mean Age","% Smokers")
place <- c("Chicago","Miami")
n <- 500
mean <- c(35,  40,58,  50, 25,20)
var  <- c( 7, 1.5, 3, .25, .5, 1)
df <- data.frame( topic=rep(topic,rep(n,length(topic)))
                 ,c(rnorm(n,mean[1],var[1]),rnorm(n,mean[3],var[3]),rnorm(n,mean[5],var[5]))
                 ,c(rnorm(n,mean[2],var[2]),rnorm(n,mean[4],var[4]),rnorm(n,mean[6],var[6]))
                )
names(df)[2:dim(df)[2]] <- place  # Name those last two columns with the corresponding place name.
head(df) 

# This "for" loop seems to only execute the final loop (i.e., where p=3)
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in 2:dim(df)[2]) {
  g <- g + geom_violin(aes(y = df[,p], colour = place[p-1]), alpha = 0.3)
}
g

# But mimicing what the for loop does in explicit code works fine, resulting in both "place"s being displayed in the graph.
g <- ggplot(df, aes(factor(topic), df[,2]))
g <-   g + geom_violin(aes(y = df[,2], colour = place[2-1]), alpha = 0.3)
g <-   g + geom_violin(aes(y = df[,3], colour = place[3-1]), alpha = 0.3)
g

## per http://stackoverflow.com/questions/18444620/set-layers-in-ggplot2-via-loop , I tried 
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in 2:dim(df)[2]) {
  df1 <- df[,c(1,p)]
  g <- g + geom_violin(aes(y = df1[,2], colour = place[p-1]), alpha = 0.3)
}
g
# but got the same undesired result

# per http://stackoverflow.com/questions/15987367/how-to-add-layers-in-ggplot-using-a-for-loop , I tried
g <- ggplot(df, aes(factor(topic), df[,2]))
for (p in names(df)[-1]) {
  cat(p,"\n")
  g <- g + geom_violin(aes_string(y = p, colour = p), alpha = 0.3)  # produced this error: Error in unit(tic_pos.c, "mm") : 'x' and 'units' must have length > 0
  # g <- g + geom_violin(aes_string(y = p            ), alpha = 0.3)  # produced this error: Error: stat_ydensity requires the following missing aesthetics: y
}
g
# but that failed to produce any graphic, per the errors noted in the "for" loop above

解决方案

The reason this is happening is due to ggplot's "lazy evaluation". This is a common problem when ggplot is used this way (making the layers separately in a loop, rather than having ggplot to it for you, as in @hrbrmstr's solution).

ggplot stores the arguments to aes(...) as expressions, and only evaluates them when the plot is rendered. So, in your loops, something like

aes(y = df[,p], colour = place[p-1])

gets stored as is, and evaluated when you render the plot, after the loop completes. At this point, p=3 so all the plots are rendered with p=3.

So the "right" way to do this is to use melt(...) in the reshape2 package so convert your data from wide to long format, and let ggplot manage the layers for you. I put "right" in quotes because in this particular case there is a subtlety. When calculating the distributions for the violins using the melted data frame, ggplot uses the grand total (for both Chicago and Miami) as the scale. If you want violins based on frequency scaled individually, you need to use loops (sadly).

The way around the lazy evaluation problem is to put any reference to the loop index in the data=... definition. This is not stored as an expression, the actual data is stored in the plot definition. So you could do this:

g <- ggplot(df,aes(x=topic))
for (p in 2:length(df)) {
  gg.data <- data.frame(topic=df$topic,value=df[,p],city=names(df)[p])
  g <- g + geom_violin(data=gg.data,aes(y=value, color=city))
}
g

which gives the same result as yours. Note that the index p does not show up in aes(...).


Update: A note about scale="width" (mentioned in a comment). This causes all the violins to have the same width (see below), which is not the same scaling as in OP's original code. IMO this is not a great way to visualize the data, as it suggests there is much more data in the Chicago group.

ggplot(gg) +geom_violin(aes(x=topic,y=value,color=variable),
                        alpha=0.3,position="identity",scale="width")

这篇关于&QUOT;对于&QUOT;循环只添加最后的ggplot图层的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆