R,ggplot:更改系列中的线型 [英] R, ggplot: Change linetype within a series

查看:43
本文介绍了R,ggplot:更改系列中的线型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用ggplot geom_smooth来绘制上一年度与当年(基于日历周)的客户组的营业额数据.由于上周未完成,因此我想在上周使用虚线.但是,我不知道如何做到这一点.我可以更改整个图或整个系列的线型,但是不能在一个系列中更改线型(取决于x的值):

为简单起见,我们仅使用以下示例:

  set.seed(42)框架<-data.frame(series = rep(c('a','b'),50),x = 1:100,y = runif(100))ggplot(frame,aes(x = x,y = y,group = series,color = series))+geom_smooth(大小= 1.5,se =假) 

我该如何更改它以获得x> = 75的虚线?

目标将是这样的:

非常感谢您的帮助!

编辑,2016-03-05

当尝试在原始图上使用此方法时,我当然失败了.问题在于功能区,该功能区是使用stat_summary和预定义函数计算的.我试图在原始数据(mdf)上使用stat_summary,在smooth_data上使用geom_line.即使我注释掉其他所有内容,我仍然会收到错误:连续值提供给离散刻度".我认为问题出在以下事实:原始x值(Kalenderwoche)是离散的,而新的平滑x是连续的.我是否必须以某种方式将其转换为另一种?我还能做什么?

这是我尝试过的内容(精简为几行):

 四分位数<-函数(x){x<-na.omit(x)#删除NULL中位数<-中位数(x)q1 <-分位数(x,0.25)q3 <-分位数(x,0.75)data.frame(y =中位数,ymin =中位数,ymax = q3)}g<-ggplot(mdf,aes(x = Kalenderwoche,y = value,group = variable,colour = variable,fill = variable))+geom_smooth(size = 1.5,method ="auto",se = FALSE)#取出数据以使线条流畅smooth_data<-ggplot_build(g)$ data [[1]]ggplot(mdf,aes(x = Kalenderwoche,y = value,group = variable,colour = variable,fill = variable))+stat_summary(fun.data =四分位数,geom ="ribbon",colour ="NA",alpha = 0.25)+geom_line(data = smooth_data,aes(x = x,y = y,group = group,colour = group,fill = group)) 

mdf看起来像这样:

  str(mdf)'data.frame':280086号.5个变量中:$ konto_id:整数1 1 1 1 1 1 1 1 1 1 1 ...$ Kalenderwoche:具有14个级别的因子"2015-48","2015-49",..:4 12 1 3 7 13 10 6 5 9 ...$变量:具有2个级别的"Umsatz","Umsatz Vorjahr"的因子:1 1 1 1 1 1 1 1 1 1 ...$ value:num 0 428.3 97.8 76 793.1 ... 

有许多帐户(konto_id),并且每个帐户和每个日历周(Kalenderwoche)都有一个当前营业额值(Umsatz)和去年的营业额值(Umsatz Vorjahr).如果需要,我可以提供较小版本的data.frame和整个代码.

非常感谢您的帮助!

P.S.我是R语言的新手,所以我的代码对专业人士来说可能看起来很愚蠢,对此感到抱歉:(

编辑,2016-03-06

我已经上传了一部分数据(mdf):

解决方案

我不确定如何通过 geom_smooth 函数平滑所有数据并为子集使用不同的线型.我的想法是提取ggplot用于构建图的数据,并使用 geom_line 进行重现.这就是我的方法:

  set.seed(42)框架<-data.frame(series = rep(c('a','b'),50),x = 1:100,y = runif(100))库(ggplot2)g<-ggplot(frame,aes(x = x,y = y,color = series))+ geom_smooth(se = FALSE)#取出数据以使线条流畅smooth_data<-ggplot_build(g)$ data [[1]]ggplot(smooth_data [smooth_data $ x< = 76,],aes(x = x,y = y,color = as.factor(group),group = group))+geom_line(大小= 1.5)+geom_line(data = smooth_data [smooth_data $ x> = 74,],linetype ="dashed",size = 1.5)+scale_color_discrete("Series",breaks = c("1","2"),labels = c("a","b")) 

您是对的.问题是您将连续x添加到原始图层中的离散x中.处理它的一种方法是创建一个查找表,在这种情况下,这很容易,因为x是从1到14的序列.我们可以通过索引转换离散的x.在您的代码中,如果您添加以下代码,它应该可以工作:

 级别<-级别(mdf $ Kalenderwoche)ggplot(mdf,aes(x = Kalenderwoche,y = value,group = variable,colour = variable,fill = variable))+stat_summary(fun.data =四分位数,geom ="ribbon",colour ="NA",alpha = 0.25)+geom_line(data = smooth_data,aes(x = level [x],y = y,group = group,colour = as.factor(group),fill = NA)) 

这是我尝试提出的问题:

  g<-ggplot(mdf,aes(x = Kalenderwoche,y = value,group = variable,colour = variable,fill = variable))+geom_smooth(size = 1.5,method ="auto",se = FALSE)+#SE = FALSE禁止绘制拟合的SE.应使用数据SE:stat_summary(fun.data =四分位数,geom ="ribbon",colour ="NA",alpha = 0.25)smooth_data<-ggplot_build(g)$ data [[1]]ribbon_data<-ggplot_build(g)$ data [[2]]#将它们用作查找表级别<-级别(mdf $ Kalenderwoche)clevel<-级别(mdf $ variable)ggplot(smooth_data [smooth_data $ x< = 13,],aes(x = level [x],y = y,group = group,color = as.factor(clevel [group])))+geom_line(大小= 1.5)+geom_line(data = smooth_data [smooth_data $ x> = 13,],linetype ="dashed",size = 1.5)+geom_ribbon(data = ribbon_data,aes(x = x,ymin = ymin,ymax = ymax,fill = as.factor(clevel [group]),color = NA),alpha = 0.25)+scale_x_discrete(breaks = breaks.custom)+scale_colour_manual(values = cbPaletteLine)+scale_fill_manual(values = cbPaletteFill)+#coord_cartesian(ylim = c(0,250))+主题(legend.title = element_blank(),title = element_text(face ="bold",size = 12))+#scale_color_brewer(palette ="Dark2")+实验室(标题=第一批",x ="Kalenderwoche",y ="Konto-Umsatz [CHF]")+geom_vline(xintercept = hpos.vline,linetype = 2)+annotate("text",x = horizo​​ntal.center,y = vpos.median.label,label ="Median",size = 4)+annotate("text",x = horizo​​ntal.center,y = vpos.mean.label,label ="Mean",size = 4)+annotate("text",x = horizo​​ntal.center,y = vpos.P75.label,label ="P75%",size = 4)+主题(axis.text.x = element_text(angle = 90,hjust = 0.5,vjust = 0.5)) 

请注意,图例具有边界.

I am using ggplot geom_smooth to plot turnover data of a customer group from previous year against the current year (based on calendar weeks). As the last week is not complete, I would like to use a dashed linetype for the last week. However, I can't figure out how to that. I can either change the linetype for the entire plot or an entire series, but not within a series (depending on the value of x):

To keep it simple, let's just use the following example:

set.seed(42)
frame <- data.frame(series = rep(c('a','b'),50),x = 1:100, y = runif(100))

ggplot(frame,aes(x = x,y = y, group = series, color=series)) + 
geom_smooth(size=1.5, se=FALSE)

How would I have to change this to get dashed lines for x >= 75?

The goal would be something like this:

Thx very much for any help!

Edit, 2016-03-05

Of course I fail when trying to use this method on the original plot. The Problem lies with the ribbon, which is calculated using stat_summary and a predefined function. I tried to use use stat_summary on the original data (mdf), and geom_line on the smooth_data. Even when I comment out everything else, I still get "Error: Continuous value supplied to discrete scale". I believe the problem comes from the fact that the original x value (Kalenderwoche) was discrete, whereas the new, smoothed x is continuous. Do I have to somehow transform one into the other? What else could I do?

Here is what I tried (condensed to the essential lines):

quartiles <- function(x) {  
  x <- na.omit(x) # remove NULL
  median <- median(x)
  q1 <- quantile(x,0.25)
  q3 <- quantile(x,0.75)
  data.frame(y = median, ymin = median, ymax = q3)
}

g <- ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
geom_smooth(size=1.5, method="auto", se=FALSE)

# Take out the data for smooth line
smooth_data <- ggplot_build(g)$data[[1]]

ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
  stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25)+
  geom_line(data=smooth_data, aes(x=x, y=y, group=group, colour=group, fill=group))  

mdf looks like this:

str(mdf)
'data.frame':   280086 obs. of  5 variables:
 $ konto_id     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ Kalenderwoche: Factor w/ 14 levels "2015-48","2015-49",..: 4 12 1 3 7 13 10 6 5 9 ...
 $ variable     : Factor w/ 2 levels "Umsatz","Umsatz Vorjahr": 1 1 1 1 1 1 1 1 1 1 ...
 $ value        : num  0 428.3 97.8 76 793.1 ...

There are many accounts (konto_id), and for each account and calendar week (Kalenderwoche), there is a current turnover value (Umsatz) and a turnover value from last year (Umsatz Vorjahr). I can provide a smaller version of the data.frame and the entire code, if required.

Thx very much for any help!

P.S. I am a total novice in R, so my code probably looks rather stupid to pros, sorry for that :(

Edit, 2016-03-06

I have uploaded a subset of the data (mdf): mdf

The full code of the original graph is the following (looking somewhat weird with so little data, but that's not the point ;)

library(dtw)
library(reshape2)
library(ggplot2)
library(RODBC)
library(Cairo)

# custom breaks for X axis
breaks.custom <- unique(mdf$Kalenderwoche)[c(TRUE,rep(FALSE,0))] 

# function called by stat_summary
quartiles <- function(x) {  
  x <- na.omit(x)
  median <- median(x)
  q1 <- quantile(x,0.25)
  q3 <- quantile(x,0.75)
  data.frame(y = median, ymin = median, ymax = q3)
}

# Positions for guidelines and labels
horizontal.center <- (length(unique(mdf$Kalenderwoche))+1)/2
kw.horizontal.center <- as.vector(sort(unique(mdf$Kalenderwoche))[c(horizontal.center-0.5,horizontal.center+0.5)])
vpos.P75.label <- max(quantile(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]],0.75)
                      ,quantile(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]],0.75))+10
# use the higher P75 value of the two weeks around the center
vpos.mean.label <- min(mean(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]])
                       ,mean(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]]))-10
vpos.median.label <- min(median(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[1]])
                         ,median(mdf$value[mdf$Kalenderwoche==kw.horizontal.center[2]]))-10

hpos.vline <- which(as.vector(sort(unique(mdf$Kalenderwoche))=="2016-03"))

# custom colour palette (2 colors)
cbPaletteLine <- c("#DA2626", "#2626DA")
cbPaletteFill <- c("#F0A8A8", "#7C7CE9")


# ggplot
ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
  geom_smooth(size=1.5, method="auto", se=FALSE)+ 
  # SE=FALSE to suppress drawing of the SE of the fit.SE of the data shall be used instead:
  stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25)+
  scale_x_discrete(breaks=breaks.custom)+
  scale_colour_manual(values=cbPaletteLine)+
  scale_fill_manual(values=cbPaletteFill)+
  #coord_cartesian(ylim = c(0, 250)) +
  theme(legend.title = element_blank(), title = element_text(face="bold", size=12))+
  #scale_color_brewer(palette="Dark2")+
  labs(title = "Tranche 1", x =  "Kalenderwoche", y = "Konto-Umsatz [CHF]")+
  geom_vline(xintercept = hpos.vline, linetype=2)+
  annotate("text", x=horizontal.center, y=vpos.median.label, label = "Median", size=4)+
  annotate("text", x=horizontal.center, y=vpos.mean.label, label= "Mean", size=4)+  
  annotate("text", x=horizontal.center, y=vpos.P75.label, label = "P75%", size=4)+
  theme(axis.text.x=element_text(angle = 90, hjust = 0.5, vjust = 0.5))

Edit, 2016-03-06

The final plot now looks like this (thx, Jason!!)

解决方案

I am not so sure how to smooth all data and use different line types for subsets by geom_smooth function. My idea is to pull out the data which ggplot used to construct the plot and use geom_line to reproduce it. This was the way I did it:

set.seed(42)
frame <- data.frame(series=rep(c('a','b'), 50),
                    x = 1:100, y = runif(100))
library(ggplot2)
g <- ggplot(frame, aes(x=x, y=y, color=series)) + geom_smooth(se=FALSE) 

# Take out the data for smooth line
smooth_data <- ggplot_build(g)$data[[1]]
ggplot(smooth_data[smooth_data$x <= 76, ], aes(x=x, y=y, color=as.factor(group), group=group)) +
  geom_line(size=1.5) +
  geom_line(data=smooth_data[smooth_data$x >= 74, ], linetype="dashed", size=1.5) +
  scale_color_discrete("Series", breaks=c("1", "2"), labels=c("a", "b"))

You're right. The problem is that you add a continuous x to a discrete x in the original layer. One way to deal with it is to create a lookup table which in this case, it is easy because x is a sequence from 1 to 14. We can transform discrete x by indexing. In your code, it should work if you add:

level <- levels(mdf$Kalenderwoche)
ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable))+
  stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25) +
  geom_line(data=smooth_data, aes(x=level[x], y=y, group=group, colour=as.factor(group), fill=NA)) 

Here is my attempt for the question:

g <- ggplot(mdf, aes(x=Kalenderwoche, y=value, group=variable, colour=variable,fill=variable)) +
  geom_smooth(size=1.5, method="auto", se=FALSE) + 
  # SE=FALSE to suppress drawing of the SE of the fit.SE of the data shall be used instead:
  stat_summary(fun.data = quartiles,geom="ribbon", colour="NA", alpha=0.25)    

smooth_data <- ggplot_build(g)$data[[1]]
ribbon_data <- ggplot_build(g)$data[[2]]    

# Use them as lookup table
level <- levels(mdf$Kalenderwoche)
clevel <- levels(mdf$variable)    

ggplot(smooth_data[smooth_data$x <= 13, ], aes(x=level[x], y=y, group=group, color=as.factor(clevel[group]))) +
  geom_line(size=1.5) + 
  geom_line(data=smooth_data[smooth_data$x >= 13, ], linetype="dashed", size=1.5) +
  geom_ribbon(data=ribbon_data,
              aes(x=x, ymin=ymin, ymax=ymax, fill=as.factor(clevel[group]), color=NA), alpha=0.25) +
  scale_x_discrete(breaks=breaks.custom) +
  scale_colour_manual(values=cbPaletteLine) +
  scale_fill_manual(values=cbPaletteFill) +
  #coord_cartesian(ylim = c(0, 250)) +
  theme(legend.title = element_blank(), title = element_text(face="bold", size=12))+
  #scale_color_brewer(palette="Dark2")+
  labs(title = "Tranche 1", x =  "Kalenderwoche", y = "Konto-Umsatz [CHF]")+
  geom_vline(xintercept = hpos.vline, linetype=2)+
  annotate("text", x=horizontal.center, y=vpos.median.label, label = "Median", size=4)+
  annotate("text", x=horizontal.center, y=vpos.mean.label, label= "Mean", size=4)+  
  annotate("text", x=horizontal.center, y=vpos.P75.label, label = "P75%", size=4)+
  theme(axis.text.x=element_text(angle = 90, hjust = 0.5, vjust = 0.5))

Note that the legend has borderline.

这篇关于R,ggplot:更改系列中的线型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆