在ggplot2中执行geom_hline时选择最新的数据集 [英] pick the latest data set when doing geom_hline in ggplot2

查看:428
本文介绍了在ggplot2中执行geom_hline时选择最新的数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

dput(x)

  structure(list(Host = structure(c(1L,1L,1L,1L,1L ,1L,1L,
1L,1L,1L),.Label =A,class =factor),TimeStamp = structure(c(1L,
2L,3L,4L,5L, 1L,2L,3L,4L,5L),。标签= c(1/11/2013​​,
2013年1月12日,2013年1月13日,2013年1月14日(1L,1L,1L,1L,1L,2L,2L,2L,2L,
,1 / 15/2013),class =factor),
Instance = 2L),.Label = c(/ application,/ db),class =factor),
Free_Space = c(5048L,5049L,6000L,4800L,5100L,317659L,
340000L,350000L,356666L,370000L),Used_Space = C(3017L,
56000L,60000L,55000L,54000L,271657L,150000L,175000L,
165000L,189999L),TOTAL_SPACE = C(8064L,61049L, 66000L,
59800L,59100L,589316L,490000L,525000L,521666L,559999L
))的,.Names = C( 主机, 时间戳, 实例, FREE_SPACE,
Used_Space,Total_Space),class =data.frame,row.names = c(NA,
-10L))



<我有这个数据框。我给列名称Total_Space添加Free_Space和Used_Space,并使用host.TimeStamp和Instance中的data.table包。 -data.table(x)
x <-x [,Total_Space:= Free_Space + Used_Space,by = c(Host,Instance,TimeStamp)]

我喜欢使用ggplot2中的ggplot facet_wrap来绘制GB中已用空间的图形,并通过Total_Space绘制一个geom_line,以便用户可以看到多少顶部空间有。



例如,我这样做:

  ggplot(x ,AES(时间戳,Used_Space / 1024,组=实例))+ geom_area(填充= 蓝)+ geom_smooth(方法= LM,颜色= 橙色,SE = T,大小= 1)+ geom_hline(数据= x,aes(yintercept = Total_Space / 1024),col =red)+ facet_wrap(〜Host + Instance,ncol = 3,scales =free)

我看到的问题是,由于Total_Space正在改变,因此我为同一个instnace和主机获取多个geom_hline。



我的问题是,如何为每个实例和主机执行geom_hline时选择最新的时间戳?我需要在geom_hline中显示最新的Total_Space。



我试过这种方法:

x <-x [ ,LatestTS:= tail(p [order(p $ TimeStamp),],1)$ Total_Space,by = c(Host,Instance,TimeStamp)]



不起作用。它为所有实例选择相同的数字。

解决方案

我的解决方案是首先让你的列 TimeStamp 到日期

  x $ TimeStamp< -as.Date(x $ TimeStamp,format = %m /%d /%Y)

然后,因为您的数据对象是 data.table ,您可以根据 Host 实例设置子集数据并设置 TimeStamp 应该是最大值。

  x [,。SD [TimeStamp == max(TimeStamp)],by =Host,Instance] 
主机实例TimeStamp Free_Space Used_Space Total_Space
1:A / application 2013-01-15 5100 54000 59100
2:A / db 2013-01-15 370000 189999 559999

现在你可以在 geom_hline()。使用 scale_x_date(),您现在可以获得更多控制此比例的可能性。

 库(鳞)
ggplot(X,AES(时间戳,Used_Space / 1024,组=实例))+
geom_area(填充= 蓝)+ geom_smooth(方法= LM,颜色=orange,se = T,size = 1)+
geom_hline(data = x [,.SD [TimeStamp == max(TimeStamp)],by =Host,Instance],aes(yintercept = (标题=日期格式(%m /%d),$ col =red)+
facet_wrap(〜Host + Instance,ncol = 3,scales =free)+
scale_x_date /%Y))


dput(x)

structure(list(Host = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = "A", class = "factor"), TimeStamp = structure(c(1L, 
2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L), .Label = c("1/11/2013", 
"1/12/2013", "1/13/2013", "1/14/2013", "1/15/2013"), class = "factor"), 
    Instance = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
    2L), .Label = c("/application", "/db"), class = "factor"), 
    Free_Space = c(5048L, 5049L, 6000L, 4800L, 5100L, 317659L, 
    340000L, 350000L, 356666L, 370000L), Used_Space = c(3017L, 
    56000L, 60000L, 55000L, 54000L, 271657L, 150000L, 175000L, 
    165000L, 189999L), Total_Space = c(8064L, 61049L, 66000L, 
    59800L, 59100L, 589316L, 490000L, 525000L, 521666L, 559999L
    )), .Names = c("Host", "TimeStamp", "Instance", "Free_Space", 
"Used_Space", "Total_Space"), class = "data.frame", row.names = c(NA, 
-10L))

I have this data frame. I drive the column name Total_Space by adding Free_Space and Used_Space using data.table package given the Host, TimeStamp and Instance.

x<-data.table(x)
x<-x[,Total_Space:=Free_Space+Used_Space, by=c("Host", "Instance", "TimeStamp")]

I like to use ggplot facet_wrap from ggplot2 to graph used space in GB and draw a geom_line by the Total_Space so that users can see how much head room there are.

For example, I am doing this:

ggplot(x, aes(TimeStamp, Used_Space/1024, group=Instance)) + geom_area(fill="blue") + geom_smooth(method="lm", colour="orange",se=T, size=1) + geom_hline(data=x, aes(yintercept = Total_Space/1024), col="red")+ facet_wrap(~Host+Instance, ncol=3, scales="free") 

The problem I am seeing is that I get multiple geom_hline for the same instnace and host, due to Total_Space is changing.

My question is, how can I pick the latest time stamp when doing geom_hline for each instance and Host? I need to show the latest Total_Space in geom_hline.

I tried this approach:

x<-x[,LatestTS:=tail(p[order(p$TimeStamp),],1)$Total_Space, by=c("Host", "Instance", "TimeStamp")]

did not work. it picks the same number for all instances.

解决方案

My solution would be, first, make your column TimeStamp to dates

x$TimeStamp<-as.Date(x$TimeStamp,format="%m/%d/%Y")

Then, as your data object is data.table, you can subset data according to Host and Instance and set TimeStamp should be maximal value.

x[,.SD[TimeStamp==max(TimeStamp)],by="Host,Instance"]
   Host     Instance  TimeStamp Free_Space Used_Space Total_Space
1:    A /application 2013-01-15       5100      54000       59100
2:    A          /db 2013-01-15     370000     189999      559999

Now you can use this line inside geom_hline(). With scale_x_date() you will get now more possibilities to control this scale.

library(scales)
ggplot(x, aes(TimeStamp, Used_Space/1024, group=Instance)) + 
  geom_area(fill="blue") + geom_smooth(method="lm", colour="orange",se=T, size=1) + 
  geom_hline(data=x[,.SD[TimeStamp==max(TimeStamp)],by="Host,Instance"], aes(yintercept = Total_Space/1024), col="red")+ 
  facet_wrap(~Host+Instance, ncol=3, scales="free") +
  scale_x_date(labels = date_format("%m/%d/%Y"))

这篇关于在ggplot2中执行geom_hline时选择最新的数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆