R + ggplot:带有事件的时间序列 [英] R + ggplot : Time series with events

查看:97
本文介绍了R + ggplot:带有事件的时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是R / ggplot新手。我想创建一个连续变量时间序列的geom_line图,然后添加一个由事件组成的图层。连续变量及其时间戳存储在一个data.frame中,事件及其时间戳存储在另一个data.frame中。



我真正想要做的就像finance.google.com上的图表。其中,时间序列是股票价格,并且有标志来表示新闻事件。我实际上并没有绘制财务报表,但图表的类型是相似的。我正在尝试绘制日志文件数据的可视化。这里是我的意思的例子...





经过一些试验和错误之后,这件事情就像我所能得到的一样。在这里,我正在使用ggplot附带的数据集的示例数据。 经济学包含一些我想绘制的时间序列数据,总统包含一些事件(总统选举)。

 <$ c 
数据(总统)
数据(经济学)

总统(总统)[ - (1:3),]
yrng < - 范围(经济学$失业)
ymin < - yrng [1]
ymax < - yrng [1] + 0.1 *(yrng [2] -yrng [1])$ ​​b
$ b p2 < - ggplot()
p2 < - p2 + geom_line(mapping = aes(x = date,y = unemploy),data = economics,size = 3,alpha = 0.5)
p2 < - p2 + scale_x_date(time)+ scale_y_continuous(name =unemployed [1000's])
p2 < - p2 + geom_segment(mapping = aes(x = start,y = ymin ,xend = start,yend = ymax,color = name),data = presidential,size = 2,alpha = 0.5)
p2 < - p2 + geom_point(mapping = aes(x = start,y = ymax, color = name),data = president,size = 3)
p2 < - p2 + geom_text(mapping = aes(x = start,y = ymax,label = name,angle = 20,hjust = -0.1, vjust = 0.1),size = 6,data = president)
p2



问题:


  • 对于非常稀疏的事件,这是可以的,但是如果它们有一个集群经常发生在日志文件中),它会变得混乱。是否有一些技巧可以用来整齐地显示一小段时间内发生的事件?我在考虑position_jitter,但对于我来说这很难实现。谷歌图表将这些事件旗帜叠加在一起,如果有很多它们的话。

  • 我实际上不喜欢把事件数据粘在与连续测量显示相同的比例。我宁愿把它放在facet_grid中。问题是,所有方面都必须来自相同的data.frame(不知道如果是这样)。如果是这样,那也似乎不理想(或者我只是想避免使用重塑?) 解决方案

我喜欢@JD Long的回答,我会把它放在R / ggplot2中。



创建第二个事件数据集并使用它来确定位置。从@Angelo开始:

  library(ggplot2)
数据(总统)
数据(经济学)

取出事件(总统)数据并进行转换。计算基准线偏移量作为经济数据的小数部分。将底部( ymin )设置为基线。这是棘手的部分来的地方。如果标签太靠近,我们需要交错。因此,确定相邻标签之间的间隔(假设事件已排序)。如果它小于一定数量(我选择了大约4年的数据),那么请注意,该标签需要更高。但它必须高于后面的那个,所以使用 rle 来获得 TRUE 的长度(也就是说,必须更高),并使用它来计算偏移量向量(每个 TRUE 字符串必须从其长度倒数到2, FALSE s只是偏移1)。使用它来确定条的顶部( ymax )。

 事件<  - 总统[ - (1:3),] 
基准=最小(经济学$失业)
delta = 0.05 *差异(范围(经济学$失业))
事件$ ymin =基线
事件$ timelapse = c(diff(events $ start),Inf)
events $ bump = events $ timelapse< 4 * 370#〜4年
偏移<-rle(事件$ bump)
事件$偏移< - unlist(mapply(function(l,v){if(v){(l: 1)+1} else {rep(1,1)}},l =偏移$长度,v =偏移$值,USE.NAMES = FALSE))
事件$ ymax< - 事件$ ymin +事件

将这些放在一起:

  ggplot()+ 
geom_line(mapping = aes(x = date,y = unemploy),data = economics,size = 3,alpha = 0.5)+
geom_segment(data = events,mapping = aes(x = start,y = ymin,xend = start,yend = ymax))+
geom_point(data = events,mapping = aes(x = start, y = ymax),size = 3)+
geom_text(data = events,mapping = aes(x = start,y = ymax,label = name),hjust = -0.1,vjust = 0.1,size = 6) +
scale_x_date(time)+
scale_y_continuous(name =unemployed \ [1000's\])



你可以面对,但是在不同的尺度上它很棘手。另一种方法是组成两张图。还有一些额外的工作必须完成,以确保绘图具有相同的x范围,使标签全部适合下图,并消除上图中的x轴。

  xrange =范围(c(经济$日期,事件$ start))

p1 < - ggplot(data = economics ,mapping = aes(x = date,y = unemploy))+
geom_line(size = 3,alpha = 0.5)+
scale_x_date(,limits = xrange)+
scale_y_continuous name =unemployed [1000's])+
opts(axis.text.x = theme_blank(),axis.title.x = theme_blank())

ylims< - c( 0,(max(events $ offset)+1)* delta)+ baseline
p2 < - ggplot(data = events,mapping = aes(x = start))+
geom_segment(mapping = aes (y = ymin,xend = start,yend = ymax))+
geom_point(mapping = aes(y = ymax),size = 3)+
geom_text(mapping = aes(y = ymax,label = $ name),hjust = -0.1,vjust = 0.1,size = 6)+
scale_x_date(time,limits = xrange)+
scale_y_continuous(,breaks = NA,limits = ylims)

#install.packages(ggExtra,repos =http://R-Forge.R-project.org)
library(ggExtra)

align.plots (p1,p2,heights = c(3,1))


I'm an R/ggplot newbie. I would like to create a geom_line plot of a continuous variable time series and then add a layer composed of events. The continuous variable and its timestamps is stored in one data.frame, the events and their timestamps are stored in another data.frame.

What I would really like to do is something like the charts on finance.google.com. In those, the time series is stock-price and there are "flags" to indicate news-events. I'm not actually plotting finance stuff, but the type of graph is similar. I am trying to plot visualizations of log file data. Here's an example of what I mean...

If advisable (?), I would like to use separate data.frames for each layer (one for continuous variable observations, another for events).

After some trial and error this is about as close as I can get. Here, I am using example data from data sets that come with ggplot. "economics" contains some time-series data that I'd like to plot and "presidential" contains a few events (presidential elections).

library(ggplot2)
data(presidential)
data(economics)

presidential <- presidential[-(1:3),]
yrng <- range(economics$unemploy)
ymin <- yrng[1]
ymax <- yrng[1] + 0.1*(yrng[2]-yrng[1])

p2 <- ggplot()
p2 <- p2 + geom_line(mapping=aes(x=date, y=unemploy), data=economics , size=3, alpha=0.5) 
p2 <- p2 + scale_x_date("time") +  scale_y_continuous(name="unemployed [1000's]")
p2 <- p2 + geom_segment(mapping=aes(x=start,y=ymin, xend=start, yend=ymax, colour=name), data=presidential, size=2, alpha=0.5)
p2 <- p2 + geom_point(mapping=aes(x=start,y=ymax, colour=name ), data=presidential, size=3) 
p2 <- p2 + geom_text(mapping=aes(x=start, y=ymax, label=name, angle=20, hjust=-0.1, vjust=0.1),size=6, data=presidential)
p2

Questions:

  • This is OK for very sparse events, but if there's a cluster of them (as often happens in a log file), it gets messy. Is there some technique I can use to neatly display a bunch of events occurring in a short time interval? I was thinking of position_jitter, but it was really hard for me to get this far. google charts stacks these event "flags" on top of each other if there's a lot of them.

  • I actually don't like sticking the event data in the same scale as the continuous measurement display. I would prefer to put it in a facet_grid. The problem is that the facets all must be sourced from the same data.frame (not sure if that's true). If so, that also seems not ideal (or maybe I'm just trying to avoid using reshape?)

解决方案

As much as I like @JD Long's answer, I'll put one that is just in R/ggplot2.

The approach is to create a second data set of events and to use that to determine positions. Starting with what @Angelo had:

library(ggplot2)
data(presidential)
data(economics)

Pull out the event (presidential) data, and transform it. Compute baseline and offset as fractions of the economic data it will be plotted with. Set the bottom (ymin) to the baseline. This is where the tricky part comes. We need to be able to stagger labels if they are too close together. So determine the spacing between adjacent labels (assumes that the events are sorted). If it is less than some amount (I picked about 4 years for this scale of data), then note that that label needs to be higher. But it has to be higher than the one after it, so use rle to get the length of TRUE's (that is, must be higher) and compute an offset vector using that (each string of TRUE must count down from its length to 2, the FALSEs are just at an offset of 1). Use this to determine the top of the bars (ymax).

events <- presidential[-(1:3),]
baseline = min(economics$unemploy)
delta = 0.05 * diff(range(economics$unemploy))
events$ymin = baseline
events$timelapse = c(diff(events$start),Inf)
events$bump = events$timelapse < 4*370 # ~4 years
offsets <- rle(events$bump)
events$offset <- unlist(mapply(function(l,v) {if(v){(l:1)+1}else{rep(1,l)}}, l=offsets$lengths, v=offsets$values, USE.NAMES=FALSE))
events$ymax <- events$ymin + events$offset * delta

Putting this together into a plot:

ggplot() +
    geom_line(mapping=aes(x=date, y=unemploy), data=economics , size=3, alpha=0.5) +
    geom_segment(data = events, mapping=aes(x=start, y=ymin, xend=start, yend=ymax)) +
    geom_point(data = events, mapping=aes(x=start,y=ymax), size=3) +
    geom_text(data = events, mapping=aes(x=start, y=ymax, label=name), hjust=-0.1, vjust=0.1, size=6) +
    scale_x_date("time") +  
    scale_y_continuous(name="unemployed \[1000's\]")

You could facet, but it is tricky with different scales. Another approach is composing two graphs. There is some extra fiddling that has to be done to make sure the plots have the same x-range, to make the labels all fit in the lower plot, and to eliminate the x axis in the upper plot.

xrange = range(c(economics$date, events$start))

p1 <- ggplot(data=economics, mapping=aes(x=date, y=unemploy)) +
    geom_line(size=3, alpha=0.5) +
    scale_x_date("", limits=xrange) +  
    scale_y_continuous(name="unemployed [1000's]") +
    opts(axis.text.x = theme_blank(), axis.title.x = theme_blank())

ylims <- c(0, (max(events$offset)+1)*delta) + baseline
p2 <- ggplot(data = events, mapping=aes(x=start)) +
    geom_segment(mapping=aes(y=ymin, xend=start, yend=ymax)) +
    geom_point(mapping=aes(y=ymax), size=3) +
    geom_text(mapping=aes(y=ymax, label=name), hjust=-0.1, vjust=0.1, size=6) +
    scale_x_date("time", limits=xrange) +
    scale_y_continuous("", breaks=NA, limits=ylims)

#install.packages("ggExtra", repos="http://R-Forge.R-project.org")
library(ggExtra)

align.plots(p1, p2, heights=c(3,1))

这篇关于R + ggplot:带有事件的时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆