将时间序列数据分成时间间隔(比如一个小时),然后绘制计数 [英] Split time series data into time intervals (say an hour) and then plot the count

查看:759
本文介绍了将时间序列数据分成时间间隔(比如一个小时),然后绘制计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 '2012-02-01 17:42:

我只有一列数据文件, 44'
'2012-02-01 17:42:44'
'2012-02-01 17:42:44'

...
我想分割数据,以便在小时的顶部计数。说:

 '2012-02-01 17:00:00'20 
'2012-02-01 18 :00:00'30

'20'和'30'代表时间序列条目的数量那段时间。我希望能够绘制时间与计数的关系。我怎样才能用R做到这一点?



这是我目前的折线图。

  library(ggplot2)

req< - read.table(times1.dat)
summary(req)

da< - req $ V2
db< - req $ V1

time< - as.POSIXct(db)

png('time_data_errs.png',width = 800,height = 600)
gg< - qplot(time,da)+ geom_line()

print(gg)
dev.off()


解决方案

听起来你想用 cut 来计算一小时内有多少个值。



如果您可以提供一些示例数据,这通常很有帮助。这里有一些:

  set.seed(1)#所以你可以得到和我一样的数字
MyDates< ; - ISOdatetime(2012,1,1,0,0,0,tz =GMT)+ sample(1:27000,500)
head(MyDates)
#[1]2012- 01-01 01:59:29 GMT2012-01-01 02:47:27 GMT2012-01-01 04:17:46 GMT
#[4]2012-01-01 06:48:39 GMT2012-01-01 01:30:45 GMT2012-01-01 06:44:13 GMT

您可以使用 cut (参数 breaks =hour(更多信息请参阅?cut.Date )来查找每小时的频率。

  MyDatesTable<  -  table(cut(MyDates,breaks =hour))
MyDatesTable

#2012-01-01 00:00:00 2012-01-01 01:00:00 2012-01-01 02:00:00 2012-01-01 03:00:00
#59 73 74 83
#2012-01-01 04:00:00 2012-01-01 05:00:00 2012-01-01 06:00:00 2012-01-01 07:00:00
#52 62 64 33
#或者一个data.frame,如果你喜欢
data.frame(MyDatesTable)
#Var1 Freq
#1 2012-01-01 00:00:00 59
#2 2012-01-01 01:00:00 73
#3 2012-01-01 02:00:00 74
#4 2012-01-01 03:00:00 83
#5 2012-01-01 04:00:00 52
#6 2012-01-01 05:00:00 62
#7 2012-01-01 06:00: 00 64
#8 2012-01-01 07:00:00 33

最后,这里是 MyDatesTable 对象的线图:

  plot(MyDatesTable, type =l,xlab =Time,ylab =Freq)






cut 可以处理一系列时间间隔。例如,如果您想每隔30分钟制表一次,则可以轻松地修改 breaks 参数来处理:

  data.frame(table(cut(MyDates,breaks =30 mins)))
#Var1 Freq
#1 2012-01-01 00: 00:00 22
#2 2012-01-01 00:30:00 37
#3 2012-01-01 01:00:00 38
#4 2012-01-01 01 :30:00 35
#5 2012-01-01 02:00:00 32
#6 2012-01-01 02:30:00 42
#7 2012-01-01 03:00:00 39
#8 2012-01-01 03:30:00 44
#9 2012-01-01 04:00:00 25
#10 2012-01- 01 04:30:00 27
#11 2012-01-01 05:00:00 33
#12 2012-01-01 05:30:00 29
#13 2012-01 -01 06:00:00 29
#14 2012-01-01 06:30:00 35
#15 2012-01-01 07:00:00 33




$ h3更新

由于你试图用 ggplot2 进行绘图,这里有一种方法(不确定它是否是最好的,因为当我使用基本R的图形时需要)。



创建一个 data.frame 的表格(如上所示)并添加一个虚拟的
$ b

  MyDatesDF<  -  data.frame(MyDatesTable,grp = 1)
ggplot(MyDatesDF,aes(Var1,Freq))+ geom_line(aes(group = grp))


I just have a data file with one column of time series:

'2012-02-01 17:42:44'
'2012-02-01 17:42:44'
'2012-02-01 17:42:44'

... I want to split the data up such that I have a count at the top of hour. Say:

'2012-02-01 17:00:00'  20   
'2012-02-01 18:00:00'  30  

The '20' and '30' represent the number of time series entries for that out period. And I want to be able to graph the time vs that 'count'. How can I do this with R?

Here is my current line graph plot.

library(ggplot2)

req <- read.table("times1.dat")
summary(req)

da <- req$V2
db <- req$V1

time <- as.POSIXct(db)

png('time_data_errs.png', width=800, height=600)
gg <- qplot(time, da) + geom_line()

print(gg)
dev.off()

解决方案

It sounds like you want to use cut to figure out how many values occur within an hour.

It's generally helpful if you can provide some sample data. Here's some:

set.seed(1) # So you can get the same numbers as I do
MyDates <- ISOdatetime(2012, 1, 1, 0, 0, 0, tz = "GMT") + sample(1:27000, 500)
head(MyDates)
# [1] "2012-01-01 01:59:29 GMT" "2012-01-01 02:47:27 GMT" "2012-01-01 04:17:46 GMT"
# [4] "2012-01-01 06:48:39 GMT" "2012-01-01 01:30:45 GMT" "2012-01-01 06:44:13 GMT"

You can use table and cut (with the argument breaks="hour" (see ?cut.Date for more info)) to find the frequencies per hour.

MyDatesTable <- table(cut(MyDates, breaks="hour"))
MyDatesTable
# 
# 2012-01-01 00:00:00 2012-01-01 01:00:00 2012-01-01 02:00:00 2012-01-01 03:00:00 
#                  59                  73                  74                  83 
# 2012-01-01 04:00:00 2012-01-01 05:00:00 2012-01-01 06:00:00 2012-01-01 07:00:00 
#                  52                  62                  64                  33 
# Or a data.frame if you prefer
data.frame(MyDatesTable)
#                  Var1 Freq
# 1 2012-01-01 00:00:00   59
# 2 2012-01-01 01:00:00   73
# 3 2012-01-01 02:00:00   74
# 4 2012-01-01 03:00:00   83
# 5 2012-01-01 04:00:00   52
# 6 2012-01-01 05:00:00   62
# 7 2012-01-01 06:00:00   64
# 8 2012-01-01 07:00:00   33

Finally, here's a line plot of the MyDatesTable object:

plot(MyDatesTable, type="l", xlab="Time", ylab="Freq")


cut can handle a range of time intervals. For example, if you wanted to tabulate for every 30 minutes, you can easily adapt the breaks argument to handle that:

data.frame(table(cut(MyDates, breaks = "30 mins")))
#                   Var1 Freq
# 1  2012-01-01 00:00:00   22
# 2  2012-01-01 00:30:00   37
# 3  2012-01-01 01:00:00   38
# 4  2012-01-01 01:30:00   35
# 5  2012-01-01 02:00:00   32
# 6  2012-01-01 02:30:00   42
# 7  2012-01-01 03:00:00   39
# 8  2012-01-01 03:30:00   44
# 9  2012-01-01 04:00:00   25
# 10 2012-01-01 04:30:00   27
# 11 2012-01-01 05:00:00   33
# 12 2012-01-01 05:30:00   29
# 13 2012-01-01 06:00:00   29
# 14 2012-01-01 06:30:00   35
# 15 2012-01-01 07:00:00   33


Update

Since you were trying to plot with ggplot2, here's one approach (not sure if it is the best since I usually use base R's graphics when I need to).

Create a data.frame of the table (as demonstrated above) and add a dummy "group" variable and plot that as follows:

MyDatesDF <- data.frame(MyDatesTable, grp = 1)
ggplot(MyDatesDF, aes(Var1, Freq)) + geom_line(aes(group = grp))

这篇关于将时间序列数据分成时间间隔(比如一个小时),然后绘制计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆