R使用dplyr来剪切包含2个或更多变量的修复时间间隔 [英] R using dplyr to cut fix time interval that contain 2 or more variables
问题描述
我有一个数据框
df< - data.frame(time = c(2015-09-07 00 :32:19,2015-09-07 01:02:30,2015-09-07 01:31:36,2015-09-07 01:47:45,
2015-09-07 02:00:172015-09-07 02:07:30,2015-09-07 03:39:41,2015-09-07 04:04:21 ,2015-09-07 04:04:21,2015-09-07 04:04:22),
inOut = c(IN,OUT,IN,IN ,IN,IN,IN,OUT,IN,OUT))
> df
时间inOut
1 2015-09-07 00:32:19 IN
2 2015-09-07 01:02:30 OUT
3 2015-09-07 01 :31:36 IN
4 2015-09-07 01:47:45 IN
5 2015-09-07 02:00:17 IN
6 2015-09-07 02:07 :30 IN
7 2015-09-07 03:39:41 IN
8 2015-09-07 04:04:21 OUT
9 2015-09-07 04:04:21 IN
10 2015-09-07 04:04:22 OUT
>
我想计算每15分钟IN / OUT的计数,我可以这样做创建另一个in_df,out_df,每15分钟剪下这些数据帧,然后将其合并在一起,以获得我的结果。 outdf是我的预期结果。
in_df< - df [which(df $ inOut ==IN),]
out_df< - df [which(df $ inOut ==OUT),]
a< - data.frame(table(cut(as.POSIXct(in_df $ time) ,break =15 mins)))
b< - data.frame(table(cut(as.POSIXct(out_df $ time),breaks =15 mins)))
colnames )< - c(Time,Out)
colnames(a)< - c(Time,In)
outdf< - merge ,b,all = TRUE)
outdf [is.na(outdf)]< - 0
> outdf
时间出来
1 2015-09-07 00:32:00 1 0
2 2015-09-07 00:47:00 0 0
3 2015-09 -07 01:02:00 0 1
4 2015-09-07 01:17:00 1 0
5 2015-09-07 01:32:00 0 0
6 2015- 09-07 01:47:00 2 0
7 2015-09-07 02:02:00 1 0
8 2015-09-07 02:17:00 0 0
9 2015 -09-07 02:32:00 0 0
10 2015-09-07 02:47:00 0 0
11 2015-09-07 03:02:00 0 0
12 2015-09-07 03:17:00 0 0
13 2015-09-07 03:32:00 1 0
14 2015-09-07 03:47:00 0 0
15 2015-09-07 04:02:00 1 2
我已经在此链接上提出过类似的问题 R使用data.table可以剪切包含2个或更多变量的修复时间间隔,而Frank为data.table提供了很好的解决方案,而且我想知道有没有人可以解决问题?如果它具有类似的强大的命令,就像Frank data.table sol ==> df [J(levels(timeCut)),as.list(table(inOut)),by = .EACHI]
对于dplyr,我已经尝试下面了,但是看起来下面将缺少0值(即,2015-09-07 00:47:00 0 0),我也想改变一个IN和OUT计数与我的预期结果相同(outdf),请注释,谢谢。
as.data.frame(df%> %group_by(inOut,timeCut = cut(as.POSIXct(time),breaks =15 min))%>%summarize(n()))
inOut timeCut n()
1 IN 2015-09-07 00:32:00 1
2 IN 2015-09-07 01:17:00 1
3 IN 2015-09-07 01:47:00 2
4 IN 2015-09-07 02:02:00 1
5 IN 2015-09-07 03:32:00 1
6 IN 2015-09-07 04:02:00 1
7 OUT 2015-09-07 01:02:00 1
8 OUT 2015-09-07 04:02:00 2
使用 dplyr
和 reshape2
:
天秤座ry(dplyr)
库(reshape2)
my_levels < -
data_frame(timeCut = levels(cut(as.POSIXct(df $ time),breaks =15 min )))
my_df< -
df%>%
mutate(timeCut = cut(as.POSIXct(time),breaks =15 min %>%
mutate_each(funs(as.character))%>%
right_join(。,my_levels)%>%
select(-time)%>%
dcast(timeCut〜inOut,length)
结果
timeCut IN OUT NA
/ pre>
1 2015-09-07 00:32:00 1 0 0
2 2015-09-07 00:47: 00 0 0 1
3 2015-09-07 01:02:00 0 1 0
4 2015-09-07 01:17:00 1 0 0
5 2015-09-07 01:32:00 0 0 1
6 2015-09-07 01:47:00 2 0 0
7 2015-09-07 02:02:00 1 0 0
8 2015 -09-07 02:17:00 0 0 1
9 2015-09-07 02:32:00 0 0 1
10 2015-09-07 02:47:00 0 0 1
11 2015-09-07 03:02:00 0 0 1
12 2015-09-07 03:17:00 0 0 1
13 2015-09-07 03:32:00 10 0
14 2015-09-07 03:47:00 0 0 1
15 2015-09-07 04:02:00 1 2 0
I have a dataframe
df <- data.frame(time = c("2015-09-07 00:32:19", "2015-09-07 01:02:30", "2015-09-07 01:31:36", "2015-09-07 01:47:45", "2015-09-07 02:00:17", "2015-09-07 02:07:30", "2015-09-07 03:39:41", "2015-09-07 04:04:21", "2015-09-07 04:04:21", "2015-09-07 04:04:22"), inOut = c("IN", "OUT", "IN", "IN", "IN", "IN", "IN", "OUT", "IN", "OUT")) > df time inOut 1 2015-09-07 00:32:19 IN 2 2015-09-07 01:02:30 OUT 3 2015-09-07 01:31:36 IN 4 2015-09-07 01:47:45 IN 5 2015-09-07 02:00:17 IN 6 2015-09-07 02:07:30 IN 7 2015-09-07 03:39:41 IN 8 2015-09-07 04:04:21 OUT 9 2015-09-07 04:04:21 IN 10 2015-09-07 04:04:22 OUT >
I want to calculate the number of counts for IN/OUT per 15 mins, I can do this by creating another in_df, out_df, cut these dataframe per 15 mins, and then merge this together to obtain my result. The outdf is my expected result.
in_df <- df[which(df$inOut== "IN"),] out_df <- df[which(df$inOut== "OUT"),] a <- data.frame(table(cut(as.POSIXct(in_df$time), breaks="15 mins"))) b <- data.frame(table(cut(as.POSIXct(out_df$time), breaks="15 mins"))) colnames(b) <- c("Time", "Out") colnames(a) <- c("Time", "In") outdf <- merge(a,b, all=TRUE) outdf[is.na(outdf)] <- 0 > outdf Time In Out 1 2015-09-07 00:32:00 1 0 2 2015-09-07 00:47:00 0 0 3 2015-09-07 01:02:00 0 1 4 2015-09-07 01:17:00 1 0 5 2015-09-07 01:32:00 0 0 6 2015-09-07 01:47:00 2 0 7 2015-09-07 02:02:00 1 0 8 2015-09-07 02:17:00 0 0 9 2015-09-07 02:32:00 0 0 10 2015-09-07 02:47:00 0 0 11 2015-09-07 03:02:00 0 0 12 2015-09-07 03:17:00 0 0 13 2015-09-07 03:32:00 1 0 14 2015-09-07 03:47:00 0 0 15 2015-09-07 04:02:00 1 2
I have asked similar questions on this link R using data.table to cut fix time interval that contain 2 or more variables and Frank have provide good sol for data.table, and I wonder if someone have sol for dplyr? And if it have similar powerful command just like Frank data.table sol ==> df[J(levels(timeCut)), as.list(table(inOut)), by=.EACHI]
And for dplyr, I have try below, but it seem below will missing 0 value(ie, 2015-09-07 00:47:00 0 0), also I want to mutate a IN and OUT count that same as my expect result(outdf), please comment, Thanks.
as.data.frame(df %>% group_by(inOut, timeCut= cut(as.POSIXct(time), breaks="15 min")) %>% summarise(n())) inOut timeCut n() 1 IN 2015-09-07 00:32:00 1 2 IN 2015-09-07 01:17:00 1 3 IN 2015-09-07 01:47:00 2 4 IN 2015-09-07 02:02:00 1 5 IN 2015-09-07 03:32:00 1 6 IN 2015-09-07 04:02:00 1 7 OUT 2015-09-07 01:02:00 1 8 OUT 2015-09-07 04:02:00 2
解决方案Another solution using
dplyr
andreshape2
:library(dplyr) library(reshape2) my_levels <- data_frame(timeCut = levels(cut(as.POSIXct(df$time), breaks="15 min"))) my_df <- df %>% mutate(timeCut = cut(as.POSIXct(time), breaks = "15 min")) %>% mutate_each(funs(as.character)) %>% right_join(., my_levels) %>% select(-time) %>% dcast(timeCut ~ inOut, length)
Result
timeCut IN OUT NA 1 2015-09-07 00:32:00 1 0 0 2 2015-09-07 00:47:00 0 0 1 3 2015-09-07 01:02:00 0 1 0 4 2015-09-07 01:17:00 1 0 0 5 2015-09-07 01:32:00 0 0 1 6 2015-09-07 01:47:00 2 0 0 7 2015-09-07 02:02:00 1 0 0 8 2015-09-07 02:17:00 0 0 1 9 2015-09-07 02:32:00 0 0 1 10 2015-09-07 02:47:00 0 0 1 11 2015-09-07 03:02:00 0 0 1 12 2015-09-07 03:17:00 0 0 1 13 2015-09-07 03:32:00 1 0 0 14 2015-09-07 03:47:00 0 0 1 15 2015-09-07 04:02:00 1 2 0
这篇关于R使用dplyr来剪切包含2个或更多变量的修复时间间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!