R使用dplyr来剪切包含2个或更多变量的修复时间间隔 [英] R using dplyr to cut fix time interval that contain 2 or more variables

查看:156
本文介绍了R使用dplyr来剪切包含2个或更多变量的修复时间间隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框

  df<  -  data.frame(time = c(2015-09-07 00 :32:19,2015-09-07 01:02:30,2015-09-07 01:31:36,2015-09-07 01:47:45,
2015-09-07 02:00:172015-09-07 02:07:30,2015-09-07 03:39:41,2015-09-07 04:04:21 ,2015-09-07 04:04:21,2015-09-07 04:04:22),
inOut = c(IN,OUT,IN,IN ,IN,IN,IN,OUT,IN,OUT))

> df
时间inOut
1 2015-09-07 00:32:19 IN
2 2015-09-07 01:02:30 OUT
3 2015-09-07 01 :31:36 IN
4 2015-09-07 01:47:45 IN
5 2015-09-07 02:00:17 IN
6 2015-09-07 02:07 :30 IN
7 2015-09-07 03:39:41 IN
8 2015-09-07 04:04:21 OUT
9 2015-09-07 04:04:21 IN
10 2015-09-07 04:04:22 OUT
>

我想计算每15分钟IN / OUT的计数,我可以这样做创建另一个in_df,out_df,每15分钟剪下这些数据帧,然后将其合并在一起,以获得我的结果。 outdf是我的预期结果。

  in_df<  -  df [which(df $ inOut ==IN),] 
out_df< - df [which(df $ inOut ==OUT),]

a< - data.frame(table(cut(as.POSIXct(in_df $ time) ,break =15 mins)))
b< - data.frame(table(cut(as.POSIXct(out_df $ time),breaks =15 mins)))
colnames )< - c(Time,Out)
colnames(a)< - c(Time,In)

outdf< - merge ,b,all = TRUE)
outdf [is.na(outdf)]< - 0

> outdf
时间出来
1 2015-09-07 00:32:00 1 0
2 2015-09-07 00:47:00 0 0
3 2015-09 -07 01:02:00 0 1
4 2015-09-07 01:17:00 1 0
5 2015-09-07 01:32:00 0 0
6 2015- 09-07 01:47:00 2 0
7 2015-09-07 02:02:00 1 0
8 2015-09-07 02:17:00 0 0
9 2015 -09-07 02:32:00 0 0
10 2015-09-07 02:47:00 0 0
11 2015-09-07 03:02:00 0 0
12 2015-09-07 03:17:00 0 0
13 2015-09-07 03:32:00 1 0
14 2015-09-07 03:47:00 0 0
15 2015-09-07 04:02:00 1 2

我已经在此链接上提出过类似的问题 R使用data.table可以剪切包含2个或更多变量的修复时间间隔,而Frank为data.table提供了很好的解决方案,而且我想知道有没有人可以解决问题?如果它具有类似的强大的命令,就像Frank data.table sol ==> df [J(levels(timeCut)),as.list(table(inOut)),by = .EACHI]



对于dplyr,我已经尝试下面了,但是看起来下面将缺少0值(即,2015-09-07 00:47:00 0 0),我也想改变一个IN和OUT计数与我的预期结果相同(outdf),请注释,谢谢。

  as.data.frame(df%> %group_by(inOut,timeCut = cut(as.POSIXct(time),breaks =15 min))%>%summarize(n()))
inOut timeCut n()
1 IN 2015-09-07 00:32:00 1
2 IN 2015-09-07 01:17:00 1
3 IN 2015-09-07 01:47:00 2
4 IN 2015-09-07 02:02:00 1
5 IN 2015-09-07 03:32:00 1
6 IN 2015-09-07 04:02:00 1
7 OUT 2015-09-07 01:02:00 1
8 OUT 2015-09-07 04:02:00 2


解决方案

使用 dplyr reshape2

 天秤座ry(dplyr)
库(reshape2)

my_levels < -
data_frame(timeCut = levels(cut(as.POSIXct(df $ time),breaks =15 min )))

my_df< -
df%>%
mutate(timeCut = cut(as.POSIXct(time),breaks =15 min %>%
mutate_each(funs(as.character))%>%
right_join(。,my_levels)%>%
select(-time)%>%
dcast(timeCut〜inOut,length)



结果



  timeCut IN OUT NA 
1 2015-09-07 00:32:00 1 0 0
2 2015-09-07 00:47: 00 0 0 1
3 2015-09-07 01:02:00 0 1 0
4 2015-09-07 01:17:00 1 0 0
5 2015-09-07 01:32:00 0 0 1
6 2015-09-07 01:47:00 2 0 0
7 2015-09-07 02:02:00 1 0 0
8 2015 -09-07 02:17:00 0 0 1
9 2015-09-07 02:32:00 0 0 1
10 2015-09-07 02:47:00 0 0 1
11 2015-09-07 03:02:00 0 0 1
12 2015-09-07 03:17:00 0 0 1
13 2015-09-07 03:32:00 10 0
14 2015-09-07 03:47:00 0 0 1
15 2015-09-07 04:02:00 1 2 0
/ pre>

I have a dataframe

df <- data.frame(time = c("2015-09-07 00:32:19", "2015-09-07 01:02:30", "2015-09-07 01:31:36", "2015-09-07 01:47:45",
"2015-09-07 02:00:17", "2015-09-07 02:07:30", "2015-09-07 03:39:41", "2015-09-07 04:04:21", "2015-09-07 04:04:21", "2015-09-07 04:04:22"), 
inOut = c("IN", "OUT", "IN", "IN", "IN", "IN", "IN", "OUT", "IN", "OUT")) 

> df
                  time inOut
1  2015-09-07 00:32:19    IN
2  2015-09-07 01:02:30   OUT
3  2015-09-07 01:31:36    IN
4  2015-09-07 01:47:45    IN
5  2015-09-07 02:00:17    IN
6  2015-09-07 02:07:30    IN
7  2015-09-07 03:39:41    IN
8  2015-09-07 04:04:21   OUT
9  2015-09-07 04:04:21    IN
10 2015-09-07 04:04:22   OUT
> 

I want to calculate the number of counts for IN/OUT per 15 mins, I can do this by creating another in_df, out_df, cut these dataframe per 15 mins, and then merge this together to obtain my result. The outdf is my expected result.

in_df <- df[which(df$inOut== "IN"),]
out_df <- df[which(df$inOut== "OUT"),]

a <- data.frame(table(cut(as.POSIXct(in_df$time), breaks="15 mins")))
b <- data.frame(table(cut(as.POSIXct(out_df$time), breaks="15 mins")))
colnames(b) <- c("Time", "Out")
colnames(a) <- c("Time", "In")

outdf <- merge(a,b, all=TRUE)
outdf[is.na(outdf)] <- 0

> outdf
                  Time In Out
1  2015-09-07 00:32:00  1   0
2  2015-09-07 00:47:00  0   0
3  2015-09-07 01:02:00  0   1
4  2015-09-07 01:17:00  1   0
5  2015-09-07 01:32:00  0   0
6  2015-09-07 01:47:00  2   0
7  2015-09-07 02:02:00  1   0
8  2015-09-07 02:17:00  0   0
9  2015-09-07 02:32:00  0   0
10 2015-09-07 02:47:00  0   0
11 2015-09-07 03:02:00  0   0
12 2015-09-07 03:17:00  0   0
13 2015-09-07 03:32:00  1   0
14 2015-09-07 03:47:00  0   0
15 2015-09-07 04:02:00  1   2

I have asked similar questions on this link R using data.table to cut fix time interval that contain 2 or more variables and Frank have provide good sol for data.table, and I wonder if someone have sol for dplyr? And if it have similar powerful command just like Frank data.table sol ==> df[J(levels(timeCut)), as.list(table(inOut)), by=.EACHI]

And for dplyr, I have try below, but it seem below will missing 0 value(ie, 2015-09-07 00:47:00 0 0), also I want to mutate a IN and OUT count that same as my expect result(outdf), please comment, Thanks.

as.data.frame(df  %>% group_by(inOut, timeCut= cut(as.POSIXct(time), breaks="15 min"))   %>% summarise(n()))
  inOut             timeCut n()
1    IN 2015-09-07 00:32:00   1
2    IN 2015-09-07 01:17:00   1
3    IN 2015-09-07 01:47:00   2
4    IN 2015-09-07 02:02:00   1
5    IN 2015-09-07 03:32:00   1
6    IN 2015-09-07 04:02:00   1
7   OUT 2015-09-07 01:02:00   1
8   OUT 2015-09-07 04:02:00   2

解决方案

Another solution using dplyr and reshape2:

library(dplyr)
library(reshape2)

my_levels <-
  data_frame(timeCut = levels(cut(as.POSIXct(df$time), breaks="15 min")))

my_df <- 
  df %>%
  mutate(timeCut = cut(as.POSIXct(time), breaks = "15 min")) %>% 
  mutate_each(funs(as.character)) %>% 
  right_join(., my_levels) %>% 
  select(-time) %>% 
  dcast(timeCut ~ inOut, length)

Result

               timeCut IN OUT NA
1  2015-09-07 00:32:00  1   0  0
2  2015-09-07 00:47:00  0   0  1
3  2015-09-07 01:02:00  0   1  0
4  2015-09-07 01:17:00  1   0  0
5  2015-09-07 01:32:00  0   0  1
6  2015-09-07 01:47:00  2   0  0
7  2015-09-07 02:02:00  1   0  0
8  2015-09-07 02:17:00  0   0  1
9  2015-09-07 02:32:00  0   0  1
10 2015-09-07 02:47:00  0   0  1
11 2015-09-07 03:02:00  0   0  1
12 2015-09-07 03:17:00  0   0  1
13 2015-09-07 03:32:00  1   0  0
14 2015-09-07 03:47:00  0   0  1
15 2015-09-07 04:02:00  1   2  0

这篇关于R使用dplyr来剪切包含2个或更多变量的修复时间间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆