r + dplyr过滤出时间序列 [英] r + dplyr filtering out time series

查看:152
本文介绍了r + dplyr过滤出时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据可以看出一群人和他们随时间吃的水果。我想使用dplyr来看每个人,直到他们吃香蕉,并总结他们吃的所有水果,直到他们吃了他们的第一个香蕉。



data:

  data<  - = c(1234L,1234L,1234L,1234L,1234L,1234L,
1234L,1234L,1234L,1234L,1234L,1234L,9584L,9584L,9584L,
9584L,9584L,9584L,9584L,9584L ,cL(6L,6L,1L,1L,6L,
5L,5L,3L,4L,1L,5L,3L,4L,1L, 2L,6L,1L,6L,5L,5L,3L,2L,6L,6L,6L,
4L,2L,5L,5L,4L,2L),标号= c(苹果 ,b,L,L,L,L, 7L,8L,9L,10L,11L,12L,1L,2L,3L,4L,5L,
6L,7L,8L,9L,5L,6L,7L,8L,9L,10L),int = (c(2L,
2L,2L,2L,2L,2L,2L,2L,2L,2L,1L,2L,2L,2L,2L,2L,2L,
1L,2L,2L ,2L,2L,1L,2L,2L,2L,1L),.Label = c(banana,
other),class =factor)),.Names = c(user ,site,time,
int),row.names = c(NA,-27L),class =data.frame)

我最初的想法是将数据分组,以查找每个用户吃香蕉的第一个实例:

  data<  -  data%>%transform(var = ifelse(site ==banana,'banana' 'other'))

data_ban< - data%>%
filter(var =='banana')%>%
group_by(user,var,time )%>%
group_by(user)%>%
summaryize(first_banana = min(time))

但是现在我坚持如何将这个实际应用回到原始的数据数据框,并设置一个过滤器,说:对于每个用户,只包括数据,直到给出的时间data_ban。有任何想法吗?

解决方案

您可以尝试切片

  data%>%
group_by(user)%>%
slice(1:(which(int ==' )[1L]))


I have some data that looks at a group of people and the fruits they eat over time. I want to use dplyr to look at each individual person up until they eat a banana and summarise all the fruits they ate up until they eat their first banana.

data:

data <-  structure(list(user = c(1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 
    1234L, 1234L, 1234L, 1234L, 1234L, 1234L, 9584L, 9584L, 9584L, 
    9584L, 9584L, 9584L, 9584L, 9584L, 9584L, 4758L, 4758L, 4758L, 
    4758L, 4758L, 4758L), site = structure(c(1L, 6L, 1L, 1L, 6L, 
    5L, 5L, 3L, 4L, 1L, 2L, 6L, 1L, 6L, 5L, 5L, 3L, 2L, 6L, 6L, 6L, 
    4L, 2L, 5L, 5L, 4L, 2L), .Label = c("apple", "banana", "lemon", 
    "lime", "orange", "pear"), class = "factor"), time = c(1L, 2L, 
    3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 
    6L, 7L, 8L, 9L, 5L, 6L, 7L, 8L, 9L, 10L), int = structure(c(2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 
    1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L), .Label = c("banana", 
    "other"), class = "factor")), .Names = c("user", "site", "time", 
    "int"), row.names = c(NA, -27L), class = "data.frame")

My initial thought would be to group the data to find the first instance of each user eating a banana:

data <- data %>% transform(var = ifelse(site=="banana", 'banana','other'))

data_ban <- data %>% 
    filter(var=='banana') %>% 
    group_by(user, var, time) %>%
    group_by(user) %>%
    summarise(first_banana = min(time))

But now I'm stuck on how to actually apply this back to the original "data" dataframe, and set a filter that says: for each user, only include data up until the time given in "data_ban". Any ideas?

解决方案

You could try slice

data %>%
     group_by(user) %>% 
     slice(1:(which(int=='banana')[1L]))

这篇关于r + dplyr过滤出时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆