有没有一种方法可以基于列中的因数来遍历数据并累加行数? [英] Is there a way to loop through data based on factor in a column and add up the number of rows?

查看:41
本文介绍了有没有一种方法可以基于列中的因数来遍历数据并累加行数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些数据,可以对同一事件进行多次观察.基于时间阈值,我想浓缩一下观察结果.但是我想知道我正在凝结多少个(即,有多少个观察成为一个观察).我不确定如何以这种方式遍历数据框.

I have some data in which I have multiple observations of the same event. Based on a threshold of time, I want to condense the observations. But I want to know how many I am condensing (i.e. how many observations become one). I'm not sure how to loop through my dataframe in such a way to do that.

我尝试编写for循环,if语句,while语句,并在google和堆栈溢出上进行了不懈的搜索.似乎与我需要做的事情无关.

I've tried writing a for loop, if statements, while statements, and have searched tirelessly on google and on stack overflow. Nothing seems to relate to what I need to do.

这是我的数据的子集:

structure(list(date.time = structure(c(1465877617, 1465877774, 
1465877816, 1465877844, 1465912214, 1465912806, 1465912862, 1465914033
), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"), 
    time = structure(1:8, .Label = c("00:13:37", "00:16:14", 
    "00:16:56", "00:17:24", "09:50:14", "10:00:06", "10:01:02", 
    "10:20:33"), class = "factor"), X = c(1, 1, 1, 1, 1, 1, 1, 
    1), diff_time1 = structure(c(157, 42, 28, 34370, 592, 56, 
    1171, 2820), class = "difftime", units = "secs"), diff_time2 = c(FALSE, 
    FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, TRUE), new = c("start", 
    "include", "include", "end", "start", "include", "end", "start-end"
    )), row.names = c(NA, 8L), class = "data.frame")

目标是使它看起来像下面,但是为每个被污染"的观察结果增加一列样本大小:

The goal is to get it to look like below, but with an additional column of sample size for each "smushed" observation:

structure(list(n = 1:8, end = structure(c(1465877844, 1465912862, 
1465914033, 1465916853, 1465921999, 1465928992, 1465933159, 1465937668
), class = c("POSIXct", "POSIXt")), start = structure(c(1465877617, 
1465912214, 1465914033, 1465916853, 1465921999, 1465928647, 1465932867, 
1465937418), class = c("POSIXct", "POSIXt")), date = structure(c(16966, 
16966, 16966, 16966, 16966, 16966, 16966, 16966), class = "Date")), row.names = c(NA, 
-8L), class = c("tbl_df", "tbl", "data.frame"))

推荐答案

library(dplyr); library(lubridate)
df %>%
  mutate(time_since_last = (date.time - lag(date.time, default = first(date.time))) / dminutes(1)) %>%
  mutate(group = 1 + cumsum(time_since_last > 15)) %>% # How many times was there a 15min+ gap? Each new one increments "group"
  group_by(group) %>%
  summarize(first = min(date.time), # or first(date.time) if sorted
            last  = max(date.time), # or last(date.time) if sorted
            count = n())

## A tibble: 3 x 4
#  group first               last                count
#  <dbl> <dttm>              <dttm>              <int>
#1     1 2016-06-14 00:13:37 2016-06-14 00:17:24     4
#2     2 2016-06-14 09:50:14 2016-06-14 10:01:02     3
#3     3 2016-06-14 10:20:33 2016-06-14 10:20:33     1

这篇关于有没有一种方法可以基于列中的因数来遍历数据并累加行数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆