按组连续计数 [英] Consecutive count by groups

查看:53
本文介绍了按组连续计数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何按组生成条件累积计数?具体来说,我的数据包含以下几列:个人名称,日期,月份和温度.我想生成一个表,显示每个月内每个人的温度连续超过38oC的天数.

How do you generate conditional cumulative counts, by groups? Specifically, my data has columns: Name of individual, date, month and temperature. I want to generate a table that shows the number of consecutive days that temperature exceeded 38oC, for each individual within each month.

一个答案说明了如何按组进行累加计数(如何在另一列上有条件地按组对列进行连续计数),但是我不确定如何在只需要累加计数的情况下添加温度高于38oC的情况.

One answer explains how to do cumulative counts by groups (How to Perform Consecutive Counts of Column by Group Conditionally Upon Another Column), but I'm not sure how to add in the condition that I only want the cumulative count with the condition that temperature is greater than 38oC.

原始表格的外观如下:

Individual name | Month | Date   | Temperature
Greg            | 1     | 2/1/16 | 26
Greg            | 1     | 3/1/16 | 25
Greg            | 1     | 4/1/16 | 39
Greg            | 1     | 5/1/16 | 39
Fred            | 1     | 2/1/16 | 40
Fred            | 1     | 3/1/16 | 41
Fred            | 1     | 4/1/16 | 41
Fred            | 1     | 5/1/16 | 41

这就是我要生成的:

Individual name | Month | Largest consecutive string of days >38oC
Greg            | 1     | 2
Fred            | 1     | 4

推荐答案

以下是使用 data.table 的选项.将'data.frame'转换为'data.table'( setDT(df1)),按'Individual_name','Month'和逻辑向量的运行长度ID(温度> 31 ),我们得到逻辑矢量的 sum ,然后按"Individual_name"和"Month"分组,得到逻辑矢量的 max 值.先前步骤中的摘要列('V1').

Here is an option using data.table. Convert the 'data.frame' to 'data.table' (setDT(df1)), grouped by 'Individual_name', 'Month' and the run-length-id of logical vector (Temperature > 31), we get the sum of the logical vector, then grouped by 'Individual_name' and 'Month', get the max value of the summarised column ('V1') from the earlier step.

library(data.table)
setDT(df1)[, sum(Temperature > 31), .(Individual_name, Month, grp=rleid(Temperature > 31))
      ][, .(LargestConsec = max(V1)), .(Individual_name, Month)]
#   Individual_name Month LargestConsec
#1:            Greg     1             2
#2:            Fred     1             4


或者使用 dplyr ,我们通过基于值"中的TRUE元素提取 lengths 来创建具有 rle 的函数(因为我们在逻辑向量上执行 rle ).按个人名称"和月份"分组,在温度"上应用此功能,以获取最大连续组的 summary d计数.


Or using dplyr, we create a function with rle by extracting the lengths based on the TRUE elements in 'values' (as we are doing the rle on a logical vector). Grouped by the 'Individual_name' and 'Month', apply the function on 'Temperature' to get the summarized count of largest consecutive groups.

f1 <- function(vec, thresh) {
    with(rle(vec > thresh), max(lengths[values]))
}

library(dplyr)
df1 %>% 
    group_by(Individual_name, Month) %>% 
    summarise(LargestConsec = f1(Temperature, 31))
#   Individual_name Month LargestConsec
#            <chr> <int>         <int>
#1            Fred     1             4
#2            Greg     1             2

数据

df1 <- structure(list(Individual_name = c("Greg", "Greg", "Greg", "Greg", 
"Fred", "Fred", "Fred", "Fred"), Month = c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), Date = c("2/1/16", "3/1/16", "4/1/16", "5/1/16", 
"2/1/16", "3/1/16", "4/1/16", "5/1/16"), Temperature = c(26L, 
25L, 39L, 39L, 40L, 41L, 41L, 41L)), .Names = c("Individual_name", 
"Month", "Date", "Temperature"), class = "data.frame", row.names = c(NA, 
-8L))

这篇关于按组连续计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆