按多个变量分组并汇总dplyr [英] Group by multiple variables and summarise dplyr

查看:120
本文介绍了按多个变量分组并汇总dplyr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试每30秒对每个传感器平均CO2浓度数据:

I'm trying to average CO2 concentration data every 30 seconds, for each of my sensors:

    head(df)
# A tibble: 6 x 7
# Groups: BinnedTime [1]

  Sensor Date       Time   calCO2 DeviceTime          cuts   BinnedTime         
  <fctr> <date>     <time>  <dbl> <dttm>              <fctr> <chr>              
1 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
2 N1     2019-02-12 13:24     400 2019-02-12 13:24:02 (0,10] 2019-02-12 13:24:02
3 N1     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:03
4 N2     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
5 N3     2019-02-12 13:24     400 2019-02-12 13:24:03 (0,10] 2019-02-12 13:24:02
6 N3     2019-02-12 13:24     400 2019-02-12 13:24:05 (0,10] 2019-02-12 13:24:04

我使用:

df %>%
  group_by(Sensor)%>%
  group_by(BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
  summarize(Concentration = mean(calCO2))

但是它不会先按Sensor分组,它会忽略它们并计算BinnedTime的平均值

But it doesn't group by Sensor first, it ignores them and calculates the average over the BinnedTime instead. Any thoughts would be welcomed.

我已经读过有关 .dots = c( Sensor, BinnedTime)但t他不起作用。

I've read about .dots=c("Sensor","BinnedTime") but this doesn't work.

注意,我还没有创建伪数据,因此您可以确切地看到我的模样,因为时间和时间似乎有些微妙。

Note, I haven't created dummy data so you can see exactly what mine looks like, as there seem to be some subtleties with time and date that I can't quite get my head around.

推荐答案

所以总结@kath的评论并做了一些改进,以解决这个问题。您的后续问题:

So to summarize the comments by @kath with some improvements to address your follow-on question:

df %>%
    group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
        mutate(Concentration = mean(calCO2)) %>%
    ungroup()

以上内容将保留所有列,但将df的每一行的浓度计算重复。可以让您汇总并保留更多感兴趣的列的替代方法是,将它们简单地添加到汇总操作中,如下所示。

The above will maintain all columns, but duplicate the Concentration calculation for each row of the df. An alternative that would allow you to both roll up and retain more columns of interest is to simply add them to the summarize operation, as illustrated below.

    df %>%
    group_by(Sensor, BinnedTime = cut(DeviceTime, breaks="30 sec")) %>%
        summarize(Concentration = mean(calCO2),
                   Date = min(Date),
                   Time = min(Time),
                   StartDeviceTime = min(DeviceTime),
                   EndDeviceTime = max(DeviceTime)) 

这篇关于按多个变量分组并汇总dplyr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆