按时间段按组查找平均值,并检索相同时间段的最后日期 [英] Find average by group over a time period and retrieve last date for same period

查看:87
本文介绍了按时间段按组查找平均值,并检索相同时间段的最后日期的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是具有四列的可复制数据表:

Below is a reproducible data table with four columns:


  1. Date

  2. category

  3. value1

  4. value2

  1. Date
  2. category
  3. value1
  4. value2

如标题所示,我想根据每个类别的 value1 value2 计算平均值,并将这些类别的最后观察日期保留在结果数据框中。

As the title suggests, I'd like to calculate average values from value1 and value2 per category, and keep the last observed date for those categories in the resulting dataframe.

以下是输入:

# Libraries
library(dplyr)
library(data.table)

# Reproducible data table
set.seed(1234)
date <- seq(as.Date("2017-01-01"), by = "month", length.out = 10)
category <- (c('A','A','B','B','C','C','C','C','C', 'C'))
value1 <- sample(seq(from = 91, to = 100, by = 1))
value2 <- sample(seq(from = 51, to = 60, by = 1))
dt <- data.table(date, category, value1, value2)
print(dt)

          date category value1 value2
 1: 2017-01-01        A     92     57
 2: 2017-02-01        A     96     55
 3: 2017-03-01        B     95     53
 4: 2017-04-01        B     98     60
 5: 2017-05-01        C     99     52
 6: 2017-06-01        C     94     59
 7: 2017-07-01        C     91     56
 8: 2017-08-01        C     97     51
 9: 2017-09-01        C    100     58
10: 2017-10-01        C     93     54

这是我到目前为止的内容:

dt2 <- dt %>% group_by(category) %>% summarise_each(funs(mean))
print(dt2)

# A tibble: 3 x 4
  category       date   value1 value2
     <chr>     <date>    <dbl>  <dbl>
1        A 2017-01-16 94.00000   56.0
2        B 2017-03-16 96.50000   56.5
3        C 2017-07-16 95.66667   55.0

这是所需的输出:

  category       date   value1 value2
1        A 2017-02-01 94.00000   56.0
2        B 2017-04-01 96.50000   56.5
3        C 2017-10-01 95.66667   55.0

谢谢您的任何建议!

推荐答案

这是 data.table 方法。我们可以使用 j 参数在。()中执行计算,并在<$ c $中设置分组c> by 参数。

Here is the data.table approach. We can perform the calculations in .() with the j argument, and set the grouping in the by argument.

dt[, .(date = last(date), value1 = mean(value1), value2 = mean(value2)), by = category]

弗兰克(Frank)开发的一种有效方法,以对此发表评论。这种方法只需要使用 .SD .SDcols编写一次 mean 函数一次。 / code>来指定要用平均值汇总的列。

Here is a more efficient way developed by Frank as a comment on this post. This approach only needs to write the mean function once, using .SD and .SDcols to specify which columns to be summarised in mean.

dt[, c(.(date = last(date)), lapply(.SD, mean)), by = category, .SDcols = value1:value2]

如果要使用 dplyr ,则可以使用Z.Lin的方法。但是,如果有很多值列,例如 value1 value10 ,则可以执行以下操作。

And if you want to use dplyr, you can use Z.Lin's approach. However, if there are lots of value columns, such as value1 to value10, you can do the following.

dt %>%
  group_by(category) %>%
  summarise_all(funs(if_else(is.numeric(.), mean(.), last(.))))

此代码将

最后一个提醒, summarise_each 仅计算列的均值,否则将报告组的最后一行。 c $ c>已被弃用。请使用 summarise_all summarise_if summarise_at

One final reminder, summarise_each has been deprecated. Please use summarise_all, summarise_if, or summarise_at.

这篇关于按时间段按组查找平均值,并检索相同时间段的最后日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆