通过聚合动态列名匹配进行分组 [英] Group by aggregate dynamic column name matching

查看:104
本文介绍了通过聚合动态列名匹配进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

可以使用dplyr在列名上使用正则表达式匹配 group_by

  library(dplyr)#dplyr_0.5.0; R版本3.3.2(2016-10-31)

#虚拟数据
set.seed(1)
df1< - sample_n(iris,20)%> %
mutate(Sepal.Length = round(Sepal.Length),
Sepal.Width = round(Sepal.Width))

静态组合(看起来/工作正常,想像如果我们有10-20列):

  df1%>%
group_by(Sepal.Length,Sepal.Width)%>%
summaryize(mySum = sum(Petal.Length) )

动态分组 - 丑陋版本:

  df1%>%
group_by _(。dots = colnames(df1)[grepl(^ Sepal,colnames(df1) )])%>%
总结(mySum = sum(Petal.Length))

理想情况下,这样(不起作用,因为 starts_with 返回索引):

 code> df1%>%
group_by(starts_with(Sepal))%>%
summaryize(mySum = sum(Petal.Length))











  eval(expr,envir, code> 


预期输出:

 #来源:本地数据框[6 x 3] 
#组:Sepal.Length [?]

#Sepal.Length Sepal.Width mySum
#< dbl> < DBL> < DBL>
#1 4 3 1.4
#2 5 3 10.9
#3 6 2 4.0
#4 6 3 43.7
#5 7 3 15.7
# 6 8 4 6.4






注意: / strong>听起来很像一个重复的帖子,请联系相关的帖子(如果有的话)。

解决方案

此功能将在未来版本,参考 GitHub问题#2619



解决方案是使用 group_by_at function:

  df1%>%
group_by_at(vars(starts_with(Sepal)))%>%
总结(mySum = sum(Petal.Length))


Is it possible to group_by using regex match on column names using dplyr?

library(dplyr) # dplyr_0.5.0; R version 3.3.2 (2016-10-31)

# dummy data
set.seed(1)
df1 <-  sample_n(iris, 20) %>% 
  mutate(Sepal.Length = round(Sepal.Length),
         Sepal.Width = round(Sepal.Width))

Group by static version (looks/works fine, imagine if we have 10-20 columns):

df1 %>% 
  group_by(Sepal.Length, Sepal.Width) %>% 
  summarise(mySum = sum(Petal.Length))

Group by dynamic - "ugly" version:

df1 %>% 
  group_by_(.dots = colnames(df1)[ grepl("^Sepal", colnames(df1))]) %>% 
  summarise(mySum = sum(Petal.Length))

Ideally, something like this (doesn't work, as starts_with returns indices):

df1 %>% 
  group_by(starts_with("Sepal")) %>% 
  summarise(mySum = sum(Petal.Length))

Error in eval(expr, envir, enclos) : 
   wrong result size (0), expected 20 or 1

Expected output:

# Source: local data frame [6 x 3]
# Groups: Sepal.Length [?]
# 
#   Sepal.Length Sepal.Width mySum
#          <dbl>       <dbl> <dbl>
# 1            4           3   1.4
# 2            5           3  10.9
# 3            6           2   4.0
# 4            6           3  43.7
# 5            7           3  15.7
# 6            8           4   6.4


Note: sounds very much like a duplicated post, kindly link the relevant posts if any.

解决方案

This feature will be implemented in future release, reference GitHub issue #2619:

Solution would be to use group_by_at function:

df1 %>%
  group_by_at(vars(starts_with("Sepal"))) %>% 
  summarise(mySum = sum(Petal.Length))

这篇关于通过聚合动态列名匹配进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆