R data.frame:按分组向量选定列的 rowSums [英] R data.frame: rowSums of selected columns by grouping vector

查看:20
本文介绍了R data.frame:按分组向量选定列的 rowSums的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含一系列数字列的数据框,两边被(不相关的)字符列包围.我想获得一个保留不相关列位置的新数据框,并通过某个分组向量将数字列彼此添加(或按组将其他一些行函数应用于数据框).示例:

I have a data frame with a sequence of numeric columns, surrounded on both sides by (irrelevant) columns of characters. I want to obtain a new data frame that keeps the position of the irrelevant columns, and adds the numeric columns to eachother by a certain grouping vector (or applies some other row-wise function to the data frame, by group). Example:

sample = data.frame(cha1 = c("A","B"),num1=1:2,num2=3:4,num3=11:12,num4=13:14,cha2=c("C","D"))
> sample
  cha1 num1 num2 num3 num4 cha2
1    A    1    3   11   13    C
2    B    2    4   12   14    D

以取得为目标

> goal
  cha1 X1 X2 cha2 
1    A  4 24    C
2    B  6 26    D

即我已经根据分组向量对 4 个数字列求和了 gl(2,2,4) = (1,1,2,2) [levels: 1,2]

i.e. I've summed the 4 numeric columns according to the grouping vector gl(2,2,4) = (1,1,2,2) [levels: 1,2]

对于纯数字数据框,我找到了以下方法:

For a purely numeric data frame I've found the following method:

sample_num = sample[,2:5] #select numeric columns
data.frame(t(apply(sample_num,1,function(row) tapply(row, INDEX=gl(2,2,4),sum))))

我可以将其与重新插入字符列结合起来给出预期的结果,但我真的在寻找一种更优雅的方式.如果有的话,我对 plyr 方法特别感兴趣,因为我正在尝试迁移到 plyr 以进行所有数据框操作.我想第一步是将数据帧转换为长格式,但我不知道如何从那里开始.

I could combine this with re-inserting the character columns to give the intended result, but I'm really looking for a more elegant way. I'm particularly interested in a plyr method if there is one, as I'm trying to migrate to plyr for all my data frame manipulations. I imagine the first step would be to cast the data frame into long format, but I have no idea how to proceed from there.

一个绝对"要求是我不能没有 gl(n,k,l) 分组方法,因为我需要它适用于广泛的数据帧和分组因素.

One 'absolute' requirement is that I cannot do without the gl(n,k,l) method of grouping, as I need this to be applicable to a wide range of data frames and grouping factors.

为简单起见,假设我知道哪些列是相关的数字列.我不关心如何选择它们,我关心的是如何在不弄乱原始数据框结构的情况下进行分组求和.

for simplicity assume that I know which columns are the relevant numeric columns. I'm not concerned with how to select them, I'm concerned with how to do my grouped sum without messing up the original data frame structure.

谢谢!

推荐答案

Grpindex<-gl(2,2,4)    
goal<-cbind.data.frame(sample["cha1"],(t(rowsum(t(sample[,2:5]), paste0("X",Grpindex)))),sample["cha2"])

输出:

  cha1 X1 X2 cha2
1    A  4 24    C
2    B  6 26    D

这篇关于R data.frame:按分组向量选定列的 rowSums的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆