R data.frame:通过对向量进行分组的选定列的行总和 [英] R data.frame: rowSums of selected columns by grouping vector

查看:275
本文介绍了R data.frame:通过对向量进行分组的选定列的行总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,其中包含一系列数字列,并在两侧用(不相关的)字符列包围.我想获得一个新数据框,该框保留不相关列的位置,并通过某个分组矢量将数字列彼此相加(或按组将某些其他按行函数应用于数据框).示例:

I have a data frame with a sequence of numeric columns, surrounded on both sides by (irrelevant) columns of characters. I want to obtain a new data frame that keeps the position of the irrelevant columns, and adds the numeric columns to eachother by a certain grouping vector (or applies some other row-wise function to the data frame, by group). Example:

sample = data.frame(cha1 = c("A","B"),num1=1:2,num2=3:4,num3=11:12,num4=13:14,cha2=c("C","D"))
> sample
  cha1 num1 num2 num3 num4 cha2
1    A    1    3   11   13    C
2    B    2    4   12   14    D

以获取目标为目标

> goal
  cha1 X1 X2 cha2 
1    A  4 24    C
2    B  6 26    D

即我已经根据分组向量gl(2,2,4) = (1,1,2,2) [levels: 1,2]

i.e. I've summed the 4 numeric columns according to the grouping vector gl(2,2,4) = (1,1,2,2) [levels: 1,2]

对于纯数字数据框,我发现了以下方法:

For a purely numeric data frame I've found the following method:

sample_num = sample[,2:5] #select numeric columns
data.frame(t(apply(sample_num,1,function(row) tapply(row, INDEX=gl(2,2,4),sum))))

我可以将其与重新插入字符列结合起来以得到预期的结果,但是我确实在寻找一种更优雅的方法.我对plyr方法(如果有)特别感兴趣,因为我正在尝试迁移到plyr进行所有数据帧操作.我想第一步是将数据帧转换为长格式,但是我不知道如何从那里开始.

I could combine this with re-inserting the character columns to give the intended result, but I'm really looking for a more elegant way. I'm particularly interested in a plyr method if there is one, as I'm trying to migrate to plyr for all my data frame manipulations. I imagine the first step would be to cast the data frame into long format, but I have no idea how to proceed from there.

一个绝对"的要求是我不能没有gl(n,k,l)分组方法,因为我需要将此方法应用于广泛的数据帧和分组因子.

One 'absolute' requirement is that I cannot do without the gl(n,k,l) method of grouping, as I need this to be applicable to a wide range of data frames and grouping factors.

为简单起见,假设我知道哪些列是相关的数字列.我不在乎如何选择它们,而是在不弄乱原始数据帧结构的情况下如何进行分组求和.

for simplicity assume that I know which columns are the relevant numeric columns. I'm not concerned with how to select them, I'm concerned with how to do my grouped sum without messing up the original data frame structure.

谢谢!

推荐答案

Grpindex<-gl(2,2,4)    
goal<-cbind.data.frame(sample["cha1"],(t(rowsum(t(sample[,2:5]), paste0("X",Grpindex)))),sample["cha2"])

输出:

  cha1 X1 X2 cha2
1    A  4 24    C
2    B  6 26    D

这篇关于R data.frame:通过对向量进行分组的选定列的行总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆