R-分组数据,但将不同的功能应用于不同的列 [英] R - Group data but apply different functions to different columns

查看:103
本文介绍了R-分组数据,但将不同的功能应用于不同的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想对这些数据进行分组,但是在分组时对某些列应用不同的功能.

I'd like to group this data but apply different functions to some columns when grouping.

ID  type isDesc isImage
1   1    1      0
1   1    0      1
1   1    0      1
4   2    0      1
4   2    1      0
6   1    1      0
6   1    0      1
6   1    0      0

我想按ID分组,可以对列isDescisImage求和,但是我想按原样获取type的值.在整个数据集中,type都是相同的.结果应如下所示:

I want to group by ID, columns isDesc and isImage can be summed, but I would like to get the value of type as it is. type will be the same through the whole dataset. The result should look like this:

ID  type isDesc isImage
1   1    1      2
4   2    1      1
6   1    1      1

当前我正在使用

library(plyr)
summarized = ddply(data, .(ID), numcolwise(sum))

,但它只是汇总所有列.您不必使用ddply,但是如果您认为这对工作有益,我想坚持下去. data.table库也是一种替代方法

but it simply sums up all the columns. You don't have to use ddply but if you think it's good for the job I'd like to stick to it. data.table library is also an alternative

推荐答案

使用data.table:

require(data.table)
dt <- data.table(data, key="ID")
dt[, list(type=type[1], isDesc=sum(isDesc), 
                  isImage=sum(isImage)), by=ID]

#    ID type isDesc isImage
# 1:  1    1      1       2
# 2:  4    2      1       1
# 3:  6    1      1       1

使用plyr:

ddply(data , .(ID), summarise, type=type[1], isDesc=sum(isDesc), isImage=sum(isImage))
#   ID type isDesc isImage
# 1  1    1      1       2
# 2  4    2      1       1
# 3  6    1      1       1

使用data.table.SDcols,如果要累加的列过多,而其他列仅作为第一个值,则可以执行此操作

Using data.table's .SDcols, you can do this in case you've too many columns that are to be summed, and other columns to be just taken the first value.

dt1 <- dt[, lapply(.SD, sum), by=ID, .SDcols=c(3,4)]
dt2 <- dt[, lapply(.SD, head, 1), by=ID, .SDcols=c(2)]
> dt2[dt1]
#    ID type isDesc isImage
# 1:  1    1      1       2
# 2:  4    2      1       1
# 3:  6    1      1       1

您可以提供列名或列号作为.SDcols的参数.例如:.SDcols=c("type")也有效.

You can provide column names or column numbers as arguments to .SDcols. Ex: .SDcols=c("type") is also valid.

这篇关于R-分组数据,但将不同的功能应用于不同的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆