R 按聚合分组 [英] R group by aggregate

查看:31
本文介绍了R 按聚合分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 R(我相对较新)中,我有一个数据框由许多列和一个数字列组成,我需要根据另一列确定的组进行聚合.

In R (which I am relatively new to) I have a data frame consists of many column and a numeric column I need to aggregate according to groups determined by another column.

 SessionID   Price
 '1',       '624.99'
 '1',       '697.99'
 '1',       '649.00'
 '7',       '779.00'
 '7',       '710.00'
 '7',       '2679.50'

我需要按 SessionID 分组并返回原始数据帧的每个 ONTO 的最大值和最小值,例如:

I need to group by SessionID and return the Max and Min for each ONTO the original data frame e.g. :

 SessionID   Price     Min     Max
 '1',       '624.99'   624.99  697.99
 '1',       '697.99'   624.99  697.99
 '1',       '649.00'   624.99  697.99
 '7',       '779.00'   710.00  2679.50
 '7',       '710.00'   710.00  2679.50
 '7',       '2679.50'  710.00  2679.50

任何想法如何有效地做到这一点?

any ideas how to do this efficiently ?

推荐答案

这是我使用 aggregate 的解决方案.

Here's my solution using aggregate.

首先加载数据:

df <- read.table(text = 
"SessionID   Price
'1'       '624.99'
'1'       '697.99'
'1'       '649.00'
'7'       '779.00'
'7'       '710.00'
'7'       '2679.50'", header = TRUE) 

然后 aggregatematch 回到原来的 data.frame:

Then aggregate and match it back to the original data.frame:

tmp <- aggregate(Price ~ SessionID, df, function(x) c(Min = min(x), Max = max(x)))
df <- cbind(df, tmp[match(df$SessionID, tmp$SessionID), 2])
print(df)
#  SessionID   Price    Min     Max
#1         1  624.99 624.99  697.99
#2         1  697.99 624.99  697.99
#3         1  649.00 624.99  697.99
#4         7  779.00 710.00 2679.50
#5         7  710.00 710.00 2679.50
#6         7 2679.50 710.00 2679.50

编辑:根据下面的评论,您可能想知道为什么会这样.确实有些奇怪.但请记住,data.frame 只是一个花哨的list.尝试调用 str(tmp),您将看到 Price 列本身是 2 x 2 的数字矩阵.由于 print.data.frame 知道如何处理这个问题,所以 print(tmp) 看起来像有 3 列,这会让人感到困惑.无论如何,tmp[2] 只需访问 data.frame/ 的第二个 column/entrylist 并返回第一列 data.frametmp[,2] 访问第二列并返回存储的数据类型.

EDIT: As per the comment below, you might wonder why this works. It indeed is somewhat weird. But remember that a data.frame just is a fancy list. Try to call str(tmp), and you'll see that the Price column itself is 2 by 2 numeric matrix. It gets confusing as the print.data.frame knows how to handle this and so print(tmp) looks like there are 3 columns. Anyway, tmp[2] simply access the second column/entry of the data.frame/list and returns that 1 column data.frame while tmp[,2] access the second column and return the data type stored.

这篇关于R 按聚合分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆