R 按聚合分组 [英] R group by aggregate
问题描述
在 R(我相对较新)中,我有一个数据框由许多列和一个数字列组成,我需要根据另一列确定的组进行聚合.
In R (which I am relatively new to) I have a data frame consists of many column and a numeric column I need to aggregate according to groups determined by another column.
SessionID Price
'1', '624.99'
'1', '697.99'
'1', '649.00'
'7', '779.00'
'7', '710.00'
'7', '2679.50'
我需要按 SessionID 分组并返回原始数据帧的每个 ONTO 的最大值和最小值,例如:
I need to group by SessionID and return the Max and Min for each ONTO the original data frame e.g. :
SessionID Price Min Max
'1', '624.99' 624.99 697.99
'1', '697.99' 624.99 697.99
'1', '649.00' 624.99 697.99
'7', '779.00' 710.00 2679.50
'7', '710.00' 710.00 2679.50
'7', '2679.50' 710.00 2679.50
任何想法如何有效地做到这一点?
any ideas how to do this efficiently ?
推荐答案
这是我使用 aggregate
的解决方案.
Here's my solution using aggregate
.
首先加载数据:
df <- read.table(text =
"SessionID Price
'1' '624.99'
'1' '697.99'
'1' '649.00'
'7' '779.00'
'7' '710.00'
'7' '2679.50'", header = TRUE)
然后 aggregate
和 match
回到原来的 data.frame
:
Then aggregate
and match
it back to the original data.frame
:
tmp <- aggregate(Price ~ SessionID, df, function(x) c(Min = min(x), Max = max(x)))
df <- cbind(df, tmp[match(df$SessionID, tmp$SessionID), 2])
print(df)
# SessionID Price Min Max
#1 1 624.99 624.99 697.99
#2 1 697.99 624.99 697.99
#3 1 649.00 624.99 697.99
#4 7 779.00 710.00 2679.50
#5 7 710.00 710.00 2679.50
#6 7 2679.50 710.00 2679.50
编辑:根据下面的评论,您可能想知道为什么会这样.确实有些奇怪.但请记住,data.frame
只是一个花哨的list
.尝试调用 str(tmp)
,您将看到 Price
列本身是 2 x 2 的数字矩阵.由于 print.data.frame
知道如何处理这个问题,所以 print(tmp)
看起来像有 3 列,这会让人感到困惑.无论如何,tmp[2]
只需访问 data.frame
/ 的第二个
并返回第一列 column
/entry
listdata.frame
而 tmp[,2]
访问第二列并返回存储的数据类型.
EDIT: As per the comment below, you might wonder why this works. It indeed is somewhat weird. But remember that a data.frame
just is a fancy list
. Try to call str(tmp)
, and you'll see that the Price
column itself is 2 by 2 numeric matrix. It gets confusing as the print.data.frame
knows how to handle this and so print(tmp)
looks like there are 3 columns. Anyway, tmp[2]
simply access the second column
/entry
of the data.frame
/list
and returns that 1 column data.frame
while tmp[,2]
access the second column and return the data type stored.
这篇关于R 按聚合分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!