R组合成 [英] R group by aggregate
问题描述
SessionID价格
'1','624.99'
'1','697.99'
'1','649.00'
'7','779.00'
'7','710.00'
'7','2679.50'
我需要按SessionID进行分组,并返回每个ONTO原始数据帧的最大值和最小值,例如:
pre $ code $ Session ID价格最小值最大值
'1','624.99'624.99 697.99
'1', '697.99'624.99 697.99
'1','649.00'624.99 697.99
'7','779.00'710.00 2679.50
'7','710.00'710.00 2679.50
'7 ','2679.50'710.00 2679.50
任何想法如何有效地做到这一点?
这是我的解决方案,使用聚合
。
首先,载入数据:
df < - read.table(text =
SessionID Price
'1''624.99'
'1''697.99'
'1''649.00'
'7''779.00'
'7''710.00'
'7''2679.50',header = TRUE)
然后 aggregate
和 match
它回到原来的 data.f rame
:
tmp< - aggregate(Price〜SessionID,df,function(x)c (min = min(x),max = max(x)))
df < - cbind(df,tmp [match(df $ SessionID,tmp $ SessionID),2])
print df)
#SessionID价格最低最高
#1 1 624.99 624.99 697.99
#2 1 697.99 624.99 697.99
#3 1 649.00 624.99 697.99
#4 7 779.00 710.00 2679.50
#5 7 710.00 710.00 2679.50
#6 7 2679.50 710.00 2679.50
编辑:根据下面的评论,您可能会想知道为什么这会起作用。这确实有点奇怪。但请记住, data.frame
只是一个奇特的 list
。尝试调用 str(tmp)
,你会发现 Price
列本身是2乘2的数字矩阵。由于 print.data.frame
知道如何处理,所以 print(tmp)
看起来像是在那里是3列。无论如何, tmp [2]
只需访问第二个列
/ 条目
data.frame
/ list
并返回1列 data.frame
,而 tmp [,2]
访问第二列并返回存储的数据类型。
In R (which I am relatively new to) I have a data frame consists of many column and a numeric column I need to aggregate according to groups determined by another column.
SessionID Price
'1', '624.99'
'1', '697.99'
'1', '649.00'
'7', '779.00'
'7', '710.00'
'7', '2679.50'
I need to group by SessionID and return the Max and Min for each ONTO the original data frame e.g. :
SessionID Price Min Max
'1', '624.99' 624.99 697.99
'1', '697.99' 624.99 697.99
'1', '649.00' 624.99 697.99
'7', '779.00' 710.00 2679.50
'7', '710.00' 710.00 2679.50
'7', '2679.50' 710.00 2679.50
any ideas how to do this efficiently ?
Here's my solution using aggregate
.
First, load the data:
df <- read.table(text =
"SessionID Price
'1' '624.99'
'1' '697.99'
'1' '649.00'
'7' '779.00'
'7' '710.00'
'7' '2679.50'", header = TRUE)
Then aggregate
and match
it back to the original data.frame
:
tmp <- aggregate(Price ~ SessionID, df, function(x) c(Min = min(x), Max = max(x)))
df <- cbind(df, tmp[match(df$SessionID, tmp$SessionID), 2])
print(df)
# SessionID Price Min Max
#1 1 624.99 624.99 697.99
#2 1 697.99 624.99 697.99
#3 1 649.00 624.99 697.99
#4 7 779.00 710.00 2679.50
#5 7 710.00 710.00 2679.50
#6 7 2679.50 710.00 2679.50
EDIT: As per the comment below, you might wonder why this works. It indeed is somewhat weird. But remember that a data.frame
just is a fancy list
. Try to call str(tmp)
, and you'll see that the Price
column itself is 2 by 2 numeric matrix. It gets confusing as the print.data.frame
knows how to handle this and so print(tmp)
looks like there are 3 columns. Anyway, tmp[2]
simply access the second column
/entry
of the data.frame
/list
and returns that 1 column data.frame
while tmp[,2]
access the second column and return the data type stored.
这篇关于R组合成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!