R组合成 [英] R group by aggregate

查看:136
本文介绍了R组合成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在R(我相对较新)中,我有一个数据框由许多列和一个数字列组成,我需要根据另一列确定的组进行聚合。

  SessionID价格
'1','624.99'
'1','697.99'
'1','649.00'
'7','779.00'
'7','710.00'
'7','2679.50'

我需要按SessionID进行分组,并返回每个ONTO原始数据帧的最大值和最小值,例如:

pre $ code $ Session ID价格最小值最大值
'1','624.99'624.99 697.99
'1', '697.99'624.99 697.99
'1','649.00'624.99 697.99
'7','779.00'710.00 2679.50
'7','710.00'710.00 2679.50
'7 ','2679.50'710.00 2679.50

任何想法如何有效地做到这一点?

解决方案

这是我的解决方案,使用聚合



首先,载入数据:

  df < -  read.table(text = 
SessionID Price
'1''624.99'
'1''697.99'
'1''649.00'
'7''779.00'
'7''710.00'
'7''2679.50',header = TRUE)

然后 aggregate match 它回到原来的 data.f rame

  tmp<  -  aggregate(Price〜SessionID,df,function(x)c (min = min(x),max = max(x)))
df < - cbind(df,tmp [match(df $ SessionID,tmp $ SessionID),2])
print df)
#SessionID价格最低最高
#1 1 624.99 624.99 697.99
#2 1 697.99 624.99 697.99
#3 1 649.00 624.99 697.99
#4 7 779.00 710.00 2679.50
#5 7 710.00 710.00 2679.50
#6 7 2679.50 710.00 2679.50

编辑:根据下面的评论,您可能会想知道为什么这会起作用。这确实有点奇怪。但请记住, data.frame 只是一个奇特的 list 。尝试调用 str(tmp),你会发现 Price 列本身是2乘2的数字矩阵。由于 print.data.frame 知道如何处理,所以 print(tmp)看起来像是在那里是3列。无论如何, tmp [2] 只需访问第二个 / 条目 data.frame / list 并返回1列 data.frame ,而 tmp [,2] 访问第二列并返回存储的数据类型。


In R (which I am relatively new to) I have a data frame consists of many column and a numeric column I need to aggregate according to groups determined by another column.

 SessionID   Price
 '1',       '624.99'
 '1',       '697.99'
 '1',       '649.00'
 '7',       '779.00'
 '7',       '710.00'
 '7',       '2679.50'

I need to group by SessionID and return the Max and Min for each ONTO the original data frame e.g. :

 SessionID   Price     Min     Max
 '1',       '624.99'   624.99  697.99
 '1',       '697.99'   624.99  697.99
 '1',       '649.00'   624.99  697.99
 '7',       '779.00'   710.00  2679.50
 '7',       '710.00'   710.00  2679.50
 '7',       '2679.50'  710.00  2679.50

any ideas how to do this efficiently ?

解决方案

Here's my solution using aggregate.

First, load the data:

df <- read.table(text = 
"SessionID   Price
'1'       '624.99'
'1'       '697.99'
'1'       '649.00'
'7'       '779.00'
'7'       '710.00'
'7'       '2679.50'", header = TRUE) 

Then aggregate and match it back to the original data.frame:

tmp <- aggregate(Price ~ SessionID, df, function(x) c(Min = min(x), Max = max(x)))
df <- cbind(df, tmp[match(df$SessionID, tmp$SessionID), 2])
print(df)
#  SessionID   Price    Min     Max
#1         1  624.99 624.99  697.99
#2         1  697.99 624.99  697.99
#3         1  649.00 624.99  697.99
#4         7  779.00 710.00 2679.50
#5         7  710.00 710.00 2679.50
#6         7 2679.50 710.00 2679.50

EDIT: As per the comment below, you might wonder why this works. It indeed is somewhat weird. But remember that a data.frame just is a fancy list. Try to call str(tmp), and you'll see that the Price column itself is 2 by 2 numeric matrix. It gets confusing as the print.data.frame knows how to handle this and so print(tmp) looks like there are 3 columns. Anyway, tmp[2] simply access the second column/entry of the data.frame/list and returns that 1 column data.frame while tmp[,2] access the second column and return the data type stored.

这篇关于R组合成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆