在R中使用data.table / plyr [英] using data.table/plyr in R
问题描述
我想要一个data.My数据A看起来像
author_id paper_id prob
731 24943 1
731 24943 1
731 688974 1
731 964345 .8
731 1201905 .9
731 1267992 1
736 249 .2
736 6889 1
736 94345 .7
736 1201905 .9
736 126992 .8
我想要的输出是:
author_id paper_id
731 24943,24943,688974,1201905,964345
736 6889,1201945,126992,94345,249
这是paper_id是根据递减
如果我使用sql和R的组合,我认为解决方案是
语句< - select * from A
GROUP BY author_id
ORDER BY prob
然后在R中使用粘贴,一旦为paper_id设置了顺序。
但是我需要R的总解决方案。
c> temp 是您的数据集,然后执行
setDT(temp)[order(-prob),list(paper_id = paste0(paper_id,collapse =,))by = author_id]
## author_id paper_id
## 1:731 24943, 24943,688974,1267992,1201905,964345
## 2:736 6889,1201905,126992,94345,249
编辑:8/11/2014
$ c> data.table v> = 1.9.4,你可以使用非常有效的 setorder
而不是 / code>
str(temp)
setorder(setDT(temp),-prob) list(paper_id = paste0(paper_id,collapse =,)),by = author_id]
## author_id paper_id
## 1:731 24943,24943,688974,1267992,1201905,964345
## 2:736 6889,1201905,126992,94345,249
,这整个事情也可以很容易地用基础R完成(虽然不推荐用于大数据集)
aggregate(paper_id〜author_id ,temp [order(-temp $ prob),],paste,collapse =,)
#author_id paper_id
#1 731 24943,24943,688974,1267992,1201905,964345
#2 736 6889,1201905,126992,94345,249
I want a data.My data A looks like
author_id paper_id prob
731 24943 1
731 24943 1
731 688974 1
731 964345 .8
731 1201905 .9
731 1267992 1
736 249 .2
736 6889 1
736 94345 .7
736 1201905 .9
736 126992 .8
The output I am desiring is:
author_id paper_id
731 24943,24943,688974,1201905,964345
736 6889,1201945,126992,94345,249
That is paper_id are arranged according to decreasing order of probability.
If I use a combination of sql and R, i think the solution would be
statement<-"select * from A
GROUP BY author_id
ORDER BY prob"
Then in R using paste once the order is set for paper_id.
But i need the total solution in R.How could this be done?
Thanks
If temp
is your data set then do
library(data.table)
setDT(temp)[order(-prob), list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]
## author_id paper_id
## 1: 731 24943, 24943, 688974, 1267992, 1201905, 964345
## 2: 736 6889, 1201905, 126992, 94345, 249
Edit: 8/11/2014
Since data.table
v >= 1.9.4, you can use the very efficient setorder
instead of order
str(temp)
setorder(setDT(temp), -prob)[, list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]
## author_id paper_id
## 1: 731 24943, 24943, 688974, 1267992, 1201905, 964345
## 2: 736 6889, 1201905, 126992, 94345, 249
And as a side note, this whole thing could be easily done with base R too (though not recommended for big data sets)
aggregate(paper_id ~ author_id, temp[order(-temp$prob), ], paste, collapse = ", ")
# author_id paper_id
# 1 731 24943, 24943, 688974, 1267992, 1201905, 964345
# 2 736 6889, 1201905, 126992, 94345, 249
这篇关于在R中使用data.table / plyr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!