在R中使用data.table / plyr [英] using data.table/plyr in R

查看：141 发布时间：2017/3/12 12:01:14 r data.table plyr

本文介绍了在R中使用data.table / plyr的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想要一个data.My数据A看起来像

  author_id paper_id prob 
 731 24943 1 
 731 24943 1 
 731 688974 1 
 731 964345 .8 
 731 1201905 .9 
 731 1267992 1 
 736 249 .2 
 736 6889 1 
 736 94345 .7 
 736 1201905 .9 
 736 126992 .8

我想要的输出是：

  author_id paper_id 
 731 24943,24943,688974,1201905,964345 
 736 6889,1201945,126992,94345,249

这是paper_id是根据递减

如果我使用sql和R的组合，我认为解决方案是

 语句< - select * from A 
 GROUP BY author_id 
 ORDER BY prob

然后在R中使用粘贴，一旦为paper_id设置了顺序。

但是我需要R的总解决方案。

解决方案

c> temp 是您的数据集，然后执行

  setDT（temp）[order（-prob），list（paper_id = paste0（paper_id，collapse =，））by = author_id] 
 ## author_id paper_id 
 ## 1：731 24943， 24943，688974，1267992，1201905，964345 
 ## 2：736 6889，1201905，126992，94345，249

编辑：8/11/2014

$ c> data.table v> = 1.9.4，你可以使用非常有效的 setorder 而不是 / code>

  str（temp）
 setorder（setDT（temp），-prob） list（paper_id = paste0（paper_id，collapse =，）），by = author_id] 
 ## author_id paper_id 
 ## 1：731 24943，24943，688974，1267992，1201905，964345 
 ## 2：736 6889，1201905，126992，94345，249

，这整个事情也可以很容易地用基础R完成（虽然不推荐用于大数据集）

  aggregate（paper_id〜author_id ，temp [order（-temp $ prob），]，paste，collapse =，）
＃author_id paper_id 
＃1 731 24943，24943，688974，1267992，1201905，964345 
 ＃2 736 6889，1201905，126992，94345，249

I want a data.My data A looks like

author_id paper_id prob
   731    24943    1
   731    24943    1
   731   688974    1
   731   964345    .8
   731  1201905    .9
   731  1267992    1
   736    249      .2
   736   6889      1
   736   94345    .7
   736  1201905    .9
   736  126992    .8

The output I am desiring is:

author_id    paper_id
  731        24943,24943,688974,1201905,964345
  736        6889,1201945,126992,94345,249

That is paper_id are arranged according to decreasing order of probability.



If I use a combination of sql and R, i think the solution would be
statement<-"select * from A 
            GROUP BY author_id
            ORDER BY prob"
Then in R using paste once the order is set for paper_id.

But i need the total solution in R.How could this be done?

Thanks 
 解决方案 
If temp is your data set then do
library(data.table)
setDT(temp)[order(-prob), list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]
##    author_id                                       paper_id
## 1:       731 24943, 24943, 688974, 1267992, 1201905, 964345
## 2:       736              6889, 1201905, 126992, 94345, 249




Edit: 8/11/2014

Since data.table v >= 1.9.4, you can use the very efficient setorder instead of order
str(temp)
setorder(setDT(temp), -prob)[, list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]
##    author_id                                       paper_id
## 1:       731 24943, 24943, 688974, 1267992, 1201905, 964345
## 2:       736              6889, 1201905, 126992, 94345, 249
And as a side note, this whole thing could be easily done with base R too (though not recommended for big data sets)
aggregate(paper_id ~ author_id, temp[order(-temp$prob), ], paste, collapse = ", ")
#   author_id                                       paper_id
# 1       731 24943, 24943, 688974, 1267992, 1201905, 964345
# 2       736              6889, 1201905, 126992, 94345, 249


                        
这篇关于在R中使用data.table / plyr的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在R中使用data.table / plyr [英] using data.table/plyr in R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在R中使用data.table / plyr [英] using data.table/plyr in R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭