按组按降序连接值 [英] Concatenate values by group in descending order

查看:81
本文介绍了按组按降序连接值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一个数据.我的数据A看起来像

I want a data.My data A looks like

author_id paper_id prob
   731    24943    1
   731    24943    1
   731   688974    1
   731   964345    .8
   731  1201905    .9
   731  1267992    1
   736    249      .2
   736   6889      1
   736   94345    .7
   736  1201905    .9
   736  126992    .8

我想要的输出是:

author_id    paper_id
  731        24943,24943,688974,1201905,964345
  736        6889,1201945,126992,94345,249

也就是paper_id是按照概率的降序排列的.

That is paper_id are arranged according to decreasing order of probability.

如果我使用sql和R的组合,我认为解决方案是

If I use a combination of sql and R, i think the solution would be

statement<-"select * from A 
            GROUP BY author_id
            ORDER BY prob"

一旦为paper_id设置了订单,然后在R中使用粘贴.

Then in R using paste once the order is set for paper_id.

但是我需要R中的整体解决方案.这怎么办?

But i need the total solution in R.How could this be done?

谢谢

推荐答案

如果temp是您的数据集,则执行

If temp is your data set then do

library(data.table)
setDT(temp)[order(-prob), list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]
##    author_id                                       paper_id
## 1:       731 24943, 24943, 688974, 1267992, 1201905, 964345
## 2:       736              6889, 1201905, 126992, 94345, 249


2014年8月11日

由于data.table v> = 1.9.4,因此可以使用非常高效的setorder代替order

Since data.table v >= 1.9.4, you can use the very efficient setorder instead of order

str(temp)
setorder(setDT(temp), -prob)[, list(paper_id = paste0(paper_id, collapse=", ")), by = author_id]
##    author_id                                       paper_id
## 1:       731 24943, 24943, 688974, 1267992, 1201905, 964345
## 2:       736              6889, 1201905, 126992, 94345, 249

还有一个要注意的地方,整个事情也可以使用base R轻松完成(尽管不建议用于大数据集)

And as a side note, this whole thing could be easily done with base R too (though not recommended for big data sets)

aggregate(paper_id ~ author_id, temp[order(-temp$prob), ], paste, collapse = ", ")
#   author_id                                       paper_id
# 1       731 24943, 24943, 688974, 1267992, 1201905, 964345
# 2       736              6889, 1201905, 126992, 94345, 249

这篇关于按组按降序连接值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆