在数据框中按组折叠文本 [英] Collapse text by group in data frame

查看:46
本文介绍了在数据框中按组折叠文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在 group 列中按组聚合数据框并在 text 列中折叠文本?

How do I aggregate data frame by group in column group and collapse text in column text?

示例数据:

df <- read.table(header=T, text="
group text
a a1
a a2
a a3
b b1
b b2
c c1
c c2
c c3
")

所需的输出(数据框):

Required output (data frame):

group text
a     a1a2a3
b     b1b2
c     c1c2c3

现在我有:

sapply(unique(df$group), function(x) {
  paste0(df[df$group==x,"text"], collapse='')
})

这在某种程度上有效,因为它返回按组正确折叠的文本,但作为向量:

This works to some extent as it returns text properly collapsed by group, but as a vector:

[1] "a1a2a3" "b1b2"   "c1c2c3"

我需要一个带有 group 列的数据框.

I need a data frame with group column as a result.

推荐答案

只需使用 aggregate :

aggregate(df$text, list(df$group), paste, collapse="")
##   Group.1      x
## 1       a a1a2a3
## 2       b   b1b2
## 3       c c1c2c3

或使用 plyr

library(plyr)
ddply(df, .(group), summarize, text=paste(text, collapse=""))
##   group   text
## 1     a a1a2a3
## 2     b   b1b2
## 3     c c1c2c3

如果数据集很大,

ddplyaggregate 快.

ddply is faster than aggregate if you have a large dataset.

编辑:根据@SeDur 的建议:

EDIT : With the suggestion from @SeDur :

aggregate(text ~ group, data = df, FUN = paste, collapse = "")
##   group   text
## 1     a a1a2a3
## 2     b   b1b2
## 3     c c1c2c3

对于与之前的方法相同的结果,您必须这样做:

For the same result with earlier method you have to do :

aggregate(x=list(text=df$text), by=list(group=df$group), paste, collapse="")

EDIT2 :使用 data.table :

library("data.table")
dt <- as.data.table(df)
dt[, list(text = paste(text, collapse="")), by = group]
##    group   text
## 1:     a a1a2a3
## 2:     b   b1b2
## 3:     c c1c2c3

这篇关于在数据框中按组折叠文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆