按其他变量分组的累积粘贴（连接）值 [英] Cumulatively paste (concatenate) values grouped by another variable

查看：85 发布时间：2017/3/25 22:43:32 r dataframe

本文介绍了按其他变量分组的累积粘贴（连接）值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在处理R中的数据帧时遇到问题。我想根据另一列中单元格的值将单元格的内容粘贴在不同的行中。我的问题是我想要输出逐渐（累积）打印。输出向量必须与输入向量的长度相同。
这是一个与我正在处理的类似的sampel表：

I have a problem dealing with a data frame in R. I would like to paste the contents of cells in different rows together based on the values of the cells in another column. My problem is that I want the output to be progressively (cumulatively) printed. The output vector must be of the same length as the input vector. Here is a sampel table similar to the one I am dealing with:

id <- c("a", "a", "a", "b", "b", "b")
content <- c("A", "B", "A", "B", "C", "B")
(testdf <- data.frame(id, content, stringsAsFactors=FALSE))
#  id content
#1  a       A
#2  a       B
#3  a       A
#4  b       B
#5  b       C
#6  b       B

这是我想要的结果看起来像：

And this is want I want the result to look like:

result <- c("A", "A B", "A B A", "B", "B C", "B C B") 
result

#[1] "A"     "A B"   "A B A" "B"     "B C"   "B C B"

我不需要这样的东西：

ddply(testdf, .(id), summarize, content_concatenated = paste(content, collapse = " "))

#  id content_concatenated
#1  a                A B A
#2  b                B C B

推荐答案

你可以de罚款累积贴功能使用减少：

You could define a "cumulative paste" function using Reduce:

cumpaste = function(x, .sep = " ") 
          Reduce(function(x1, x2) paste(x1, x2, sep = .sep), x, accumulate = TRUE)

cumpaste(letters[1:3], "; ")
#[1] "a"       "a; b"    "a; b; c"

减少的循环避免重新从一开始就连接元素，因为它延续了下一个元素的先前连接。

Reduce's loop avoids re-concatenating elements from the start as it elongates the previous concatenation by the next element.

按组应用它：

ave(as.character(testdf$content), testdf$id, FUN = cumpaste)
#[1] "A"     "A B"   "A B A" "B"     "B C"   "B C B"

另一个想法，可以在开始时连接整个向量，逐渐子串：

Another idea, could to concatenate the whole vector at start and, then, progressively substring:

cumpaste2 = function(x, .sep = " ")
{
    concat = paste(x, collapse = .sep)
    substring(concat, 1L, cumsum(c(nchar(x[[1L]]), nchar(x[-1L]) + nchar(.sep))))
}
cumpaste2(letters[1:3], " ;@-")
#[1] "a"           "a ;@-b"      "a ;@-b ;@-c"

这似乎有点更快：

set.seed(077)
X = replicate(1e3, paste(sample(letters, sample(0:5, 1), TRUE), collapse = ""))
identical(cumpaste(X, " --- "), cumpaste2(X, " --- "))
#[1] TRUE
microbenchmark::microbenchmark(cumpaste(X, " --- "), cumpaste2(X, " --- "), times = 30)
#Unit: milliseconds
#                  expr      min       lq     mean   median       uq      max neval cld
#  cumpaste(X, " --- ") 21.19967 21.82295 26.47899 24.83196 30.34068 39.86275    30   b
# cumpaste2(X, " --- ") 14.41291 14.92378 16.87865 16.03339 18.56703 23.22958    30  a

...这使得它成为 cumpaste_faster 。

...which makes it the cumpaste_faster.

这篇关于按其他变量分组的累积粘贴（连接）值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

按其他变量分组的累积粘贴（连接）值 [英] Cumulatively paste (concatenate) values grouped by another variable

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

按其他变量分组的累积粘贴（连接）值 [英] Cumulatively paste (concatenate) values grouped by another variable

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭