按组值连接 SD 列的数据表字符串 [英] data table string concatenation of SD columns for by group values

查看:18
本文介绍了按组值连接 SD 列的数据表字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大数据集,其中包含许多看起来与此类似的变量:

I have a big data set with many variables that looks similar to this :

 > data.table(a=letters[1:10],b=LETTERS[1:10],ID=c(1,1,1,2,2,2,2,3,3,3))
     a b ID
  1: a A  1
  2: b B  1
  3: c C  1
  4: d D  2
  5: e E  2
  6: f F  2
  7: g G  2
  8: h H  3
  9: i I  3
 10: j J  3

我想为每个 ID 值连接除 ID 之外的所有列值(它们之间带有换行符),因此结果应如下所示:

I want to concatenate(with new line character between them) all column values except ID for each value of ID, so the result should look like this :

     a b ID
  1: a A  1
     b B   
     c C   
  2: d D  2
     e E   
     f F   
     g G   
  3: h H  3
     i I   
     j J   

我找到了一个链接 R Dataframe:在列内、跨行、按组聚合字符串,其中讨论了如何为一列执行此操作,如何为 .SD 中的所有列扩展此操作?

I found a link R Dataframe: aggregating strings within column, across rows, by group which talks about how to do it for one column, how to extend this for all columns in .SD ?

为了清楚起见,我将分隔符从 更改为 ,,结果应如下所示:

To make it clear I changed the separator from to , and the result should look like :

   a       b       ID
1: a,b,c   A,B,C   1
2: d,e,f,g D,E,F,G 2
3: h,i,j   H,I,J   3

推荐答案

您可以使用 lapply 连接所有列.

You can concatenate all columns in using lapply.

dt[, lapply(.SD, paste0, collapse=" "), by = ID]
##    ID       a       b
## 1:  1   a b c   A B C
## 2:  2 d e f g D E F G
## 3:  3   h i j   H I J

使用换行符作为 ollapse 参数而不是 " " 确实有效,但不会像您在所需输出中所期望的那样打印.

Using newline characters as a ollapse argument instead of " " does work, but does not print as you seem to expect in your desired output.

dt[, lapply(.SD, paste0, collapse="
"), by = ID]
##    ID          a          b
## 1:  1    a
b
c    A
B
C
## 2:  2 d
e
f
g D
E
F
G
## 3:  3    h
i
j    H
I
J

正如@Frank 的评论中所指出的,问题已更改为使用 , 作为分隔符,而不是 .当然,您可以将 collapse 参数更改为 ",".如果您还想有一个空格 "、",那么@DavidArenburg 的解决方案更可取.

As pointed out in the comments by @Frank, the question has been changed to have , as a seperator instead of . Of course you can just change the collapse argument to ",". If you want to have a space as well ", ", then the solution by @DavidArenburg is preferable.

dt[, lapply(.SD, paste0, collapse=","), by = ID]
dt[, lapply(.SD, toString), by = ID]

这篇关于按组值连接 SD 列的数据表字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆