数据表字符串串联的SD列按组值 [英] data table string concatenation of SD columns for by group values

查看:58
本文介绍了数据表字符串串联的SD列按组值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大数据集,其中包含许多类似于以下内容的变量:

I have a big data set with many variables that looks similar to this :

 > data.table(a=letters[1:10],b=LETTERS[1:10],ID=c(1,1,1,2,2,2,2,3,3,3))
     a b ID
  1: a A  1
  2: b B  1
  3: c C  1
  4: d D  2
  5: e E  2
  6: f F  2
  7: g G  2
  8: h H  3
  9: i I  3
 10: j J  3

我想连接(除了它们之间的新行字符)除ID之外的所有列值每个ID的值,因此结果应该看起来像这:

I want to concatenate(with new line character between them) all column values except ID for each value of ID, so the result should look like this :

     a b ID
  1: a A  1
     b B   
     c C   
  2: d D  2
     e E   
     f F   
     g G   
  3: h H  3
     i I   
     j J   

我找到了一个链接 R数据框架:按行组合列中的字符串,说明如何对一列进行扩展,对于.SD中的所有列?

I found a link R Dataframe: aggregating strings within column, across rows, by group which talks about how to do it for one column, how to extend this for all columns in .SD ?

为了清楚起见,我将分隔符从 \\\
更改为,结果应如下所示:

To make it clear I changed the separator from \n to , and the result should look like :

   a       b       ID
1: a,b,c   A,B,C   1
2: d,e,f,g D,E,F,G 2
3: h,i,j   H,I,J   3


推荐答案

$ c> lapply 。

You can concatenate all columns in using lapply.

dt[, lapply(.SD, paste0, collapse=" "), by = ID]
##    ID       a       b
## 1:  1   a b c   A B C
## 2:  2 d e f g D E F G
## 3:  3   h i j   H I J

使用换行字符作为ollapse参数,而不是可以工作,但不打印,在您所需的输出。

Using newline characters as a ollapse argument instead of " " does work, but does not print as you seem to expect in your desired output.

dt[, lapply(.SD, paste0, collapse="\n"), by = ID]
##    ID          a          b
## 1:  1    a\nb\nc    A\nB\nC
## 2:  2 d\ne\nf\ng D\nE\nF\nG
## 3:  3    h\ni\nj    H\nI\nJ

正如@Frank的意见中所指出的,问题已经更改为作为分隔符,而不是 \\\
。当然,您可以将 collapse 参数更改为。如果你想有一个空格,,那么@DavidArenburg的解决方案是更可取的。

As pointed out in the comments by @Frank, the question has been changed to have , as a seperator instead of \n. Of course you can just change the collapse argument to ",". If you want to have a space as well ", ", then the solution by @DavidArenburg is preferable.

dt[, lapply(.SD, paste0, collapse=","), by = ID]
dt[, lapply(.SD, toString), by = ID]

这篇关于数据表字符串串联的SD列按组值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆