R将多行折叠为1行-同一列 [英] R collapse multiple rows into 1 row - same columns

查看:51
本文介绍了R将多行折叠为1行-同一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我昨晚回答的一个问题,因为我正在重新考虑如何格式化数据.我进行了搜索,但找不到任何适用的答案;我搜索的字词可能有误.

This is piggy backing on a question I answered last night as I am reconsidering how I'd like to format my data. I did search but couldn't find up with any applicable answer; I may be searching with wrong terms.

我有一个数据表,其中有很多行要合并:

I have a data table with many rows that I'd like to combine:

record_numb <- c(1,1,1,2,2,2)
col_a <- c(123,'','',987,'','')
col_b <- c('','234','','','765','')
col_c <- c('','','543','','','543')
df <- data.frame(record_numb,col_a,col_b,col_c)
library(data.table)
setDT(df)

record_numb    col_a    col_b     col_c
1               123
1                       234
1                                 345
2               987
2                       765
2                               543

每行将始终填充col_a,col_b或col_c.在这3个人口中,它将永远不会有超过1个.我想将(?)这些数据透视成每条记录的单行,所以它看起来像这样:

Each row will always have either col_a, col_b, or col_c populated. It will never have more than 1 of those 3 populated. I'd like to pivot(?) these into a single row per record so it appears like this:

record_numb     col_a   col_b   col_c
1               123     234     345
2               987     765     543

我玩了一些融合/投射,但是我对R这么新手,我有一半的问题都知道可以使用什么.有太多需要使用的功能,我希望你们中的一个可以将我指向一个包装或功能,而您的头顶上却没有.我进行的搜索指出我可以融化并投射,但我无法将其应用于这种情况.我愿意使用任何功能或程序包.

I played with melt/cast a bit, but I'm such a novice at R that half of my issue is knowing what is available to use. There is just so much to use that I'm hoping one of you can point me to a package or function off the top of your head. My searches I performed pointed me to melt and cast and such, but I was unable to apply it to this case. I'm open to using any function or package.

推荐答案

如您所建议的那样,您希望在注释中使用 data.table 解决方案,您可以使用

As you suggested that you would like a data.table solution in your comment, you could use

library(data.table)
df <- data.table(record_numb,col_a,col_b,col_c)

df[, lapply(.SD, paste0, collapse=""), by=record_numb]
   record_numb col_a col_b col_c
1:           1   123   234   543
2:           2   987   765   543

.SD 基本上说,通过我的data.table获取所有变量",除了by参数中的那些变量.在@Frank的答案中,他使用 .SDcols 减少了变量集.如果要将变量转换为数字,则仍然可以在一行中执行此操作.这是一种链接方法.

.SD basically says, "take all the variables in my data.table" except those in the by argument. In @Frank's answer, he reduces the set of the variables using .SDcols. If you want to cast the variables into numeric, you can still do this in one line. Here is a chaining method.

df[, lapply(.SD, paste0, collapse=""), by=record_numb][, lapply(.SD, as.integer)]

第二个链"将所有变量强制转换为整数.

The second "chain" casts all the variables as integers.

这篇关于R将多行折叠为1行-同一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆