R将多行折叠成1行-相同的列 [英] R collapse multiple rows into 1 row - same columns
问题描述
这是对我昨晚回答的一个问题的支持,因为我正在重新考虑如何格式化我的数据.我进行了搜索,但找不到任何适用的答案;我可能使用错误的字词进行搜索.
This is piggy backing on a question I answered last night as I am reconsidering how I'd like to format my data. I did search but couldn't find up with any applicable answer; I may be searching with wrong terms.
我有一个包含许多行的数据表,我想合并它们:
I have a data table with many rows that I'd like to combine:
record_numb <- c(1,1,1,2,2,2)
col_a <- c(123,'','',987,'','')
col_b <- c('','234','','','765','')
col_c <- c('','','543','','','543')
df <- data.frame(record_numb,col_a,col_b,col_c)
library(data.table)
setDT(df)
record_numb col_a col_b col_c
1 123
1 234
1 345
2 987
2 765
2 543
每一行将始终填充 col_a、col_b 或 col_c.它永远不会有超过这 3 个中的 1 个.我想将这些数据透视(?)到每条记录的一行,所以它看起来像这样:
Each row will always have either col_a, col_b, or col_c populated. It will never have more than 1 of those 3 populated. I'd like to pivot(?) these into a single row per record so it appears like this:
record_numb col_a col_b col_c
1 123 234 345
2 987 765 543
我玩过一些 melt/cast,但我是 R 的新手,我的一半问题是知道什么是可用的.有很多东西可以使用,我希望你们中的某个人可以指出我脑海中的一个包或功能.我进行的搜索表明我要熔化和铸造等,但我无法将其应用于这种情况.我愿意使用任何功能或包.
I played with melt/cast a bit, but I'm such a novice at R that half of my issue is knowing what is available to use. There is just so much to use that I'm hoping one of you can point me to a package or function off the top of your head. My searches I performed pointed me to melt and cast and such, but I was unable to apply it to this case. I'm open to using any function or package.
推荐答案
正如您在评论中建议您希望使用 data.table
解决方案,您可以使用
As you suggested that you would like a data.table
solution in your comment, you could use
library(data.table)
df <- data.table(record_numb,col_a,col_b,col_c)
df[, lapply(.SD, paste0, collapse=""), by=record_numb]
record_numb col_a col_b col_c
1: 1 123 234 543
2: 2 987 765 543
.SD
基本上说,获取我的 data.table 中的所有变量",除了 by 参数中的变量.在@Frank 的回答中,他使用.SDcols
减少了变量集.如果您想将变量转换为数字,您仍然可以在一行中执行此操作.这是一个链接方法.
.SD
basically says, "take all the variables in my data.table" except those in the by argument. In @Frank's answer, he reduces the set of the variables using .SDcols
. If you want to cast the variables into numeric, you can still do this in one line. Here is a chaining method.
df[, lapply(.SD, paste0, collapse=""), by=record_numb][, lapply(.SD, as.integer)]
第二个链"将所有变量转换为整数.
The second "chain" casts all the variables as integers.
这篇关于R将多行折叠成1行-相同的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!