按ID和日期合并行 [英] Merge Rows by ID and Date
问题描述
我是R的新手,我一直在研究如何解决以下问题.
I am newbie at R and I have been searching on how to solve the following problem.
我有一个看起来像df的
I have a df that looks like:
id ------------日期------------ OB1 ------ OB2 ----- OB3
1 ------- 2017-01-01 --------- 1 --------- 0 --------- 0
2 ------- 2006-01-05 --------- 1 --------- 0 --------- 0
2 ------- 2007-04-19 --------- 0 --------- 1 --------- 0
3 ------- 2015-02-23 --------- 0 --------- 0 --------- 1
3 ------- 2015-02-23 --------- 1 --------- 0 --------- 0
id------------Date ------------OB1------ OB2----- OB3
1 ------- 2017-01-01 --------- 1 --------- 0--------- 0
2 ------- 2006-01-05 --------- 1 --------- 0--------- 0
2 ------- 2007-04-19 --------- 0 --------- 1--------- 0
3 ------- 2015-02-23 --------- 0 --------- 0--------- 1
3 ------- 2015-02-23 --------- 1 --------- 0--------- 0
我要实现的目标显示在这里:
What I have to achieve is shown here:
id ------------日期------------ OB1 ------ OB2 ----- OB3
1 ------- 2017-01-01 --------- 1 --------- 0 --------- 0
2 ------- 2006-01-05 --------- 1 --------- 0 --------- 0
2 ------- 2007-04-19 --------- 0 --------- 1 --------- 0
3 ------- 2015-02-23 --------- 1 --------- 0-- ------ 1
id------------Date ------------OB1------ OB2----- OB3
1 ------- 2017-01-01 --------- 1 --------- 0--------- 0
2 ------- 2006-01-05 --------- 1 --------- 0--------- 0
2 ------- 2007-04-19 --------- 0 --------- 1--------- 0
3 ------- 2015-02-23 --------- 1 --------- 0--------- 1
这是按ID和日期组合行.
This is, to combine rows, by id and date.
如果日期中OB3的值为'1',而同一日期(对于相同的ID)的OB1的值为'1',则OB1的结果必须为'1','OB3的值必须为'1'和一个日期
If there is value '1' for OB3 in a date and value '1' for OB1 in the same date (for the same ID) the result must be value '1' for OB1, value '1' for 'OB3' and a single date
我一直在尝试应用此处说明的一些解决方案: 将具有相同值的行合并到多列中
I have been trying to apply some solutions explained here: Merge rows having same values in multiple columns
但这没用
OB1,OB2,OBS3是布尔值 感谢您的帮助!
OB1, OB2, OBS3 are boolean values Thanks for your help!
aggregate(.〜ID + Date,df,any)有效!
EDIT 2: aggregate(. ~ ID + Date, df, any) works!
输入数据
structure(list(ID = c(-1L, 1L, 1L), Date = c("2008-01-15", "2011-01-21", "2011-01-21"), `OBS1` = c(0, 0, 0), `OBS2` = c(0, 0, 0), `OBS3` = c(0, 0, 0), `OBS4` = c(0, 0, 0), `OBS5` = c(0, 0, 0), `OBS6` = c(0, 1, 0)), .Names = c("ID", "Date", "OBS1", "OBS2", "OBS3", "OBS4", "OBS5", "OBS6"), row.names = c(NA, 3L), class = "data.frame")
输出数据
structure(list(ID = c(-1L, 1L), Date = c("2008-01-15", "2011-01-21"), `OBS1` = c(FALSE, FALSE), `OBS2` = c(FALSE, FALSE), `OBS3` = c(FALSE, FALSE), `OBS4` = c(FALSE, FALSE), `OBS5` = c(FALSE, FALSE), `OBS6` = c(FALSE, TRUE)), .Names = c("ID", "Date", "OBS1", "OBS2", "OBS3", "OBS4", "OBS5", "OBS6"), row.names = c(NA, -2L), class = "data.frame")
推荐答案
已经使用base R的aggregate()
函数回答了该问题.
The question already has been answered using base R's aggregate()
function.
但是,我感到很难将问题中打印的样本数据集转换为可重现的示例( 之前,OP编辑了问题以包含dput()
的结果).
However, I felt challenged to turn the sample dataset as printed in the question into a reproducible example (before the OP edited the question to include the results of dput()
).
此外,OP还提到他有一个非常大的df" ,这可能值得尝试使用data.table
方法.
In addition, the OP has mentioned he has a "very large df" which might be worthwhile to try a data.table
approach.
library(magrittr)
library(data.table)
df <- readr::read_file(
"id------------Date ------------OB1------ OB2----- OB3
1 ------- 2017-01-01 --------- 1 --------- 0--------- 0
2 ------- 2006-01-05 --------- 1 --------- 0--------- 0
2 ------- 2007-04-19 --------- 0 --------- 1--------- 0
3 ------- 2015-02-23 --------- 0 --------- 0--------- 1
3 ------- 2015-02-23 --------- 1 --------- 0--------- 0"
) %>% stringr::str_replace_all("[-]{2,}", " ") %>%
fread()
df
id Date OB1 OB2 OB3
1: 1 2017-01-01 TRUE FALSE FALSE
2: 2 2006-01-05 TRUE FALSE FALSE
3: 2 2007-04-19 FALSE TRUE FALSE
4: 3 2015-02-23 FALSE FALSE TRUE
5: 3 2015-02-23 TRUE FALSE FALSE
请注意,fread()
已自动识别出布尔列.
Note that fread()
has recognised automatically the boolean columns.
library(data.table)
setDT(df)[, lapply(.SD, any), by = .(id, Date)]
id Date OB1 OB2 OB3
1: 1 2017-01-01 TRUE FALSE FALSE
2: 2 2006-01-05 TRUE FALSE FALSE
3: 2 2007-04-19 FALSE TRUE FALSE
4: 3 2015-02-23 TRUE FALSE TRUE
如果OP期望整数值0
和1
而不是逻辑值,则可以一次性创建:
In case, the OP expects integer values 0
and 1
instead of logical values, these can be created in one go:
setDT(df)[, lapply(.SD, function(x) as.integer(any(x))), by = .(id, Date)]
id Date OB1 OB2 OB3
1: 1 2017-01-01 1 0 0
2: 2 2006-01-05 1 0 0
3: 2 2007-04-19 0 1 0
4: 3 2015-02-23 1 0 1
这篇关于按ID和日期合并行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!