R - 根据另一个单元格值,按组连接数据帧中的单元格 [英] R - Concatenate cell in dataframe, by group, depending on another cell value
本文介绍了R - 根据另一个单元格值,按组连接数据帧中的单元格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下类型的数据集(第一行是标题):
I have a dataset of the following type (first row is the header):
content
始终是文本merge
总是合乎逻辑的
content
is always textmerge
is always a logical
id1 id2 start_line end_line content merge
A B 1 1 "aaaa" TRUE
A B 4 4 "aa mm" TRUE
A B 5 5 "boool" TRUE
A B 6 6 "omw" TRUE
C D 6 6 "hear!" TRUE
C D 7 7 " me out!" TRUE
C D 21 21 "hello" FALSE
问题:我需要按照一个非常具体的标准进行合并:
Problem: I need to merge following a very specific criteria:
- 具有
merge = FALSE
的行必须保持原样 - 具有相同
id1
、相同id2
和连续start_line
的行:- 需要附加在列
content
end_line
值需要更改为最后一行
- Rows that have
merge = FALSE
must remain as is - Rows that have: same
id1
, sameid2
and consecutivestart_line
:- Need to be appended on the column
content
- The
end_line
value needs to change to the last row
所以,预期的结果是:
id1 id2 start_line end_line content merge A B 1 1 "aaaa" TRUE A B 4 6 "aa mm boool omw" TRUE C D 6 7 "hear! me out!" TRUE C D 21 21 "hello" FALSE
在示例中注意:
- 最小合并是两行(ID 示例:C-D,最初是第 6 行和第 7 行)
- 可以合并多行(ids A-B 的例子,最初是第 2、3、4 行)
我尝试了一系列非常大且效率低下的循环,它们只合并了两行.这就是为什么我不在这里发布我的尝试.
I have attempted a very large, and inefficient series of loops, that only merge two lines. That is why I am not posting my attempt here.
推荐答案
使用
dplyr
你可以试试:library(dplyr) df %>% group_by(id1, id2, grp = cumsum(c(TRUE, diff(start_line) > 1))) %>% summarise(start_line = first(start_line), end_line = last(end_line), content = paste(content, collapse = " "), merge = any(merge)) # id1 id2 grp start_line end_line content merge # <chr> <chr> <int> <int> <int> <chr> <lgl> #1 A B 1 1 1 aaaa TRUE #2 A B 2 4 6 aa mm boool omw TRUE #3 C D 2 6 7 hear! me out! TRUE #4 C D 3 21 21 hello FALSE
数据
df <- structure(list(id1 = c("A", "A", "A", "A", "C", "C", "C"), id2 = c("B", "B", "B", "B", "D", "D", "D"), start_line = c(1L, 4L, 5L, 6L, 6L, 7L, 21L), end_line = c(1L, 4L, 5L, 6L, 6L, 7L, 21L), content = c("aaaa", "aa mm", "boool", "omw", "hear!", " me out!", "hello"), merge = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA, -7L))
这篇关于R - 根据另一个单元格值,按组连接数据帧中的单元格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- Need to be appended on the column
- 需要附加在列
查看全文