R合并具有相似值的行 [英] R combine rows with similar values
问题描述
我有一个数据框,并且行值首先从最小到最大排序。我计算相邻行之间的行值差,合并具有相似差(例如,小于1)的行,然后返回合并行的平均值。我可以使用for循环检查每一行的差异,但似乎是一种效率很低的方法。还有更好的主意吗?谢谢。
I have a dataframe and the row values are first ordered from smallest to largest. I compute row value differences between adjacent rows, combine rows with similar differences (e.g., smaller than 1), and return averaged values of combined rows. I could check each row differences with a for loop, but seems a very inefficient way. Any better ideas? Thanks.
library(dplyr)
DF <- data.frame(ID=letters[1:12],
Values=c(1, 2.2, 3, 5, 6.2, 6.8, 7, 8.5, 10, 12.2, 13, 14))
DF <- DF %>%
mutate(Diff=c(0, diff(Values)))
DF的预期输出为是
ID Values
a 1.0
b/c 2.6 # (2.2+3.0)/2
d 5.0
e/f/g 6.67 # (6.2+6.8+7.0)/3
h 8.5
i 10.0
j/k 12.6 # (12.2+13.0)/2
i 14.0
推荐答案
计算值之间的差
每行,并检查它们是否> = 1
。 > = 1
的累积总和将为您提供不同的组,您可以在该组上进行汇总
以得到所需的结果。
Calculate difference between Values
of every row and check if those are >= 1
. Cumulative sum of that >=1
will provide you distinct group on which one can summarize
to get desired result.
library(dplyr)
DF %>% arrange(Values) %>%
group_by(Diff = cumsum(c(1,diff(Values)) >= 1) ) %>%
summarise(ID = paste0(ID, collapse = "/"), Values = mean(Values)) %>%
ungroup() %>% select(-Diff)
# # A tibble: 8 x 2
# ID Values
# <chr> <dbl>
# 1 a 1.00
# 2 b/c 2.60
# 3 d 5.00
# 4 e/f/g 6.67
# 5 h 8.50
# 6 i 10.0
# 7 j/k 12.6
# 8 l 14.0
这篇关于R合并具有相似值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!