R合并具有相似值的行 [英] R combine rows with similar values

查看：97 发布时间：2020/10/3 2:14:00 r dataframe dplyr diff cluster-analysis

本文介绍了R合并具有相似值的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框，并且行值首先从最小到最大排序。我计算相邻行之间的行值差，合并具有相似差（例如，小于1）的行，然后返回合并行的平均值。我可以使用for循环检查每一行的差异，但似乎是一种效率很低的方法。还有更好的主意吗？谢谢。

I have a dataframe and the row values are first ordered from smallest to largest. I compute row value differences between adjacent rows, combine rows with similar differences (e.g., smaller than 1), and return averaged values of combined rows. I could check each row differences with a for loop, but seems a very inefficient way. Any better ideas? Thanks.

library(dplyr)
DF <- data.frame(ID=letters[1:12],
                 Values=c(1, 2.2, 3, 5, 6.2, 6.8, 7, 8.5, 10, 12.2, 13, 14))
DF <- DF %>%
   mutate(Diff=c(0, diff(Values)))

DF的预期输出为是

ID        Values
a         1.0
b/c       2.6  # (2.2+3.0)/2
d         5.0
e/f/g     6.67 # (6.2+6.8+7.0)/3
h         8.5
i         10.0
j/k       12.6 # (12.2+13.0)/2
i         14.0

推荐答案

计算值之间的差每行，并检查它们是否> = 1 。 > = 1 的累积总和将为您提供不同的组，您可以在该组上进行汇总以得到所需的结果。

Calculate difference between Values of every row and check if those are >= 1. Cumulative sum of that >=1 will provide you distinct group on which one can summarize to get desired result.

library(dplyr)
DF %>% arrange(Values) %>%
  group_by(Diff = cumsum(c(1,diff(Values)) >= 1) ) %>%
  summarise(ID = paste0(ID, collapse = "/"), Values = mean(Values)) %>%
  ungroup() %>% select(-Diff)

# # A tibble: 8 x 2
# ID    Values
# <chr>  <dbl>
# 1 a       1.00
# 2 b/c     2.60
# 3 d       5.00
# 4 e/f/g   6.67
# 5 h       8.50
# 6 i      10.0 
# 7 j/k    12.6 
# 8 l      14.0

这篇关于R合并具有相似值的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R合并具有相似值的行 [英] R combine rows with similar values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R合并具有相似值的行 [英] R combine rows with similar values

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭