R合并具有相似值的行 [英] R combine rows with similar values

查看:97
本文介绍了R合并具有相似值的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,并且行值首先从最小到最大排序。我计算相邻行之间的行值差,合并具有相似差(例如,小于1)的行,然后返回合并行的平均值。我可以使用for循环检查每一行的差异,但似乎是一种效率很低的方法。还有更好的主意吗?谢谢。

I have a dataframe and the row values are first ordered from smallest to largest. I compute row value differences between adjacent rows, combine rows with similar differences (e.g., smaller than 1), and return averaged values of combined rows. I could check each row differences with a for loop, but seems a very inefficient way. Any better ideas? Thanks.

library(dplyr)
DF <- data.frame(ID=letters[1:12],
                 Values=c(1, 2.2, 3, 5, 6.2, 6.8, 7, 8.5, 10, 12.2, 13, 14))
DF <- DF %>%
   mutate(Diff=c(0, diff(Values)))

DF的预期输出为是

ID        Values
a         1.0
b/c       2.6  # (2.2+3.0)/2
d         5.0
e/f/g     6.67 # (6.2+6.8+7.0)/3
h         8.5
i         10.0
j/k       12.6 # (12.2+13.0)/2
i         14.0


推荐答案

计算值之间的差每行,并检查它们是否> = 1 > = 1 的累积总和将为您提供不同的组,您可以在该组上进行汇总以得到所需的结果。

Calculate difference between Values of every row and check if those are >= 1. Cumulative sum of that >=1 will provide you distinct group on which one can summarize to get desired result.

library(dplyr)
DF %>% arrange(Values) %>%
  group_by(Diff = cumsum(c(1,diff(Values)) >= 1) ) %>%
  summarise(ID = paste0(ID, collapse = "/"), Values = mean(Values)) %>%
  ungroup() %>% select(-Diff)

# # A tibble: 8 x 2
# ID    Values
# <chr>  <dbl>
# 1 a       1.00
# 2 b/c     2.60
# 3 d       5.00
# 4 e/f/g   6.67
# 5 h       8.50
# 6 i      10.0 
# 7 j/k    12.6 
# 8 l      14.0 

这篇关于R合并具有相似值的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆