dplyr变量列,其外部列表中的值最近 [英] dplyr mutate column with nearest value in external list

查看:56
本文介绍了dplyr变量列,其外部列表中的值最近的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对一列进行突变,如果出现匹配项,则使用列表中的精确匹配项进行填充,如果没有,则使用最接近的匹配项.

I'm trying to mutate a column and populate it with exact matches from a list if those occur, and if not, the closest match possible.

我的数据框如下:

index <- seq(1, 10, 1)
blockID <- c(100, 120, 132, 133, 201, 207, 210, 238, 240, 256)
df <- as.data.frame(cbind(index, blockID))

   index blockID
1      1     100
2      2     120
3      3     132
4      4     133
5      5     201
6      6     207
7      7     210
8      8     238
9      9     240
10    10     256

我要mutate一个新列,该列检查blockID是否在列表中.如果是,则应仅保留blockID的值.如果不是,则应返回blocklist中最接近的值:

I want to mutate a new column that checks whether blockID is in a list. If yes, it should just keep the value of blockID. If not, It should return the nearest value in blocklist:

blocklist <- c(100, 120, 130, 150, 201, 205, 210, 238, 240, 256) 

因此其他列应包含

100 (match), 
120 (match), 
130 (no match for 132--nearest value is 130), 
130 (no match for 133--nearest value is 130), 
201, 
205 (no match for 207--nearest value is 205), 
210, 
238, 
240, 
256 

这是我尝试过的:

df2 <- df %>% mutate(blockmatch = ifelse(blockID %in% blocklist, blockID, ifelse(match.closest(blockID, blocklist, tolerance = Inf), "missing")))

我只是放入"missing"来完成ifelse()语句,但是实际上不应在任何地方返回它,因为对于blockID的每个值,上述情况都将得到满足.但是,生成的df2在所有应替换最近的数字的单元格中都只是丢失".我知道match.closest有基本的R替代方案,但我不确定这是问题所在.有任何想法吗?

I just put in "missing" to complete the ifelse() statements, but it shouldn't actually be returned anywhere since the preceding cases will be fulfilled for every value of blockID. However, the resulting df2 just has "missing" in all the cells where it should have substituted the nearest number. I know there are base R alternatives to match.closest but I'm not sure that's the problem. Any ideas?

推荐答案

您不需要if..else.通过说与blockID相比,我们总是得到 最小绝对差 blocklist元素,可以简化您的规则.如果值匹配,则绝对差为0(将始终是最小).

You don't need if..else. Your rule can simplified by saying that we always get the blocklist element with least absolute difference when compared to blockID. If values match then absolute difference is 0 (which will always be the least).

有了这,这是一个简单的基础R解决方案-

With that here's a simple base R solution -

df$blockmatch <- sapply(df$blockID, function(x) blocklist[order(abs(x - blocklist))][1])

   index blockID blockmatch
1      1     100        100
2      2     120        120
3      3     132        130
4      4     133        130
5      5     201        201
6      6     207        205
7      7     210        210
8      8     238        238
9      9     240        240
10    10     256        256

以下是dplyr-

df %>% 
  rowwise() %>% 
  mutate(
    blockmatch = blocklist[order(abs(blockID - blocklist))][1]
  )

df %>% 
  mutate(
    blockmatch = sapply(blockID, function(x) blocklist[order(abs(x - blocklist))][1])
  )

感谢@Onyambu,这是一种更快的方法-

Thanks to @Onyambu, here's a faster way -

df$blockmatch <- blocklist[max.col(-abs(sapply(blocklist, '-', df$blockID)))]

这篇关于dplyr变量列,其外部列表中的值最近的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆