dplyr变量列,其外部列表中的值最近 [英] dplyr mutate column with nearest value in external list
问题描述
我正在尝试对一列进行突变,如果出现匹配项,则使用列表中的精确匹配项进行填充,如果没有,则使用最接近的匹配项.
I'm trying to mutate a column and populate it with exact matches from a list if those occur, and if not, the closest match possible.
我的数据框如下:
index <- seq(1, 10, 1)
blockID <- c(100, 120, 132, 133, 201, 207, 210, 238, 240, 256)
df <- as.data.frame(cbind(index, blockID))
index blockID
1 1 100
2 2 120
3 3 132
4 4 133
5 5 201
6 6 207
7 7 210
8 8 238
9 9 240
10 10 256
我要mutate
一个新列,该列检查blockID
是否在列表中.如果是,则应仅保留blockID
的值.如果不是,则应返回blocklist
中最接近的值:
I want to mutate
a new column that checks whether blockID
is in a list. If yes, it should just keep the value of blockID
. If not, It should return the nearest value in blocklist
:
blocklist <- c(100, 120, 130, 150, 201, 205, 210, 238, 240, 256)
因此其他列应包含
100 (match),
120 (match),
130 (no match for 132--nearest value is 130),
130 (no match for 133--nearest value is 130),
201,
205 (no match for 207--nearest value is 205),
210,
238,
240,
256
这是我尝试过的:
df2 <- df %>% mutate(blockmatch = ifelse(blockID %in% blocklist, blockID, ifelse(match.closest(blockID, blocklist, tolerance = Inf), "missing")))
我只是放入"missing"
来完成ifelse()
语句,但是实际上不应在任何地方返回它,因为对于blockID
的每个值,上述情况都将得到满足.但是,生成的df2在所有应替换最近的数字的单元格中都只是丢失".我知道match.closest
有基本的R替代方案,但我不确定这是问题所在.有任何想法吗?
I just put in "missing"
to complete the ifelse()
statements, but it shouldn't actually be returned anywhere since the preceding cases will be fulfilled for every value of blockID
. However, the resulting df2 just has "missing" in all the cells where it should have substituted the nearest number. I know there are base R alternatives to match.closest
but I'm not sure that's the problem. Any ideas?
推荐答案
您不需要if..else
.通过说与blockID
相比,我们总是得到 最小绝对差 的blocklist
元素,可以简化您的规则.如果值匹配,则绝对差为0(将始终是最小).
You don't need if..else
. Your rule can simplified by saying that we always get the blocklist
element with least absolute difference when compared to blockID
. If values match then absolute difference is 0 (which will always be the least).
有了这,这是一个简单的基础R解决方案-
With that here's a simple base R solution -
df$blockmatch <- sapply(df$blockID, function(x) blocklist[order(abs(x - blocklist))][1])
index blockID blockmatch
1 1 100 100
2 2 120 120
3 3 132 130
4 4 133 130
5 5 201 201
6 6 207 205
7 7 210 210
8 8 238 238
9 9 240 240
10 10 256 256
以下是dplyr
-
df %>%
rowwise() %>%
mutate(
blockmatch = blocklist[order(abs(blockID - blocklist))][1]
)
df %>%
mutate(
blockmatch = sapply(blockID, function(x) blocklist[order(abs(x - blocklist))][1])
)
感谢@Onyambu,这是一种更快的方法-
Thanks to @Onyambu, here's a faster way -
df$blockmatch <- blocklist[max.col(-abs(sapply(blocklist, '-', df$blockID)))]
这篇关于dplyr变量列,其外部列表中的值最近的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!