距数据框中最接近的非NA值的距离 [英] Distance from the closest non NA value in a dataframe
问题描述
我具有以下数据框df,我想添加一列,该列的距离应为每行与最接近的非NA值的距离。
I have the following dataframe df and I want to add a column with the distance from the closest non NA value for each row.
df <- data.frame(x = 1:20)
df[c(1, 3, 4, 5, 11, 14, 15, 16), "x"] <- NA
换句话说,我正在寻找以下值:
In other words, I am looking for the following values:
df$distance <- c(1, 0, 1, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 2, 1, 0, 0, 0, 0)
如何自动执行此操作?
推荐答案
让 x
为包含 NA的向量
,您的问题是
a <- which(!is.na(x))
b <- which(is.na(x))
找到 min(abs(a-b [i]))
每 b [i]
。
使用R代码有效很难实现这种任务。用编译的代码编写循环通常是一个更好的选择。除非某些软件包中的某些功能已经为我们做到了。
This type of task is not easily to be accomplished efficiently with R code. Writing a loop with compiled code is generally a better choice; unless there is some function from some package that already does this for us.
以下是一些幼稚但简单的解决方案。
Some naive but straightforward solutions are the following.
如果 x
不太长,我们可以使用 outer
:
If x
is not too long, we can use outer
:
distance <- numeric(length(x))
distance[is.na(x)] <- apply(abs(outer(a, b, "-")), 2L, min)
如果时间较长且内存使用量为外部
成为问题,我们可能会这样做
If it is long and memory usage of outer
becomes a problem, we might do
distance <- numeric(length(x))
distance[is.na(x)] <- sapply(b, function (bi) min(abs(bi - a)))
请注意,鉴于该算法,所有方法都不是真正有效的。
Note, none of the methods is truly efficient in view of the algorithm.
这篇关于距数据框中最接近的非NA值的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!