距数据框中最接近的非NA值的距离 [英] Distance from the closest non NA value in a dataframe

查看:37
本文介绍了距数据框中最接近的非NA值的距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我具有以下数据框df,我想添加一列,该列的距离应为每行与最接近的非NA值的距离。

I have the following dataframe df and I want to add a column with the distance from the closest non NA value for each row.

df <- data.frame(x = 1:20)
df[c(1, 3, 4, 5, 11, 14, 15, 16), "x"] <-  NA

换句话说,我正在寻找以下值:

In other words, I am looking for the following values:

df$distance <- c(1, 0, 1, 2, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 2, 1, 0, 0, 0, 0)

如何自动执行此操作?

推荐答案

x 为包含 NA的向量,您的问题是

a <- which(!is.na(x))
b <- which(is.na(x))

找到 min(abs(a-b [i])) b [i]

使用R代码有效很难实现这种任务。用编译的代码编写循环通常是一个更好的选择。除非某些软件包中的某些功能已经为我们做到了。

This type of task is not easily to be accomplished efficiently with R code. Writing a loop with compiled code is generally a better choice; unless there is some function from some package that already does this for us.

以下是一些幼稚但简单的解决方案。

Some naive but straightforward solutions are the following.

如果 x 不太长,我们可以使用 outer

If x is not too long, we can use outer:

distance <- numeric(length(x))
distance[is.na(x)] <- apply(abs(outer(a, b, "-")), 2L, min)

如果时间较长且内存使用量为外部成为问题,我们可能会这样做

If it is long and memory usage of outer becomes a problem, we might do

distance <- numeric(length(x))
distance[is.na(x)] <- sapply(b, function (bi) min(abs(bi - a)))

请注意,鉴于该算法,所有方法都不是真正有效的。

Note, none of the methods is truly efficient in view of the algorithm.

这篇关于距数据框中最接近的非NA值的距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆