用最接近的值替换R中的NA [英] Replacing NAs in R with nearest value

查看：110 发布时间：2020/5/9 23:12:47 r na missing-data

本文介绍了用最接近的值替换R中的NA的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找与zoo包中的na.locf()类似的东西，但我不想一直使用 previous 非NA值，而是想使用最近非NA值.一些示例数据:

I'm looking for something similar to na.locf() in the zoo package, but instead of always using the previous non-NA value I'd like to use the nearest non-NA value. Some example data:

dat <- c(1, 3, NA, NA, 5, 7)

将NA替换为na.locf(结转3个):

Replacing NA with na.locf (3 is carried forward):

library(zoo)
na.locf(dat)
# 1 3 3 3 5 7

将fromLast设置为TRUE的

和na.locf(5向后携带):

and na.locf with fromLast set to TRUE (5 is carried backwards):

na.locf(dat, fromLast = TRUE)
# 1 3 5 5 5 7

但是我希望使用最近的非NA值.在我的示例中，这意味着应将3向前携带到第一个NA，将5向后携带到第二个NA:

But I wish the nearest non-NA value to be used. In my example this means that the 3 should be carried forward to the first NA, and the 5 should be carried backwards to the second NA:

1 3 3 5 5 7

我已经编写了一个解决方案，但是想确保我没有重新发明轮子.已经有东西漂浮了吗?

I have a solution coded up, but wanted to make sure that I wasn't reinventing the wheel. Is there something already floating around?

仅供参考，我当前的代码如下.也许没有别的，有人可以建议如何提高它的效率.我觉得我缺少一种明显的改进方法:

FYI, my current code is as follows. Perhaps if nothing else, someone can suggest how to make it more efficient. I feel like I'm missing an obvious way to improve this:

  na.pos <- which(is.na(dat))
  if (length(na.pos) == length(dat)) {
    return(dat)
  }
  non.na.pos <- setdiff(seq_along(dat), na.pos)
  nearest.non.na.pos <- sapply(na.pos, function(x) {
    return(which.min(abs(non.na.pos - x)))
  })
  dat[na.pos] <- dat[non.na.pos[nearest.non.na.pos]]

要回答以下smci的问题:

To answer smci's questions below:

否，任何条目都可以不适用
如果全部都不适用，请保留
不.我当前的解决方案默认为最接近的左侧值，但这没关系
这些行通常是几十万个元素，因此理论上上限是几十万个.实际上，这里只不过是少数几个而已.在那里，通常是一个.

更新因此，事实证明，我们完全朝着不同的方向发展，但这仍然是一个有趣的讨论.谢谢大家！

Update So it turns out that we're going in a different direction altogether but this was still an interesting discussion. Thanks all!

推荐答案

这是一个非常快的方法.它使用 findInterval 来查找应该定位的两个位置考虑原始数据中的每个NA:

Here is a very fast one. It uses findInterval to find what two positions should be considered for each NA in your original data:

f1 <- function(dat) {
  N <- length(dat)
  na.pos <- which(is.na(dat))
  if (length(na.pos) %in% c(0, N)) {
    return(dat)
  }
  non.na.pos <- which(!is.na(dat))
  intervals  <- findInterval(na.pos, non.na.pos,
                             all.inside = TRUE)
  left.pos   <- non.na.pos[pmax(1, intervals)]
  right.pos  <- non.na.pos[pmin(N, intervals+1)]
  left.dist  <- na.pos - left.pos
  right.dist <- right.pos - na.pos

  dat[na.pos] <- ifelse(left.dist <= right.dist,
                        dat[left.pos], dat[right.pos])
  return(dat)
}

在这里我对其进行测试:

And here I test it:

# sample data, suggested by @JeffAllen
dat <- as.integer(runif(50000, min=0, max=10))
dat[dat==0] <- NA

# computation times
system.time(r0 <- f0(dat))    # your function
# user  system elapsed 
# 5.52    0.00    5.52
system.time(r1 <- f1(dat))    # this function
# user  system elapsed 
# 0.01    0.00    0.03
identical(r0, r1)
# [1] TRUE

这篇关于用最接近的值替换R中的NA的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用最接近的值替换R中的NA [英] Replacing NAs in R with nearest value

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

用最接近的值替换R中的NA [英] Replacing NAs in R with nearest value

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭