用最新的非NA值替换NA [英] Replacing NAs with latest non-NA value

查看:119
本文介绍了用最新的非NA值替换NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在data.frame(或data.table)中,我想填充具有最接近的非NA值的NA。使用向量(而不是 data.frame )的一个简单示例如下:

In a data.frame (or data.table), I would like to "fill forward" NAs with the closest previous non-NA value. A simple example, using vectors (instead of a data.frame) is the following:

> y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)

我想要一个函数 fill.NAs(),它允许我构造 yy

I would like a function fill.NAs() that allows me to construct yy such that:

> yy
[1] NA NA NA  2  2  2  2  3  3  3  4  4



需要对许多(总〜1Tb)小型 data.frame s(〜30-50Mb)重复此操作,其中行是NA,其所有条目。什么是处理问题的好方法?

I need to repeat this operation for many (total ~1 Tb) small sized data.frames (~30-50 Mb), where a row is NA is all its entries are. What is a good way to approach the problem?

我煮的丑陋的解决方案使用这个功能:

The ugly solution I cooked up uses this function:

last <- function (x){
    x[length(x)]
}    

fill.NAs <- function(isNA){
if (isNA[1] == 1) {
    isNA[1:max({which(isNA==0)[1]-1},1)] <- 0 # first is NAs 
                                              # can't be forward filled
}
isNA.neg <- isNA.pos <- isNA.diff <- diff(isNA)
isNA.pos[isNA.diff < 0] <- 0
isNA.neg[isNA.diff > 0] <- 0
which.isNA.neg <- which(as.logical(isNA.neg))
if (length(which.isNA.neg)==0) return(NULL) # generates warnings later, but works
which.isNA.pos <- which(as.logical(isNA.pos))
which.isNA <- which(as.logical(isNA))
if (length(which.isNA.neg)==length(which.isNA.pos)){
    replacement <- rep(which.isNA.pos[2:length(which.isNA.neg)], 
                                which.isNA.neg[2:max(length(which.isNA.neg)-1,2)] - 
                                which.isNA.pos[1:max(length(which.isNA.neg)-1,1)])      
    replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))
} else {
    replacement <- rep(which.isNA.pos[1:length(which.isNA.neg)], which.isNA.neg - which.isNA.pos[1:length(which.isNA.neg)])     
    replacement <- c(replacement, rep(last(which.isNA.pos), last(which.isNA) - last(which.isNA.pos)))
}
replacement
}

函数 fill.NAs 使用如下:

y <- c(NA, 2, 2, NA, NA, 3, NA, 4, NA, NA)
isNA <- as.numeric(is.na(y))
replacement <- fill.NAs(isNA)
if (length(replacement)){
which.isNA <- which(as.logical(isNA))
to.replace <- which.isNA[which(isNA==0)[1]:length(which.isNA)]
y[to.replace] <- y[replacement]
} 

输出

Output

> y
[1] NA  2  2  2  2  3  3  3  4  4  4

这似乎工作。但是,男人,是丑陋!任何建议?

... which seems to work. But, man, is it ugly! Any suggestions?

推荐答案

您可能想使用 na.locf()函数从动物园包中继承最后一次观察结果以替换您的NA值。

You probably want to use the na.locf() function from the zoo package to carry the last observation forward to replace your NA values.

以下是帮助页面中其使用示例的开头:

Here is the beginning of its usage example from the help page:

> example(na.locf)

na.lcf> az <- zoo(1:6)

na.lcf> bz <- zoo(c(2,NA,1,4,5,2))

na.lcf> na.locf(bz)
1 2 3 4 5 6 
2 2 1 4 5 2 

na.lcf> na.locf(bz, fromLast = TRUE)
1 2 3 4 5 6 
2 1 1 4 5 2 

na.lcf> cz <- zoo(c(NA,9,3,2,3,2))

na.lcf> na.locf(cz)
2 3 4 5 6 
9 3 2 3 2 

这篇关于用最新的非NA值替换NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆