通过仅使用data.table,将NA替换为data.table中的最后一个非NA [英] Replace NA with last non-NA in data.table by using only data.table
本文介绍了通过仅使用data.table,将NA替换为data.table中的最后一个非NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想用 NA data.table / index.htmlrel =nofollow> data.table
,并使用 data.table
。我有一个解决方案,但它比 na.locf
慢得多:
I want to replace NA
values with last non-NA values in data.table
and using data.table
. I have one solution, but it's considerably slower than na.locf
:
library(data.table)
library(zoo)
library(microbenchmark)
f1 <- function(x) {
x[, X := na.locf(X, na.rm = F)]
x
}
f2 <- function(x) {
cond <- !is.na(x[, X])
x[, X := .SD[, X][1L], by = cumsum(cond)]
x
}
m1 <- data.table(X = rep(c(NA,NA,1,2,NA,NA,NA,6,7,8), 100))
m2 <- data.table(X = rep(c(NA,NA,1,2,NA,NA,NA,6,7,8), 100))
microbenchmark(f1(m1), f2(m2), times = 10)
#Unit: milliseconds
# expr min lq median uq max neval
# f1(m1) 2.648938 2.770792 2.959156 3.894635 6.032533 10
# f2(m2) 994.267610 1916.250440 1926.420436 1941.401077 2008.929024 10
我想知道,为什么它很慢,解决方案存在与否。
I want to know, why it's so slow and whether a faster solution exists or not.
推荐答案
这是一个 data.table
解决方案,但它比 na.locf
:
$ b b
Here's a data.table
-only solution, but it's slightly slower than na.locf
:
m1[, X := X[1], by = cumsum(!is.na(X))]
m1
# X
# 1: NA
# 2: NA
# 3: 1
# 4: 2
# 5: 2
# ---
# 996: 2
# 997: 2
# 998: 6
# 999: 7
#1000: 8
速度测试:
m1 <- data.table(X = rep(c(NA,NA,1,2,NA,NA,NA,6,7,8), 1e6))
f3 = function(x) x[, X := X[1], by = cumsum(!is.na(X))]
system.time(f1(copy(m1)))
# user system elapsed
# 3.84 0.58 4.62
system.time(f3(copy(m1)))
# user system elapsed
# 5.56 0.19 6.04
更快的速度,但我认为这使得它相当不可读性:
And here's a perverse way of making it faster, but I think one that makes it considerably less readable:
f4 = function(x) {
x[, tmp := cumsum(!is.na(X))]
setattr(x, "sorted", "tmp") # set the key without any checks
x[x[!is.na(X)], X := i.X][, tmp := NULL]
}
system.time(f4(copy(m1)))
# user system elapsed
# 3.32 0.51 4.00
这篇关于通过仅使用data.table,将NA替换为data.table中的最后一个非NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文