通过仅使用data.table,将NA替换为data.table中的最后一个非NA [英] Replace NA with last non-NA in data.table by using only data.table

查看:338
本文介绍了通过仅使用data.table,将NA替换为data.table中的最后一个非NA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用 NA data.table / index.htmlrel =nofollow> data.table ,并使用 data.table 。我有一个解决方案,但它比 na.locf 慢得多:

I want to replace NA values with last non-NA values in data.table and using data.table. I have one solution, but it's considerably slower than na.locf:

library(data.table)
library(zoo)
library(microbenchmark)

f1 <- function(x) {
    x[, X := na.locf(X, na.rm = F)]
    x
}

f2 <- function(x) {
    cond <- !is.na(x[, X])
    x[, X := .SD[, X][1L], by = cumsum(cond)]
    x
}

m1 <- data.table(X = rep(c(NA,NA,1,2,NA,NA,NA,6,7,8), 100))
m2 <- data.table(X = rep(c(NA,NA,1,2,NA,NA,NA,6,7,8), 100))

microbenchmark(f1(m1), f2(m2), times = 10)

#Unit: milliseconds
#   expr        min          lq      median          uq         max neval
# f1(m1)   2.648938    2.770792    2.959156    3.894635    6.032533    10
# f2(m2) 994.267610 1916.250440 1926.420436 1941.401077 2008.929024    10

我想知道,为什么它很慢,解决方案存在与否。

I want to know, why it's so slow and whether a faster solution exists or not.

推荐答案

这是一个 data.table 解决方案,但它比 na.locf
$ b b

Here's a data.table-only solution, but it's slightly slower than na.locf:

m1[, X := X[1], by = cumsum(!is.na(X))]
m1
#       X
#   1: NA
#   2: NA
#   3:  1
#   4:  2
#   5:  2
#  ---   
# 996:  2
# 997:  2
# 998:  6
# 999:  7
#1000:  8

速度测试:

m1 <- data.table(X = rep(c(NA,NA,1,2,NA,NA,NA,6,7,8), 1e6))
f3 = function(x) x[, X := X[1], by = cumsum(!is.na(X))]

system.time(f1(copy(m1)))
# user  system elapsed 
# 3.84    0.58    4.62 
system.time(f3(copy(m1)))
# user  system elapsed 
# 5.56    0.19    6.04 

更快的速度,但我认为这使得它相当不可读性:

And here's a perverse way of making it faster, but I think one that makes it considerably less readable:

f4 = function(x) {
  x[, tmp := cumsum(!is.na(X))]
  setattr(x, "sorted", "tmp") # set the key without any checks
  x[x[!is.na(X)], X := i.X][, tmp := NULL]
}

system.time(f4(copy(m1)))
# user  system elapsed 
# 3.32    0.51    4.00 

这篇关于通过仅使用data.table,将NA替换为data.table中的最后一个非NA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆