用一个值替换 R data.table 中的所有缺失值 [英] Replacing all missing values in R data.table with a value

查看:19
本文介绍了用一个值替换 R data.table 中的所有缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果你有一个 R data.table 有缺失值,你如何将它们全部替换为值 0?例如

If you have an R data.table that has missing values, how do you replace all of them with say, the value 0? E.g.

aa = data.table(V1=1:10,V2=c(1,2,2,3,3,3,4,4,4,4))
bb = data.table(V1=3:6,X=letters[1:4])
setkey(aa,V1)
setkey(bb,V1)
tt = bb[aa]

    V1  X V2
 1:  1 NA  1
 2:  2 NA  2
 3:  3  a  2
 4:  4  b  3
 5:  5  c  3
 6:  6  d  3
 7:  7 NA  4
 8:  8 NA  4
 9:  9 NA  4
10: 10 NA  4

有什么方法可以在一行中做到这一点?如果它只是一个矩阵,你可以这样做:

Any way to do this in one line? If it were just a matrix, you could just do:

tt[is.na(tt)] = 0

推荐答案

is.na(作为一个原语)具有相对较少的开销并且通常非常快.因此,您可以遍历列并使用 setNA 替换为0`.

is.na (being a primitive) has relatively very less overhead and is usually quite fast. So, you can just loop through the columns and use set to replace NA with0`.

使用 <- 进行分配将导致 all 列的副本,这不是使用 data.table 的惯用方式.

Using <- to assign will result in a copy of all the columns and this is not the idiomatic way using data.table.

首先我将说明如何做到这一点,然后展示如何缓慢处理大量数据(由于副本):

First I'll illustrate as to how to do it and then show how slow this can get on huge data (due to the copy):

for (i in seq_along(tt)) set(tt, i=which(is.na(tt[[i]])), j=i, value=0)

您会在此处收到警告,0"被强制转换为字符以匹配列的类型.你可以忽略它.

You'll get a warning here that "0" is being coerced to character to match the type of column. You can ignore it.

# by reference - idiomatic way
set.seed(45)
tt <- data.table(matrix(sample(c(NA, rnorm(10)), 1e7*3, TRUE), ncol=3))
tracemem(tt)
# modifies value by reference - no copy
system.time({
for (i in seq_along(tt)) 
    set(tt, i=which(is.na(tt[[i]])), j=i, value=0)
})
#   user  system elapsed 
#  0.284   0.083   0.386 

# by copy - NOT the idiomatic way
set.seed(45)
tt <- data.table(matrix(sample(c(NA, rnorm(10)), 1e7*3, TRUE), ncol=3))
tracemem(tt)
# makes copy
system.time({tt[is.na(tt)] <- 0})
# a bunch of "tracemem" output showing the copies being made
#   user  system elapsed 
#  4.110   0.976   5.187 

这篇关于用一个值替换 R data.table 中的所有缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆