使用值替换R数据表中的所有缺失值 [英] Replacing all missing values in R data.table with a value

查看：479 发布时间：2017/3/12 10:25:24 r data.table

本文介绍了使用值替换R数据表中的所有缺失值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如果你有一个R data.table有缺失值，你如何替换所有的值，比如说，值0？ Eg

If you have an R data.table that has missing values, how do you replace all of them with say, the value 0? E.g.

aa = data.table(V1=1:10,V2=c(1,2,2,3,3,3,4,4,4,4))
bb = data.table(V1=3:6,X=letters[1:4])
setkey(aa,V1)
setkey(bb,V1)
tt = bb[aa]

    V1  X V2
 1:  1 NA  1
 2:  2 NA  2
 3:  3  a  2
 4:  4  b  3
 5:  5  c  3
 6:  6  d  3
 7:  7 NA  4
 8:  8 NA  4
 9:  9 NA  4
10: 10 NA  4

这一行在一行？如果它只是一个矩阵，你可以这样做：

Any way to do this in one line? If it were just a matrix, you could just do:

tt[is.na(tt)] = 0

推荐答案

is.na （作为原语）具有相对非常少的开销，并且通常相当快。所以，你可以循环通过列，并使用 set 将 NA替换为 0'。

is.na (being a primitive) has relatively very less overhead and is usually quite fast. So, you can just loop through the columns and use set to replace NA with0`.

使用< - 分配会产生所有列的副本，这不是惯用的方式使用 data.table 。

Using <- to assign will result in a copy of all the columns and this is not the idiomatic way using data.table.

首先我将演示如何做，然后显示如何慢这可以获得巨大的数据（由于副本）：

First I'll illustrate as to how to do it and then show how slow this can get on huge data (due to the copy):

for (i in seq_along(tt)) set(tt, i=which(is.na(tt[[i]])), j=i, value=0)

你会得到一个警告，0被强制转换为字符匹配列的类型。您可以忽略它。

You'll get a warning here that "0" is being coerced to character to match the type of column. You can ignore it.

# by reference - idiomatic way
set.seed(45)
tt <- data.table(matrix(sample(c(NA, rnorm(10)), 1e7*3, TRUE), ncol=3))
tracemem(tt)
# modifies value by reference - no copy
system.time({
for (i in seq_along(tt)) 
    set(tt, i=which(is.na(tt[[i]])), j=i, value=0)
})
#   user  system elapsed 
#  0.284   0.083   0.386 

# by copy - NOT the idiomatic way
set.seed(45)
tt <- data.table(matrix(sample(c(NA, rnorm(10)), 1e7*3, TRUE), ncol=3))
tracemem(tt)
# makes copy
system.time({tt[is.na(tt)] <- 0})
# a bunch of "tracemem" output showing the copies being made
#   user  system elapsed 
#  4.110   0.976   5.187

这篇关于使用值替换R数据表中的所有缺失值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用值替换R数据表中的所有缺失值 [英] Replacing all missing values in R data.table with a value

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用值替换R数据表中的所有缺失值 [英] Replacing all missing values in R data.table with a value

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭