R中的NA替换功能 [英] NA replace function in R

查看:153
本文介绍了R中的NA替换功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将矩阵 mat 中的NA替换为零。我正在使用 mat [is.na(mat)]<-0 。当我有94531个观测值矩阵(对18946个变量或更小的观测值)时,它工作良好,但是我在112039个观测值矩阵(对22752个变量)上进行了尝试,R显示了错误:

I'm trying to replace NA in matrix - mat - by zeros. I'm using mat[is.na(mat)] <- 0. When I have matrix of 94531 observations of 18946 variables or smaller it works good but I try it on matrix of 112039 observations of 22752 variables, R shows an error:


if(!nreplace)return(x)中的错误:缺少值,其中需要TRUE / FALSE

另外:警告消息:

总和(i,na .rm = TRUE):整数溢出-使用sum(as.numeric(。))

Error in if (!nreplace) return(x) : missing value where TRUE/FALSE needed
In addition: Warning message:
In sum(i, na.rm = TRUE) : integer overflow - use sum(as.numeric(.))

我不知道我在做什么m做错了,我不明白错误。

I don't know what I'm doing wrong and I don't understand the error.

以下是我的数据结构的示例。

Here is an example of the structure of my data.

small data.matrix :(由真实数据源制成)

small data.matrix: (made from real data source)

> str(mat)
Classes 'data.table' and 'data.frame':  94531 obs. of  18946 variables:
 $ 6316506: num  1 0 NA NA NA NA NA NA NA NA ...
 $ 6794602: num  0 1 NA NA NA NA NA 0 0 0 ...
 $ 1008667: num  NA NA 0 1 0 NA NA 0 0 0 ...
 $ 6312454: num  NA NA 1 0 0 NA NA 0 0 0 ...
 $ 8009082: num  NA NA 0 0 1 NA NA NA NA NA ...
 $ 1023293: num  NA NA NA NA NA 1 NA NA NA NA ...
 $ 6740421: num  NA NA NA NA NA 1 NA 0 0 0 ...
 $ 6777805: num  NA NA NA NA NA NA 1 NA NA NA ...
 $ 1000558: num  NA NA NA NA NA NA NA 0 0 0 ...
 $ 1001682: num  NA NA NA NA NA NA NA 0 0 0 ...

较大的外观完全相同。

the bigger looks exactly the same.

其他问题:

有什么方法可以使用 rbindlist(data, fill = T)并用零而不是NA填充?

is there some way how to use rbindlist(data, fill=T) and fill with zeros instead of NAs?

推荐答案

具有大data.table , set 函数通常是在变量内进行替换的方法。

With a large data.table, the set function is usually the way to go for replacement within variables.

在此应用程序中,您可以获取您想要的结果分两步。

In this application, you can get your desired outcome in two steps.


  1. 找到每个变量的NA位置并返回列表。

  2. 使用data.table的 set 函数替换值。

  1. Find the locations of NAs for each variable and return a list.
  2. Use data.table's set function to replace the values.

我构造了一个

set.seed(1234)
dt <- data.table(matrix(sample(c(NA, rnorm(4)), replace=TRUE, 50), 10))
This looks like
dt
            V1         V2         V3         V4         V5
 1:  1.0844412         NA -2.3456977 -2.3456977 -1.2070657
 2:  0.2774292 -1.2070657         NA -2.3456977  1.0844412
 3:  1.0844412 -1.2070657  0.2774292  0.2774292         NA
 4:  0.2774292 -1.2070657 -1.2070657  1.0844412 -1.2070657
 5: -1.2070657         NA -1.2070657 -1.2070657  1.0844412
 6: -2.3456977         NA  0.2774292  1.0844412  0.2774292
 7: -1.2070657 -1.2070657         NA -1.2070657         NA
 8: -2.3456977 -2.3456977  1.0844412  0.2774292  0.2774292
 9: -1.2070657  0.2774292 -1.2070657  1.0844412  0.2774292
10: -1.2070657 -2.3456977 -1.2070657  0.2774292  1.0844412

第一步是找到每一列的NA。

The first step is to find the NAs for each column.

myNAs <- lapply(dt, function(x) which(is.na(x)))

接下来,使用 for 循环以遍历各列,并在检查该列是否包含带有<$的缺失值之后,使用超高效的 set 函数填充NA值c $ c> if 。

Next, use a for loop to iterate over the columns and fill in the NA values with the super efficient set function after checking that the column contains missing values with if.

for(j in seq_along(dt)) if(length(myNAs[[j]]) > 0) set(dt, myNAs[[j]], j, 0)

set 执行就地替换(没有任何副本),因此在执行此操作后,data.table dt会将以前的NA替换为0。

set performs the replacement "in place" (without any copies), so following this operation, the data.table dt has the former NAs replaced with 0s.

dt
            V1         V2         V3         V4         V5
 1:  1.0844412  0.0000000 -2.3456977 -2.3456977 -1.2070657
 2:  0.2774292 -1.2070657  0.0000000 -2.3456977  1.0844412
 3:  1.0844412 -1.2070657  0.2774292  0.2774292  0.0000000
 4:  0.2774292 -1.2070657 -1.2070657  1.0844412 -1.2070657
 5: -1.2070657  0.0000000 -1.2070657 -1.2070657  1.0844412
 6: -2.3456977  0.0000000  0.2774292  1.0844412  0.2774292
 7: -1.2070657 -1.2070657  0.0000000 -1.2070657  0.0000000
 8: -2.3456977 -2.3456977  1.0844412  0.2774292  0.2774292
 9: -1.2070657  0.2774292 -1.2070657  1.0844412  0.2774292
10: -1.2070657 -2.3456977 -1.2070657  0.2774292  1.0844412

这篇关于R中的NA替换功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆