R中的NA替换功能 [英] NA replace function in R
问题描述
我正在尝试将矩阵 mat
中的NA替换为零。我正在使用 mat [is.na(mat)]<-0
。当我有94531个观测值矩阵(对18946个变量或更小的观测值)时,它工作良好,但是我在112039个观测值矩阵(对22752个变量)上进行了尝试,R显示了错误:
I'm trying to replace NA in matrix - mat
- by zeros. I'm using mat[is.na(mat)] <- 0
. When I have matrix of 94531 observations of 18946 variables or smaller it works good but I try it on matrix of 112039 observations of 22752 variables, R shows an error:
if(!nreplace)return(x)中的错误:缺少值,其中需要TRUE / FALSE
另外:警告消息:
总和(i,na .rm = TRUE):整数溢出-使用sum(as.numeric(。))
Error in if (!nreplace) return(x) : missing value where TRUE/FALSE needed
In addition: Warning message:
In sum(i, na.rm = TRUE) : integer overflow - use sum(as.numeric(.))
我不知道我在做什么m做错了,我不明白错误。
I don't know what I'm doing wrong and I don't understand the error.
以下是我的数据结构的示例。
Here is an example of the structure of my data.
small data.matrix :(由真实数据源制成)
small data.matrix: (made from real data source)
> str(mat)
Classes 'data.table' and 'data.frame': 94531 obs. of 18946 variables:
$ 6316506: num 1 0 NA NA NA NA NA NA NA NA ...
$ 6794602: num 0 1 NA NA NA NA NA 0 0 0 ...
$ 1008667: num NA NA 0 1 0 NA NA 0 0 0 ...
$ 6312454: num NA NA 1 0 0 NA NA 0 0 0 ...
$ 8009082: num NA NA 0 0 1 NA NA NA NA NA ...
$ 1023293: num NA NA NA NA NA 1 NA NA NA NA ...
$ 6740421: num NA NA NA NA NA 1 NA 0 0 0 ...
$ 6777805: num NA NA NA NA NA NA 1 NA NA NA ...
$ 1000558: num NA NA NA NA NA NA NA 0 0 0 ...
$ 1001682: num NA NA NA NA NA NA NA 0 0 0 ...
较大的外观完全相同。
the bigger looks exactly the same.
其他问题:
有什么方法可以使用 rbindlist(data, fill = T)
并用零而不是NA填充?
is there some way how to use rbindlist(data, fill=T)
and fill with zeros instead of NAs?
推荐答案
具有大data.table , set
函数通常是在变量内进行替换的方法。
With a large data.table, the set
function is usually the way to go for replacement within variables.
在此应用程序中,您可以获取您想要的结果分两步。
In this application, you can get your desired outcome in two steps.
- 找到每个变量的NA位置并返回列表。
- 使用data.table的
set
函数替换值。
- Find the locations of NAs for each variable and return a list.
- Use data.table's
set
function to replace the values.
我构造了一个
set.seed(1234)
dt <- data.table(matrix(sample(c(NA, rnorm(4)), replace=TRUE, 50), 10))
This looks like
dt
V1 V2 V3 V4 V5
1: 1.0844412 NA -2.3456977 -2.3456977 -1.2070657
2: 0.2774292 -1.2070657 NA -2.3456977 1.0844412
3: 1.0844412 -1.2070657 0.2774292 0.2774292 NA
4: 0.2774292 -1.2070657 -1.2070657 1.0844412 -1.2070657
5: -1.2070657 NA -1.2070657 -1.2070657 1.0844412
6: -2.3456977 NA 0.2774292 1.0844412 0.2774292
7: -1.2070657 -1.2070657 NA -1.2070657 NA
8: -2.3456977 -2.3456977 1.0844412 0.2774292 0.2774292
9: -1.2070657 0.2774292 -1.2070657 1.0844412 0.2774292
10: -1.2070657 -2.3456977 -1.2070657 0.2774292 1.0844412
第一步是找到每一列的NA。
The first step is to find the NAs for each column.
myNAs <- lapply(dt, function(x) which(is.na(x)))
接下来,使用 for
循环以遍历各列,并在检查该列是否包含带有<$的缺失值之后,使用超高效的 set
函数填充NA值c $ c> if 。
Next, use a for
loop to iterate over the columns and fill in the NA values with the super efficient set
function after checking that the column contains missing values with if
.
for(j in seq_along(dt)) if(length(myNAs[[j]]) > 0) set(dt, myNAs[[j]], j, 0)
set
执行就地替换(没有任何副本),因此在执行此操作后,data.table dt会将以前的NA替换为0。
set
performs the replacement "in place" (without any copies), so following this operation, the data.table dt has the former NAs replaced with 0s.
dt
V1 V2 V3 V4 V5
1: 1.0844412 0.0000000 -2.3456977 -2.3456977 -1.2070657
2: 0.2774292 -1.2070657 0.0000000 -2.3456977 1.0844412
3: 1.0844412 -1.2070657 0.2774292 0.2774292 0.0000000
4: 0.2774292 -1.2070657 -1.2070657 1.0844412 -1.2070657
5: -1.2070657 0.0000000 -1.2070657 -1.2070657 1.0844412
6: -2.3456977 0.0000000 0.2774292 1.0844412 0.2774292
7: -1.2070657 -1.2070657 0.0000000 -1.2070657 0.0000000
8: -2.3456977 -2.3456977 1.0844412 0.2774292 0.2774292
9: -1.2070657 0.2774292 -1.2070657 1.0844412 0.2774292
10: -1.2070657 -2.3456977 -1.2070657 0.2774292 1.0844412
这篇关于R中的NA替换功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!