在data.table中使用`:=`将R中两列的值相加,忽略NAs [英] Using `:=` in data.table to sum the values of two columns in R, ignoring NAs

查看:151
本文介绍了在data.table中使用`:=`将R中两列的值相加,忽略NAs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有我想的是一个非常简单的问题,涉及使用data.table和:= 函数。我不认为我非常理解:= 的行为,并且经常遇到类似的问题。

I have what I think is a very simple question related to the use of data.table and the := function. I don't think I quite understand the behaviour of := and often I run into similar problems.

是一些示例数据

 mat <- structure(list(
              col1 = c(NA, 0, -0.015038, 0.003817, -0.011407), 
              col2 = c(0.003745, 0.007463, -0.007407, -0.003731, -0.007491)), 
              .Names = c("col1", "col2"), 
              row.names = c(NA, 10L), 
              class = c("data.table", "data.frame"))

它提供

> mat
         col1      col2
 1:        NA  0.003745
 2:  0.000000  0.007463
 3: -0.015038 -0.007407
 4:  0.003817 -0.003731
 5: -0.011407 -0.007491

我想创建一个名为col3的列,它提供col1和col2的总和。如果我使用

I want to create a column called col3 which gives the sum of col1 and col2. If I use

mat[,col3 := col1 + col2]

#        col1      col2      col3
#1:        NA  0.003745        NA
#2:  0.000000  0.007463  0.007463
#3: -0.015038 -0.007407 -0.022445
#4:  0.003817 -0.003731  0.000086
#5: -0.011407 -0.007491 -0.018898

那么我得到第一行的NA,被忽略。所以我尝试了

then I get an NA for the first row, but I want NAs to be ignored. So I tried instead

mat[,col3 := sum(col1,col2,na.rm=TRUE)]

#        col1      col2      col3
#1:        NA  0.003745 -0.030049
#2:  0.000000  0.007463 -0.030049
#3: -0.015038 -0.007407 -0.030049
#4:  0.003817 -0.003731 -0.030049
#5: -0.011407 -0.007491 -0.030049

不是我后面的,因为它给我col1和col2的所有元素的总和。我想我不能得到:= ...如何获得col1和col2的元素的总和忽略NA值?

which is not what I am after, since it is giving me the sum of all elements of col1 and col2. I think I don't quite get :=... How can I get the sum of the element of col1 and col2 ignoring NA values?

不确定这是否相关,但这里是我的sessionInfo

Not sure this is relevant, but here is my sessionInfo

> sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.8.3


推荐答案

这不是缺乏对data.table的理解,而是一个关于R中的向量化函数。你可以定义一个dyadic对于缺失的值,其操作方式与+操作符不同:

It's not a lack of understanding of data.table but rather one regarding vectorized functions in R. You can define a dyadic operator that will behave differently than the "+" operator with regard to missing values:

 `%+na%` <- function(x,y) {ifelse( is.na(x), y, ifelse( is.na(y), x, x+y) )}

 mat[ , col3:= col1 %+na% col2]
#-------------------------------
        col1      col2      col3
1:        NA  0.003745  0.003745
2:  0.000000  0.007463  0.007463
3: -0.015038 -0.007407 -0.022445
4:  0.003817 -0.003731  0.000086
5: -0.011407 -0.007491 -0.018898

您可以使用mrdwad的注释来执行 sum ,na.rm = TRUE ):

You can use mrdwad's comment to do it with sum(... , na.rm=TRUE):

mat[ , col4 := sum(col1, col2, na.rm=TRUE), by=1:NROW(mat)]

这篇关于在data.table中使用`:=`将R中两列的值相加,忽略NAs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆