带有na.rm = TRUE参数的data.table和pmin [英] data.table and pmin with na.rm=TRUE argument

查看:130
本文介绍了带有na.rm = TRUE参数的data.table和pmin的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用pmin函数和data.table计算行之间的最小值(类似于帖子在data.table 中逐行进行操作和更新),但使用诸如 with = FALSE 语法,并带有 na.rm = TRUE 参数。

I am trying to calculate the minimum across rows using the pmin function and data.table (similar to the post row-by-row operations and updates in data.table) but with a character list of columns using something like the with=FALSE syntax, and with the na.rm=TRUE argument.

DT <- data.table(x = c(1,1,2,3,4,1,9), 
                 y = c(2,4,1,2,5,6,6),
                 z = c(3,5,1,7,4,5,3),
                 a = c(1,3,NA,3,5,NA,2))

> DT
   x y z  a
1: 1 2 3  1
2: 1 4 5  3
3: 2 1 1 NA
4: 3 2 7  3
5: 4 5 4  5
6: 1 6 5 NA
7: 9 6 3  2

我可以直接使用列来计算行之间的最小值:

I can calculate the minimum across rows using columns directly:

DT[,min_val := pmin(x,y,z,a,na.rm=TRUE)]

给予

> DT
   x y z  a min_val
1: 1 2 3  1       1
2: 1 4 5  3       1
3: 2 1 1 NA       1
4: 3 2 7  3       2
5: 4 5 4  5       4
6: 1 6 5 NA       1
7: 9 6 3  2       2

但是,我试图在一组自动生成的大型列上执行此操作,并且我希望能够对存储在col_names变量中的任意列列表执行此操作, col_names<-c( a, y, z')

However, I am trying to do this over an automatically generated large set of columns, and I want to be able to do this across this arbitrary list of columns, stored in a col_names variable, col_names <- c("a","y","z')

我可以

DT[, col_min := do.call(pmin,DT[,col_names,with=FALSE])]

但这给了我NA值,我不知道如何通过 na.rm = TRUE 进入do.call参数,我尝试将函数定义为

But it gives me NA values. I can't figure out how to pass the na.rm=TRUE argument into the do.call. I've tried defining the function as

DT[, col_min := do.call(function(x) pmin(x,na.rm=TRUE),DT[,col_names,with=FALSE])]

但这给了我一个错误,我也尝试将参数作为附加的ele传递在列表中,但是我认为pmin(或do.call)在DT非标准的列名评估和参数之间感到困惑。

but this gives me an error. I also tried passing in the argument as an additional element in a list, but I think pmin (or do.call) gets confused between the DT non-standard evaluation of column names and the argument.

有什么想法吗?

推荐答案

如果我们需要获取整个数据集每一行的最小值,请使用 pmin ,在 .SD 上将 na.rm = TRUE 连接为列出,其中 .SD 表示 do.call(pmin

If we need to get the minimum value of each row of the whole dataset, use the pmin, on .SD concatenate the na.rm=TRUE as a list with .SD for the do.call(pmin.

DT[, col_min:= do.call(pmin, c(.SD, list(na.rm=TRUE)))]
DT
#   x y z  a col_min
#1: 1 2 3  1       1
#2: 1 4 5  3       1
#3: 2 1 1 NA       1
#4: 3 2 7  3       2
#5: 4 5 4  5       4
#6: 1 6 5 NA       1
#7: 9 6 3  2       2

如果我们只想对存储在 col_names中的一部分列名执行此操作,请使用 .SDcols

If we want only to do this only for a subset of column names stored in 'col_names', use the .SDcols.

DT[, col_min:= do.call(pmin, c(.SD, list(na.rm=TRUE))), 
                .SDcols= col_names]

这篇关于带有na.rm = TRUE参数的data.table和pmin的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆