如何将特定于列的参数传递给data.table .SD中的lapply? [英] How do I pass column-specific arguments to lapply in data.table .SD?

查看:87
本文介绍了如何将特定于列的参数传递给data.table .SD中的lapply?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看过在data.table中将.SDlapply一起使用的示例,其功能如下:

I have seen examples of using .SDwith lapply in data.table with a simple function as below:

DT[ , .(b,d,e) := lapply(.SD, tan), .SDcols = .(b,d,e)]

但是我不确定如何在多参数函数中使用特定于列的参数.例如,我有一个winsorize函数,我想将其应用于数据表中列的子集,但要使用列特定的百分位数,例如

But I'm unsure of how to use column-specific arguments in a multiple argument function. For instance I have a winsorize function, I want to apply it to a subset of columns in a data table but using column-specific percentiles, e.g.

library(DescTools)
wlevel <- list(b=list(lower=0.01,upper=0.99), c=list(upper=0.02,upper=0.95))
DT[ , .(b,c) :=lapply(.SD, function(x) 
{winsorize(x,wlevel$zzz$lower,wlevel$zzz$upper)}), .SDcols = .(b,c)]

其中zzz将是要迭代的相应列.我还看到了有关在lapply上使用更改参数的线程,但在.SDcols

Where zzz will be the respective column to iterate. I have also seen threads on using changing arguments with lapply but not in the context of data table with .SDcols

这可能吗?

这是一个玩具示例,旨在概括任意数量的列的情况;循环始终是一种选择,但尝试查看是否有更优雅/更有效的解决方案...

This is a toy example, looking to generalize for the case of arbitrary large number of columns; Looping is always an option but trying to see if there's a more elegant/efficient solution...

推荐答案

如何在多参数函数中使用特定于列的参数?

How to use column-specific arguments in a multiple argument function?

使用 mapply(FUN, dat, params1, params2, ...) 其中每个params1, params2, ...可以是列表或向量; mapply并行遍历每个dat, params1, params2, ....

Use mapply(FUN, dat, params1, params2, ...) where each of params1, params2, ... can be a list or vector; mapply iterates over each of dat, params1, params2, ... in parallel.

请注意,与apply/lapply/sapply系列的其余部分不同,使用mapply时,函数参数首先出现,然后是数据和参数.

Note that unlike the rest of the apply/lapply/sapply family, with mapply the function argument comes first, then the data and parameter(s).

在您的情况下(伪代码,您需要对其进行调整才能使其运行),例如:

In your case (pseudo-code, you'll need to tweak it to get it to run) something like:

与其嵌套列表wlevel <- list(b=list(lower=0.01,upper=0.99), c=list(upper=0.02,upper=0.95))相比,解压缩到以下位置可能更容易:

Instead of your nested list wlevel <- list(b=list(lower=0.01,upper=0.99), c=list(upper=0.02,upper=0.95)), probably easier to unpack to:

w_lower <- list(b=0.01, c=0.02)
w_upper <- list(b=0.99, c=0.95) 

DT[ , c('b','c') := mapply(function(x, w_lower_col, w_upper_col) { winsorize(x, w_lower_col, w_upper_col) },
  .SD, w_lower, w_upper), .SDcols = c('b', 'c')]

在为列表建立索引时,我们不需要使用列名(您的zzz),mapply()应该仅按原样遍历列表.

We shouldn't need to use column-names (your zzz) in indexing into the list, mapply() should just iterate over the list as-is.

这篇关于如何将特定于列的参数传递给data.table .SD中的lapply?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆