如何将特定于列的参数传递给data.table .SD中的lapply? [英] How do I pass column-specific arguments to lapply in data.table .SD?
问题描述
我看过在data.table
中将.SD
与lapply
一起使用的示例,其功能如下:
I have seen examples of using .SD
with lapply
in data.table
with a simple function as below:
DT[ , .(b,d,e) := lapply(.SD, tan), .SDcols = .(b,d,e)]
但是我不确定如何在多参数函数中使用特定于列的参数.例如,我有一个winsorize
函数,我想将其应用于数据表中列的子集,但要使用列特定的百分位数,例如
But I'm unsure of how to use column-specific arguments in a multiple argument function. For instance I have a winsorize
function, I want to apply it to a subset of columns in a data table but using column-specific percentiles, e.g.
library(DescTools)
wlevel <- list(b=list(lower=0.01,upper=0.99), c=list(upper=0.02,upper=0.95))
DT[ , .(b,c) :=lapply(.SD, function(x)
{winsorize(x,wlevel$zzz$lower,wlevel$zzz$upper)}), .SDcols = .(b,c)]
其中zzz
将是要迭代的相应列.我还看到了有关在lapply
上使用更改参数的线程,但在.SDcols
Where zzz
will be the respective column to iterate. I have also seen threads on using changing arguments with lapply
but not in the context of data table with .SDcols
这可能吗?
这是一个玩具示例,旨在概括任意数量的列的情况;循环始终是一种选择,但尝试查看是否有更优雅/更有效的解决方案...
This is a toy example, looking to generalize for the case of arbitrary large number of columns; Looping is always an option but trying to see if there's a more elegant/efficient solution...
推荐答案
如何在多参数函数中使用特定于列的参数?
How to use column-specific arguments in a multiple argument function?
使用 mapply(FUN, dat, params1, params2, ...)
其中每个params1, params2, ...
可以是列表或向量; mapply
并行遍历每个dat, params1, params2, ...
.
Use mapply(FUN, dat, params1, params2, ...)
where each of params1, params2, ...
can be a list or vector; mapply
iterates over each of dat, params1, params2, ...
in parallel.
请注意,与apply/lapply/sapply
系列的其余部分不同,使用mapply
时,函数参数首先出现,然后是数据和参数.
Note that unlike the rest of the apply/lapply/sapply
family, with mapply
the function argument comes first, then the data and parameter(s).
在您的情况下(伪代码,您需要对其进行调整才能使其运行),例如:
In your case (pseudo-code, you'll need to tweak it to get it to run) something like:
与其嵌套列表wlevel <- list(b=list(lower=0.01,upper=0.99), c=list(upper=0.02,upper=0.95))
相比,解压缩到以下位置可能更容易:
Instead of your nested list wlevel <- list(b=list(lower=0.01,upper=0.99), c=list(upper=0.02,upper=0.95))
, probably easier to unpack to:
w_lower <- list(b=0.01, c=0.02)
w_upper <- list(b=0.99, c=0.95)
DT[ , c('b','c') := mapply(function(x, w_lower_col, w_upper_col) { winsorize(x, w_lower_col, w_upper_col) },
.SD, w_lower, w_upper), .SDcols = c('b', 'c')]
在为列表建立索引时,我们不需要使用列名(您的zzz
),mapply()
应该仅按原样遍历列表.
We shouldn't need to use column-names (your zzz
) in indexing into the list, mapply()
should just iterate over the list as-is.
这篇关于如何将特定于列的参数传递给data.table .SD中的lapply?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!