使用lapply和get时的data.table列顺序 [英] data.table column order when using lapply and get

查看:50
本文介绍了使用lapply和get时的data.table列顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以帮助我理解为什么下面的两个lapply操作版本(使用和不使用get())都不会产生相同的结果吗?使用get()时,结果列会混合在一起.

can someone help me understand why the two versions of the lapply operations below with and without using get() don't produce the same result? When using get() the result columns get mixed up.

dt <- data.table(v1 = c(1,2), v2 = c(3,4), type = c('A', 'B'))

   v1 v2 type
1:  1  3    A
2:  2  4    B

col_in <- c('v2', 'v1')
col_out <- paste0(col_in, '.new')

以硬编码方式访问类型"

accessing 'type' the hard-coded way

dt[, (col_out) := lapply(.SD, function(x){x * min(x[type == 'A'])}), .SDcols = col_in]

产生预期的结果:

   v1 v2 type v2.new v1.new
1:  1  3    A      9      1
2:  2  4    B     12      2

但是,当通过get()访问类型"时

however, when accessing 'type' via get()

dt[, (col_out) := lapply(.SD, function(x){x * min(x[get('type') == 'A'])}), .SDcols = col_in]

v1.new 的期望值在 v2.new 中,反之亦然:

the expected values for v1.new are in v2.new and vice versa:

   v1 v2 type v2.new v1.new
1:  1  3    A      1      9
2:  2  4    B      2     12

注意:这是一个最小的玩具示例,是我从尝试实现的更复杂的操作中提炼出来的."type"变量的名称作为输入参数给出.

Note: This a minimal toy example that I distilled down from a more complex operation that I'm trying to implement. The name of the 'type' variable is given as an input parameter.

推荐答案

有趣!感谢分享!似乎使用get需要一些内部排序(错误?).

Interesting! Thanks for sharing! It seems that the use of get requires some internal sorting (bug?).

两种避免这种情况的方法:

Two ways to avoid this:

  1. 将类型=='A'部分移至dt [,lapply(...)]

  1. Move the type == 'A' part outside the dt[,lapply(...)]

referenceRows <- which(dt[,type == 'A'])
referenceRows <- which(dt[,get('type') == 'A'])
dt[, lapply(.SD, function(x){x * min(x[referenceRows])}), .SDcols = col_in]

   v1 v2 type v2.new v1.new
1:  1  3    A      9      1
2:  2  4    B     12      2

  • 首先创建新列,然后使用setnames确保为新列分配了正确的列名.最后将两个部分与cbind绑定在一起:

  • First create the new columns and then use setnames to make sure that the new columns are assigned the proper columns names. Finally bind the two parts together with cbind:

    dtNew <- dt[, lapply(.SD, function(x){x * min(x[type == 'A'])}), .SDcols = col_in]
    setnames(dtNew, col_in, col_out)
    cbind(dt, dtNew)
    
    
       v1 v2 type v2.new v1.new
    1:  1  3    A      9      1
    2:  2  4    B     12      2
    

  • 相同的结果(尽管排序不同):

    Same result (although differently sorted):

        dtNew <- dt[, lapply(.SD, function(x){x * min(x[get('type') == 'A'])}), .SDcols = col_in]
        setnames(dtNew, col_in, col_out)
        cbind(dt, dtNew)
    
    
           v1 v2 type v1.new v2.new
        1:  1  3    A      1      9
        2:  2  4    B      2     12
    

    这篇关于使用lapply和get时的data.table列顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆