在多个条件下加入时的奇怪行为 [英] Odd behavior when joining with multiple conditions

查看:146
本文介绍了在多个条件下加入时的奇怪行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

回答此问题中有关使用 data.table

In answering this question about rolling joins with the data.table package, I've run into some odd behavior when using multiple conditions.

考虑以下数据集:

dt <- data.table(t_id = c(1,4,2,3,5), place = c("a","a","d","a","d"), num = c(5.1, 5.1, 6.2, 5.1, 6.2), key=c("place"))
dt_lu <- data.table(f_id = c(rep(1,4),rep(2,3)), place = c("a","b","c","d","a","d","a"), num = c(6,7,8,9,6,7,8), key=c("place"))


$ b b

当我想用 dt_lu 加入 dt ,其中只有 dt_lu ,并且 dt_lu $ num 高于 dt $ num 如下:

When I want to join dt with dt_lu where only those cases of dt_lu that have the same place and where dt_lu$num is higher than dt$num as follows:

dt_lu[dt, list(tid = i.t_id,
               tnum = i.num,
               fnum = num[i.num < num],
               fid = f_id),
      by = .EACHI]

我得到所需的结果:

    place tid tnum fnum fid
 1:     a   1  5.1    6   1
 2:     a   1  5.1    6   2
 3:     a   1  5.1    8   2
 4:     a   4  5.1    6   1
 5:     a   4  5.1    6   2
 6:     a   4  5.1    8   2
 7:     a   3  5.1    6   1
 8:     a   3  5.1    6   2
 9:     a   3  5.1    8   2
10:     d   2  6.2    9   1
11:     d   2  6.2    7   2
12:     d   5  6.2    9   1
13:     d   5  6.2    7   2

当我想添加一个附加条件时,我可以通过链接附加条件如下:

When I want to add an additional condition, I can get the desired result easily by chaining that additional conditions as follows:

dt_lu[dt, list(tid = i.t_id,
               tnum = i.num,
               fnum = num[i.num < num],
               fid = f_id),
      by = .EACHI][fnum - tnum < 2]

这会给我:

   place tid tnum fnum fid
1:     a   1  5.1    6   1
2:     a   1  5.1    6   2
3:     a   4  5.1    6   1
4:     a   4  5.1    6   2
5:     a   3  5.1    6   1
6:     a   3  5.1    6   2
7:     d   2  6.2    7   2
8:     d   5  6.2    7   2

但是当我添加额外的条件(即:差异必须小于 2 )如下:

However when I add the extra condition (i.e.: the difference has to be less than 2) as follows:

dt_lu[dt, list(tid = i.t_id,
               tnum = i.num,
               fnum = num[i.num < num & num - i.num < 2],
               fid = f_id),
      by = .EACHI]

我没有得到预期结果:

    place tid tnum fnum fid
 1:     a   1  5.1    6   1
 2:     a   1  5.1    6   2
 3:     a   1  5.1    6   2
 4:     a   4  5.1    6   1
 5:     a   4  5.1    6   2
 6:     a   4  5.1    6   2
 7:     a   3  5.1    6   1
 8:     a   3  5.1    6   2
 9:     a   3  5.1    6   2
10:     d   2  6.2    7   1
11:     d   2  6.2    7   2
12:     d   5  6.2    7   1
13:     d   5  6.2    7   2

警告消息: $。

Moreover, I get the following warning message:


警告消息:在 [。data.table (dt_lu, dt,list(tid = i.t_id,tnum
= i.num,fnum = num [i.num < :组1的结果的列3是长度2,但该结果中最长的列是3.再循环离开
剩余的1个项目。

Warning message: In [.data.table(dt_lu, dt, list(tid = i.t_id, tnum = i.num, fnum = num[i.num < : Column 3 of result for group 1 is length 2 but the longest column in this result is 3. Recycled leaving remainder of 1 items. This warning is once only for the first group with this issue.

预期的结果是:

    place tid tnum fnum fid
 1:     a   1  5.1    6   1
 2:     a   1  5.1    6   2
 4:     a   4  5.1    6   1
 5:     a   4  5.1    6   2
 7:     a   3  5.1    6   1
 8:     a   3  5.1    6   2
11:     d   2  6.2    7   2
13:     d   5  6.2    7   2

我故意保留示例显示在最终结果(与工作解决方案相同)中必须维护哪些行。

I deliberately kept the rownumbers from the first example to show which rows have to be maintained in the final result (which is the same as the working solution).

作为这个答案显示,应该可以在连接操作中使用多个条件。

As this answer shows, it should be possible to use multiple conditions inside the join operation.

我尝试了以下替代方法,但是它们都不起作用:

I tried the following alternatives, but both of them don't work:

dt_lu[dt, list(tid = i.t_id,
               tnum = i.num,
               fnum = num[(i.num < num) & (num - i.num < 2)],
               fid = f_id),
      by = .EACHI]

dt_lu[dt, {
  val = num[(i.num < num) & (num - i.num < 2)];
  list(tid = i.t_id,
       tnum = i.num,
       fnum = val,
       fid = f_id)},
  by = .EACHI]

有人可以解释一下为什么我在连接操作中没有得到多个条件的期望结果? / p>

Could someone explain me why I don't get the desired result with multiple conditions inside the join operation?

推荐答案

警告消息给出了问题。此外,使用 print()在这里非常有用。

The warning message gives away the issue. Also, using print() is quite helpful here.

dt_lu[dt, print(i.num < num & num - i.num < 2), by=.EACHI]
# [1]  TRUE  TRUE FALSE
# [1]  TRUE  TRUE FALSE
# [1]  TRUE  TRUE FALSE
# [1] FALSE  TRUE
# [1] FALSE  TRUE
# Empty data.table (0 rows) of 3 cols: place,place,num

考虑第一种情况,条件计算结果为 TRUE, FALSE 。这组有3个观察。您的 j-expression 包含:

Consider the first case where the condition evaluates to TRUE, TRUE, FALSE. There are 3 observations for this group. And your j-expression contains:

.(tid = i.t_id,
  tnum = i.num,
  fnum = num[i.num < num & num - i.num < 2],
  fid = f_id)

i.t_id i。 num 长度为1(因为它们来自 dt )。但 num [.. condn ..] 将返回length = 2,而 f_id 将返回length =长度= 1和长度= 2的项目将被回收到最长项目/矢量的长度= 3。这导致不正确的结果。由于3不能完全被2整除,它返回警告。

i.t_id and i.num are of length 1 (as they come from dt). But num[..condn..] will return length = 2, whereas f_id will return length = 3. Both the length=1 and length=2 items will be recycled to the length of the longest item/vector = 3. That leads to incorrect result. Since 3 isn't perfectly divisible by 2, it returns the warning.

您打算做什么:

.(tid = i.t_id,
  tnum = i.num,
  fnum = num[i.num < num & num - i.num < 2],
  fid = f_id[i.num < num & num - i.num < 2])

p>

or equivalently:

{  
  idx = i.num < num & num - i.num < 2
  .(tid  = i.t_id, tnum = i.num, fnum = num[idx], fid  = f_id[idx])
}

组合在一起:

dt_lu[dt, 
       {
         idx = i.num < num & num - i.num < 2
        .(tid  = i.t_id, tnum = i.num, fnum = num[idx], fid  = f_id[idx])
       }, 
by = .EACHI]
#    place tid tnum fnum fid
# 1:     a   1  5.1    6   1
# 2:     a   1  5.1    6   2
# 3:     a   4  5.1    6   1
# 4:     a   4  5.1    6   2
# 5:     a   3  5.1    6   1
# 6:     a   3  5.1    6   2
# 7:     d   2  6.2    7   2
# 8:     d   5  6.2    7   2

这篇关于在多个条件下加入时的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆