如何将数据表滚动连接条件从弱不等式更改为严格不等式? [英] How do you change the data table rolling join condition from weak inequality to strict inequality?

查看:72
本文介绍了如何将数据表滚动连接条件从弱不等式更改为严格不等式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下两个数据集,其中的' time '列代表一般时间戳,为简化示例说明而使用了整数:

Consider the two following two datasets, with a 'time' column which represents a general timestamp, integers are used for simplicity of illustrating the example:

library(data.table)

df_test_1 <- 
    data.table(time = c(1:10, seq(20, 30, by = 5))) %>%
df_test_1$values <- -df_test_1$time
df_test_1 <- setkey(df_test_1, time)

df_test_2 <-
    data.table(time = c(15, 20, 26, 28, 31))
df_test_2 <- setkey(df_test_2, time)

因此:

> df_test_1
    time values
 ...
 5:    5     -5
 6:    6     -6
 7:    7     -7
 8:    8     -8
 9:    9     -9
10:   10    -10
11:   20    -20
12:   25    -25
13:   30    -30

和:

> df_test_2

   time
1:   15
2:   20
3:   26
4:   28
5:   31

滚动连接 df_test_1 [df_test_2,roll = -Inf] 产生:

> df_test_1[df_test_2, roll = -Inf]
   time values
1:   15    -20
2:   20    -20
3:   26    -30
4:   28    -30
5:   31     NA

即,每次值在 df_test_1 中,找到所有时间值在 df_test_2 小于或等于它,并将相应的关联到 df_test_2 。例如, df_test_1 $ time == 20 匹配时间值 15 20 df_test_2 $ time 中,因此相应的 -20 值与这些 df_test_2

That is, for each time value in df_test_1, find all time values in df_test_2 smaller than or equal to it, and associate the corresponding value to this row of df_test_2. For example, df_test_1$time == 20 matches the time values 15 and 20 in df_test_2$time, thus the corresponding value of -20 is associated to these rows of df_test_2.

我想将联接条件(上面的粗体)更改为小于其,即产生的答案应该是:

I would like to change the join condition (in bold above) to smaller than it, that is, the answer produced should be:

   time values
1:   15    -20
2:   20    -25
3:   26    -30
4:   28    -30
5:   31     NA

这里的区别是 df_test_1 $ time == 25 的值应与<$ c $的行匹配c> df_test_2 其中 df_test_2 $ time == 20

The difference here is that the value at df_test_1$time == 25 should be matched to the row of df_test_2 where df_test_2$time == 20.

另一种方法产生期望的结果将是从时间中减去一小部分:

An alternate way of producing the desired result would be to take away a minuscule portion from time:

df_test_3 <-
    df_test_1 %>%
    mutate(time = time - 0.1) %>%
    setkey(time)

因此:

> df_test_3[df_test_2, roll = -Inf]
   time values
1:   15    -20
2:   20    -25
3:   26    -30
4:   28    -30
5:   31     NA


推荐答案

当前开发版本的data.table中新的 non-equi 加入了功能,这很简单:

Using the new non-equi joins feature in the current development version of data.table, this is straightforward:

# v1.9.7+
df_test_1[df_test_2, on=.(time > time), mult="first"]

键联接只能执行 equi 联接。 on 参数对于条件连接至关重要。

Keyed joins are capable of only equi joins. on argument is essential for conditional joins.

请注意,如果在,则不需要对data.tables进行键输入使用参数。即使您希望对data.tables进行键输入,在上指定也会更好,因为这样有助于在以后立即理解代码。

Note that there's no need for data.tables to be keyed if the on argument is used. Even if you wish to key the data.tables, specifying on is better as it helps understand the code immediately at a later point.

请参见开发版本此处的安装说明。

See the installation instructions for devel version here.

这篇关于如何将数据表滚动连接条件从弱不等式更改为严格不等式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆