如何将数据表滚动连接条件从弱不等式更改为严格不等式? [英] How do you change the data table rolling join condition from weak inequality to strict inequality?
问题描述
请考虑以下两个数据集,其中的' time
'列代表一般时间戳,为简化示例说明而使用了整数:
Consider the two following two datasets, with a 'time
' column which represents a general timestamp, integers are used for simplicity of illustrating the example:
library(data.table)
df_test_1 <-
data.table(time = c(1:10, seq(20, 30, by = 5))) %>%
df_test_1$values <- -df_test_1$time
df_test_1 <- setkey(df_test_1, time)
df_test_2 <-
data.table(time = c(15, 20, 26, 28, 31))
df_test_2 <- setkey(df_test_2, time)
因此:
> df_test_1
time values
...
5: 5 -5
6: 6 -6
7: 7 -7
8: 8 -8
9: 9 -9
10: 10 -10
11: 20 -20
12: 25 -25
13: 30 -30
和:
> df_test_2
time
1: 15
2: 20
3: 26
4: 28
5: 31
滚动连接 df_test_1 [df_test_2,roll = -Inf]
产生:
> df_test_1[df_test_2, roll = -Inf]
time values
1: 15 -20
2: 20 -20
3: 26 -30
4: 28 -30
5: 31 NA
即,每次次
值在 df_test_1
中,找到所有时间
值在 df_test_2
小于或等于它,并将相应的值
关联到 df_test_2
。例如, df_test_1 $ time == 20
匹配时间值 15
和 20
在 df_test_2 $ time
中,因此相应的 -20
值与这些 df_test_2
。
That is, for each time
value in df_test_1
, find all time
values in df_test_2
smaller than or equal to it, and associate the corresponding value
to this row of df_test_2
. For example, df_test_1$time == 20
matches the time values 15
and 20
in df_test_2$time
, thus the corresponding value of -20
is associated to these rows of df_test_2
.
我想将联接条件(上面的粗体)更改为小于其,即产生的答案应该是:
I would like to change the join condition (in bold above) to smaller than it, that is, the answer produced should be:
time values
1: 15 -20
2: 20 -25
3: 26 -30
4: 28 -30
5: 31 NA
这里的区别是 df_test_1 $ time == 25
的值应与<$ c $的行匹配c> df_test_2 其中 df_test_2 $ time == 20
。
The difference here is that the value at df_test_1$time == 25
should be matched to the row of df_test_2
where df_test_2$time == 20
.
另一种方法产生期望的结果将是从时间
中减去一小部分:
An alternate way of producing the desired result would be to take away a minuscule portion from time
:
df_test_3 <-
df_test_1 %>%
mutate(time = time - 0.1) %>%
setkey(time)
因此:
> df_test_3[df_test_2, roll = -Inf]
time values
1: 15 -20
2: 20 -25
3: 26 -30
4: 28 -30
5: 31 NA
推荐答案
当前开发版本的data.table中新的 non-equi
加入了功能,这很简单:
Using the new non-equi
joins feature in the current development version of data.table, this is straightforward:
# v1.9.7+
df_test_1[df_test_2, on=.(time > time), mult="first"]
键联接只能执行 equi 联接。 on
参数对于条件连接至关重要。
Keyed joins are capable of only equi joins. on
argument is essential for conditional joins.
请注意,如果在上
,则不需要对data.tables进行键输入使用参数。即使您希望对data.tables进行键输入,在上指定
也会更好,因为这样有助于在以后立即理解代码。
Note that there's no need for data.tables to be keyed if the on
argument is used. Even if you wish to key the data.tables, specifying on
is better as it helps understand the code immediately at a later point.
请参见开发版本此处的安装说明。
See the installation instructions for devel version here.
这篇关于如何将数据表滚动连接条件从弱不等式更改为严格不等式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!