滚动联接由data.table中的第二个变量分组 [英] Rolling join grouped by a second variable in data.table

查看:63
本文介绍了滚动联接由data.table中的第二个变量分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,我想使用 data.table 包在R中执行滚动联接。加入日期列时有多个匹配项,因此我想在 data.table by 参数>在字段列上,以防止将来自不同字段的数据连接在一起。

Hello I would like to perform a rolling join in R using the data.table package. There are multiple matches when joining on the "Date" column so I would like to use the by argument in data.table on the "Field" column to keep data from different fields from being joined together.

示例数据

d1<-structure(list(Field = c("6", "W62", "6", "6", "12S", "19-1", 
"6", "6", "12S", "7", "6", "12S", "W62", "6", "12S", "W62", "12S", 
"6", "6", "7", "12S", "12S", "W62", "7", "12S", "6", "12S", "7", 
"12S", "7", "6", "7", "12S", "7", "6", "6", "6", "6", "12S", 
"7", "7", "6", "6", "12S", "7", "12S", "12S", "12S", "19-1", 
"6"), Date = structure(c(16994, 17240, 17240, 17401, 17048, 17417, 
17387, 17394, 17382, 17414, 17029, 17403, 17045, 17359, 17179, 
17281, 17152, 16972, 16987, 17042, 17282, 17415, 17281, 17266, 
17179, 17190, 17057, 17380, 17280, 17178, 17178, 17343, 17373, 
17043, 17190, 17343, 17253, 16981, 17079, 17043, 17270, 17366, 
16981, 17357, 17366, 17415, 17079, 17190, 17385, 17008), class = "Date"), 
    NlbsAcre = c(NA, 18874.6557383659, 2477.08251404958, NA, 
    NA, 19658.0054165823, NA, NA, 12621.0827111083, NA, NA, 16764.41968227, 
    16764.9745173044, NA, 7671.24950330348, 21341.6444661863, 
    5197.26333885612, NA, NA, NA, 39560.8958554292, 18162.4040880297, 
    22578.1487456647, 15842.9161753361, 3613.95523726973, 2601.07083566694, 
    17766.9873538952, NA, 44728.1837479613, 2279.60909695434, 
    2014.7720270382, NA, 14847.7006686211, NA, 3082.31758038481, 
    NA, 2427.53558465175, NA, 23641.2999848709, NA, NA, NA, NA, 
    5928.31591997149, NA, 22162.2028819815, 18972.2228621189, 
    6534.4257935542, 12630.9231775315, NA)), .Names = c("Field", 
"Date", "NlbsAcre"), class = c("data.table", "data.frame"), row.names = c(NA, 
-50L), .internal.selfref = <pointer: 0x0000000006540788>)

d2<-structure(list(Field = c("6", "W62", "7", "12S", "19-1", "12S", 
"6", "6", "19-1", "19-1", "6", "7", "W62", "19-1", "12S", "7", 
"19-1", "7", "12S", "12S", "12S", "7", "6", "7", "6", "7", "W62", 
"19-1", "6", "6", "12S", "12S", "6", "6", "12S", "6", "12S", 
"19-1", "6", "W62", "W62", "6", "7", "7", "6", "19-1", "W62", 
"6", "12S", "7"), Date = structure(c(16993, 17140, 17208, 17443, 
17063, 16948, 17415, 16926, 17316, 16922, 16981, 17043, 17219, 
17252, 17392, 17244, 17179, 17017, 17042, 17031, 17013, 17104, 
17273, 16954, 17364, 16993, 17168, 17028, 17208, 16966, 17241, 
16945, 17038, 17169, 17379, 17183, 17238, 17054, 17244, 16952, 
17044, 17359, 17219, 17303, 17007, 17151, 16926, 17178, 17382, 
17364), class = "Date"), TotN = c(79.244802845739, 94.193700050628, 
21.075505564932, 692.152760834712, 224.689064446728, 172.576578578436, 
47.406177406404, 102.53239575903, 818.80997295717, 476.174916307807, 
125.828033450364, 58.270026966444, 75.465909993456, 435.049246131543, 
337.913876678769, 31.714327953234, 305.353940577156, 72.621457768224, 
393.815453005314, 428.540114240892, 318.97091713563, 73.888113736431, 
79.0380747113805, 147.493527174027, 65.5311189906495, 59.269732271703, 
119.390398108236, 110.706003557451, 21.96790939404, 149.060445984684, 
128.143343232486, 208.621943093862, 75.770138571561, 47.496596179338, 
132.723654607278, 43.92222198012, 145.150910469252, 215.88105225024, 
21.393670871196, 72.969536052, 86.335878117078, 103.524169592979, 
19.920230115264, 44.968722966108, 62.244487239885, 338.593490463303, 
96.7285416279375, 45.537296152302, 422.630318314444, 58.5336350807685
)), .Names = c("Field", "Date", "TotN"), class = c("data.table", 
"data.frame"), row.names = c(NA, -50L), .internal.selfref = <pointer: 0x0000000006540788>)

我尝试过的事情

这是我尝试在日期列上进行滚动联接的方法,该列按字段列的 分组。显然,我可以按字段拆分数据并分别处理,但我想避免使用该选项。

This is my attempt to perform a rolling join on the "Date" column, grouped by the "Field" column. Obviously I can split the data by "Field" and process separately but I would like to avoid that option.

d1[d2, roll = "nearest", on = .(Date), by = .(Field)]

Error in `[.data.table`(d1, d2, roll = "nearest", on = .(Date), by = .(Field)) : 
  'by' or 'keyby' is supplied but not j


推荐答案

您快到了。

您可以同时加入多个列。因此,除了日期,您还可以在 on 子句中包括字段。但是请注意?data.table roll 参数的描述:

You can join on multiple columns simultaneously. So, in addition to "Date", you can include "Field" in the on clause. But please note the description of the roll argument in ?data.table:


滚动联接应用于 last 联接列

因此,要使日期用于滚动连接,请将其指定为 on 上的 last 变量:

Thus, for "Date" to be used for the rolling join, specify it as the last variable in on:

library(data.table)
d1[d2, roll = "nearest", on = .(Field, Date)]

为了更好的验证,可以对结果进行排序

For better verification, the result can be ordered

d1[d2, roll = "nearest", on = .(Field, Date)][order(Field, Date)]




    Field       Date  NlbsAcre      TotN
 1:   12S 2016-05-24        NA 208.62194
 2:   12S 2016-05-27        NA 172.57658
 3:   12S 2016-07-31        NA 318.97092
 4:   12S 2016-08-18        NA 428.54011
 5:   12S 2016-08-29        NA 393.81545
 6:   12S 2017-03-13 44728.184 145.15091
 7:   12S 2017-03-16 44728.184 128.14334
 8:   12S 2017-08-01 12621.083 132.72365
 9:   12S 2017-08-04 12621.083 422.63032
10:   12S 2017-08-14 12621.083 337.91388
11:   12S 2017-10-04 22162.203 692.15276
12:  19-1 2016-05-01 12630.923 476.17492
13:  19-1 2016-08-15 12630.923 110.70600
14:  19-1 2016-09-10 12630.923 215.88105
15:  19-1 2016-09-19 12630.923 224.68906
16:  19-1 2016-12-16 12630.923 338.59349
17:  19-1 2017-01-13 12630.923 305.35394
18:  19-1 2017-03-27 12630.923 435.04925
19:  19-1 2017-05-30 12630.923 818.80997
20:     6 2016-05-05        NA 102.53240
21:     6 2016-06-14        NA 149.06045
22:     6 2016-06-29        NA 125.82803
23:     6 2016-06-29        NA 125.82803
24:     6 2016-07-11        NA  79.24480
25:     6 2016-07-25        NA  62.24449
26:     6 2016-08-25        NA  75.77014
27:     6 2017-01-03  2014.772  47.49660
28:     6 2017-01-12  2014.772  45.53730
29:     6 2017-01-17  2014.772  43.92222
30:     6 2017-02-11  3082.318  21.96791
31:     6 2017-03-19  2477.083  21.39367
32:     6 2017-04-17  2427.536  79.03807
33:     6 2017-07-12        NA 103.52417
34:     6 2017-07-17        NA  65.53112
35:     6 2017-09-06        NA  47.40618
36:     7 2016-06-02        NA 147.49353
37:     7 2016-07-11        NA  59.26973
38:     7 2016-08-04        NA  72.62146
39:     7 2016-08-30        NA  58.27003
40:     7 2016-08-30        NA  58.27003
41:     7 2016-10-30        NA  73.88811
42:     7 2017-02-11  2279.609  21.07551
43:     7 2017-02-22  2279.609  19.92023
44:     7 2017-03-19 15842.916  31.71433
45:     7 2017-05-17        NA  44.96872
46:     7 2017-07-17        NA  58.53364
47:   W62 2016-05-05 16764.975  96.72854
48:   W62 2016-05-31 16764.975  72.96954
49:   W62 2016-08-31 16764.975  86.33588
50:   W62 2016-12-05 16764.975  94.19370
51:   W62 2017-01-02 18874.656 119.39040
52:   W62 2017-02-22 18874.656  75.46591
    Field       Date  NlbsAcre      TotN


这篇关于滚动联接由data.table中的第二个变量分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆