滚动联接由data.table中的第二个变量分组 [英] Rolling join grouped by a second variable in data.table
问题描述
你好,我想使用 data.table
包在R中执行滚动联接。加入日期列时有多个匹配项,因此我想在 data.table
by 参数>在字段列上,以防止将来自不同字段的数据连接在一起。
Hello I would like to perform a rolling join in R using the data.table
package. There are multiple matches when joining on the "Date" column so I would like to use the by
argument in data.table
on the "Field" column to keep data from different fields from being joined together.
示例数据
d1<-structure(list(Field = c("6", "W62", "6", "6", "12S", "19-1",
"6", "6", "12S", "7", "6", "12S", "W62", "6", "12S", "W62", "12S",
"6", "6", "7", "12S", "12S", "W62", "7", "12S", "6", "12S", "7",
"12S", "7", "6", "7", "12S", "7", "6", "6", "6", "6", "12S",
"7", "7", "6", "6", "12S", "7", "12S", "12S", "12S", "19-1",
"6"), Date = structure(c(16994, 17240, 17240, 17401, 17048, 17417,
17387, 17394, 17382, 17414, 17029, 17403, 17045, 17359, 17179,
17281, 17152, 16972, 16987, 17042, 17282, 17415, 17281, 17266,
17179, 17190, 17057, 17380, 17280, 17178, 17178, 17343, 17373,
17043, 17190, 17343, 17253, 16981, 17079, 17043, 17270, 17366,
16981, 17357, 17366, 17415, 17079, 17190, 17385, 17008), class = "Date"),
NlbsAcre = c(NA, 18874.6557383659, 2477.08251404958, NA,
NA, 19658.0054165823, NA, NA, 12621.0827111083, NA, NA, 16764.41968227,
16764.9745173044, NA, 7671.24950330348, 21341.6444661863,
5197.26333885612, NA, NA, NA, 39560.8958554292, 18162.4040880297,
22578.1487456647, 15842.9161753361, 3613.95523726973, 2601.07083566694,
17766.9873538952, NA, 44728.1837479613, 2279.60909695434,
2014.7720270382, NA, 14847.7006686211, NA, 3082.31758038481,
NA, 2427.53558465175, NA, 23641.2999848709, NA, NA, NA, NA,
5928.31591997149, NA, 22162.2028819815, 18972.2228621189,
6534.4257935542, 12630.9231775315, NA)), .Names = c("Field",
"Date", "NlbsAcre"), class = c("data.table", "data.frame"), row.names = c(NA,
-50L), .internal.selfref = <pointer: 0x0000000006540788>)
d2<-structure(list(Field = c("6", "W62", "7", "12S", "19-1", "12S",
"6", "6", "19-1", "19-1", "6", "7", "W62", "19-1", "12S", "7",
"19-1", "7", "12S", "12S", "12S", "7", "6", "7", "6", "7", "W62",
"19-1", "6", "6", "12S", "12S", "6", "6", "12S", "6", "12S",
"19-1", "6", "W62", "W62", "6", "7", "7", "6", "19-1", "W62",
"6", "12S", "7"), Date = structure(c(16993, 17140, 17208, 17443,
17063, 16948, 17415, 16926, 17316, 16922, 16981, 17043, 17219,
17252, 17392, 17244, 17179, 17017, 17042, 17031, 17013, 17104,
17273, 16954, 17364, 16993, 17168, 17028, 17208, 16966, 17241,
16945, 17038, 17169, 17379, 17183, 17238, 17054, 17244, 16952,
17044, 17359, 17219, 17303, 17007, 17151, 16926, 17178, 17382,
17364), class = "Date"), TotN = c(79.244802845739, 94.193700050628,
21.075505564932, 692.152760834712, 224.689064446728, 172.576578578436,
47.406177406404, 102.53239575903, 818.80997295717, 476.174916307807,
125.828033450364, 58.270026966444, 75.465909993456, 435.049246131543,
337.913876678769, 31.714327953234, 305.353940577156, 72.621457768224,
393.815453005314, 428.540114240892, 318.97091713563, 73.888113736431,
79.0380747113805, 147.493527174027, 65.5311189906495, 59.269732271703,
119.390398108236, 110.706003557451, 21.96790939404, 149.060445984684,
128.143343232486, 208.621943093862, 75.770138571561, 47.496596179338,
132.723654607278, 43.92222198012, 145.150910469252, 215.88105225024,
21.393670871196, 72.969536052, 86.335878117078, 103.524169592979,
19.920230115264, 44.968722966108, 62.244487239885, 338.593490463303,
96.7285416279375, 45.537296152302, 422.630318314444, 58.5336350807685
)), .Names = c("Field", "Date", "TotN"), class = c("data.table",
"data.frame"), row.names = c(NA, -50L), .internal.selfref = <pointer: 0x0000000006540788>)
我尝试过的事情
这是我尝试在日期列上进行滚动联接的方法,该列按字段列的 分组。显然,我可以按字段拆分数据并分别处理,但我想避免使用该选项。
This is my attempt to perform a rolling join on the "Date" column, grouped by
the "Field" column. Obviously I can split the data by "Field" and process separately but I would like to avoid that option.
d1[d2, roll = "nearest", on = .(Date), by = .(Field)]
Error in `[.data.table`(d1, d2, roll = "nearest", on = .(Date), by = .(Field)) :
'by' or 'keyby' is supplied but not j
推荐答案
您快到了。
您可以同时加入多个列。因此,除了日期,您还可以在 on
子句中包括字段。但是请注意?data.table
中 roll
参数的描述:
You can join on multiple columns simultaneously. So, in addition to "Date", you can include "Field" in the on
clause. But please note the description of the roll
argument in ?data.table
:
滚动联接应用于 last 联接列
因此,要使日期用于滚动连接,请将其指定为 on
上的 last 变量:
Thus, for "Date" to be used for the rolling join, specify it as the last variable in on
:
library(data.table)
d1[d2, roll = "nearest", on = .(Field, Date)]
为了更好的验证,可以对结果进行排序
For better verification, the result can be ordered
d1[d2, roll = "nearest", on = .(Field, Date)][order(Field, Date)]
Field Date NlbsAcre TotN
1: 12S 2016-05-24 NA 208.62194
2: 12S 2016-05-27 NA 172.57658
3: 12S 2016-07-31 NA 318.97092
4: 12S 2016-08-18 NA 428.54011
5: 12S 2016-08-29 NA 393.81545
6: 12S 2017-03-13 44728.184 145.15091
7: 12S 2017-03-16 44728.184 128.14334
8: 12S 2017-08-01 12621.083 132.72365
9: 12S 2017-08-04 12621.083 422.63032
10: 12S 2017-08-14 12621.083 337.91388
11: 12S 2017-10-04 22162.203 692.15276
12: 19-1 2016-05-01 12630.923 476.17492
13: 19-1 2016-08-15 12630.923 110.70600
14: 19-1 2016-09-10 12630.923 215.88105
15: 19-1 2016-09-19 12630.923 224.68906
16: 19-1 2016-12-16 12630.923 338.59349
17: 19-1 2017-01-13 12630.923 305.35394
18: 19-1 2017-03-27 12630.923 435.04925
19: 19-1 2017-05-30 12630.923 818.80997
20: 6 2016-05-05 NA 102.53240
21: 6 2016-06-14 NA 149.06045
22: 6 2016-06-29 NA 125.82803
23: 6 2016-06-29 NA 125.82803
24: 6 2016-07-11 NA 79.24480
25: 6 2016-07-25 NA 62.24449
26: 6 2016-08-25 NA 75.77014
27: 6 2017-01-03 2014.772 47.49660
28: 6 2017-01-12 2014.772 45.53730
29: 6 2017-01-17 2014.772 43.92222
30: 6 2017-02-11 3082.318 21.96791
31: 6 2017-03-19 2477.083 21.39367
32: 6 2017-04-17 2427.536 79.03807
33: 6 2017-07-12 NA 103.52417
34: 6 2017-07-17 NA 65.53112
35: 6 2017-09-06 NA 47.40618
36: 7 2016-06-02 NA 147.49353
37: 7 2016-07-11 NA 59.26973
38: 7 2016-08-04 NA 72.62146
39: 7 2016-08-30 NA 58.27003
40: 7 2016-08-30 NA 58.27003
41: 7 2016-10-30 NA 73.88811
42: 7 2017-02-11 2279.609 21.07551
43: 7 2017-02-22 2279.609 19.92023
44: 7 2017-03-19 15842.916 31.71433
45: 7 2017-05-17 NA 44.96872
46: 7 2017-07-17 NA 58.53364
47: W62 2016-05-05 16764.975 96.72854
48: W62 2016-05-31 16764.975 72.96954
49: W62 2016-08-31 16764.975 86.33588
50: W62 2016-12-05 16764.975 94.19370
51: W62 2017-01-02 18874.656 119.39040
52: W62 2017-02-22 18874.656 75.46591
Field Date NlbsAcre TotN
这篇关于滚动联接由data.table中的第二个变量分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!