由 data.table 中的第二个变量分组的滚动连接 [英] Rolling join grouped by a second variable in data.table

查看:15
本文介绍了由 data.table 中的第二个变量分组的滚动连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我想使用 data.table 包在 R 中执行滚动连接.加入日期"列时有多个匹配项,因此我想在字段"列的 data.table 中使用 by 参数来保持不同的数据从被连接在一起的领域.

示例数据

d1<-structure(list(Field = c(6", W62", 6", 6", 12S", 19-1",6"、6"、12S"、7"、6"、12S"、W62"、6"、12S"、W62"、12S"、6"、6"、7"、12S"、12S"、W62"、7"、12S"、6"、12S"、7"、12S"、7"、6"、7"、12S"、7"、6"、6"、6"、6"、12S"、7"、7"、6"、6"、12S"、7"、12S"、12S"、12S"、19-1"、"6"), 日期 = 结构 (c(16994, 17240, 17240, 17401, 17048, 17417,17387、17394、17382、17414、17029、17403、17045、17359、17179、17281、17152、16972、16987、17042、17282、17415、17281、17266、17179、17190、17057、17380、17280、17178、17178、17343、17373、17043、17190、17343、17253、16981、17079、17043、17270、17366、16981、17357、17366、17415、17079、17190、17385、17008),类=日期"),NlbsAcre = c(NA, 18874.6557383659, 2477.08251404958, NA,NA, 19658.0054165823, NA, NA, 12621.0827111083, NA, NA, 16764.41968227,16764.9745173044, 北美, 7671.24950330348, 21341.6444661863,5197.26333885612, NA, NA, NA, 39560.8958554292, 18162.4040880297,22578.1487456647、15842.9161753361、3613.95523726973、2601.07083566694、17766.9873538952, NA, 44728.1837479613, 2279.60909695434,2014.7720270382, 北美, 14847.7006686211, 北美, 3082.31758038481,NA, 2427.53558465175, NA, 23641.2999848709, NA, NA, NA, NA,5928.31591997149, 北美, 22162.2028819815, 18972.2228621189,6534.4257935542, 12630.9231775315, NA)), .Names = c("字段","日期", "NlbsAcre"), 类 = c("data.table", "data.frame"), row.names = c(NA,-50L), .internal.selfref = <指针: 0x0000000006540​​788>)d2<-结构(列表(字段 = c(6",W62",7",12S",19-1",12S",6"、6"、19-1"、19-1"、6"、7"、W62"、19-1"、12S"、7"、19-1"、7"、12S"、12S"、12S"、7"、6"、7"、6"、7"、W62"、19-1"、6"、6"、12S"、12S"、6"、6"、12S"、6"、12S"、19-1"、6"、W62"、W62"、6"、7"、7"、6"、19-1"、W62"、"6", "12S", "7"), 日期 = 结构 (c(16993, 17140, 17208, 17443,17063、16948、17415、16926、17316、16922、16981、17043、17219、17252、17392、17244、17179、17017、17042、17031、17013、17104、17273、16954、17364、16993、17168、17028、17208、16966、17241、16945、17038、17169、17379、17183、17238、17054、17244、16952、17044、17359、17219、17303、17007、17151、16926、17178、17382、17364), class = "Date"), TotN = c(79.244802845739, 94.193700050628,21.075505564932, 692.152760834712, 224.689064446728, 172.576578578436,47.406177406404、102.53239575903、818.80997295717、476.174916307807、125.828033450364, 58.270026966444, 75.465909993456, 435.049246131543,337.913876678769, 31.714327953234, 305.353940577156, 72.621457768224,393.815453005314, 428.540114240892, 318.97091713563, 73.888113736431,79.0380747113805, 147.493527174027, 65.5311189906495, 59.269732271703,119.390398108236, 110.706003557451, 21.96790939404, 149.060445984684,128.143343232486, 208.621943093862, 75.770138571561, 47.496596179338,132.723654607278, 43.92222198012, 145.150910469252, 215.88105225024,21.393670871196, 72.969536052, 86.335878117078, 103.524169592979,19.920230115264, 44.968722966108, 62.244487239885, 338.593490463303,96.7285416279375、45.537296152302、422.630318314444、58.5336350807685)), .Names = c("Field", "Date", "TotN"), class = c("data.table","data.frame"), row.names = c(NA, -50L), .internal.selfref = <pointer: 0x0000000006540​​788>)

我的尝试

这是我对日期"列执行滚动连接的尝试,将 by 分组为字段"列.显然,我可以按字段"拆分数据并单独处理,但我想避免使用该选项.

d1[d2, roll = "nearest", on = .(Date), by = .(Field)]`[.data.table`(d1, d2, roll = "nearest", on = .(Date), by = .(Field)) 中的错误:提供了 'by' 或 'keyby' 但不提供 j

解决方案

你快到了.

您可以同时加入多个列.因此,除了日期"之外,您还可以在 on 子句中包含字段".但请注意?data.tableroll参数的说明:

<块引用>

滚动连接适用于 last 连接列

因此,对于要用于滚动连接的日期",请将其指定为 on 中的 last 变量:

库(data.table)d1[d2, roll = "最近的", on = .(Field, Date)]

为了更好的验证,结果可以排序

d1[d2, roll = "nearest", on = .(Field, Date)][order(Field, Date)]

<块引用>

 字段日期 NlbsAcre TotN1: 12S 2016-05-24 不适用 208.621942: 12S 2016-05-27 不适用 172.576583: 12S 2016-07-31 不适用 318.970924: 12S 2016-08-18 不适用 428.540115: 12S 2016-08-29 不适用 393.815456:12S 2017-03-13 44728.184 145.150917:12S 2017-03-16 44728.184 128.143348:12S 2017-08-01 12621.083 132.723659:12S 2017-08-04 12621.083 422.6303210:12S 2017-08-14 12621.083 337.9138811:12S 2017-10-04 22162.203 692.1527612: 19-1 2016-05-01 12630.923 476.1749213: 19-1 2016-08-15 12630.923 110.7060014: 19-1 2016-09-10 12630.923 215.8810515: 19-1 2016-09-19 12630.923 224.6890616: 19-1 2016-12-16 12630.923 338.5934917: 19-1 2017-01-13 12630.923 305.3539418:19-1 2017-03-27 12630.923 435.0492519:19-1 2017-05-30 12630.923 818.8099720:6 2016-05-05 不适用 102.5324021:6 2016-06-14 不适用 149.0604522:6 2016-06-29 不适用 125.8280323:6 2016-06-29 不适用 125.8280324:6 2016-07-11 不适用 79.2448025:6 2016-07-25 不适用 62.2444926:6 2016-08-25 不适用 75.7701427:6 2017-01-03 2014.772 47.4966028:6 2017-01-12 2014.772 45.5373029:6 2017-01-17 2014.772 43.9222230:6 2017-02-11 3082.318 21.9679131:6 2017-03-19 2477.083 21.3936732:6 2017-04-17 2427.536 79.0380733:6 2017-07-12 不适用 103.5241734:6 2017-07-17 不适用 65.5311235:6 2017-09-06 不适用 47.4061836:7 2016-06-02 不适用 147.4935337:7 2016-07-11 不适用 59.2697338:7 2016-08-04 不适用 72.6214639:7 2016-08-30 不适用 58.2700340:7 2016-08-30 不适用 58.2700341:7 2016-10-30 不适用 73.8881142:7 2017-02-11 2279.609 21.0755143:7 2017-02-22 2279.609 19.9202344:7 2017-03-19 15842.916 31.7143345:7 2017-05-17 不适用 44.9687246:7 2017-07-17 不适用 58.5336447:W62 2016-05-05 16764.975 96.7285448:W62 2016-05-31 16764.975 72.9695449:W62 2016-08-31 16764.975 86.3358850:W62 2016-12-05 16764.975 94.1937051:W62 2017-01-02 18874.656 119.3904052:W62 2017-02-22 18874.656 75.46591田间日期 NlbsAcre TotN

Hello I would like to perform a rolling join in R using the data.table package. There are multiple matches when joining on the "Date" column so I would like to use the by argument in data.table on the "Field" column to keep data from different fields from being joined together.

Example Data

d1<-structure(list(Field = c("6", "W62", "6", "6", "12S", "19-1", 
"6", "6", "12S", "7", "6", "12S", "W62", "6", "12S", "W62", "12S", 
"6", "6", "7", "12S", "12S", "W62", "7", "12S", "6", "12S", "7", 
"12S", "7", "6", "7", "12S", "7", "6", "6", "6", "6", "12S", 
"7", "7", "6", "6", "12S", "7", "12S", "12S", "12S", "19-1", 
"6"), Date = structure(c(16994, 17240, 17240, 17401, 17048, 17417, 
17387, 17394, 17382, 17414, 17029, 17403, 17045, 17359, 17179, 
17281, 17152, 16972, 16987, 17042, 17282, 17415, 17281, 17266, 
17179, 17190, 17057, 17380, 17280, 17178, 17178, 17343, 17373, 
17043, 17190, 17343, 17253, 16981, 17079, 17043, 17270, 17366, 
16981, 17357, 17366, 17415, 17079, 17190, 17385, 17008), class = "Date"), 
    NlbsAcre = c(NA, 18874.6557383659, 2477.08251404958, NA, 
    NA, 19658.0054165823, NA, NA, 12621.0827111083, NA, NA, 16764.41968227, 
    16764.9745173044, NA, 7671.24950330348, 21341.6444661863, 
    5197.26333885612, NA, NA, NA, 39560.8958554292, 18162.4040880297, 
    22578.1487456647, 15842.9161753361, 3613.95523726973, 2601.07083566694, 
    17766.9873538952, NA, 44728.1837479613, 2279.60909695434, 
    2014.7720270382, NA, 14847.7006686211, NA, 3082.31758038481, 
    NA, 2427.53558465175, NA, 23641.2999848709, NA, NA, NA, NA, 
    5928.31591997149, NA, 22162.2028819815, 18972.2228621189, 
    6534.4257935542, 12630.9231775315, NA)), .Names = c("Field", 
"Date", "NlbsAcre"), class = c("data.table", "data.frame"), row.names = c(NA, 
-50L), .internal.selfref = <pointer: 0x0000000006540788>)

d2<-structure(list(Field = c("6", "W62", "7", "12S", "19-1", "12S", 
"6", "6", "19-1", "19-1", "6", "7", "W62", "19-1", "12S", "7", 
"19-1", "7", "12S", "12S", "12S", "7", "6", "7", "6", "7", "W62", 
"19-1", "6", "6", "12S", "12S", "6", "6", "12S", "6", "12S", 
"19-1", "6", "W62", "W62", "6", "7", "7", "6", "19-1", "W62", 
"6", "12S", "7"), Date = structure(c(16993, 17140, 17208, 17443, 
17063, 16948, 17415, 16926, 17316, 16922, 16981, 17043, 17219, 
17252, 17392, 17244, 17179, 17017, 17042, 17031, 17013, 17104, 
17273, 16954, 17364, 16993, 17168, 17028, 17208, 16966, 17241, 
16945, 17038, 17169, 17379, 17183, 17238, 17054, 17244, 16952, 
17044, 17359, 17219, 17303, 17007, 17151, 16926, 17178, 17382, 
17364), class = "Date"), TotN = c(79.244802845739, 94.193700050628, 
21.075505564932, 692.152760834712, 224.689064446728, 172.576578578436, 
47.406177406404, 102.53239575903, 818.80997295717, 476.174916307807, 
125.828033450364, 58.270026966444, 75.465909993456, 435.049246131543, 
337.913876678769, 31.714327953234, 305.353940577156, 72.621457768224, 
393.815453005314, 428.540114240892, 318.97091713563, 73.888113736431, 
79.0380747113805, 147.493527174027, 65.5311189906495, 59.269732271703, 
119.390398108236, 110.706003557451, 21.96790939404, 149.060445984684, 
128.143343232486, 208.621943093862, 75.770138571561, 47.496596179338, 
132.723654607278, 43.92222198012, 145.150910469252, 215.88105225024, 
21.393670871196, 72.969536052, 86.335878117078, 103.524169592979, 
19.920230115264, 44.968722966108, 62.244487239885, 338.593490463303, 
96.7285416279375, 45.537296152302, 422.630318314444, 58.5336350807685
)), .Names = c("Field", "Date", "TotN"), class = c("data.table", 
"data.frame"), row.names = c(NA, -50L), .internal.selfref = <pointer: 0x0000000006540788>)

What I have tried

This is my attempt to perform a rolling join on the "Date" column, grouped by the "Field" column. Obviously I can split the data by "Field" and process separately but I would like to avoid that option.

d1[d2, roll = "nearest", on = .(Date), by = .(Field)]

Error in `[.data.table`(d1, d2, roll = "nearest", on = .(Date), by = .(Field)) : 
  'by' or 'keyby' is supplied but not j

解决方案

You were almost there.

You can join on multiple columns simultaneously. So, in addition to "Date", you can include "Field" in the on clause. But please note the description of the roll argument in ?data.table:

Rolling joins apply to the last join column

Thus, for "Date" to be used for the rolling join, specify it as the last variable in on:

library(data.table)
d1[d2, roll = "nearest", on = .(Field, Date)]

For better verification, the result can be ordered

d1[d2, roll = "nearest", on = .(Field, Date)][order(Field, Date)]

    Field       Date  NlbsAcre      TotN
 1:   12S 2016-05-24        NA 208.62194
 2:   12S 2016-05-27        NA 172.57658
 3:   12S 2016-07-31        NA 318.97092
 4:   12S 2016-08-18        NA 428.54011
 5:   12S 2016-08-29        NA 393.81545
 6:   12S 2017-03-13 44728.184 145.15091
 7:   12S 2017-03-16 44728.184 128.14334
 8:   12S 2017-08-01 12621.083 132.72365
 9:   12S 2017-08-04 12621.083 422.63032
10:   12S 2017-08-14 12621.083 337.91388
11:   12S 2017-10-04 22162.203 692.15276
12:  19-1 2016-05-01 12630.923 476.17492
13:  19-1 2016-08-15 12630.923 110.70600
14:  19-1 2016-09-10 12630.923 215.88105
15:  19-1 2016-09-19 12630.923 224.68906
16:  19-1 2016-12-16 12630.923 338.59349
17:  19-1 2017-01-13 12630.923 305.35394
18:  19-1 2017-03-27 12630.923 435.04925
19:  19-1 2017-05-30 12630.923 818.80997
20:     6 2016-05-05        NA 102.53240
21:     6 2016-06-14        NA 149.06045
22:     6 2016-06-29        NA 125.82803
23:     6 2016-06-29        NA 125.82803
24:     6 2016-07-11        NA  79.24480
25:     6 2016-07-25        NA  62.24449
26:     6 2016-08-25        NA  75.77014
27:     6 2017-01-03  2014.772  47.49660
28:     6 2017-01-12  2014.772  45.53730
29:     6 2017-01-17  2014.772  43.92222
30:     6 2017-02-11  3082.318  21.96791
31:     6 2017-03-19  2477.083  21.39367
32:     6 2017-04-17  2427.536  79.03807
33:     6 2017-07-12        NA 103.52417
34:     6 2017-07-17        NA  65.53112
35:     6 2017-09-06        NA  47.40618
36:     7 2016-06-02        NA 147.49353
37:     7 2016-07-11        NA  59.26973
38:     7 2016-08-04        NA  72.62146
39:     7 2016-08-30        NA  58.27003
40:     7 2016-08-30        NA  58.27003
41:     7 2016-10-30        NA  73.88811
42:     7 2017-02-11  2279.609  21.07551
43:     7 2017-02-22  2279.609  19.92023
44:     7 2017-03-19 15842.916  31.71433
45:     7 2017-05-17        NA  44.96872
46:     7 2017-07-17        NA  58.53364
47:   W62 2016-05-05 16764.975  96.72854
48:   W62 2016-05-31 16764.975  72.96954
49:   W62 2016-08-31 16764.975  86.33588
50:   W62 2016-12-05 16764.975  94.19370
51:   W62 2017-01-02 18874.656 119.39040
52:   W62 2017-02-22 18874.656  75.46591
    Field       Date  NlbsAcre      TotN

这篇关于由 data.table 中的第二个变量分组的滚动连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆