在%%的间隔内按日期润滑数据帧 [英] Joining data frames by lubridate date %within% intervals
问题描述
我一直在练习和学习用包含 lubridate
数据类型的列来处理R数据帧,例如我的其他问题。
I've been practicing and learning wrangling R data frames with columns that contain lubridate
data types, such as an example problem in my other question.
现在,我正在尝试进行等同于联接两个数据帧的操作,但是要通过联接一个数据帧中的一个时间戳是否落在另一个时间戳 interval
之内来联接它们数据框。例如:
Now, I am trying to do the equivalent of joining two data frames, but joining them by whether one timestamp in one data frame falls within an interval
in the other data frame. For example:
这是 df1
:
> glimpse(df1)
Observations: 6,160
Variables: 4
$ upload_id <int> 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, ...
$ site_id <int> 2, 2, 2, 2, 2, 4, 4, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, ...
$ segment_id <int> 1, 2, 3, 4, 5, 1, 2, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, ...
$ interval <S4: Interval> 2015-04-12 UTC--2015-04-19 UTC, 2015-04-19 UTC--201...
其中有一堆 lubridate
时间间隔,每个时间间隔具有 upload_id
, site_id
和 segment_id
。
Where there is a bunch of lubridate
time intervals each with a corresponding unique combination of upload_id
, site_id
, and segment_id
.
这是 df2
:
> glimpse(df2)
Observations: 32,385
Variables: 3
$ sequence_id <int> 2047, 2067, 2069, 2072, 2075, 2081, 2086, 2091, 2096, 2104,...
$ upload_id <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 5, 5,...
$ taken <dttm> 2015-04-11 23:09:59, 2015-04-15 19:17:10, 2015-04-16 07:42...
在列
中有一系列时间戳记,并带有<$ c的相应 unique 组合$ c> sequence_id 和 upload_id
。
Where there is a series of timestamps in column taken
with corresponding unique combinations of sequence_id
and upload_id
.
本质上,我想 left_join(df2,df1)
其中所需的 by
参数考虑两件事:(1)共享的 upload_id
列; (2) df2
中的收入
是否在间隔
之内在 df1
中。这是因为对于任何给定的
,它可能会下降%within%
多个间隔
s,反之亦然,所以我想使用 upload_id
作为每个
的唯一标识符因此, df2
中的每个
将只与 df1中的另一行匹配
。在加入操作之后,我希望新的数据帧具有六列: sequence_id
, taked
, upload_id
, site_id
, segment_id
和 interval
。
Essentially, I want to left_join(df2, df1)
where the needed by
argument considers two things: (1) the shared upload_id
column; and (2) whether taken
in df2
falls within interval
in df1
. This is because for any given taken
, it might fall %within%
multiple interval
s, and vice versa, so I want to use upload_id
as a unique identifier for each taken
so that each taken
in df2
will be matched to only one other row in df1
. After the join operation, I expect the new data frame to have six columns: sequence_id
, taken
, upload_id
, site_id
, segment_id
, and interval
. How can this be done tidyly?
编辑:一条评论表明,上载.Rdata文件可能是不可信的,而另一条则表示违反政策这里。因此,我删除了.Rdata文件,并尝试通过 dput()
提取每个数据帧的300行子集,这里是 df1
:
A comment suggested that uploading .Rdata files may be untrustworthy and another stated that it's against the policy here. So I removed the .Rdata files, and I tried to take a 300-row subset of each data frame via dput()
, here is df1
:
structure(list(upload_id = c(1050L, 1582L, 2336L, 2665L, 1007L,
2148L, 275L, 2738L, 1501L, 64L, 2737L, 1547L, 2146L, 2596L, 457L,
2141L, 2790L, 362L, 2835L, 2741L, 575L, 914L, 2820L, 2572L, 2791L,
2157L, 1117L, 1535L, 2738L, 794L, 1335L, 2737L, 2570L, 1597L,
300L, 460L, 1701L, 2142L, 274L, 339L, 2109L, 500L, 2184L, 2837L,
1238L, 2837L, 2727L, 1175L, 1524L, 303L, 1714L, 1412L, 1894L,
340L, 1495L, 869L, 995L, 2438L, 1974L, 2762L, 205L, 1581L, 1527L,
2818L, 1617L, 2537L, 1956L, 638L, 1808L, 2151L, 771L, 2709L,
2185L, 2015L, 2511L, 1163L, 2557L, 1377L, 2213L, 2560L, 1417L,
1934L, 1860L, 2772L, 2614L, 2698L, 421L, 2609L, 1418L, 2355L,
463L, 2697L, 347L, 1531L, 1427L, 2548L, 2218L, 2781L, 1962L,
396L, 234L, 2846L, 4L, 2742L, 2838L, 1676L, 1635L, 2810L, 1990L,
2514L, 2809L, 1354L, 2668L, 2737L, 1606L, 764L, 1176L, 1442L,
519L, 2584L, 1021L, 352L, 2314L, 2662L, 1368L, 1043L, 2207L,
2792L, 684L, 1806L, 2743L, 2557L, 1971L, 1510L, 418L, 1866L,
1569L, 1717L, 1992L, 1629L, 2189L, 316L, 2030L, 2840L, 2307L,
1506L, 1962L, 1249L, 2791L, 670L, 592L, 236L, 2781L, 793L, 2790L,
2640L, 2517L, 855L, 626L, 1303L, 2241L, 1541L, 910L, 155L, 1617L,
29L, 916L, 732L, 2006L, 2742L, 2788L, 2830L, 2664L, 1455L, 1062L,
937L, 1543L, 781L, 737L, 901L, 2633L, 194L, 1000L, 1170L, 1567L,
2826L, 73L, 801L, 970L, 1327L, 2688L, 1538L, 2306L, 2170L, 1977L,
2367L, 186L, 1990L, 2606L, 2000L, 2818L, 396L, 696L, 630L, 2835L,
2067L, 1540L, 51L, 511L, 2587L, 2737L, 1961L, 594L, 1867L, 1042L,
116L, 1532L, 760L, 2662L, 2814L, 2585L, 2596L, 2837L, 1870L,
1971L, 73L, 2595L, 1955L, 692L, 2062L, 2742L, 2084L, 1098L, 2205L,
1404L, 2627L, 809L, 2684L, 2570L, 322L, 2605L, 2016L, 2782L,
54L, 2254L, 1165L, 655L, 532L, 732L, 534L, 2664L, 1880L, 1444L,
1920L, 477L, 2728L, 2640L, 1434L, 100L, 2587L, 1545L, 250L, 282L,
1756L, 940L, 2826L, 1005L, 2835L, 2152L, 203L, 1970L, 579L, 1234L,
2682L, 1050L, 2594L, 199L, 945L, 758L, 1262L, 796L, 2156L, 921L,
1961L, 817L, 486L, 982L, 394L, 1928L, 2237L, 2570L, 2144L, 2386L,
325L, 2729L, 2685L, 901L, 2042L, 141L, 2248L), site_id = c(184L,
278L, 73L, 364L, 231L, 244L, 72L, 364L, 74L, 52L, 350L, 248L,
223L, 306L, 117L, 223L, 350L, 115L, 357L, 295L, 113L, 74L, 350L,
348L, 364L, 267L, 74L, 248L, 364L, 198L, 73L, 350L, 347L, 260L,
103L, 134L, 271L, 223L, 72L, 120L, 73L, 145L, 214L, 350L, 74L,
350L, 361L, 227L, 160L, 73L, 73L, 237L, 292L, 110L, 267L, 205L,
230L, 74L, 306L, 295L, 47L, 261L, 44L, 357L, 280L, 355L, 199L,
119L, 160L, 73L, 186L, 348L, 214L, 295L, 348L, 160L, 306L, 74L,
191L, 350L, 73L, 191L, 191L, 364L, 306L, 364L, 74L, 73L, 74L,
74L, 155L, 350L, 54L, 248L, 260L, 114L, 241L, 360L, 292L, 31L,
36L, 73L, 7L, 360L, 364L, 74L, 262L, 361L, 292L, 350L, 360L,
256L, 73L, 350L, 280L, 184L, 44L, 258L, 146L, 347L, 217L, 44L,
113L, 357L, 191L, 233L, 245L, 360L, 156L, 293L, 360L, 306L, 292L,
226L, 74L, 36L, 73L, 73L, 199L, 244L, 241L, 110L, 295L, 361L,
248L, 251L, 292L, 113L, 364L, 74L, 160L, 105L, 360L, 202L, 350L,
306L, 351L, 201L, 160L, 247L, 320L, 248L, 213L, 54L, 280L, 41L,
198L, 187L, 74L, 360L, 357L, 287L, 350L, 44L, 234L, 105L, 248L,
200L, 174L, 198L, 73L, 54L, 217L, 236L, 277L, 361L, 63L, 194L,
160L, 73L, 361L, 248L, 320L, 74L, 293L, 73L, 68L, 292L, 350L,
199L, 357L, 31L, 166L, 165L, 357L, 312L, 248L, 42L, 148L, 350L,
350L, 147L, 116L, 248L, 174L, 47L, 226L, 74L, 357L, 73L, 348L,
306L, 350L, 293L, 292L, 63L, 348L, 298L, 174L, 316L, 360L, 312L,
227L, 319L, 237L, 350L, 160L, 348L, 347L, 108L, 306L, 293L, 361L,
54L, 74L, 74L, 73L, 56L, 187L, 74L, 350L, 199L, 74L, 271L, 56L,
360L, 306L, 226L, 72L, 350L, 248L, 90L, 91L, 74L, 44L, 361L,
217L, 357L, 73L, 55L, 191L, 73L, 226L, 347L, 184L, 357L, 95L,
218L, 196L, 249L, 197L, 74L, 74L, 147L, 199L, 145L, 217L, 136L,
295L, 73L, 347L, 223L, 113L, 47L, 350L, 350L, 198L, 310L, 23L,
74L), segment_id = c(3L, 1L, 1L, 1L, 1L, 2L, 1L, 5L, 1L, 1L,
7L, 1L, 2L, 7L, 1L, 1L, 3L, 3L, 7L, 1L, 2L, 1L, 8L, 2L, 11L,
1L, 1L, 3L, 6L, 1L, 1L, 8L, 2L, 2L, 4L, 5L, 3L, 1L, 1L, 1L, 1L,
3L, 1L, 17L, 1L, 3L, 4L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 3L,
5L, 1L, 1L, 2L, 1L, 1L, 2L, 7L, 4L, 2L, 3L, 1L, 1L, 1L, 3L, 3L,
1L, 6L, 2L, 2L, 5L, 1L, 2L, 5L, 1L, 2L, 3L, 2L, 4L, 3L, 1L, 1L,
2L, 1L, 4L, 13L, 3L, 2L, 1L, 2L, 3L, 6L, 5L, 5L, 3L, 1L, 2L,
7L, 10L, 1L, 1L, 1L, 7L, 4L, 2L, 2L, 1L, 9L, 1L, 1L, 1L, 10L,
3L, 4L, 6L, 1L, 4L, 9L, 1L, 1L, 1L, 10L, 2L, 1L, 4L, 4L, 1L,
1L, 1L, 1L, 1L, 1L, 8L, 1L, 1L, 1L, 7L, 15L, 2L, 8L, 7L, 3L,
6L, 1L, 1L, 1L, 8L, 1L, 23L, 4L, 3L, 2L, 2L, 2L, 2L, 4L, 1L,
1L, 3L, 2L, 5L, 1L, 1L, 6L, 5L, 1L, 12L, 2L, 2L, 1L, 1L, 3L,
1L, 2L, 1L, 2L, 5L, 2L, 1L, 6L, 4L, 2L, 1L, 1L, 1L, 3L, 1L, 1L,
2L, 1L, 4L, 5L, 5L, 7L, 4L, 17L, 1L, 2L, 2L, 1L, 1L, 1L, 3L,
1L, 18L, 4L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 8L, 6L, 2L, 1L,
6L, 1L, 1L, 2L, 1L, 1L, 10L, 1L, 1L, 1L, 2L, 10L, 1L, 15L, 4L,
4L, 3L, 4L, 12L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 11L, 1L, 1L, 2L,
2L, 2L, 7L, 3L, 1L, 2L, 4L, 2L, 2L, 1L, 2L, 16L, 2L, 4L, 1L,
2L, 1L, 1L, 2L, 14L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 6L,
1L, 1L, 3L, 1L, 2L, 1L, 7L, 2L, 1L, 2L, 2L, 15L, 6L, 1L, 1L,
1L), interval = new("Interval", .Data = c(604800, 86400, 86400,
259200, 604800, 604800, 604800, 604800, 86400, 86400, 604800,
604800, 518400, 604800, 86400, 604800, 604800, 604800, 604800,
518400, 604800, 86400, 604800, 604800, 259200, 604800, 86400,
604800, 604800, 518400, 172800, 604800, 604800, 604800, 172800,
432000, 604800, 604800, 259200, 432000, 86400, 604800, 432000,
604800, 86400, 604800, 604800, 604800, 604800, 86400, 86400,
604800, 604800, 604800, 604800, 172800, 604800, 345600, 518400,
604800, 345600, 604800, 86400, 86400, 604800, 604800, 604800,
604800, 604800, 86400, 86400, 604800, 518400, 604800, 604800,
86400, 604800, 86400, 86400, 604800, 604800, 432000, 604800,
604800, 604800, 604800, 86400, 86400, 259200, 86400, 604800,
604800, 259200, 604800, 604800, 604800, 259200, 604800, 604800,
604800, 604800, 86400, 604800, 604800, 604800, 172800, 604800,
604800, 604800, 432000, 604800, 604800, 86400, 604800, 604800,
518400, 518400, 604800, 604800, 604800, 172800, 604800, 604800,
86400, 604800, 604800, 604800, 604800, 86400, 518400, 604800,
604800, 604800, 518400, 518400, 604800, 86400, 86400, 172800,
604800, 604800, 259200, 604800, 604800, 604800, 604800, 432000,
604800, 604800, 86400, 604800, 432000, 604800, 604800, 604800,
604800, 604800, 86400, 518400, 604800, 604800, 604800, 604800,
518400, 604800, 604800, 604800, 604800, 172800, 604800, 86400,
604800, 604800, 604800, 345600, 604800, 604800, 604800, 604800,
604800, 86400, 86400, 345600, 172800, 172800, 604800, 604800,
518400, 604800, 86400, 604800, 604800, 604800, 172800, 604800,
86400, 86400, 604800, 604800, 604800, 604800, 432000, 604800,
604800, 604800, 172800, 604800, 345600, 604800, 604800, 604800,
604800, 604800, 604800, 172800, 604800, 172800, 86400, 604800,
86400, 604800, 604800, 604800, 604800, 604800, 604800, 604800,
604800, 604800, 86400, 518400, 259200, 604800, 604800, 604800,
604800, 432000, 604800, 604800, 86400, 604800, 604800, 604800,
259200, 86400, 86400, 86400, 518400, 86400, 86400, 604800, 604800,
259200, 345600, 604800, 604800, 604800, 604800, 172800, 604800,
604800, 259200, 604800, 86400, 86400, 604800, 604800, 604800,
86400, 172800, 604800, 86400, 604800, 604800, 604800, 172800,
432000, 604800, 518400, 345600, 518400, 86400, 86400, 604800,
604800, 604800, 604800, 172800, 604800, 86400, 604800, 518400,
86400, 604800, 604800, 518400, 172800, 259200, 86400, 86400),
start = structure(c(1463097600, 1479081600, 1499817600, 1511654400,
1464912000, 1493337600, 1440028800, 1514073600, 1478995200,
1438128000, 1507593600, 1475193600, 1491782400, 1507593600,
1445212800, 1487462400, 1505174400, 1445731200, 1519084800,
1515456000, 1449964800, 1463529600, 1508198400, 1504483200,
1517702400, 1485648000, 1468195200, 1476403200, 1514678400,
1460073600, 1472860800, 1508198400, 1504483200, 1475798400,
1444348800, 1451692800, 1481587200, 1488153600, 1439769600,
1445126400, 1492732800, 1449446400, 1494201600, 1513641600,
1470441600, 1505174400, 1510704000, 1469145600, 1478563200,
1444780800, 1483228800, 1475280000, 1485129600, 1444867200,
1477267200, 1462492800, 1464652800, 1503532800, 1488931200,
1516060800, 1441584000, 1475884800, 1479772800, 1519084800,
1478908800, 1505952000, 1486598400, 1444608000, 1485216000,
1493942400, 1459814400, 1505088000, 1494201600, 1488240000,
1504483200, 1469491200, 1506384000, 1474502400, 1495411200,
1506384000, 1475366400, 1487548800, 1485734400, 1512259200,
1505779200, 1512864000, 1448496000, 1509494400, 1475884800,
1500422400, 1448582400, 1511222400, 1444348800, 1474416000,
1475193600, 1506038400, 1495411200, 1513036800, 1487548800,
1439856000, 1441497600, 1519948800, 1428192000, 1513641600,
1517097600, 1481673600, 1475884800, 1508889600, 1488758400,
1505779200, 1510617600, 1471305600, 1511913600, 1508803200,
1477094400, 1457481600, 1469577600, 1473206400, 1449187200,
1505692800, 1465776000, 1444694400, 1497744000, 1511827200,
1473465600, 1465516800, 1494892800, 1515456000, 1454803200,
1485216000, 1511827200, 1505779200, 1485129600, 1478649600,
1447977600, 1465516800, 1479945600, 1483315200, 1489622400,
1479340800, 1494201600, 1444867200, 1488844800, 1517356800,
1495756800, 1477785600, 1488758400, 1468800000, 1514678400,
1455753600, 1452556800, 1442534400, 1514246400, 1456617600,
1517270400, 1505779200, 1505606400, 1462147200, 1453852800,
1471824000, 1495584000, 1477008000, 1462579200, 1439596800,
1478304000, 1433808000, 1462492800, 1457395200, 1489881600,
1513036800, 1517875200, 1518912000, 1510617600, 1476230400,
1466121600, 1463443200, 1475193600, 1458432000, 1457395200,
1460678400, 1510617600, 1441324800, 1465171200, 1469491200,
1477872000, 1511913600, 1439510400, 1460332800, 1464134400,
1472774400, 1508889600, 1476403200, 1494979200, 1494460800,
1485820800, 1501027200, 1441324800, 1487548800, 1506384000,
1489017600, 1517270400, 1447113600, 1455580800, 1453680000,
1516060800, 1491264000, 1475193600, 1437696000, 1449446400,
1503964800, 1514246400, 1487030400, 1452124800, 1485216000,
1464825600, 1438905600, 1479772800, 1459641600, 1506988800,
1518739200, 1508112000, 1506988800, 1504569600, 1485216000,
1488153600, 1437696000, 1503878400, 1487808000, 1455321600,
1489881600, 1515456000, 1491609600, 1466121600, 1494201600,
1471651200, 1509408000, 1460592000, 1512345600, 1505692800,
1445040000, 1505174400, 1487030400, 1515542400, 1437868800,
1496620800, 1469577600, 1455235200, 1450224000, 1.458e+09,
1450828800, 1510012800, 1485388800, 1476835200, 1487894400,
1447977600, 1510617600, 1507593600, 1474934400, 1438905600,
1504569600, 1477008000, 1443312000, 1443312000, 1484524800,
1464048000, 1517961600, 1463356800, 1517270400, 1494028800,
1441238400, 1488758400, 1452643200, 1470700800, 1511740800,
1461888000, 1508803200, 1441238400, 1463616000, 1455062400,
1471478400, 1460073600, 1494115200, 1463616000, 1488240000,
1460073600, 1448236800, 1463961600, 1447372800, 1485820800,
1496102400, 1507507200, 1489968000, 1499126400, 1444176000,
1504569600, 1512432000, 1463097600, 1490745600, 1440028800,
1496448000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
tzone = "UTC")), row.names = c(NA, -300L), class = c("tbl_df",
"tbl", "data.frame"))
这是 df2
:
structure(list(sequence_id = c(10545297L, 5696697L, 26853675L,
26800598L, 5477912L, 3564676L, 11545989L, 26788357L, 26790778L,
4682984L, 12887744L, 4254651L, 6472328L, 18236650L, 26829066L,
26784117L, 26886686L, 797197L, 26820954L, 26791541L, 11657412L,
3960964L, 10189029L, 21286407L, 12914356L, 26793531L, 26802965L,
12435451L, 5484298L, 26827162L, 26853752L, 25711869L, 9030699L,
14386264L, 26802894L, 26377583L, 13291447L, 1851672L, 26790782L,
9900386L, 26797667L, 6561255L, 26818879L, 11648069L, 14259988L,
26809952L, 26809264L, 15071783L, 26791374L, 26853008L, 6762100L,
26853620L, 26880265L, 26878102L, 26809279L, 26787754L, 5502014L,
17810813L, 18236753L, 5568166L, 9252741L, 26786093L, 18418962L,
1218679L, 26801395L, 16954415L, 26853619L, 26800113L, 26817488L,
26811724L, 26809375L, 26809666L, 5869152L, 7681085L, 26894216L,
15810230L, 26829083L, 26817434L, 26789887L, 26785533L, 26796803L,
26786930L, 26825007L, 26784040L, 26810066L, 26853657L, 18236660L,
26797322L, 26825026L, 4103811L, 26878149L, 10545137L, 26784075L,
26902434L, 3948950L, 26816568L, 11453844L, 26826969L, 26813846L,
26897750L, 26802715L, 26790888L, 26815971L, 26797683L, 4726015L,
4617411L, 26797067L, 9252726L, 26797067L, 26785670L, 26789320L,
26901211L, 26894241L, 499985L, 26825082L, 21774171L, 26803324L,
26815122L, 56056L, 18236919L, 5425808L, 13209778L, 4726052L,
14386262L, 5477952L, 5564830L, 9756473L, 26894173L, 7136912L,
26792378L, 26878986L, 7726907L, 26903079L, 9517618L, 10730383L,
21774142L, 26901299L, 15071807L, 26786514L, 26901389L, 26903784L,
26802651L, 7817686L, 26805379L, 4617432L, 21624158L, 9656749L,
26789389L, 25399602L, 26901650L, 26797702L, 9900332L, 10965877L,
15268795L, 26896376L, 26787716L, 26851798L, 15810222L, 12887738L,
26827055L, 16102402L, 26796994L, 26784422L, 14725739L, 26901257L,
26853712L, 26785221L, 26793075L, 11658007L, 26823570L, 26791524L,
26797467L, 26796972L, 8501567L, 26799777L, 5572466L, 26787249L,
18385461L, 4791179L, 15810380L, 26808430L, 10239023L, 26790569L,
26805358L, 18158022L, 15810244L, 26878116L, 10623114L, 267502L,
9517623L, 16102411L, 26377567L, 8230310L, 13076594L, 26878082L,
415271L, 13833529L, 26823199L, 2410L, 26900200L), upload_id = c(851L,
592L, 2314L, 1799L, 546L, 357L, 925L, 299L, 1611L, 465L, 976L,
424L, 641L, 1249L, 2274L, 1436L, 2556L, 157L, 2166L, 1666L, 928L,
388L, 836L, 1405L, 977L, 1698L, 1928L, 961L, 547L, 2261L, 2316L,
1486L, 774L, 1038L, 1920L, 1503L, 993L, 229L, 1611L, 819L, 1767L,
651L, 2151L, 927L, 1034L, 2049L, 2028L, 1074L, 1629L, 2302L,
666L, 2314L, 2434L, 2387L, 2028L, 392L, 557L, 1217L, 1249L, 564L,
783L, 883L, 1265L, 179L, 1846L, 1159L, 2314L, 1783L, 2138L, 2079L,
2035L, 2045L, 594L, 736L, 2569L, 1102L, 2277L, 2089L, 52L, 1025L,
1746L, 669L, 2230L, 1506L, 2055L, 2314L, 1249L, 1757L, 2230L,
406L, 2387L, 851L, 1506L, 2787L, 385L, 2128L, 922L, 2251L, 2102L,
2711L, 1907L, 1605L, 2125L, 1767L, 459L, 458L, 1746L, 783L, 1746L,
1000L, 98L, 2750L, 2569L, 122L, 2230L, 1416L, 1929L, 2110L, 41L,
1249L, 542L, 985L, 459L, 1038L, 546L, 563L, 815L, 2569L, 681L,
1665L, 2419L, 738L, 2821L, 792L, 879L, 1416L, 2751L, 1074L, 779L,
2755L, 2849L, 1904L, 740L, 1951L, 458L, 1399L, 810L, 98L, 1479L,
2760L, 1767L, 819L, 891L, 1086L, 2693L, 440L, 2292L, 1102L, 976L,
2257L, 1106L, 1746L, 1442L, 1055L, 2751L, 2314L, 1400L, 1680L,
929L, 2194L, 1661L, 1765L, 1746L, 769L, 1774L, 570L, 572L, 1264L,
473L, 1102L, 2009L, 838L, 1586L, 1951L, 1235L, 1102L, 2387L,
864L, 95L, 792L, 1106L, 1503L, 762L, 984L, 2387L, 120L, 1012L,
1681L, 5L, 2722L), taken = structure(c(1461607098, 1357440699,
1497946386, 1480535568, 1450529748, 1446385695, 1463741872, 1444334424,
1479280400, 1449136788, 1462488333, 1448183687, 1454753449, 1467598406,
1497333513, 1475588136, 1507455271, 1440251873, 1494085620, 1481115392,
1463814473, 1441262063, 1461931738, 1471111946, 1462814426, 1482484495,
1488369500, 1463341759, 1451394079, 1496897690, 1499171773, 1478337380,
1459646439, 1465542945, 1487492476, 1478507314, 1465151499, 1440878596,
1479297148, 1461237979, 1484471493, 1455032917, 1493960869, 1462284996,
1465967563, 1490769440, 1490547948, 1458713033, 1480133603, 1498456304,
1454837375, 1497347897, 1502541854, 1499517904, 1490563199, 1443806209,
1451728803, 1469188230, 1468317942, 1452000085, 1459446443, 1462629579,
1469694294, 1438787731, 1486631809, 1469203046, 1497347627, 1485346076,
1493760152, 1491737060, 1490640549, 1490971607, 1452390124, 1458148243,
1506439827, 1465194751, 1497427230, 1493546423, 1437499385, 1465909309,
1479587401, 1455275863, 1494462120, 1475150180, 1486585139, 1497692625,
1467632404, 1483992126, 1494818410, 1443259589, 1499966514, 1461252282,
1476463125, 1517825105, 1439276459, 1492732155, 1463060151, 1496495881,
1492443646, 1513698078, 1487699018, 1478033857, 1493459209, 1484574255,
1445463014, 1445377602, 1482270132, 1459068085, 1482270132, 1465324190,
1437645893, 1516448011, 1506768001, 1439499230, 1495154336, 1475995917,
1487326465, 1492842646, 1437512735, 1471084135, 1451331488, 1464596049,
1445487433, 1465542768, 1450654515, 1450251138, 1458756627, 1505539318,
1456158745, 1481191991, 1502958079, 1456851898, 1519301621, 1460132323,
1462246721, 1475745018, 1516537759, 1459318655, 1460122320, 1514916703,
1520412137, 1488024066, 1458195162, 1487453288, 1445389049, 1474006970,
1459754632, 1438269539, 1477661255, 1516007192, 1484753445, 1461136855,
1463031275, 1466667291, 1509613313, 1441042946, 1497589967, 1465033581,
1462417047, 1496682390, 1467178192, 1481293492, 1469788770, 1462814225,
1516529474, 1498386350, 1470051133, 1481928052, 1463302826, 1495262048,
1480681123, 1483683739, 1481041639, 1459773430, 1484652813, 1451208417,
1451471584, 1467788032, 1445564488, 1466521584, 1490178592, 1461418924,
1478867863, 1486761277, 1470424975, 1465375208, 1499603574, 1462529520,
1438348434, 1460184847, 1467258314, 1478446800, 1457830628, 1464092571,
1499339617, 1439448916, 1465530027, 1491299676, 1431043226, 1511424274
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA,
-200L), class = c("tbl_df", "tbl", "data.frame"))
这些子集的问题在于,我不确定两个子集之间仍然有多少重叠,但希望会有一些。我尝试使用 filter()
之一包含另一个中的 upload_id
,但是出现错误消息:
The problem with these subsets is that I'm not sure how much overlap remains between the two of them for the join, but hopefully there will be some. I tried to filter()
one to include upload_id
s from the other, but I get an error saying:
filter_impl(.data,quo)中的错误:列
interval
类Period
和lubridate的间隔目前不受支持。
Error in filter_impl(.data, quo) : Column
interval
classes Period and Interval from lubridate are currently not supported.
抱歉,这听起来很复杂,请让我知道是否可以进一步阐明这个问题。非常感谢您的帮助!
Sorry this sounds complicated, please let me know if I can clarify this question further. I am truly grateful for your help!
推荐答案
您可以使用 fuzzyjoin
包:
library(BiocManager)
library(lubridate)
library(fuzzyjoin)
colnames(df2) <- c("sequence_id", "upload_id", "start")
df1$start <- int_start(df1$interval)
df1$end <- int_end(df1$interval)
df2$end <- df2$start
df3 <- interval_inner_join(df1, df2, by=c("start", "end")) # let 1 join with 2
这篇关于在%%的间隔内按日期润滑数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!