在%%的间隔内按日期润滑数据帧 [英] Joining data frames by lubridate date %within% intervals

查看:81
本文介绍了在%%的间隔内按日期润滑数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在练习和学习用包含 lubridate 数据类型的列来处理R数据帧,例如我的其他问题

I've been practicing and learning wrangling R data frames with columns that contain lubridate data types, such as an example problem in my other question.

现在,我正在尝试进行等同于联接两个数据帧的操作,但是要通过联接一个数据帧中的一个时间戳是否落在另一个时间戳 interval 之内来联接它们数据框。例如:

Now, I am trying to do the equivalent of joining two data frames, but joining them by whether one timestamp in one data frame falls within an interval in the other data frame. For example:

这是 df1

> glimpse(df1)
Observations: 6,160
Variables: 4
$ upload_id  <int> 2, 2, 2, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, ...
$ site_id    <int> 2, 2, 2, 2, 2, 4, 4, 7, 7, 7, 7, 7, 7, 7, 7, 8, 8, 8, 8, 8, ...
$ segment_id <int> 1, 2, 3, 4, 5, 1, 2, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 4, 5, ...
$ interval   <S4: Interval> 2015-04-12 UTC--2015-04-19 UTC, 2015-04-19 UTC--201...

其中有一堆 lubridate 时间间隔,每个时间间隔具有 upload_id site_id segment_id

Where there is a bunch of lubridate time intervals each with a corresponding unique combination of upload_id, site_id, and segment_id.

这是 df2

> glimpse(df2)
Observations: 32,385
Variables: 3
$ sequence_id <int> 2047, 2067, 2069, 2072, 2075, 2081, 2086, 2091, 2096, 2104,...
$ upload_id   <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 5, 5,...
$ taken       <dttm> 2015-04-11 23:09:59, 2015-04-15 19:17:10, 2015-04-16 07:42...

中有一系列时间戳记,并带有<$ c的相应 unique 组合$ c> sequence_id 和 upload_id

Where there is a series of timestamps in column taken with corresponding unique combinations of sequence_id and upload_id.

本质上,我想 left_join(df2,df1)其中所需的 by 参数考虑两件事:(1)共享的 upload_id 列; (2) df2 中的收入是否在间隔之内在 df1 中。这是因为对于任何给定的 ,它可能会下降%within%多个间隔 s,反之亦然,所以我想使用 upload_id 作为每个 的唯一标识符因此, df2 中的每个 将只与 df1中的另一行匹配。在加入操作之后,我希望新的数据帧具有六列: sequence_id taked upload_id site_id segment_id interval

Essentially, I want to left_join(df2, df1) where the needed by argument considers two things: (1) the shared upload_id column; and (2) whether taken in df2 falls within interval in df1. This is because for any given taken, it might fall %within% multiple intervals, and vice versa, so I want to use upload_id as a unique identifier for each taken so that each taken in df2 will be matched to only one other row in df1. After the join operation, I expect the new data frame to have six columns: sequence_id, taken, upload_id, site_id, segment_id, and interval. How can this be done tidyly?

编辑:一条评论表明,上载.Rdata文件可能是不可信的,而另一条则表示违反政策这里。因此,我删除了.Rdata文件,并尝试通过 dput()提取每个数据帧的300行子集,这里是 df1

A comment suggested that uploading .Rdata files may be untrustworthy and another stated that it's against the policy here. So I removed the .Rdata files, and I tried to take a 300-row subset of each data frame via dput(), here is df1:

structure(list(upload_id = c(1050L, 1582L, 2336L, 2665L, 1007L, 
2148L, 275L, 2738L, 1501L, 64L, 2737L, 1547L, 2146L, 2596L, 457L, 
2141L, 2790L, 362L, 2835L, 2741L, 575L, 914L, 2820L, 2572L, 2791L, 
2157L, 1117L, 1535L, 2738L, 794L, 1335L, 2737L, 2570L, 1597L, 
300L, 460L, 1701L, 2142L, 274L, 339L, 2109L, 500L, 2184L, 2837L, 
1238L, 2837L, 2727L, 1175L, 1524L, 303L, 1714L, 1412L, 1894L, 
340L, 1495L, 869L, 995L, 2438L, 1974L, 2762L, 205L, 1581L, 1527L, 
2818L, 1617L, 2537L, 1956L, 638L, 1808L, 2151L, 771L, 2709L, 
2185L, 2015L, 2511L, 1163L, 2557L, 1377L, 2213L, 2560L, 1417L, 
1934L, 1860L, 2772L, 2614L, 2698L, 421L, 2609L, 1418L, 2355L, 
463L, 2697L, 347L, 1531L, 1427L, 2548L, 2218L, 2781L, 1962L, 
396L, 234L, 2846L, 4L, 2742L, 2838L, 1676L, 1635L, 2810L, 1990L, 
2514L, 2809L, 1354L, 2668L, 2737L, 1606L, 764L, 1176L, 1442L, 
519L, 2584L, 1021L, 352L, 2314L, 2662L, 1368L, 1043L, 2207L, 
2792L, 684L, 1806L, 2743L, 2557L, 1971L, 1510L, 418L, 1866L, 
1569L, 1717L, 1992L, 1629L, 2189L, 316L, 2030L, 2840L, 2307L, 
1506L, 1962L, 1249L, 2791L, 670L, 592L, 236L, 2781L, 793L, 2790L, 
2640L, 2517L, 855L, 626L, 1303L, 2241L, 1541L, 910L, 155L, 1617L, 
29L, 916L, 732L, 2006L, 2742L, 2788L, 2830L, 2664L, 1455L, 1062L, 
937L, 1543L, 781L, 737L, 901L, 2633L, 194L, 1000L, 1170L, 1567L, 
2826L, 73L, 801L, 970L, 1327L, 2688L, 1538L, 2306L, 2170L, 1977L, 
2367L, 186L, 1990L, 2606L, 2000L, 2818L, 396L, 696L, 630L, 2835L, 
2067L, 1540L, 51L, 511L, 2587L, 2737L, 1961L, 594L, 1867L, 1042L, 
116L, 1532L, 760L, 2662L, 2814L, 2585L, 2596L, 2837L, 1870L, 
1971L, 73L, 2595L, 1955L, 692L, 2062L, 2742L, 2084L, 1098L, 2205L, 
1404L, 2627L, 809L, 2684L, 2570L, 322L, 2605L, 2016L, 2782L, 
54L, 2254L, 1165L, 655L, 532L, 732L, 534L, 2664L, 1880L, 1444L, 
1920L, 477L, 2728L, 2640L, 1434L, 100L, 2587L, 1545L, 250L, 282L, 
1756L, 940L, 2826L, 1005L, 2835L, 2152L, 203L, 1970L, 579L, 1234L, 
2682L, 1050L, 2594L, 199L, 945L, 758L, 1262L, 796L, 2156L, 921L, 
1961L, 817L, 486L, 982L, 394L, 1928L, 2237L, 2570L, 2144L, 2386L, 
325L, 2729L, 2685L, 901L, 2042L, 141L, 2248L), site_id = c(184L, 
278L, 73L, 364L, 231L, 244L, 72L, 364L, 74L, 52L, 350L, 248L, 
223L, 306L, 117L, 223L, 350L, 115L, 357L, 295L, 113L, 74L, 350L, 
348L, 364L, 267L, 74L, 248L, 364L, 198L, 73L, 350L, 347L, 260L, 
103L, 134L, 271L, 223L, 72L, 120L, 73L, 145L, 214L, 350L, 74L, 
350L, 361L, 227L, 160L, 73L, 73L, 237L, 292L, 110L, 267L, 205L, 
230L, 74L, 306L, 295L, 47L, 261L, 44L, 357L, 280L, 355L, 199L, 
119L, 160L, 73L, 186L, 348L, 214L, 295L, 348L, 160L, 306L, 74L, 
191L, 350L, 73L, 191L, 191L, 364L, 306L, 364L, 74L, 73L, 74L, 
74L, 155L, 350L, 54L, 248L, 260L, 114L, 241L, 360L, 292L, 31L, 
36L, 73L, 7L, 360L, 364L, 74L, 262L, 361L, 292L, 350L, 360L, 
256L, 73L, 350L, 280L, 184L, 44L, 258L, 146L, 347L, 217L, 44L, 
113L, 357L, 191L, 233L, 245L, 360L, 156L, 293L, 360L, 306L, 292L, 
226L, 74L, 36L, 73L, 73L, 199L, 244L, 241L, 110L, 295L, 361L, 
248L, 251L, 292L, 113L, 364L, 74L, 160L, 105L, 360L, 202L, 350L, 
306L, 351L, 201L, 160L, 247L, 320L, 248L, 213L, 54L, 280L, 41L, 
198L, 187L, 74L, 360L, 357L, 287L, 350L, 44L, 234L, 105L, 248L, 
200L, 174L, 198L, 73L, 54L, 217L, 236L, 277L, 361L, 63L, 194L, 
160L, 73L, 361L, 248L, 320L, 74L, 293L, 73L, 68L, 292L, 350L, 
199L, 357L, 31L, 166L, 165L, 357L, 312L, 248L, 42L, 148L, 350L, 
350L, 147L, 116L, 248L, 174L, 47L, 226L, 74L, 357L, 73L, 348L, 
306L, 350L, 293L, 292L, 63L, 348L, 298L, 174L, 316L, 360L, 312L, 
227L, 319L, 237L, 350L, 160L, 348L, 347L, 108L, 306L, 293L, 361L, 
54L, 74L, 74L, 73L, 56L, 187L, 74L, 350L, 199L, 74L, 271L, 56L, 
360L, 306L, 226L, 72L, 350L, 248L, 90L, 91L, 74L, 44L, 361L, 
217L, 357L, 73L, 55L, 191L, 73L, 226L, 347L, 184L, 357L, 95L, 
218L, 196L, 249L, 197L, 74L, 74L, 147L, 199L, 145L, 217L, 136L, 
295L, 73L, 347L, 223L, 113L, 47L, 350L, 350L, 198L, 310L, 23L, 
74L), segment_id = c(3L, 1L, 1L, 1L, 1L, 2L, 1L, 5L, 1L, 1L, 
7L, 1L, 2L, 7L, 1L, 1L, 3L, 3L, 7L, 1L, 2L, 1L, 8L, 2L, 11L, 
1L, 1L, 3L, 6L, 1L, 1L, 8L, 2L, 2L, 4L, 5L, 3L, 1L, 1L, 1L, 1L, 
3L, 1L, 17L, 1L, 3L, 4L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 3L, 3L, 
5L, 1L, 1L, 2L, 1L, 1L, 2L, 7L, 4L, 2L, 3L, 1L, 1L, 1L, 3L, 3L, 
1L, 6L, 2L, 2L, 5L, 1L, 2L, 5L, 1L, 2L, 3L, 2L, 4L, 3L, 1L, 1L, 
2L, 1L, 4L, 13L, 3L, 2L, 1L, 2L, 3L, 6L, 5L, 5L, 3L, 1L, 2L, 
7L, 10L, 1L, 1L, 1L, 7L, 4L, 2L, 2L, 1L, 9L, 1L, 1L, 1L, 10L, 
3L, 4L, 6L, 1L, 4L, 9L, 1L, 1L, 1L, 10L, 2L, 1L, 4L, 4L, 1L, 
1L, 1L, 1L, 1L, 1L, 8L, 1L, 1L, 1L, 7L, 15L, 2L, 8L, 7L, 3L, 
6L, 1L, 1L, 1L, 8L, 1L, 23L, 4L, 3L, 2L, 2L, 2L, 2L, 4L, 1L, 
1L, 3L, 2L, 5L, 1L, 1L, 6L, 5L, 1L, 12L, 2L, 2L, 1L, 1L, 3L, 
1L, 2L, 1L, 2L, 5L, 2L, 1L, 6L, 4L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 
2L, 1L, 4L, 5L, 5L, 7L, 4L, 17L, 1L, 2L, 2L, 1L, 1L, 1L, 3L, 
1L, 18L, 4L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 8L, 6L, 2L, 1L, 
6L, 1L, 1L, 2L, 1L, 1L, 10L, 1L, 1L, 1L, 2L, 10L, 1L, 15L, 4L, 
4L, 3L, 4L, 12L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 11L, 1L, 1L, 2L, 
2L, 2L, 7L, 3L, 1L, 2L, 4L, 2L, 2L, 1L, 2L, 16L, 2L, 4L, 1L, 
2L, 1L, 1L, 2L, 14L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 6L, 
1L, 1L, 3L, 1L, 2L, 1L, 7L, 2L, 1L, 2L, 2L, 15L, 6L, 1L, 1L, 
1L), interval = new("Interval", .Data = c(604800, 86400, 86400, 
259200, 604800, 604800, 604800, 604800, 86400, 86400, 604800, 
604800, 518400, 604800, 86400, 604800, 604800, 604800, 604800, 
518400, 604800, 86400, 604800, 604800, 259200, 604800, 86400, 
604800, 604800, 518400, 172800, 604800, 604800, 604800, 172800, 
432000, 604800, 604800, 259200, 432000, 86400, 604800, 432000, 
604800, 86400, 604800, 604800, 604800, 604800, 86400, 86400, 
604800, 604800, 604800, 604800, 172800, 604800, 345600, 518400, 
604800, 345600, 604800, 86400, 86400, 604800, 604800, 604800, 
604800, 604800, 86400, 86400, 604800, 518400, 604800, 604800, 
86400, 604800, 86400, 86400, 604800, 604800, 432000, 604800, 
604800, 604800, 604800, 86400, 86400, 259200, 86400, 604800, 
604800, 259200, 604800, 604800, 604800, 259200, 604800, 604800, 
604800, 604800, 86400, 604800, 604800, 604800, 172800, 604800, 
604800, 604800, 432000, 604800, 604800, 86400, 604800, 604800, 
518400, 518400, 604800, 604800, 604800, 172800, 604800, 604800, 
86400, 604800, 604800, 604800, 604800, 86400, 518400, 604800, 
604800, 604800, 518400, 518400, 604800, 86400, 86400, 172800, 
604800, 604800, 259200, 604800, 604800, 604800, 604800, 432000, 
604800, 604800, 86400, 604800, 432000, 604800, 604800, 604800, 
604800, 604800, 86400, 518400, 604800, 604800, 604800, 604800, 
518400, 604800, 604800, 604800, 604800, 172800, 604800, 86400, 
604800, 604800, 604800, 345600, 604800, 604800, 604800, 604800, 
604800, 86400, 86400, 345600, 172800, 172800, 604800, 604800, 
518400, 604800, 86400, 604800, 604800, 604800, 172800, 604800, 
86400, 86400, 604800, 604800, 604800, 604800, 432000, 604800, 
604800, 604800, 172800, 604800, 345600, 604800, 604800, 604800, 
604800, 604800, 604800, 172800, 604800, 172800, 86400, 604800, 
86400, 604800, 604800, 604800, 604800, 604800, 604800, 604800, 
604800, 604800, 86400, 518400, 259200, 604800, 604800, 604800, 
604800, 432000, 604800, 604800, 86400, 604800, 604800, 604800, 
259200, 86400, 86400, 86400, 518400, 86400, 86400, 604800, 604800, 
259200, 345600, 604800, 604800, 604800, 604800, 172800, 604800, 
604800, 259200, 604800, 86400, 86400, 604800, 604800, 604800, 
86400, 172800, 604800, 86400, 604800, 604800, 604800, 172800, 
432000, 604800, 518400, 345600, 518400, 86400, 86400, 604800, 
604800, 604800, 604800, 172800, 604800, 86400, 604800, 518400, 
86400, 604800, 604800, 518400, 172800, 259200, 86400, 86400), 
    start = structure(c(1463097600, 1479081600, 1499817600, 1511654400, 
    1464912000, 1493337600, 1440028800, 1514073600, 1478995200, 
    1438128000, 1507593600, 1475193600, 1491782400, 1507593600, 
    1445212800, 1487462400, 1505174400, 1445731200, 1519084800, 
    1515456000, 1449964800, 1463529600, 1508198400, 1504483200, 
    1517702400, 1485648000, 1468195200, 1476403200, 1514678400, 
    1460073600, 1472860800, 1508198400, 1504483200, 1475798400, 
    1444348800, 1451692800, 1481587200, 1488153600, 1439769600, 
    1445126400, 1492732800, 1449446400, 1494201600, 1513641600, 
    1470441600, 1505174400, 1510704000, 1469145600, 1478563200, 
    1444780800, 1483228800, 1475280000, 1485129600, 1444867200, 
    1477267200, 1462492800, 1464652800, 1503532800, 1488931200, 
    1516060800, 1441584000, 1475884800, 1479772800, 1519084800, 
    1478908800, 1505952000, 1486598400, 1444608000, 1485216000, 
    1493942400, 1459814400, 1505088000, 1494201600, 1488240000, 
    1504483200, 1469491200, 1506384000, 1474502400, 1495411200, 
    1506384000, 1475366400, 1487548800, 1485734400, 1512259200, 
    1505779200, 1512864000, 1448496000, 1509494400, 1475884800, 
    1500422400, 1448582400, 1511222400, 1444348800, 1474416000, 
    1475193600, 1506038400, 1495411200, 1513036800, 1487548800, 
    1439856000, 1441497600, 1519948800, 1428192000, 1513641600, 
    1517097600, 1481673600, 1475884800, 1508889600, 1488758400, 
    1505779200, 1510617600, 1471305600, 1511913600, 1508803200, 
    1477094400, 1457481600, 1469577600, 1473206400, 1449187200, 
    1505692800, 1465776000, 1444694400, 1497744000, 1511827200, 
    1473465600, 1465516800, 1494892800, 1515456000, 1454803200, 
    1485216000, 1511827200, 1505779200, 1485129600, 1478649600, 
    1447977600, 1465516800, 1479945600, 1483315200, 1489622400, 
    1479340800, 1494201600, 1444867200, 1488844800, 1517356800, 
    1495756800, 1477785600, 1488758400, 1468800000, 1514678400, 
    1455753600, 1452556800, 1442534400, 1514246400, 1456617600, 
    1517270400, 1505779200, 1505606400, 1462147200, 1453852800, 
    1471824000, 1495584000, 1477008000, 1462579200, 1439596800, 
    1478304000, 1433808000, 1462492800, 1457395200, 1489881600, 
    1513036800, 1517875200, 1518912000, 1510617600, 1476230400, 
    1466121600, 1463443200, 1475193600, 1458432000, 1457395200, 
    1460678400, 1510617600, 1441324800, 1465171200, 1469491200, 
    1477872000, 1511913600, 1439510400, 1460332800, 1464134400, 
    1472774400, 1508889600, 1476403200, 1494979200, 1494460800, 
    1485820800, 1501027200, 1441324800, 1487548800, 1506384000, 
    1489017600, 1517270400, 1447113600, 1455580800, 1453680000, 
    1516060800, 1491264000, 1475193600, 1437696000, 1449446400, 
    1503964800, 1514246400, 1487030400, 1452124800, 1485216000, 
    1464825600, 1438905600, 1479772800, 1459641600, 1506988800, 
    1518739200, 1508112000, 1506988800, 1504569600, 1485216000, 
    1488153600, 1437696000, 1503878400, 1487808000, 1455321600, 
    1489881600, 1515456000, 1491609600, 1466121600, 1494201600, 
    1471651200, 1509408000, 1460592000, 1512345600, 1505692800, 
    1445040000, 1505174400, 1487030400, 1515542400, 1437868800, 
    1496620800, 1469577600, 1455235200, 1450224000, 1.458e+09, 
    1450828800, 1510012800, 1485388800, 1476835200, 1487894400, 
    1447977600, 1510617600, 1507593600, 1474934400, 1438905600, 
    1504569600, 1477008000, 1443312000, 1443312000, 1484524800, 
    1464048000, 1517961600, 1463356800, 1517270400, 1494028800, 
    1441238400, 1488758400, 1452643200, 1470700800, 1511740800, 
    1461888000, 1508803200, 1441238400, 1463616000, 1455062400, 
    1471478400, 1460073600, 1494115200, 1463616000, 1488240000, 
    1460073600, 1448236800, 1463961600, 1447372800, 1485820800, 
    1496102400, 1507507200, 1489968000, 1499126400, 1444176000, 
    1504569600, 1512432000, 1463097600, 1490745600, 1440028800, 
    1496448000), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    tzone = "UTC")), row.names = c(NA, -300L), class = c("tbl_df", 
"tbl", "data.frame"))

这是 df2

structure(list(sequence_id = c(10545297L, 5696697L, 26853675L, 
26800598L, 5477912L, 3564676L, 11545989L, 26788357L, 26790778L, 
4682984L, 12887744L, 4254651L, 6472328L, 18236650L, 26829066L, 
26784117L, 26886686L, 797197L, 26820954L, 26791541L, 11657412L, 
3960964L, 10189029L, 21286407L, 12914356L, 26793531L, 26802965L, 
12435451L, 5484298L, 26827162L, 26853752L, 25711869L, 9030699L, 
14386264L, 26802894L, 26377583L, 13291447L, 1851672L, 26790782L, 
9900386L, 26797667L, 6561255L, 26818879L, 11648069L, 14259988L, 
26809952L, 26809264L, 15071783L, 26791374L, 26853008L, 6762100L, 
26853620L, 26880265L, 26878102L, 26809279L, 26787754L, 5502014L, 
17810813L, 18236753L, 5568166L, 9252741L, 26786093L, 18418962L, 
1218679L, 26801395L, 16954415L, 26853619L, 26800113L, 26817488L, 
26811724L, 26809375L, 26809666L, 5869152L, 7681085L, 26894216L, 
15810230L, 26829083L, 26817434L, 26789887L, 26785533L, 26796803L, 
26786930L, 26825007L, 26784040L, 26810066L, 26853657L, 18236660L, 
26797322L, 26825026L, 4103811L, 26878149L, 10545137L, 26784075L, 
26902434L, 3948950L, 26816568L, 11453844L, 26826969L, 26813846L, 
26897750L, 26802715L, 26790888L, 26815971L, 26797683L, 4726015L, 
4617411L, 26797067L, 9252726L, 26797067L, 26785670L, 26789320L, 
26901211L, 26894241L, 499985L, 26825082L, 21774171L, 26803324L, 
26815122L, 56056L, 18236919L, 5425808L, 13209778L, 4726052L, 
14386262L, 5477952L, 5564830L, 9756473L, 26894173L, 7136912L, 
26792378L, 26878986L, 7726907L, 26903079L, 9517618L, 10730383L, 
21774142L, 26901299L, 15071807L, 26786514L, 26901389L, 26903784L, 
26802651L, 7817686L, 26805379L, 4617432L, 21624158L, 9656749L, 
26789389L, 25399602L, 26901650L, 26797702L, 9900332L, 10965877L, 
15268795L, 26896376L, 26787716L, 26851798L, 15810222L, 12887738L, 
26827055L, 16102402L, 26796994L, 26784422L, 14725739L, 26901257L, 
26853712L, 26785221L, 26793075L, 11658007L, 26823570L, 26791524L, 
26797467L, 26796972L, 8501567L, 26799777L, 5572466L, 26787249L, 
18385461L, 4791179L, 15810380L, 26808430L, 10239023L, 26790569L, 
26805358L, 18158022L, 15810244L, 26878116L, 10623114L, 267502L, 
9517623L, 16102411L, 26377567L, 8230310L, 13076594L, 26878082L, 
415271L, 13833529L, 26823199L, 2410L, 26900200L), upload_id = c(851L, 
592L, 2314L, 1799L, 546L, 357L, 925L, 299L, 1611L, 465L, 976L, 
424L, 641L, 1249L, 2274L, 1436L, 2556L, 157L, 2166L, 1666L, 928L, 
388L, 836L, 1405L, 977L, 1698L, 1928L, 961L, 547L, 2261L, 2316L, 
1486L, 774L, 1038L, 1920L, 1503L, 993L, 229L, 1611L, 819L, 1767L, 
651L, 2151L, 927L, 1034L, 2049L, 2028L, 1074L, 1629L, 2302L, 
666L, 2314L, 2434L, 2387L, 2028L, 392L, 557L, 1217L, 1249L, 564L, 
783L, 883L, 1265L, 179L, 1846L, 1159L, 2314L, 1783L, 2138L, 2079L, 
2035L, 2045L, 594L, 736L, 2569L, 1102L, 2277L, 2089L, 52L, 1025L, 
1746L, 669L, 2230L, 1506L, 2055L, 2314L, 1249L, 1757L, 2230L, 
406L, 2387L, 851L, 1506L, 2787L, 385L, 2128L, 922L, 2251L, 2102L, 
2711L, 1907L, 1605L, 2125L, 1767L, 459L, 458L, 1746L, 783L, 1746L, 
1000L, 98L, 2750L, 2569L, 122L, 2230L, 1416L, 1929L, 2110L, 41L, 
1249L, 542L, 985L, 459L, 1038L, 546L, 563L, 815L, 2569L, 681L, 
1665L, 2419L, 738L, 2821L, 792L, 879L, 1416L, 2751L, 1074L, 779L, 
2755L, 2849L, 1904L, 740L, 1951L, 458L, 1399L, 810L, 98L, 1479L, 
2760L, 1767L, 819L, 891L, 1086L, 2693L, 440L, 2292L, 1102L, 976L, 
2257L, 1106L, 1746L, 1442L, 1055L, 2751L, 2314L, 1400L, 1680L, 
929L, 2194L, 1661L, 1765L, 1746L, 769L, 1774L, 570L, 572L, 1264L, 
473L, 1102L, 2009L, 838L, 1586L, 1951L, 1235L, 1102L, 2387L, 
864L, 95L, 792L, 1106L, 1503L, 762L, 984L, 2387L, 120L, 1012L, 
1681L, 5L, 2722L), taken = structure(c(1461607098, 1357440699, 
1497946386, 1480535568, 1450529748, 1446385695, 1463741872, 1444334424, 
1479280400, 1449136788, 1462488333, 1448183687, 1454753449, 1467598406, 
1497333513, 1475588136, 1507455271, 1440251873, 1494085620, 1481115392, 
1463814473, 1441262063, 1461931738, 1471111946, 1462814426, 1482484495, 
1488369500, 1463341759, 1451394079, 1496897690, 1499171773, 1478337380, 
1459646439, 1465542945, 1487492476, 1478507314, 1465151499, 1440878596, 
1479297148, 1461237979, 1484471493, 1455032917, 1493960869, 1462284996, 
1465967563, 1490769440, 1490547948, 1458713033, 1480133603, 1498456304, 
1454837375, 1497347897, 1502541854, 1499517904, 1490563199, 1443806209, 
1451728803, 1469188230, 1468317942, 1452000085, 1459446443, 1462629579, 
1469694294, 1438787731, 1486631809, 1469203046, 1497347627, 1485346076, 
1493760152, 1491737060, 1490640549, 1490971607, 1452390124, 1458148243, 
1506439827, 1465194751, 1497427230, 1493546423, 1437499385, 1465909309, 
1479587401, 1455275863, 1494462120, 1475150180, 1486585139, 1497692625, 
1467632404, 1483992126, 1494818410, 1443259589, 1499966514, 1461252282, 
1476463125, 1517825105, 1439276459, 1492732155, 1463060151, 1496495881, 
1492443646, 1513698078, 1487699018, 1478033857, 1493459209, 1484574255, 
1445463014, 1445377602, 1482270132, 1459068085, 1482270132, 1465324190, 
1437645893, 1516448011, 1506768001, 1439499230, 1495154336, 1475995917, 
1487326465, 1492842646, 1437512735, 1471084135, 1451331488, 1464596049, 
1445487433, 1465542768, 1450654515, 1450251138, 1458756627, 1505539318, 
1456158745, 1481191991, 1502958079, 1456851898, 1519301621, 1460132323, 
1462246721, 1475745018, 1516537759, 1459318655, 1460122320, 1514916703, 
1520412137, 1488024066, 1458195162, 1487453288, 1445389049, 1474006970, 
1459754632, 1438269539, 1477661255, 1516007192, 1484753445, 1461136855, 
1463031275, 1466667291, 1509613313, 1441042946, 1497589967, 1465033581, 
1462417047, 1496682390, 1467178192, 1481293492, 1469788770, 1462814225, 
1516529474, 1498386350, 1470051133, 1481928052, 1463302826, 1495262048, 
1480681123, 1483683739, 1481041639, 1459773430, 1484652813, 1451208417, 
1451471584, 1467788032, 1445564488, 1466521584, 1490178592, 1461418924, 
1478867863, 1486761277, 1470424975, 1465375208, 1499603574, 1462529520, 
1438348434, 1460184847, 1467258314, 1478446800, 1457830628, 1464092571, 
1499339617, 1439448916, 1465530027, 1491299676, 1431043226, 1511424274
), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(NA, 
-200L), class = c("tbl_df", "tbl", "data.frame"))

这些子集的问题在于,我不确定两个子集之间仍然有多少重叠,但希望会有一些。我尝试使用 filter()之一包含另一个中的 upload_id ,但是出现错误消息:

The problem with these subsets is that I'm not sure how much overlap remains between the two of them for the join, but hopefully there will be some. I tried to filter() one to include upload_ids from the other, but I get an error saying:


filter_impl(.data,quo)中的错误:列 interval 类Period
和lubridate的间隔目前不受支持。

Error in filter_impl(.data, quo) : Column interval classes Period and Interval from lubridate are currently not supported.

抱歉,这听起来很复杂,请让我知道是否可以进一步阐明这个问题。非常感谢您的帮助!

Sorry this sounds complicated, please let me know if I can clarify this question further. I am truly grateful for your help!

推荐答案

您可以使用 fuzzyjoin 包:

library(BiocManager)
library(lubridate)
library(fuzzyjoin)
colnames(df2) <- c("sequence_id", "upload_id",  "start") 
df1$start <- int_start(df1$interval)
df1$end <- int_end(df1$interval)
df2$end <- df2$start

df3 <- interval_inner_join(df1, df2, by=c("start", "end"))   # let 1 join with 2

这篇关于在%%的间隔内按日期润滑数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆