R data.table联接/子设置/按组和按条件匹配 [英] R data.table join/ subsetting/ match by group and by a condition

查看:106
本文介绍了R data.table联接/子设置/按组和按条件匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从2个data.tables中按组对数据进行子集/匹配,但无法弄清楚R中的情况. 我有一个具有City_ID和时间戳(列名=时间)的以下data.table.

I am trying to subset/ match data by groups from 2 data.tables and cannot figure out how do this is in R. I have the following data.table that has a City_ID and a time stamp (column name=Time).

Library(data.table)  
timetable <- data.table(City_ID=c("12","9"),
                        Time=c("12-29-2013-22:05:03","12-29-2013-11:59:00")) 

我有第二个data.table,其中包含对城市和时间戳的多次观察(加上其他数据).该表如下所示:

I have a second data.table with several observation for cities and time stamps (plus additional data). The table looks like this:

DT = data.table(City_ID =c("12","12","12","9","9","9"),
                Time= c("12-29-2013-13:05:13","12-29-2013-22:05:03",
                        "12-28-2013-13:05:13","12-29-2013-11:59:00",
                        "01-30-2013-10:05:03","12-28-2013-13:05:13"), 
                Other=1:6)

现在,我需要在其他data.table"timetable"(基本上是matchtable)中找到DT中每个时间> = Time的城市的观测值.仅保留那些记录(包括不用于计算的列;在示例列"other"中).我想要的结果看起来像这样:

Now I need to find the observations for each city in DT that have a Time >= Time in the other data.table "timetable" (which is basically the matchtable). Only those records should be kept (including the columns that are not used for the calculation; in the example column "other"). The result I want looks like this:

desiredresult = data.table(City_ID=c("12","9"),
                           Time= c("12-29-2013-22:05:03","12-29-2013-11:59:00"),
                           Other=c("2","4"))

我尝试了以下方法:

setkey(DT, City_ID, Time)  
setkey(timetable, City_ID)  
failedresult = DT[,Time >= timetable[Time], by=City_ID]  
failedresult2 = DT[,Time >= timetable, by=City_ID]  

顺便说一句:我知道最好另外分割日期和时间,但这可能会使示例变得更加复杂(当我测试通过data.table在时间戳中找到最小值时,它似乎可以工作).

BTW: I know it may be better to additionally split date and time, but this may make the example even more complex (and when I tested finding a minimum in the time stamps through data.table, it seemed to work).

推荐答案

以下是此任务的一种方法:

Here's an approach for this task:

# 1) transform string to POSIXct object
DT[ , Time := as.POSIXct(strptime(Time, "%m-%d-%Y-%X"))]
timetable[ , Time := as.POSIXct(strptime(Time, "%m-%d-%Y-%X"))]

# 2) set key
setkey(DT, City_ID)
setkey(timetable, City_ID)

# 3) join tables
DT2 <- DT[timetable]

# 4) extract rows and columns
DT2[Time >= Time.1, .SD, .SDcols = names(DT)]

#    City_ID                Time Other
# 1:      12 2013-12-29 22:05:03     2
# 2:       9 2013-12-29 11:59:00     4

这篇关于R data.table联接/子设置/按组和按条件匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆