删除data.table中的范围 [英] Remove a range in data.table

查看:133
本文介绍了删除data.table中的范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图从基于日期和月份(例如夏季假期除外)的数据表中排除某些行,例如始终从6月15日开始,到下个月15日结束。我可以提取这些天基于日期,但as.Date函数是非常慢的操作,我有单独的整数列的月和日,我想只使用它们。

I am trying to exclude some rows from a datatable based on, let's say, days and month - excluding for example summer holidays, that always begin for example 15th of June and end the 15th of next month. I can extract those days based on Date, but as as.Date function is awfully slow to operate with, I have separate integer columns for Month and Day and I want to do it using only them.

很容易通过

DT[Month==6][Day>=15]
DT[Month==7][Day<=15]

任何方式如何使两个 data.tables (原始的和我选择的)的差异。 (为什么不分组?也许我缺少一些简单的东西,但我不想排除10/6,31/7之类的天。)

Is there any way how to make "difference" of the two data.tables (the original ones and the ones I selected). (Why not subset? Maybe I am missing something simple, but I don't want to exclude days like 10/6, 31/7.)

我知道

setkey(DT, Month, Day)
DT[-DT[J(Month,Day), which= TRUE]]

以更一般的方式解决它?

Can anyone help how to solve it in more general way?

推荐答案

我已编辑问题标题以匹配问题。

Great question. I've edited the question title to match the question.

一个简单的方法,避免 as.Date

A simple approach avoiding as.Date which reads nicely :

DT[!(Month*100L+Day) %between% c(0615L,0715L)]

这在很多情况下可能足够快。如果您有很多不同的范围,那么您可能想加一个档位:

That's probably fast enough in many cases. If you have a lot of different ranges, then you may want to step up a gear :

DT[,mmdd:=Month*100L+Day]
from = DT[J(0615),mult="first",which=TRUE]
to = DT[J(0715),mult="first",which=TRUE]
DT[-(from:to)]

因为它是DIY。所以一个想法是在 i 表中的 list 列将表示范围查询( FR#203 %between%)。然后未加入(尚未实现, FR#1384 )可以与列表列范围查询结合,以完全满足您的要求:

That's a bit long and error prone because it's DIY. So one idea is that a list column in an i table would represent a range query (FR#203, like a binary search %between%). Then a not-join (also not yet implemented, FR#1384) could be combined with the list column range query to do exactly what you asked :

setkey(DT,mmdd)
DT[-J(list(0615,0715))]

这会以通常的方式扩展到多个不同的范围,或许多不同id的相同范围;即添加到 i 的更多行。

That would extend to multiple different ranges, or the same range for many different ids, in the usual way; i.e., more rows added to i.

这篇关于删除data.table中的范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆