锁定data.table表的内容 [英] Locking the contents of data.table tables

查看:66
本文介绍了锁定data.table表的内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以使data.table静态(即不可更新)?使用lockBinding()函数可防止重新分配变量,但是,仍可以编辑数据表的列。示例:

Is it possible to make a data.table static (i.e. non-updatable)? Using the lockBinding() function prevents the variable being re-assigned, however, the columns of the data table can still be edited. Example:

> dt = data.table( x = 1:5 )
> lockBinding( "dt", env = environment() )
> dt = 1
Error: cannot change value of locked binding for 'dt'
> dt[ , x := 1 ]
> dt[ , x ]
[1] 1 1 1 1 1

我想是有关如何引用数据表的信息,但是,也能够锁定数据表的内容将很有用。 (我经常有共享的引用表,我不想意外地更新它们。)

I guess it is related to how data tables are referenced, however, it would be useful to be able to lock the contents of the data table as well. (I often have shared reference tables that I don't want to update by accident.)

推荐答案

这有点棘手。一种方法是劫持 [函数以禁止在对象上使用:= 。如果要绑定data.table,可以向其添加一个类,如下所示:

This is kinda tricky. One way to do it is to hijack the [ function to disallow the use of := on the object. If we want to bind a data.table, we can add a class to it, like so:

boundDT <- function(dt){
  class(dt) <- c("bound.data.table", class(dt))
  dt
}

结果:

library(data.table)
dt = data.table( x = 1:5 )
bound <- boundDT(dt)
class(bound)
[1] "bound.data.table" "data.table"       "data.frame"   

如果我们随后创建新的索引功能来工作在 bound.data.table 类上,我们可以做以下事情:

If we then create a new indexing function to work on the bound.data.table class, we can do our thing:

`[.bound.data.table` <- function(dt, ...){
  if(any(unlist(sapply(match.call()[-(1:2)], function(x) if(length(x) > 1)as.character(x[1]) == ":=")))){
    stop("Can't use `:=` on this object.")
  }
  class(dt) <- class(dt)[-1]
  dt[...]
}

这将检查调用中是否使用了函数:= 并抛出错误r,如果有。否则,它将删除data.table的内部副本上的绑定类,并调用常规的 [函数。

This checks whether the function := is used in the call and throws an error if it does. Else it removes the bound class on the internal copy of the data.table, and calls the regular [ function.

bound[, x := 1]
 Error in `[.bound.data.table`(bound, , `:=`(x, 1)) : 
  Can't use `:=` on this object. 
bound[, x]
[1] 1 2 3 4 5

一个警告:

在使用:= 在联接中,如果绑定表不是基表,则此方法不起作用:

When using := in a join, this does not work if the bound table is not the base table:

dt = data.table( x = 1:5 , y = 5:1)
bound <- boundDT(dt)
dt[bound, y := 1, on = .(x = x)]
bound
   x y
1: 1 1
2: 2 1
3: 3 1
4: 4 1
5: 5 1

但是:

bound[dt, y := 1, on = .(x = x)]
 Error in `[.bound.data.table`(bound, dt, `:=`(y, 1), on = .(x = x)) : 
  Can't use `:=` on this object.



防止使用 set *



由于大多数问题围绕:= 运算符,我们可以着重于防止使用 set * 放在我们的对象上。

Preventing the use of set*

With most issues around the := operator out of the way, we can focus on preventing the use of set* on our object.

使用绑定的data.table时,我们可以检查调用堆栈以查看是否在提供data.table之前,有任何 set * 函数。

When the bound data.table is used, we can check the call-stack to see if there are any set* functions, before providing the data.table.

bindDT <- function(dt){
  bound <- boundDT(dt)
  function(){
    calls <- sys.calls()
    forbidden <- c("set", "set2key", "set2keyv", "setattr", "setcolorder", "setdiff", "setDT", 
                   "setDTthreads", "setequal", "setindex", "setindexv", "setkey", "setkeyv", 
                   "setnames", "setNumericRounding", "setorder", "setorderv")
    matches <- unlist(lapply(calls, function(x) as.character(x)[1] %in% forbidden))
    if(any(matches)){
      stop(paste0("Can't use function ", calls[[which(matches)[1]]][1], " on bound data.table."))
    }

    bound
  }
}

此函数像以前一样绑定data.table,但是它没有返回它,而是返回了一个函数。调用此函数时,将检查调用堆栈中的 set * 函数,如果发现任何错误,则引发错误。我是从data.table帮助页面上获得此列表的,因此应该是完整的。

This function binds the data.table like before, but instead of returning this, it returns a function. This function, when called checks the callstack for set* functions and throws an error if it finds any. I got this list from the data.table help-page, so this should be complete.

您可以使用主动绑定来避免将data.table作为函数,每次使用 pryr

You can use active binding to avoid having to call the data.table as a function for each use, using pryr:

library(data.table)
library(pryr)

dt = data.table( x = 1:5 , y = 5:1)
bound %<a-% (bindDT(dt))()

setkey(bound, x)
Error in (bindDT(dt))() : Can't use function setkey on bound data.table.

这篇关于锁定data.table表的内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆