评估函数参数以传递给data.table [英] Evaluating function arguments to pass to data.table

查看:99
本文介绍了评估函数参数以传递给data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这段代码要包装在一个函数中

I have this piece of code that I'd like to wrap in a function

indata <- data.frame(id = c(1L, 2L, 3L, 4L, 12L, 13L, 14L, 15L), 
                     fid = c(NA, 9L, 1L, 1L, 7L, 5L, 5L, 5L), 
                     mid = c(0L, NA, 2L, 2L, 6L, 6L, 6L, 8L))
library(data.table)
DT <- as.data.table(indata)

DT[, msib:=.(list(id)), by = mid][                                              
   ,msibs := mapply(setdiff, msib, id)][
   ,fsib  := .(list(id)), by = fid][
   ,fsibs := mapply(setdiff, fsib, id)][
   ,siblist  := mapply(union, msibs, fsibs)][
   ,c("msib","msibs", "fsib", "fsibs") := NULL] 

到目前为止一切顺利。根据需要工作。现在,它应该包装在一个函数中,在这里我可以传递替代变量名(如果可能,不加引号),这是我的第一次尝试。

So far so good. Works as desired. Now it should be wrapped in a function, where I can pass alternative variable names (without quoting if possible), and here's my first try.

f <- function(DT, id, fid, mid) {

    DT[, msib:=.(list(id)), by = mid][                                              
       ,msibs := mapply(setdiff, msib, id)][
       ,fsib  := .(list(id)), by = fid][
       ,fsibs := mapply(setdiff, fsib, id)][
       ,siblist  := mapply(union, msibs, fsibs)][
       ,c("msib","msibs", "fsib", "fsibs") := NULL] 
}

知道这不起作用,但是让我们看看它引发的错误

I know this isn't working but lets look at the error it throws

indata2 <- indata
names(indata2) <- c("A", "B", "C")  # Give new names
DT2 <- as.data.table(indata2)
f(DT2, A, B, C)




错误为.vector(x, list):
不能将类型'closure'强制转换为类型'list'的向量

Error in as.vector(x, "list") : cannot coerce type 'closure' to vector of type 'list'

这就说得通了。现在,为了确保对承诺的正确评估,我尝试了此操作

That makes sense. Now to make sure that the promises are evaluated correctly I tried this

f <- function(DT, id, fid, mid) {
    mid <- deparse(substitute(mid))
    id <- deparse(substitute(id))
    fid <- deparse(substitute(fid))

    DT[, msib:=.(list(id)), by = mid][                                              
       ,msibs := mapply(setdiff, msib, id)][
       ,fsib  := .(list(id)), by = fid][
       ,fsibs := mapply(setdiff, fsib, id)][
       ,siblist  := mapply(union, msibs, fsibs)][
       ,c("msib","msibs", "fsib", "fsibs") := NULL] 
}

这不会引发错误,但是也不起作用。输出看起来像这样

That doesn't throw an error but also does not work. The output looks like this

f(DT2, A, B, C)
    A  B  C siblist
1:  1 NA  0        
2:  2  9 NA        
3:  3  1  2        
4:  4  1  2        
5: 12  7  6        
6: 13  5  6        
7: 14  5  6        
8: 15  5  8   

siblist 列为空,当我手动运行它时不应该,也不应该。我还尝试了此版本(将其转换为字符串),看看是否可行:

and the siblist column is empty which it shouldn't be and isn't when I run it manually. I also tried this version (converting it to character strings) to see if that worked:

f <- function(DT, id, fid, mid){
    mid <- as.character(substitute(mid))
    id <- as.character(substitute(id))
    fid <- as.character(substitute(fid))
    DT[, msib:=.(list(id)), by = mid][ # Siblings through the mother
       ,msibs := mapply(setdiff, msib, id)][
       ,fsib  := .(list(id)), by = fid][
       ,fsibs := mapply(setdiff, fsib, id)][
       ,siblist  := mapply(union, msibs, fsibs)][
       ,c("msib","msibs", "fsib", "fsibs") := NULL] # Removed unused
}

但这都不起作用-与上述输出相同。我认为可能是因为 data.table j 部分中的承诺在错误的环境中进行了评估,但是不确定。我该如何修复我的函数?

but that doesn't work either - same output as above. I think it may be because the promises in the j part of the data.table are evaluated in the wrong environment but am not sure. How can I fix my function?

推荐答案

如果您希望某个对象具有某种结构或保存某些数据,则可以定义一个上课真的可以帮上忙。有了S3,就很简单。

If you expect an object to have a certain structure or hold certain data, then defining a class can really help. And with S3, it's simple.

as.relationship <- function(DT, id, fid, mid) {
  out <- DT[, c(id, fid, mid), with = FALSE]
  setnames(out, c("id", "fid", "mid"))
  setattr(out, "class", c("relationship", class(out)))
  out
}

然后,您可以编写一个函数在该类上工作,并且知道所有内容在哪里。

Then you can write a function to work on that class with the safety of knowing where everything is.

f <- function(DT, id, fid, mid) {
  relatives <- as.relationship(DT, id, fid, mid)
  relatives[
    relatives,
    on = "fid",
    allow.cartesian = TRUE
  ][
    relatives,
    on = "mid",
    allow.cartesian = TRUE
  ][
    ,
    {
      siblings    <- union(i.id, i.id.1)
      except_self <- setdiff(siblings, .BY[["id"]])
      list(siblist = list(except_self))
    },
    by = "id"
  ]
}

此函数使用c列名作为字符串。因此,您可以这样称呼它:

This function takes the column names as strings. So you'd call it like this:

f(DT, "id", "fid", "mid")
#    id  siblist
# 1:  1         
# 2:  2         
# 3:  3        4
# 4:  4        3
# 5: 12    13,14
# 6: 13 14,15,12
# 7: 14 13,15,12
# 8: 15    13,14

setnames(DT, c("A", "B", "C"))
f(DT, "A", "B", "C")
#    id  siblist
# 1:  1         
# 2:  2         
# 3:  3        4
# 4:  4        3
# 5: 12    13,14
# 6: 13 14,15,12
# 7: 14 13,15,12
# 8: 15    13,14

如果您担心性能,那就不要。如果您从另一个 data.table 的整个列中创建一个 data.table ,它们足够聪明,无法实际复制数据。他们分享。因此,制作另一个对象并没有实际的性能损失。

If you're worried about performance, don't be. If you create a data.table from entire columns of another data.table, they're smart enough not to actually copy the data. They share it. So there's no real performance penalty to making another object.

这篇关于评估函数参数以传递给data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆