评估函数参数以传递给data.table [英] Evaluating function arguments to pass to data.table
问题描述
我有这段代码要包装在一个函数中
I have this piece of code that I'd like to wrap in a function
indata <- data.frame(id = c(1L, 2L, 3L, 4L, 12L, 13L, 14L, 15L),
fid = c(NA, 9L, 1L, 1L, 7L, 5L, 5L, 5L),
mid = c(0L, NA, 2L, 2L, 6L, 6L, 6L, 8L))
library(data.table)
DT <- as.data.table(indata)
DT[, msib:=.(list(id)), by = mid][
,msibs := mapply(setdiff, msib, id)][
,fsib := .(list(id)), by = fid][
,fsibs := mapply(setdiff, fsib, id)][
,siblist := mapply(union, msibs, fsibs)][
,c("msib","msibs", "fsib", "fsibs") := NULL]
到目前为止一切顺利。根据需要工作。现在,它应该包装在一个函数中,在这里我可以传递替代变量名(如果可能,不加引号),这是我的第一次尝试。
So far so good. Works as desired. Now it should be wrapped in a function, where I can pass alternative variable names (without quoting if possible), and here's my first try.
f <- function(DT, id, fid, mid) {
DT[, msib:=.(list(id)), by = mid][
,msibs := mapply(setdiff, msib, id)][
,fsib := .(list(id)), by = fid][
,fsibs := mapply(setdiff, fsib, id)][
,siblist := mapply(union, msibs, fsibs)][
,c("msib","msibs", "fsib", "fsibs") := NULL]
}
我知道这不起作用,但是让我们看看它引发的错误
I know this isn't working but lets look at the error it throws
indata2 <- indata
names(indata2) <- c("A", "B", "C") # Give new names
DT2 <- as.data.table(indata2)
f(DT2, A, B, C)
错误为.vector(x, list):
不能将类型'closure'强制转换为类型'list'的向量
Error in as.vector(x, "list") : cannot coerce type 'closure' to vector of type 'list'
这就说得通了。现在,为了确保对承诺的正确评估,我尝试了此操作
That makes sense. Now to make sure that the promises are evaluated correctly I tried this
f <- function(DT, id, fid, mid) {
mid <- deparse(substitute(mid))
id <- deparse(substitute(id))
fid <- deparse(substitute(fid))
DT[, msib:=.(list(id)), by = mid][
,msibs := mapply(setdiff, msib, id)][
,fsib := .(list(id)), by = fid][
,fsibs := mapply(setdiff, fsib, id)][
,siblist := mapply(union, msibs, fsibs)][
,c("msib","msibs", "fsib", "fsibs") := NULL]
}
这不会引发错误,但是也不起作用。输出看起来像这样
That doesn't throw an error but also does not work. The output looks like this
f(DT2, A, B, C)
A B C siblist
1: 1 NA 0
2: 2 9 NA
3: 3 1 2
4: 4 1 2
5: 12 7 6
6: 13 5 6
7: 14 5 6
8: 15 5 8
和 siblist
列为空,当我手动运行它时不应该,也不应该。我还尝试了此版本(将其转换为字符串),看看是否可行:
and the siblist
column is empty which it shouldn't be and isn't when I run it manually. I also tried this version (converting it to character strings) to see if that worked:
f <- function(DT, id, fid, mid){
mid <- as.character(substitute(mid))
id <- as.character(substitute(id))
fid <- as.character(substitute(fid))
DT[, msib:=.(list(id)), by = mid][ # Siblings through the mother
,msibs := mapply(setdiff, msib, id)][
,fsib := .(list(id)), by = fid][
,fsibs := mapply(setdiff, fsib, id)][
,siblist := mapply(union, msibs, fsibs)][
,c("msib","msibs", "fsib", "fsibs") := NULL] # Removed unused
}
但这都不起作用-与上述输出相同。我认为可能是因为 data.table
的 j
部分中的承诺在错误的环境中进行了评估,但是不确定。我该如何修复我的函数?
but that doesn't work either - same output as above. I think it may be because the promises in the j
part of the data.table
are evaluated in the wrong environment but am not sure. How can I fix my function?
推荐答案
如果您希望某个对象具有某种结构或保存某些数据,则可以定义一个上课真的可以帮上忙。有了S3,就很简单。
If you expect an object to have a certain structure or hold certain data, then defining a class can really help. And with S3, it's simple.
as.relationship <- function(DT, id, fid, mid) {
out <- DT[, c(id, fid, mid), with = FALSE]
setnames(out, c("id", "fid", "mid"))
setattr(out, "class", c("relationship", class(out)))
out
}
然后,您可以编写一个函数在该类上工作,并且知道所有内容在哪里。
Then you can write a function to work on that class with the safety of knowing where everything is.
f <- function(DT, id, fid, mid) {
relatives <- as.relationship(DT, id, fid, mid)
relatives[
relatives,
on = "fid",
allow.cartesian = TRUE
][
relatives,
on = "mid",
allow.cartesian = TRUE
][
,
{
siblings <- union(i.id, i.id.1)
except_self <- setdiff(siblings, .BY[["id"]])
list(siblist = list(except_self))
},
by = "id"
]
}
此函数使用c列名作为字符串。因此,您可以这样称呼它:
This function takes the column names as strings. So you'd call it like this:
f(DT, "id", "fid", "mid")
# id siblist
# 1: 1
# 2: 2
# 3: 3 4
# 4: 4 3
# 5: 12 13,14
# 6: 13 14,15,12
# 7: 14 13,15,12
# 8: 15 13,14
setnames(DT, c("A", "B", "C"))
f(DT, "A", "B", "C")
# id siblist
# 1: 1
# 2: 2
# 3: 3 4
# 4: 4 3
# 5: 12 13,14
# 6: 13 14,15,12
# 7: 14 13,15,12
# 8: 15 13,14
如果您担心性能,那就不要。如果您从另一个 data.table
的整个列中创建一个 data.table
,它们足够聪明,无法实际复制数据。他们分享。因此,制作另一个对象并没有实际的性能损失。
If you're worried about performance, don't be. If you create a data.table
from entire columns of another data.table
, they're smart enough not to actually copy the data. They share it. So there's no real performance penalty to making another object.
这篇关于评估函数参数以传递给data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!