在data.table中获取用户定义的函数 [英] Get a user-defined function work in data.table
问题描述
我想知道如何在data.table中传递用户定义的函数。
我使用data.table创建了以下代码,用于计算两个组中所有有效响应('a'或'b')中响应'b'的百分比; grp1和grp2:
数据(带有警告信息):
<$ c $ (c(I,II,III,IV)),rep(c(A,B (c)(a,a,b,b,b) (grp1,grp2,Q1)
计算%respondents的代码:$>
dt [,sum(Q1%in%b)/ sum(!is.na(Q1))* 100, by = grp1:grp2] [order(grp1,grp2)]
@通过计算受访者百分比):
grp1 grp2 V1
1:IA 55.55556
2:IB 62.50000
3:IC 62.50000
4:II A 62.50000
5:II B 55.55556
6:II C 62.50000
7:III A 50.00000
8 :III B 62.50000
9:III C 66.66667
10:IV A 66.66667
11:IV B 62.50000
12:IV C 50.00000
我想要做的是创建一个函数并使用它来计算50个其他项目的等价值集合。我创建了以下函数,希望尽量减少重复过程;
test = function(questionA,groupB){
dt [,sum(get(question)%in %(b))/ sum(!is.na(get(question)))* 100,by = eval((c(groupA,groupB)))] [order(groupA,groupB)]
}
test(question =Q1,groupA =grp1,groupB =grp2)
然而,这只返回最上一行:
grp1 grp2 V1
1:IA 55.55556
我读过堆栈溢出的其他项目(例如在函数中使用data.table i和j参数)并尝试了其他代码,但我没有能够找到一种方法来实现它的工作。
我是R新手,非常感谢您的反馈。
问题出在你用参数指定的方式。我们还可以使用
keyby
来代替,,可以一步完成排序:
test =函数(question,groupA,groupB){
dt [,sum(get(question)%in%b)/ sum (!is.na(get(question)))* 100,
keyby = c(groupA,groupB)]
}
ans = test(question =Q1 ,groupA =grp1,groupB =grp2)
#grp1 grp2 V1
#1:IA 55.55556
#2:IB 62.50000
#3:IC 62.50000
#4:II A 62.50000
#5:II B 55.55556
#6:II C 62.50000
#7:III A 50.00000
#8:III B 62.50000
#9:III C 66.66667
#10:IV A 66.66667
#11:IV B 62.50000
#12:IV C 50.00000
I would like to know how to pass a user-defined function in a data.table.
I created the following code using data.table to calculate % of responses 'b' out of all valid responses ('a' or 'b') by two groups; grp1 and grp2:
The data (with a warning message):
library(data.table)
dt = data.table(rep(c("I", "II", "III", "IV")), rep(c("A", "B", "C")),
rep(c("a", "a", "b", "b", "b"), 20))
colnames(dt) = c("grp1", "grp2", "Q1")
The code to calculate % respondents:
dt[, sum(Q1 %in% "b")/sum(!is.na(Q1))*100, by = grp1:grp2][order(grp1, grp2)]
This produces what I need (thanks @Frank your help at Calculate % respondents by more than one group for a survey data):
grp1 grp2 V1
1: I A 55.55556
2: I B 62.50000
3: I C 62.50000
4: II A 62.50000
5: II B 55.55556
6: II C 62.50000
7: III A 50.00000
8: III B 62.50000
9: III C 66.66667
10: IV A 66.66667
11: IV B 62.50000
12: IV C 50.00000
What I would like to do is to create a function and use it to calculate the equivalent set of values for 50 other items. I created the following function hoping to minimize the repetitive process;
test = function(question, groupA, groupB){
dt[, sum(get(question) %in% "b")/sum(!is.na(get(question)))*100, by = eval((c(groupA, groupB)))][order(groupA, groupB)]
}
test(question = "Q1", groupA = "grp1", groupB ="grp2")
However, this returns only the top row :
grp1 grp2 V1
1: I A 55.55556
I've read other items on Stack Overflow (e.g. Using data.table i and j arguments in functions) and tried other codes but I haven't been able to find a way to get it work.
I'm new to R and would very much appreciate any feedback you may have.
The issue is in the way you specify the by
argument. Also we can use keyby
instead of by
, to do the sorting in one step:
test = function(question, groupA, groupB){
dt[, sum(get(question) %in% "b") / sum(!is.na(get(question))) * 100,
keyby = c(groupA, groupB)]
}
ans = test(question = "Q1", groupA = "grp1", groupB ="grp2")
# grp1 grp2 V1
# 1: I A 55.55556
# 2: I B 62.50000
# 3: I C 62.50000
# 4: II A 62.50000
# 5: II B 55.55556
# 6: II C 62.50000
# 7: III A 50.00000
# 8: III B 62.50000
# 9: III C 66.66667
# 10: IV A 66.66667
# 11: IV B 62.50000
# 12: IV C 50.00000
这篇关于在data.table中获取用户定义的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!