data.table 中的 ifelse 赋值 [英] ifelse assignment in data.table
问题描述
我是一名教师,想正确使用R
中的data.table
包在日志文件中自动对学生的答案进行评分,即添加一个名为正确
如果学生对特定问题的回答是该问题的正确答案,否则为 0.如果每个问题只有一个答案,我可以轻松做到这一点,但如果一个问题有多个可能的答案(问题及其可能的正确答案存储在另一个表中),我会被绊倒
I am a teacher, and would like to correctly use the data.table
package in R
to automatically grade student answers in a log file, i.e. add a column called correct
if the student answer to a particular question, is the correct answer to that question, and 0 otherwise. I can do this easily if each question has only 1 answer, but I am getting tripped up if a question has multiple possible answers (questions and their possible correct answers are stored in another table)
下面是 MWE:
set.seed(123)
question_table <- data.table(id=c(1,1,2,2,3,4),correct_ans=sample(1:4,6,replace = T))
log <- data.table(student=sample(letters[1:3],10,replace = T),
question_id=c(1,1,1,2,2,2,3,3,4,4),
student_answer= c(2,4,1,3,2,4,4,5,2,1))
我的问题在于在 j
中使用 ifelse
的正确 data.table
方式是什么,尤其是如果我们依赖另一个表?
My question lies in what is the correct data.table
way to use ifelse
in j
, especially if we depend on another table?
log[,correct:=ifelse(student_answer %in%
question_table[log$question_id %in% id]$correct_ans,1,0)]
如下所示,问题 1 和问题 2 都有多个可能的正确答案.
As can be seen below, question 1 and 2 both have multiple possible correct answers.
> question_table
id correct_ans
1: 1 2
2: 1 4
3: 2 2
4: 2 4
5: 3 4
6: 4 1
虽然计算正确的列没有错误,但有些地方是不正确的:例如当 student b
回答问题时,即使他回答错误,他也会获得正确的分数.只有 correct
列的一些条目是关闭的,这让我相信我不明白变量的范围.
While the correct column is calculated without errors, something isn't right: e.g. when student b
answers question, he is awarded a correct score, even though he answered incorrectly. Only some entries of the correct
column are off, which leads me to believe there is something i am not getting with how variables have are scoped.
> log
student question_id student_answer correct
1: b 1 2 1
2: c 1 4 1
3: b 1 1 1 <- ?
4: b 2 3 0
5: c 2 2 1
6: b 2 4 1
7: c 3 4 1
8: b 3 5 0
9: a 4 2 1 <- ?
10: c 4 1 1
我考虑通过 join
与 question_table
在 log
表中使用正确的 ans 创建一个帮助列,但这不起作用,因为后者的键不是唯一的.
I considered making a helper column with the correct ans in the log
table by join
ing with question_table
, but that does not work since the key is not unique in the latter.
我们将不胜感激.提前致谢.
Any and all help would be appreciated. Thanks in advance.
推荐答案
可以使用join:
# initialize to zero
log[, correct := 0L ]
# update to 1 if matched
log[question_table, on=c(question_id = "id", student_answer = "correct_ans"),
correct := 1L ]
student question_id student_answer correct
1: b 1 2 1
2: c 1 4 1
3: b 1 1 0
4: b 2 3 0
5: c 2 2 1
6: b 2 4 1
7: c 3 4 1
8: b 3 5 0
9: a 4 2 0
10: c 4 1 1
它是如何工作的.更新连接的语法是 X[Y, on=cols, xvar := z]
:
- 如果
X
和Y
之间的 col 名称不同,请使用on=c(xcol = "ycol", xcol2 = "ycol2")
或者,在 1.9.7+ 版本中,.(xcol = ycol, xcol2 = ycol2)
. xvar := z
只会对匹配的X
行进行操作.有时,在这里使用by=.EACHI
也很有用,这取决于Y
中每个匹配的X
行数以及如何z
的表达式很复杂.
- If col names differ between
X
andY
, useon=c(xcol = "ycol", xcol2 = "ycol2")
or, in version 1.9.7+,.(xcol = ycol, xcol2 = ycol2)
. xvar := z
will only operate on the rows ofX
that are matched. Sometimes, it is also useful to useby=.EACHI
here, depending on how many rows ofX
are matched by each inY
and how complicated the expression forz
is.
有关语法的完整文档,请参阅 ?data.table
.
See ?data.table
for full documentation on the syntax.
这篇关于data.table 中的 ifelse 赋值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!