data.table 中的 ifelse 赋值 [英] ifelse assignment in data.table

查看:22
本文介绍了data.table 中的 ifelse 赋值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一名教师,想正确使用R中的data.table包在日志文件中自动对学生的答案进行评分,即添加一个名为正确 如果学生对特定问题的回答是该问题的正确答案,否则为 0.如果每个问题只有一个答案,我可以轻松做到这一点,但如果一个问题有多个可能的答案(问题及其可能的正确答案存储在另一个表中),我会被绊倒

I am a teacher, and would like to correctly use the data.table package in R to automatically grade student answers in a log file, i.e. add a column called correct if the student answer to a particular question, is the correct answer to that question, and 0 otherwise. I can do this easily if each question has only 1 answer, but I am getting tripped up if a question has multiple possible answers (questions and their possible correct answers are stored in another table)

下面是 MWE:

set.seed(123)
question_table <- data.table(id=c(1,1,2,2,3,4),correct_ans=sample(1:4,6,replace = T))
log <- data.table(student=sample(letters[1:3],10,replace = T),
                  question_id=c(1,1,1,2,2,2,3,3,4,4), 
                  student_answer= c(2,4,1,3,2,4,4,5,2,1))

我的问题在于在 j 中使用 ifelse 的正确 data.table 方式是什么,尤其是如果我们依赖另一个表?

My question lies in what is the correct data.table way to use ifelse in j, especially if we depend on another table?

log[,correct:=ifelse(student_answer %in% 
                          question_table[log$question_id %in% id]$correct_ans,1,0)]

如下所示,问题 1 和问题 2 都有多个可能的正确答案.

As can be seen below, question 1 and 2 both have multiple possible correct answers.

> question_table
   id correct_ans
1:  1           2
2:  1           4
3:  2           2
4:  2           4
5:  3           4
6:  4           1

虽然计算正确的列没有错误,但有些地方是不正确的:例如当 student b 回答问题时,即使他回答错误,他也会获得正确的分数.只有 correct 列的一些条目是关闭的,这让我相信我不明白变量的范围.

While the correct column is calculated without errors, something isn't right: e.g. when student b answers question, he is awarded a correct score, even though he answered incorrectly. Only some entries of the correct column are off, which leads me to believe there is something i am not getting with how variables have are scoped.

> log
    student question_id student_answer correct
 1:       b           1              2       1
 2:       c           1              4       1
 3:       b           1              1       1   <- ?
 4:       b           2              3       0
 5:       c           2              2       1
 6:       b           2              4       1
 7:       c           3              4       1
 8:       b           3              5       0
 9:       a           4              2       1   <- ?
10:       c           4              1       1

我考虑通过 joinquestion_tablelog 表中使用正确的 ans 创建一个帮助列,但这不起作用,因为后者的键不是唯一的.

I considered making a helper column with the correct ans in the log table by joining with question_table, but that does not work since the key is not unique in the latter.

我们将不胜感激.提前致谢.

Any and all help would be appreciated. Thanks in advance.

推荐答案

可以使用join:

# initialize to zero
log[, correct := 0L ]

# update to 1 if matched
log[question_table, on=c(question_id = "id", student_answer = "correct_ans"),
   correct := 1L ] 

    student question_id student_answer correct
 1:       b           1              2       1
 2:       c           1              4       1
 3:       b           1              1       0
 4:       b           2              3       0
 5:       c           2              2       1
 6:       b           2              4       1
 7:       c           3              4       1
 8:       b           3              5       0
 9:       a           4              2       0
10:       c           4              1       1

它是如何工作的.更新连接的语法是 X[Y, on=cols, xvar := z]:

  • 如果 XY 之间的 col 名称不同,请使用 on=c(xcol = "ycol", xcol2 = "ycol2")或者,在 1.9.7+ 版本中,.(xcol = ycol, xcol2 = ycol2).
  • xvar := z 只会对匹配的 X 行进行操作.有时,在这里使用 by=.EACHI 也很有用,这取决于 Y 中每个匹配的 X 行数以及如何z 的表达式很复杂.
  • If col names differ between X and Y, use on=c(xcol = "ycol", xcol2 = "ycol2") or, in version 1.9.7+, .(xcol = ycol, xcol2 = ycol2).
  • xvar := z will only operate on the rows of X that are matched. Sometimes, it is also useful to use by=.EACHI here, depending on how many rows of X are matched by each in Y and how complicated the expression for z is.

有关语法的完整文档,请参阅 ?data.table.

See ?data.table for full documentation on the syntax.

这篇关于data.table 中的 ifelse 赋值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆