data.table中的ifelse分配 [英] ifelse assignment in data.table

查看:97
本文介绍了data.table中的ifelse分配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是一名老师,并且想正确地使用 R 中的 data.table 软件包自动评分学生在日志文件中的答案,即,如果学生对特定问题的答案是该问题的正确答案,则添加一列 correct ,否则为0。如果每个问题只有一个答案,我可以很容易地做到这一点,但是如果一个问题有多个可能的答案(问题及其可能的正确答案存储在另一个表中),我就会被绊倒。

I am a teacher, and would like to correctly use the data.table package in R to automatically grade student answers in a log file, i.e. add a column called correct if the student answer to a particular question, is the correct answer to that question, and 0 otherwise. I can do this easily if each question has only 1 answer, but I am getting tripped up if a question has multiple possible answers (questions and their possible correct answers are stored in another table)

以下是MWE:

set.seed(123)
question_table <- data.table(id=c(1,1,2,2,3,4),correct_ans=sample(1:4,6,replace = T))
log <- data.table(student=sample(letters[1:3],10,replace = T),
                  question_id=c(1,1,1,2,2,2,3,3,4,4), 
                  student_answer= c(2,4,1,3,2,4,4,5,2,1))

我的问题在于,在 j <中使用 ifelse 的正确 data.table 方法是什么? / code>,尤其是如果我们依赖于另一个表吗?

My question lies in what is the correct data.table way to use ifelse in j, especially if we depend on another table?

log[,correct:=ifelse(student_answer %in% 
                          question_table[log$question_id %in% id]$correct_ans,1,0)]

如下所示,问题1和2都有亩可能给出正确答案。

As can be seen below, question 1 and 2 both have multiple possible correct answers.

> question_table
   id correct_ans
1:  1           2
2:  1           4
3:  2           2
4:  2           4
5:  3           4
6:  4           1

虽然正确的列计算没有错误,但有些不正确:例如当学生b 回答问题时,即使他回答不正确,他也会获得正确的分数。 正确的列中只有一些条目处于关闭状态,这使我相信我对变量的范围划分并没有理解。

While the correct column is calculated without errors, something isn't right: e.g. when student b answers question, he is awarded a correct score, even though he answered incorrectly. Only some entries of the correct column are off, which leads me to believe there is something i am not getting with how variables have are scoped.

> log
    student question_id student_answer correct
 1:       b           1              2       1
 2:       c           1              4       1
 3:       b           1              1       1   <- ?
 4:       b           2              3       0
 5:       c           2              2       1
 6:       b           2              4       1
 7:       c           3              4       1
 8:       b           3              5       0
 9:       a           4              2       1   <- ?
10:       c           4              1       1

我考虑过在帮助栏中输入正确的答案 log 表通过 join question_table 联接,但是

I considered making a helper column with the correct ans in the log table by joining with question_table, but that does not work since the key is not unique in the latter.

任何帮助,我们将不胜感激。
预先感谢。

Any and all help would be appreciated. Thanks in advance.

推荐答案

您可以使用联接:

# initialize to zero
log[, correct := 0L ]

# update to 1 if matched
log[question_table, on=c(question_id = "id", student_answer = "correct_ans"),
   correct := 1L ] 

    student question_id student_answer correct
 1:       b           1              2       1
 2:       c           1              4       1
 3:       b           1              1       0
 4:       b           2              3       0
 5:       c           2              2       1
 6:       b           2              4       1
 7:       c           3              4       1
 8:       b           3              5       0
 9:       a           4              2       0
10:       c           4              1       1

工作原理。更新联接的语法为 X [Y,on = cols, xvar := z]


  • 如果列名在 X Y ,使用 on = c(xcol = ycol,xcol2 = ycol2)或,在版本1.9.7+中,。(xcol = ycol,xcol2 = ycol2)

  • xvar := z 仅对匹配的 X 行起作用。有时,在这里使用 by = .EACHI 也很有用,这取决于 X 匹配多少行。 Y 中的每一个,以及 z 的表达式有多复杂。

  • If col names differ between X and Y, use on=c(xcol = "ycol", xcol2 = "ycol2") or, in version 1.9.7+, .(xcol = ycol, xcol2 = ycol2).
  • xvar := z will only operate on the rows of X that are matched. Sometimes, it is also useful to use by=.EACHI here, depending on how many rows of X are matched by each in Y and how complicated the expression for z is.

有关语法的完整文档,请参见?data.table

See ?data.table for full documentation on the syntax.

这篇关于data.table中的ifelse分配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆