R:按组,检查是否对于一个var的每个唯一值,有至少一个观察值,其中var的值等于另一个var的值 [英] R: By group, check if for each unique value of one var, there is at least one observation where the value of the var equals the value of another var
问题描述
我尝试在Google和SE上发现一些有用的东西,但我发现似乎不能以一种能够得到我寻找答案的方式来制定问题。
我可以写一个 for-loop ,为每个 id 和每行的 a 的每个唯一值进行比较,但我努力实现更高级别的R
id <-c(1,1,1,2,2,2) ,3,3,3,4,4,4,5,5,5)
a <-c(1,1,1,2,2,2,3,3,4,4,4, 5,5,5,6)
b < - c(1,2,3,3,3,4,3,4,5,4,4,5,6,7,8)
require(data.table)
dt< - data.table(id,a,b)
dt
dt [,unique in%b,by = id]
tmp < - dt [,unique(a)%in%b,by = id]
tmp $ id [tmp $ V1 == FALSE]
在我的示例中, ID 2,3和5应该是结果,规则为:通过 id ,检查 a 的每个唯一值是否至少有一个观察值,其中 b
。但是,我的代码只输出 ID 2和5,因为对于 ID 3, 4 与先前观察的 4 匹配。
结果应该输出不满足条件的ID,或者向原始表中添加一个虚拟变量,指示是否满足ID的条件。
如何
dt [ ,function(i)any(a == i& b == i))),by = id]
#id V1
#1:1 TRUE
#2:2 FALSE
#3:3 FALSE
#4:4 TRUE
#5:5 FALSE
要添加一个虚拟变量到原始表,可以修改它像
dt [,check:= all (a),function(i)any(a == i& b == i))),by = id]
I think I am on the right direction with this code, but I am not quite there yet.
I tried finding something useful on Google and SE, but I did not seem to be able to formulate the question in a way that gets me the answer I am looking for.
I could write a for-loop for this, comparing for each id and for each unique value of a per row, but I strive to achieve a higher level of R-understanding and thus want to avoid loops.
id <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5)
a <- c(1,1,1,2,2,2,3,3,4,4,4,5,5,5,6)
b <- c(1,2,3,3,3,4,3,4,5,4,4,5,6,7,8)
require(data.table)
dt <- data.table(id, a, b)
dt
dt[,unique(a) %in% b, by=id]
tmp <- dt[,unique(a) %in% b, by=id]
tmp$id[tmp$V1 == FALSE]
In my example, IDs 2, 3 and 5 should be the result, the decision rule being: "By id, check if for each unique value of a if there is at least one observation where the value of b equals value of a."
However, my code only outputs IDs 2 and 5, but not 3. This is because for ID 3, the 4 is matched with the 4 of the previous observation.
The result should either output the IDs for which the condition is not met, or add a dummy variable to the original table that indicated whether the condition is met for the ID.
How about
dt[, all(sapply(unique(a), function(i) any(a == i & b == i))), by = id]
# id V1
#1: 1 TRUE
#2: 2 FALSE
#3: 3 FALSE
#4: 4 TRUE
#5: 5 FALSE
If you want to add a dummy variable to the original table, you can modify it like
dt[, check:=all(sapply(unique(a), function(i) any(a == i & b == i))), by = id]
这篇关于R:按组,检查是否对于一个var的每个唯一值,有至少一个观察值,其中var的值等于另一个var的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!