R data.table将函数应用于所有成对的列 [英] R data.table apply function to all pair of columns
问题描述
说我有一个data.table,例如:(或带有数字和NA)
Say I have a data.table such as: (or with numbers and NAs)
temp <- data.table(M=c(NA,T,T,F,F,F,NA,NA,F),
P=c(T,T,T,F,F,F,NA,NA,NA), S=c(T,F,NA,T,F,NA,NA,NA,NA))
M P S
NA TRUE TRUE
TRUE TRUE FALSE
TRUE TRUE NA
FALSE FALSE TRUE
FALSE FALSE FALSE
FALSE FALSE NA
NA NA NA
NA NA NA
FALSE NA NA
我想检查变量是否为NA是否意味着第二个变量的值也全部为NA.要检查某些变量是否链接到其他变量.
And I want to check if whenever a variable is NA implies that the values of a second variable are all NA as well. To check if some variables are linked to other.
例如,每当P = NA时,我们也有S = NA.
For example, whenever P=NA we have also S=NA.
此代码可正确用于两列:
This code works properly for two single columns:
temp[is.na(P),all(is.na(S))]
给出TRUE
和
temp[is.na(S),all(is.na(P))]
给出FALSE,因为第六行是S = NA但P!= NA.
gives FALSE because the sixth row is S=NA but P!=NA.
现在我的问题.我想归纳一下,检查我的data.table中的所有对,并打印出对接"的对.
我宁愿只打印TRUE的结果,而忽略FALSE的结果,因为我的实际数据表中的大多数对都不会链接,并且我有550个变量.
Now my question.
I would like to generalize it, checking all pairs in my data.table and print what pairs are "linked".
I'd prefer to print only the results that are TRUE, ignoring the FALSE ones because most pairs in my real data.table won't be linked, and I have 550 variables.
我尝试了以下代码:
temp[, lapply(.SD, function(x) temp[is.na(x),
lapply(.SD, function(y) all(is.na(y)) )]]
我收到此错误
错误:"temp [,lapply(.SD,function(x)temp [is.na(x),lapply(.SD,function(y)all(is.na(y)))]]"
Error: unexpected ']' in: "temp[, lapply(.SD, function(x) temp[is.na(x), lapply(.SD, function(y) all(is.na(y)) )]]"
我可以尝试使用for循环,但是我更喜欢典型的data.table语法.欢迎任何建议.
I could try with a for loop but I'd prefer the typical data.table syntax. Any suggestion is welcome.
我还想知道在嵌套data.table调用时如何引用两个不同的.SD.
I would also like to know how to refer to two different .SD when you are nesting data.table calls.
推荐答案
对于成对组合, crossprod
似乎仍然有用.
For combinations in pairs, crossprod
seems yet useful.
我们只关心值是否为 NA
:
NAtemp = is.na(temp)
比较 NA
s的共存:
crossprod(NAtemp)
# M P S
#M 3 2 2
#P 2 3 3
#S 2 3 5
每列的 NA
个数:
colSums(NAtemp)
#M P S
#3 3 5
喜欢:
ans = crossprod(NAtemp) == colSums(NAtemp)
ans
# M P S
#M TRUE FALSE FALSE
#P FALSE TRUE TRUE
#S FALSE FALSE TRUE
并使用方便的 as.data.frame.table
进行格式化:
And use the convenient as.data.frame.table
to format:
subset(as.data.frame(as.table(ans)), Var1 != Var2)
# Var1 Var2 Freq
#2 P M FALSE
#3 S M FALSE
#4 M P FALSE
#6 S P FALSE
#7 M S FALSE
#8 P S TRUE
这篇关于R data.table将函数应用于所有成对的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!