R data.table将函数应用于所有成对的列 [英] R data.table apply function to all pair of columns

查看:34
本文介绍了R data.table将函数应用于所有成对的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个data.table,例如:(或带有数字和NA)

Say I have a data.table such as: (or with numbers and NAs)

temp <- data.table(M=c(NA,T,T,F,F,F,NA,NA,F), 
                   P=c(T,T,T,F,F,F,NA,NA,NA), S=c(T,F,NA,T,F,NA,NA,NA,NA))

    M     P     S
   NA  TRUE  TRUE
 TRUE  TRUE FALSE
 TRUE  TRUE    NA
FALSE FALSE  TRUE
FALSE FALSE FALSE
FALSE FALSE    NA
   NA    NA    NA
   NA    NA    NA
FALSE    NA    NA

我想检查变量是否为NA是否意味着第二个变量的值也全部为NA.要检查某些变量是否链接到其他变量.

And I want to check if whenever a variable is NA implies that the values of a second variable are all NA as well. To check if some variables are linked to other.

例如,每当P = NA时,我们也有S = NA.

For example, whenever P=NA we have also S=NA.

此代码可正确用于两列:

This code works properly for two single columns:

temp[is.na(P),all(is.na(S))]

给出TRUE

temp[is.na(S),all(is.na(P))]

给出FALSE,因为第六行是S = NA但P!= NA.

gives FALSE because the sixth row is S=NA but P!=NA.

现在我的问题.我想归纳一下,检查我的data.table中的所有对,并打印出对接"的对.
我宁愿只打印TRUE的结果,而忽略FALSE的结果,因为我的实际数据表中的大多数对都不会链接,并且我有550个变量.

Now my question. I would like to generalize it, checking all pairs in my data.table and print what pairs are "linked".
I'd prefer to print only the results that are TRUE, ignoring the FALSE ones because most pairs in my real data.table won't be linked, and I have 550 variables.

我尝试了以下代码:

temp[, lapply(.SD, function(x) temp[is.na(x), 
                 lapply(.SD, function(y)  all(is.na(y)) )]]

我收到此错误

错误:"temp [,lapply(.SD,function(x)temp [is.na(x),lapply(.SD,function(y)all(is.na(y)))]]"

Error: unexpected ']' in: "temp[, lapply(.SD, function(x) temp[is.na(x), lapply(.SD, function(y) all(is.na(y)) )]]"

我可以尝试使用for循环,但是我更喜欢典型的data.table语法.欢迎任何建议.

I could try with a for loop but I'd prefer the typical data.table syntax. Any suggestion is welcome.

我还想知道在嵌套data.table调用时如何引用两个不同的.SD.

I would also like to know how to refer to two different .SD when you are nesting data.table calls.

推荐答案

对于成对组合, crossprod 似乎仍然有用.

For combinations in pairs, crossprod seems yet useful.

我们只关心值是否为 NA :

NAtemp = is.na(temp)

比较 NA s的共存:

crossprod(NAtemp)
#  M P S
#M 3 2 2
#P 2 3 3
#S 2 3 5

每列的 NA 个数:

colSums(NAtemp)
#M P S 
#3 3 5

喜欢:

ans = crossprod(NAtemp) == colSums(NAtemp)
ans
#      M     P     S
#M  TRUE FALSE FALSE
#P FALSE  TRUE  TRUE
#S FALSE FALSE  TRUE

并使用方便的 as.data.frame.table 进行格式化:

And use the convenient as.data.frame.table to format:

subset(as.data.frame(as.table(ans)), Var1 != Var2)
#  Var1 Var2  Freq
#2    P    M FALSE
#3    S    M FALSE
#4    M    P FALSE
#6    S    P FALSE
#7    M    S FALSE
#8    P    S  TRUE

这篇关于R data.table将函数应用于所有成对的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆