基于两个或多个变量的所有可能组合的子数据表 [英] Subset data.table based on all possible combinations of two or more variables

查看:186
本文介绍了基于两个或多个变量的所有可能组合的子数据表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据某些变量是正数,负数还是两者之间的组合来为data.frame子集.对于n变量,这应该导致2^n可能的组合.

I want to subset a data.frame based on if some variables are all positive, all negative or some combination in between. For n variables this should lead to 2^n possible combinations.

我认为combn可以用来实现这一目标,但我正在努力正确地做到这一点.

I think combn can be used to achieve this but I'm struggling to do it properly.

样本数据:

library(data.table)
dt <- data.table(x = runif(100, -1, 1), y = runif(100, -1, 1), z = runif(100, -1, 1))

我想要什么:

dt[x < 0 & y < 0 z < 0, ]
dt[x < 0 & y < 0 z > 0, ]
dt[x < 0 & y > 0 z < 0, ]
dt[x < 0 & y > 0 z > 0, ]
dt[x > 0 & y < 0 z < 0, ]
dt[x > 0 & y < 0 z > 0, ]
dt[x > 0 & y > 0 z < 0, ]
dt[x > 0 & y > 0 z > 0, ]

到目前为止我已经尝试过的:

What I've tried so far:

combinator <- function(z){
  cnames <- colnames(z)
  combinations <- t(combn(c(rep("<", ncol(z)), rep(">", ncol(z))),ncol(z)))

  retval <- t(sapply(1:nrow(combinations), function(p){
    sapply(1:ncol(z), function(q) paste(cnames[q], combinations[p,q], 0))
  }))

  return(apply(retval, 1, paste, collapse = " & "))
}

输出:

> l <- combinator(dt)
> l
 [1] "x < 0 & y < 0 & z < 0" "x < 0 & y < 0 & z > 0" "x < 0 & y < 0 & z > 0" "x < 0 & y < 0 & z > 0"
 [5] "x < 0 & y < 0 & z > 0" "x < 0 & y < 0 & z > 0" "x < 0 & y < 0 & z > 0" "x < 0 & y > 0 & z > 0"
 [9] "x < 0 & y > 0 & z > 0" "x < 0 & y > 0 & z > 0" "x < 0 & y < 0 & z > 0" "x < 0 & y < 0 & z > 0"
[13] "x < 0 & y < 0 & z > 0" "x < 0 & y > 0 & z > 0" "x < 0 & y > 0 & z > 0" "x < 0 & y > 0 & z > 0"
[17] "x < 0 & y > 0 & z > 0" "x < 0 & y > 0 & z > 0" "x < 0 & y > 0 & z > 0" "x > 0 & y > 0 & z > 0"

> l[1]
[1] "x < 0 & y < 0 & z < 0"

> subset(dt, eval(l[1]))
Error in subset.data.table(dt, eval(l[1])) : 
  'subset' must evaluate to logical

此外,如果以下内容表明我没有列出所有所需的组合:

Also if the following shows that I'm not listing all the desired combinations:

> unique(l)
[1] "x < 0 & y < 0 & z < 0" "x < 0 & y < 0 & z > 0" 
[3] "x < 0 & y > 0 & z > 0" "x > 0 & y > 0 & z > 0"

输出应具有8个唯一的结果,而不是上面显示的4个.

the output should have 8 unique results instead of the 4 shown above.

推荐答案

只需执行dt[, sign_combi := do.call(paste, lapply(dt, sign))],您就可以根据需要splitby =该列,例如split(dt, dt$sign_combi).试图将代码粘贴在一起是一个坏主意.

Just do dt[, sign_combi := do.call(paste, lapply(dt, sign))] and you can split or by = that column as needed, e.g., split(dt, dt$sign_combi). Trying to paste together code is a Bad Idea.

例如:

set.seed(47) # setting seed for reproducibility
dt <- data.table(x = runif(100, -1, 1), y = runif(100, -1, 1), z = runif(100, -1, 1))

# create combination column (you could keep it separate if you prefer)
dt[, sign_combi := do.call(paste, lapply(dt, sign))]

# split original data by sign combinations
result = split(dt, dt$sign_combi)

# list of 8 resulting data tables
length(result)
# [1] 8

# peaking at the first three rows of the first three tables:
lapply(head(result, 3), head, 3)
# $`-1 -1 -1`
#             x          y          z sign_combi
# 1: -0.5713038 -0.7103555 -0.6873705   -1 -1 -1
# 2: -0.1407803 -0.8371153 -0.3686299   -1 -1 -1
# 3: -0.6478446 -0.7629461 -0.7458949   -1 -1 -1
# 
# $`-1 -1 1`
#             x          y         z sign_combi
# 1: -0.8070969 -0.3952283 0.9212030    -1 -1 1
# 2: -0.1190934 -0.4969318 0.8082232    -1 -1 1
# 3: -0.6536104 -0.3280965 0.6880454    -1 -1 1
# 
# $`-1 1 -1`
#              x         y          z sign_combi
# 1: -0.78789241 0.8577848 -0.7586369    -1 1 -1
# 2: -0.04442825 0.4736388 -0.3354734    -1 1 -1
# 3: -0.22105744 0.3012645 -0.4160631    -1 1 -1

这篇关于基于两个或多个变量的所有可能组合的子数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆