用于计算R中集合的幂集(所有可能的子集)的算法 [英] Algorithm to calculate power set (all possible subsets) of a set in R

查看:460
本文介绍了用于计算R中集合的幂集(所有可能的子集)的算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在任何地方都找不到答案,所以这是我的解决方法.

I couldn't find an answer to this anywhere, so here's my solution.

问题是:如何计算R中的功效?

The question is: how can you calculate a power set in R?

可以通过命令"2^as.set(c(1,2,3,4))"使用库"sets"执行此操作,该命令产生输出{{}, {1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}, {1, 2, 3, 4}}.但是,这使用的是递归算法,速度很慢.

It is possible to do this with the library "sets", with the command 2^as.set(c(1,2,3,4)), which yields the output {{}, {1}, {2}, {3}, {4}, {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}, {1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}, {1, 2, 3, 4}}. However, this uses a recursive algorithm, which is rather slow.

这是我想出的算法.

它是非递归的,因此它比现有的其他解决方案要快得多(并且在我的机器上比"sets"包中的算法快100倍).速度仍然是O(2 ^ n).

It's non-recursive, so it's much faster than some of the other solutions out there (and ~100x faster on my machine than the algorithm in the "sets" package). The speed is still O(2^n).

此算法的概念基础如下:

The conceptual basis for this algorithm is the following:

for each element in the set:
    for each subset constructed so far:
        new subset = (subset + element)

这是R代码:

这是相同概念的某种更快的版本;我的原始算法在这篇文章的第三条评论中.在一组长度为19的机器上,这比我的机器快30%.

here's a somewhat faster version of the same concept; my original algorithm is in the third comment to this post. This one is 30% faster on my machine for a set of length 19.

powerset = function(s){
    len = length(s)
    l = vector(mode="list",length=2^len) ; l[[1]]=numeric()
    counter = 1L
    for(x in 1L:length(s)){
        for(subset in 1L:counter){
            counter=counter+1L
            l[[counter]] = c(l[[subset]],s[x])
        }
    }
    return(l)
}

此版本通过在起始位置起始向量的最终长度并跟踪保存新子集的位置的"counter"变量来节省时间.也可以通过分析来计算位置,但这要慢一些.

This version saves time by initiating the vector with its final length at the start and keeping track with the "counter" variable of the position at which to save new subsets. It's also possible to calculate the position analytically, but that was slightly slower.

推荐答案

子集可以看作是布尔向量,指示元素是否在not的子集中. 这些布尔向量可以看成是以二进制形式写的数字. 枚举1:n的所有子集 因此,等效于枚举02^n-1的数字.

A subset can be seen as a boolean vector, indicating whether an element is in the subset of not. Those boolean vectors can be seen as numbers written in binary. Enumerating all the subsets of 1:n is therefore equivalent to enumerating the numbers from 0 to 2^n-1.

f <- function(set) { 
  n <- length(set)
  masks <- 2^(1:n-1)
  lapply( 1:2^n-1, function(u) set[ bitwAnd(u, masks) != 0 ] )
}
f(LETTERS[1:4])

这篇关于用于计算R中集合的幂集(所有可能的子集)的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆