为什么这个 powerset 生成器需要按位运算符? [英] Why is bitwise operator needed in this powerset generator?

查看:25
本文介绍了为什么这个 powerset 生成器需要按位运算符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在关注 MITx 的 6.00.2x,我们被要求想出底部那个的动力组发生器的变体.

I am currently following MITx's 6.00.2x, and we are asked to come up with a variant of power set generator of the one at the bottom.

但在我开始研究变体之前,我什至不明白给定的生成器发生了什么.具体:

But before I can work on the variant, I do not even understand what's going on with the given generator. Specifically:

  1. (i >> j) % 2 == 1 以及实际上整个 for j in range(N): 块做了什么?我明白 i >>j 移位i 乘以 j 的二进制,然后返回的十进制表示那个移位的二进制数.但我完全不知道为什么二进制甚至首先需要在 powerset 发电机中,更不用说这个条件的必要性.
  2. 我知道对于任何给定的集合 A 的基数 n,其幂集的基数是 2**n - 因为对于 A 的每个子集每个成员要么在要么不在,我们重复n次.

  1. What does (i >> j) % 2 == 1, and in fact the whole for j in range(N): block do? I understand that i >> j shifts the binary of i by j, then returns the decimal representation of that shifted binary number. But I have absolutely no clue why binary is even needed in a powerset generator in the first place, let alone the necessity of this conditional.
  2. I understand that for any given set A a cardinality n, the cardinality of its powerset is 2**n - because for every subset of A every member is either in or not, and we repeat that for n times.

这就是 for i in range(2**N): 正在做的吗?即遍历 2**n 个子集并包含或不包含集合中的任何给定成员?

Is that what for i in range(2**N): is doing? i.e. going over 2**n subsets and either include or not include any given member of the set?

我尝试使用 items=['apple,'banana','orange']items=[1,2,3] 运行它,并且都返回一个空列表,这让它变得更加混乱.

I tried running it with items=['apple,'banana','orange'] and items=[1,2,3], and both returned an empty list, which makes it all the more confusing.

def powerSet(items):
    # generate all combinations of N items, items is a Python list
    N = len(items)
    # enumerate the 2**N possible combinations
    for i in range(2**N):
        combo = []
        for j in range(N):
            # test bit jth of integer i
            if (i >> j) % 2 == 1:
                combo.append(items[j])
        return combo

推荐答案

所以这里的算法从一个观察开始:{1,...,N} 的任何子集都可以看作是一个function f:{1,...,N}->{0,1},即特征函数.这个怎么运作?好吧,如果 A{1,...,N} 的子集,那么 ff(x) 给出=0 如果 x 不在 A 中,否则 f(x)=1.

So the algorithm here starts with an observation that any subset of {1,...,N} can be seen as a function f:{1,...,N}->{0,1}, i.e. the characteristic function. How it works? Well, if A is a subset of {1,...,N} then f is given by f(x)=0 if x not in A and f(x)=1 otherwise.

现在另一个观察结果是,任何函数 f:{1,...,N}->{0,1} 都可以编码为 N 的二进制数代码>位:如果f(j)=1,第j位为1,否则为0.

Now another observation is that any function f:{1,...,N}->{0,1} can be encoded as a binary number of N bits: j-th bit is 1 if f(j)=1 and 0 otherwise.

因此,如果我们想要生成 {1,..,N} 的所有子集,生成长度为 N 的所有二进制数就足够了.那么这样的数字有多少呢?当然2**N.由于 02**N - 1 之间的每个数字(-1 因为我们从 0 开始计数)唯一对应于 {1,...,N} 的某个子集,然后我们可以简单地循环遍历它们.这就是 for i in range(2**N): 循环的来源.

And so if we want to generate all subsets of {1,..,N} it is enough to generate all binary numbers of length N. So how many such numbers are there? Of course 2**N. And since every number between 0 and 2**N - 1 (-1 since we count from 0) uniquely corresponds to some subset of {1,...,N} then we can simply loop through them. That's where the for i in range(2**N): loop comes from.

但我们不是简单地处理{1,...,N}的子集,我们实际上有一些未知的集合/列表items长度N.所以如果 A{1,...,N} 的子集,意味着 A0 之间的数字code> 和 2**N - 1 那么我们如何将其转换为 items 的子集?好吧,我们再次使用位1 对应于已设置"而位0 对应于未设置"的事实.这就是 (i >> j) % 2 == 1 的来源.它只是表示如果第 j 位为 1",结果会导致第 j 个元素应该在子集中".

But we don't simply deal with subsets of {1,...,N}, we actually have some unknown set/list items of length N. So if A is a subset of {1,...,N}, meaning A is a number between 0 and 2**N - 1 then how do we convert it to a subset of items? Well, again, we use the fact that the bit 1 corresponds to "is in set" and the bit 0 corresponds to "is not in set". And that's where (i >> j) % 2 == 1 comes from. It simply means "if j-th bit is 1" which in the consequence leads to "j-th element should be in the subset".

您的代码有一个小问题.你也许应该屈服而不是回报:

There's a slight issue with your code. You should maybe yield instead of return:

def powerSet(items):
    N = len(items)
    for i in range(2**N):
        combo = []  # <-- this is our subset
        for j in range(N):
            if (i >> j) % 2 == 1:
                combo.append(items[j])
        yield combo  # <-- here we yield it to caller

subsets = list(powerSet(["apple", "banana", "pear"]))

<小时>

以下是这种子集二进制编码的示例.假设你有一个清单


Here's an example of this binary encoding of subsets. Say you have a list

[苹果"、香蕉"、梨"]

["apple", "banana", "pear"]

它有 3 个元素,所以我们正在查看(二进制)长度为 3 的数字.所以这里是所有可能的子集及其循环"顺序中的编码:

It has 3 elements so we are looking at numbers of (binary) length 3. So here are all possible subsets and their encodings in the "loop" order:

000 == []

001 == ["苹果"]

001 == ["apple"]

010 == ["香蕉"]

010 == ["banana"]

011 == [苹果",香蕉"]

011 == ["apple", "banana"]

100 == ["梨"]

100 == ["pear"]

101 == [苹果",梨"]

101 == ["apple", "pear"]

110 == [香蕉",梨"]

110 == ["banana", "pear"]

111 == [苹果"、香蕉"、梨"]

111 == ["apple", "banana", "pear"]

这篇关于为什么这个 powerset 生成器需要按位运算符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆