为什么这个 powerset 生成器需要按位运算符? [英] Why is bitwise operator needed in this powerset generator?
问题描述
我目前正在关注 MITx 的 6.00.2x,我们被要求想出底部那个的动力组发生器的变体.
I am currently following MITx's 6.00.2x, and we are asked to come up with a variant of power set generator of the one at the bottom.
但在我开始研究变体之前,我什至不明白给定的生成器发生了什么.具体:
But before I can work on the variant, I do not even understand what's going on with the given generator. Specifically:
(i >> j) % 2 == 1
以及实际上整个for j in range(N):
块做了什么?我明白i >>j
移位i
乘以j
的二进制,然后返回的十进制表示那个移位的二进制数.但我完全不知道为什么二进制甚至首先需要在 powerset 发电机中,更不用说这个条件的必要性.我知道对于任何给定的集合 A 的基数 n,其幂集的基数是 2**n - 因为对于 A 的每个子集每个成员要么在要么不在,我们重复n次.
- What does
(i >> j) % 2 == 1
, and in fact the wholefor j in range(N):
block do? I understand thati >> j
shifts the binary ofi
byj
, then returns the decimal representation of that shifted binary number. But I have absolutely no clue why binary is even needed in a powerset generator in the first place, let alone the necessity of this conditional. I understand that for any given set A a cardinality n, the cardinality of its powerset is 2**n - because for every subset of A every member is either in or not, and we repeat that for n times.
这就是 for i in range(2**N):
正在做的吗?即遍历 2**n 个子集并包含或不包含集合中的任何给定成员?
Is that what for i in range(2**N):
is doing? i.e. going over 2**n subsets and either include or not include any given member of the set?
我尝试使用 items=['apple,'banana','orange']
和 items=[1,2,3]
运行它,并且都返回一个空列表,这让它变得更加混乱.
I tried running it with items=['apple,'banana','orange']
and items=[1,2,3]
, and both returned an empty list, which makes it all the more confusing.
def powerSet(items):
# generate all combinations of N items, items is a Python list
N = len(items)
# enumerate the 2**N possible combinations
for i in range(2**N):
combo = []
for j in range(N):
# test bit jth of integer i
if (i >> j) % 2 == 1:
combo.append(items[j])
return combo
推荐答案
所以这里的算法从一个观察开始:{1,...,N}
的任何子集都可以看作是一个function f:{1,...,N}->{0,1}
,即特征函数.这个怎么运作?好吧,如果 A
是 {1,...,N}
的子集,那么 f
由 f(x) 给出=0
如果 x
不在 A
中,否则 f(x)=1
.
So the algorithm here starts with an observation that any subset of {1,...,N}
can be seen as a function f:{1,...,N}->{0,1}
, i.e. the characteristic function. How it works? Well, if A
is a subset of {1,...,N}
then f
is given by f(x)=0
if x
not in A
and f(x)=1
otherwise.
现在另一个观察结果是,任何函数 f:{1,...,N}->{0,1}
都可以编码为 N
的二进制数代码>位:如果f(j)=1
,第j位为1,否则为0.
Now another observation is that any function f:{1,...,N}->{0,1}
can be encoded as a binary number of N
bits: j-th bit is 1 if f(j)=1
and 0 otherwise.
因此,如果我们想要生成 {1,..,N}
的所有子集,生成长度为 N
的所有二进制数就足够了.那么这样的数字有多少呢?当然2**N
.由于 0
和 2**N - 1
之间的每个数字(-1
因为我们从 0
开始计数)唯一对应于 {1,...,N}
的某个子集,然后我们可以简单地循环遍历它们.这就是 for i in range(2**N):
循环的来源.
And so if we want to generate all subsets of {1,..,N}
it is enough to generate all binary numbers of length N
. So how many such numbers are there? Of course 2**N
. And since every number between 0
and 2**N - 1
(-1
since we count from 0
) uniquely corresponds to some subset of {1,...,N}
then we can simply loop through them. That's where the for i in range(2**N):
loop comes from.
但我们不是简单地处理{1,...,N}
的子集,我们实际上有一些未知的集合/列表items
长度N
.所以如果 A
是 {1,...,N}
的子集,意味着 A
是 0
之间的数字code> 和 2**N - 1
那么我们如何将其转换为 items
的子集?好吧,我们再次使用位1
对应于已设置"而位0
对应于未设置"的事实.这就是 (i >> j) % 2 == 1
的来源.它只是表示如果第 j 位为 1",结果会导致第 j 个元素应该在子集中".
But we don't simply deal with subsets of {1,...,N}
, we actually have some unknown set/list items
of length N
. So if A
is a subset of {1,...,N}
, meaning A
is a number between 0
and 2**N - 1
then how do we convert it to a subset of items
? Well, again, we use the fact that the bit 1
corresponds to "is in set" and the bit 0
corresponds to "is not in set". And that's where (i >> j) % 2 == 1
comes from. It simply means "if j-th bit is 1" which in the consequence leads to "j-th element should be in the subset".
您的代码有一个小问题.你也许应该屈服而不是回报:
There's a slight issue with your code. You should maybe yield instead of return:
def powerSet(items):
N = len(items)
for i in range(2**N):
combo = [] # <-- this is our subset
for j in range(N):
if (i >> j) % 2 == 1:
combo.append(items[j])
yield combo # <-- here we yield it to caller
subsets = list(powerSet(["apple", "banana", "pear"]))
<小时>
以下是这种子集二进制编码的示例.假设你有一个清单
Here's an example of this binary encoding of subsets. Say you have a list
[苹果"、香蕉"、梨"]
["apple", "banana", "pear"]
它有 3 个元素,所以我们正在查看(二进制)长度为 3 的数字.所以这里是所有可能的子集及其循环"顺序中的编码:
It has 3 elements so we are looking at numbers of (binary) length 3. So here are all possible subsets and their encodings in the "loop" order:
000 == []
001 == ["苹果"]
001 == ["apple"]
010 == ["香蕉"]
010 == ["banana"]
011 == [苹果",香蕉"]
011 == ["apple", "banana"]
100 == ["梨"]
100 == ["pear"]
101 == [苹果",梨"]
101 == ["apple", "pear"]
110 == [香蕉",梨"]
110 == ["banana", "pear"]
111 == [苹果"、香蕉"、梨"]
111 == ["apple", "banana", "pear"]
这篇关于为什么这个 powerset 生成器需要按位运算符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!