为什么创建从 0 到 log(len(list), 2) 的范围这么慢? [英] Why is creating a range from 0 to log(len(list), 2) so slow?
问题描述
我不知道为什么会发生这种情况.我弄乱了一些列表,我需要一个从 0 到 log(n, 2)
的 for
循环,其中 n 是列表的长度.但是代码非常慢,所以经过一番研究后我发现问题出在范围生成上.演示示例代码:
I don't have a clue why is this happening. I was messing with some lists, and I needed a for
loop going from 0 to log(n, 2)
where n was the length of a list. But the code was amazingly slow, so after a bit a research I found that the problem is in the range generation. Sample code for demonstration:
n = len([1,2,3,4,5,6,7,8])
k = 8
timeit('range(log(n, 2))', number=2, repeat=3) # Test 1
timeit('range(log(k, 2))', number=2, repeat=3) # Test 2
输出
2 loops, best of 3: 2.2 s per loop
2 loops, best of 3: 3.46 µs per loop
测试次数很少(我不希望它运行超过 10 分钟),但它已经表明 range(log(n, 2))
是数量级比仅使用整数的对数的对应物慢.这真的很令人惊讶,我不知道为什么会这样.可能是我电脑的问题,可能是 Sage 问题或 Python 错误(我没有在 Python 上尝试过同样的问题).
The number of tests is low (I didn't want this to be running more than 10 minutes), but it already shows that range(log(n, 2))
is orders of magnitude slower than the counterpart using just the logarithm of an integer. This is really surprising and I don't have any clue on why is this happening. Maybe is a problem on my PC, maybe a Sage problem or a Python bug (I didn't try the same on Python).
使用 xrange
而不是 range
也无济于事.此外,如果您使用 .n()
获得数字,则测试 1 以与 2 相同的速度运行.
Using xrange
instead of range
doesn't help either. Also, if you get the number with .n()
, test 1 runs at the same speed of 2.
有人知道会发生什么吗?谢谢!
Does anybody know what can be happening? Thanks!
推荐答案
真遗憾——我认得这个.它与我的一个有关,trac #12121.首先,由于无聊的原因,使用 Python int
而不是 Sage Integer
会带来额外的开销:
Good grief -- I recognize this one. It's related to one of mine, trac #12121. First, you get extra overhead from using a Python int
as opposed to a Sage Integer
for boring reasons:
sage: log(8, 2)
3
sage: type(log(8, 2))
sage.rings.integer.Integer
sage: log(8r, 2)
log(8)/log(2)
sage: type(log(8r, 2))
sage.symbolic.expression.Expression
sage: %timeit log(8, 2)
1000000 loops, best of 3: 1.4 us per loop
sage: %timeit log(8r, 2)
1000 loops, best of 3: 404 us per loop
(r
后缀表示原始",并防止 Sage 预解析器将文字 2
包装到 Integer(2)
中)
(The r
suffix means "raw", and prevents the Sage preparser from wrapping the literal 2
into Integer(2)
)
然后就变得很奇怪了.为了生成一个 int 供 range
使用,Sage 必须弄清楚如何将 log(8)/log(2)
变成 3,结果证明她做了最坏的事情.抄袭我原来的诊断(比照):
And then it gets weird. In order to produce an int for range
to consume, Sage has to figure out how to turn log(8)/log(2)
into 3, and it turns out that she does the worst thing possible. Plagiarizing my original diagnosis (mutatis mutandis):
首先她检查这个对象是否有自己的方法来获取一个int,但它没有.所以她用 log(8)/log(2) 构建了一个 RealInterval 对象,结果证明这是她能做的最糟糕的事情!她检查音程的下部和上部是否一致 [在地板上,我的意思是](以便她确定地板是什么).但在这种情况下,因为它确实是一个整数!这总是看起来像:
First she checks to see if this object has its own way to get an int, and it doesn't. So she builds a RealInterval object out of log(8)/log(2), and it turns out that this is about the worst thing she could do! She checks to see whether the lower and upper parts of the interval agree [on the floor, I mean] (so that she knows for certain what the floor is). But in this case, because it really is an integer! this is always going to look like:
sage: y = log(8)/log(2)
sage: rif = RealIntervalField(53)(y)
sage: rif
3.000000000000000?
sage: rif.endpoints()
(2.99999999999999, 3.00000000000001)
这两个边界有不相等的下限,所以 Sage 认为她还没有解决这个问题,她不断将精度提高到 20000 位,看看她是否能证明它们是..但是通过建设它永远不会工作.最后她放弃并尝试简化它,结果成功了:
These two bounds have floors which aren't aren't equal, so Sage decides she hasn't solved the problem yet, and she keeps increasing the precision to 20000 bits to see if she can prove that they are.. but by construction it's never going to work. Finally she gives up and tries to simplify it, which succeeds:
sage: y.simplify_full()
3
无需文字即可证明它是完全可分情况的反常性质:
Proof without words that it's a perverse property of the exactly divisible case:
sage: %timeit range(log(8r, 2))
1 loops, best of 3: 2.18 s per loop
sage: %timeit range(log(9r, 2))
1000 loops, best of 3: 766 us per loop
sage: %timeit range(log(15r, 2))
1000 loops, best of 3: 764 us per loop
sage: %timeit range(log(16r, 2))
1 loops, best of 3: 2.19 s per loop
这篇关于为什么创建从 0 到 log(len(list), 2) 的范围这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!