为什么“1000000000000000在范围内(1000000000000001)"?在 Python 3 中这么快? [英] Why is "1000000000000000 in range(1000000000000001)" so fast in Python 3?

查看:68
本文介绍了为什么“1000000000000000在范围内(1000000000000001)"?在 Python 3 中这么快?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,range() 函数实际上是 Python 3 中的一种对象类型,动态生成其内容,类似于生成器.

It is my understanding that the range() function, which is actually an object type in Python 3, generates its contents on the fly, similar to a generator.

在这种情况下,我预计以下行会花费过多的时间,因为为了确定 1 千万亿是否在范围内,必须生成千万亿值:

This being the case, I would have expected the following line to take an inordinate amount of time because, in order to determine whether 1 quadrillion is in the range, a quadrillion values would have to be generated:

1000000000000000 in range(1000000000000001)

此外:似乎无论我添加多少个零,计算或多或少需要相同的时间(基本上是瞬时的).

Furthermore: it seems that no matter how many zeroes I add on, the calculation more or less takes the same amount of time (basically instantaneous).

我也尝试过这样的事情,但计算仍然几乎是即时的:

I have also tried things like this, but the calculation is still almost instant:

1000000000000000000000 in range(0,1000000000000000000001,10) # count by tens

如果我尝试实现自己的 range 函数,结果就不那么好了!!

If I try to implement my own range function, the result is not so nice!!

def my_crappy_range(N):
    i = 0
    while i < N:
        yield i
        i += 1
    return

range() 对象在幕后做了什么使其如此之快?

What is the range() object doing under the hood that makes it so fast?

Martijn Pieters 的答案 是因为其完整性,但也请参阅 abarnert 的第一个答案 很好地讨论了 range 在 Python 中成为一个成熟的序列意味着什么3,以及关于 Python 实现中 __contains__ 函数优化的潜在不一致的一些信息/警告.abarnert 的其他答案 更详细,并为那些对 Python 3 优化背后的历史感兴趣的人提供了链接(并且缺乏Python 2 中 xrange 的优化.通过戳通过wim回答 为有兴趣的人提供相关的 C 源代码和解释.

Martijn Pieters' answer was chosen for its completeness, but also see abarnert's first answer for a good discussion of what it means for range to be a full-fledged sequence in Python 3, and some information/warning regarding potential inconsistency for __contains__ function optimization across Python implementations. abarnert's other answer goes into some more detail and provides links for those interested in the history behind the optimization in Python 3 (and lack of optimization of xrange in Python 2). Answers by poke and by wim provide the relevant C source code and explanations for those who are interested.

推荐答案

Python 3 range() 对象不会立即产生数字;它是一个智能的序列对象,可以生成数字按需.它包含的只是您的开始、停止和步长值,然后当您迭代对象时,每次迭代都会计算下一个整数.

The Python 3 range() object doesn't produce numbers immediately; it is a smart sequence object that produces numbers on demand. All it contains is your start, stop and step values, then as you iterate over the object the next integer is calculated each iteration.

该对象还实现了object.__contains__ hook,然后计算您的号码是否在其范围内.计算是一个(接近)恒定时间操作*.永远不需要扫描范围内所有可能的整数.

The object also implements the object.__contains__ hook, and calculates if your number is part of its range. Calculating is a (near) constant time operation *. There is never a need to scan through all possible integers in the range.

来自 range() 对象文档:

From the range() object documentation:

range 类型相对于常规 listtuple 的优势在于范围对象将始终采用相同(小)数量内存,无论它代表的范围有多大(因为它只存储 startstopstep 值,计算单个项目和子范围根据需要).

The advantage of the range type over a regular list or tuple is that a range object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the start, stop and step values, calculating individual items and subranges as needed).

因此,您的 range() 对象至少可以:

So at a minimum, your range() object would do:

class my_range:
    def __init__(self, start, stop=None, step=1, /):
        if stop is None:
            start, stop = 0, start
        self.start, self.stop, self.step = start, stop, step
        if step < 0:
            lo, hi, step = stop, start, -step
        else:
            lo, hi = start, stop
        self.length = 0 if lo > hi else ((hi - lo - 1) // step) + 1

    def __iter__(self):
        current = self.start
        if self.step < 0:
            while current > self.stop:
                yield current
                current += self.step
        else:
            while current < self.stop:
                yield current
                current += self.step

    def __len__(self):
        return self.length

    def __getitem__(self, i):
        if i < 0:
            i += self.length
        if 0 <= i < self.length:
            return self.start + i * self.step
        raise IndexError('my_range object index out of range')

    def __contains__(self, num):
        if self.step < 0:
            if not (self.stop < num <= self.start):
                return False
        else:
            if not (self.start <= num < self.stop):
                return False
        return (num - self.start) % self.step == 0

这仍然缺少一些真正的 range() 支持的东西(例如 .index().count()方法、散列、相等性测试或切片),但应该给你一个想法.

This is still missing several things that a real range() supports (such as the .index() or .count() methods, hashing, equality testing, or slicing), but should give you an idea.

我还简化了 __contains__ 实现,只关注整数测试;如果你给一个真正的 range() 对象一个非整数值(包括 int 的子类),就会启动慢速扫描以查看是否存在匹配,就像如果您针对所有包含值的列表使用包含测试.这样做是为了继续支持其他数字类型,这些类型恰好支持整数等式测试,但预计也不支持整数算术.请参阅实施遏制测试的原始 Python 问题.

I also simplified the __contains__ implementation to only focus on integer tests; if you give a real range() object a non-integer value (including subclasses of int), a slow scan is initiated to see if there is a match, just as if you use a containment test against a list of all the contained values. This was done to continue to support other numeric types that just happen to support equality testing with integers but are not expected to support integer arithmetic as well. See the original Python issue that implemented the containment test.

* Near 恒定时间,因为 Python 整数是无界的,所以数学运算也会随着 N 的增长而随时间增长,这使得它成为 O(log N) 运算.由于所有这些都在优化的 C 代码中执行,并且 Python 将整数值存储在 30 位块中,因此在您看到由于此处涉及的整数的大小而产生的任何性能影响之前,您已经耗尽了内存.

* Near constant time because Python integers are unbounded and so math operations also grow in time as N grows, making this a O(log N) operation. Since it’s all executed in optimised C code and Python stores integer values in 30-bit chunks, you’d run out of memory before you saw any performance impact due to the size of the integers involved here.

这篇关于为什么“1000000000000000在范围内(1000000000000001)"?在 Python 3 中这么快?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆