加速 Python 中的位串/位操作? [英] Speed up bitstring/bit operations in Python?

查看:26
本文介绍了加速 Python 中的位串/位操作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 Sieve of Eratosthenes 和 Python 3.1 编写了一个素数生成器.代码在 ideone.com 上以 0.32 秒正确且优雅地运行,以生成最多 1,000,000 的素数.

I wrote a prime number generator using Sieve of Eratosthenes and Python 3.1. The code runs correctly and gracefully at 0.32 seconds on ideone.com to generate prime numbers up to 1,000,000.

# from bitstring import BitString

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...    
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5) 
    flags = [False, False] + [True] * (limit - 2)   
#     flags = BitString(limit)
    # Step through all the odd numbers
    for i in range(3, limit, 2):       
        if flags[i] is False:
#        if flags[i] is True:
            continue
        yield i
        # Exclude further multiples of the current prime number
        if i <= sub_limit:
            for j in range(i*3, limit, i<<1):
                flags[j] = False
#                flags[j] = True

问题是,当我尝试生成高达 1,000,000,000 的数字时,内存不足.

The problem is, I run out of memory when I try to generate numbers up to 1,000,000,000.

    flags = [False, False] + [True] * (limit - 2)   
MemoryError

如您所想,分配 10 亿个布尔值(1 byte 4 或 8 个字节(见注释)在 Python 中每个)确实不可行,所以我研究了 bitstring.我想,为每个标志使用 1 位会更节省内存.然而,该程序的性能急剧下降 - 运行时间为 24 秒,对于高达 1,000,000 的质数.这可能是由于bitstring的内部实现造成的.

As you can imagine, allocating 1 billion boolean values (1 byte 4 or 8 bytes (see comment) each in Python) is really not feasible, so I looked into bitstring. I figured, using 1 bit for each flag would be much more memory-efficient. However, the program's performance dropped drastically - 24 seconds runtime, for prime number up to 1,000,000. This is probably due to the internal implementation of bitstring.

您可以注释/取消注释三行以查看我更改为使用 BitString 的内容,如上面的代码片段.

You can comment/uncomment the three lines to see what I changed to use BitString, as the code snippet above.

我的问题是,有没有办法加速我的程序,不管有没有位串?

请在发布前自己测试代码.我自然不能接受运行速度比我现有代码慢的答案.

Please test the code yourself before posting. I can't accept answers that run slower than my existing code, naturally.

再次

我在我的机器上编制了一份基准测试列表.

推荐答案

您的版本有几个小的优化.通过颠倒 True 和 False 的角色,您可以将if flags[i] is False:"更改为if flags[i]:".并且第二个range 语句的起始值可以是i*i 而不是i*3.您的原始版本在我的系统上需要 0.166 秒.通过这些更改,以下版本在我的系统上需要 0.156 秒.

There are a couple of small optimizations for your version. By reversing the roles of True and False, you can change "if flags[i] is False:" to "if flags[i]:". And the starting value for the second range statement can be i*i instead of i*3. Your original version takes 0.166 seconds on my system. With those changes, the version below takes 0.156 seconds on my system.

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    flags = [True, True] + [False] * (limit - 2)
    # Step through all the odd numbers
    for i in range(3, limit, 2):
        if flags[i]:
            continue
        yield i
        # Exclude further multiples of the current prime number
        if i <= sub_limit:
            for j in range(i*i, limit, i<<1):
                flags[j] = True

不过,这对您的记忆问题没有帮助.

This doesn't help your memory issue, though.

进入 C 扩展的世界,我使用了 gmpy 的开发版本.(免责声明:我是维护者之一.)开发版本称为 gmpy2,支持称为 xmpz 的可变整数.使用 gmpy2 和以下代码,我的运行时间为 0.140 秒.1,000,000,000 的限制的运行时间为 158 秒.

Moving into the world of C extensions, I used the development version of gmpy. (Disclaimer: I'm one of the maintainers.) The development version is called gmpy2 and supports mutable integers called xmpz. Using gmpy2 and the following code, I have a running time of 0.140 seconds. Running time for a limit of 1,000,000,000 is 158 seconds.

import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    # Actual number is 2*bit_position + 1.
    oddnums = gmpy2.xmpz(1)
    current = 0
    while True:
        current += 1
        current = oddnums.bit_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        # Exclude further multiples of the current prime number
        if prime <= sub_limit:
            for j in range(2*current*(current+1), limit>>1, prime):
                oddnums.bit_set(j)

推动优化并牺牲清晰度,我使用以下代码获得了 0.107 和 123 秒的运行时间:

Pushing optimizations, and sacrificing clarity, I get running times of 0.107 and 123 seconds with the following code:

import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    # Actual number is 2*bit_position + 1.
    oddnums = gmpy2.xmpz(1)
    f_set = oddnums.bit_set
    f_scan0 = oddnums.bit_scan0
    current = 0
    while True:
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        # Exclude further multiples of the current prime number
        if prime <= sub_limit:
            list(map(f_set,range(2*current*(current+1), limit>>1, prime)))

基于这个练习,我修改了 gmpy2 以接受 xmpz.bit_set(iterator).使用以下代码,Python 2.7 中所有小于 1,000,000,000 的素数的运行时间为 56 秒,Python 3.2 为 74 秒.(如评论中所述,xrangerange 快.)

Based on this exercise, I modified gmpy2 to accept xmpz.bit_set(iterator). Using the following code, the run time for all primes less 1,000,000,000 is 56 seconds for Python 2.7 and 74 seconds for Python 3.2. (As noted in the comments, xrange is faster than range.)

import gmpy2

try:
    range = xrange
except NameError:
    pass

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    oddnums = gmpy2.xmpz(1)
    f_scan0 = oddnums.bit_scan0
    current = 0
    while True:
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        if prime <= sub_limit:
            oddnums.bit_set(iter(range(2*current*(current+1), limit>>1, prime)))

编辑 #2:再试一次!我修改了 gmpy2 以接受 xmpz.bit_set(slice).使用以下代码,Python 2.7 和 Python 3.2 的所有小于 1,000,000,000 的素数的运行时间约为 40 秒.

Edit #2: One more try! I modified gmpy2 to accept xmpz.bit_set(slice). Using the following code, the run time for all primes less 1,000,000,000 is about 40 seconds for both Python 2.7 and Python 3.2.

from __future__ import print_function
import time
import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    yield 2
    sub_limit = int(limit**0.5)
    flags = gmpy2.xmpz(1)
    # pre-allocate the total length
    flags.bit_set((limit>>1)+1)
    f_scan0 = flags.bit_scan0
    current = 0
    while True:
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        if prime > limit:
            break
        yield prime
        if prime <= sub_limit:
            flags.bit_set(slice(2*current*(current+1), limit>>1, prime))

start = time.time()
result = list(prime_numbers(1000000000))
print(time.time() - start)

编辑 #3:我已经更新了 gmpy2 以正确支持在 xmpz 的位级别进行切片.性能没有变化,但 API 非常好.我做了一些调整,我把时间缩短到大约 37 秒.(请参阅编辑 #4 以了解 gmpy2 2.0.0b1 中的更改.)

Edit #3: I've updated gmpy2 to properly support slicing at the bit level of an xmpz. No change in performance but a much nice API. I have done a little tweaking and I've got the time down to about 37 seconds. (See Edit #4 to changes in gmpy2 2.0.0b1.)

from __future__ import print_function
import time
import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    sub_limit = int(limit**0.5)
    flags = gmpy2.xmpz(1)
    flags[(limit>>1)+1] = True
    f_scan0 = flags.bit_scan0
    current = 0
    prime = 2
    while prime <= sub_limit:
        yield prime
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        flags[2*current*(current+1):limit>>1:prime] = True
    while prime <= limit:
        yield prime
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1

start = time.time()
result = list(prime_numbers(1000000000))
print(time.time() - start)

编辑 #4:我在 gmpy2 2.0.0b1 中做了一些改变,打破了前面的例子.gmpy2 不再将 True 视为提供无限位源的特殊值.应该使用 -1 代替.

Edit #4: I made some changes in gmpy2 2.0.0b1 that break the previous example. gmpy2 no longer treats True as a special value that provides an infinite source of 1-bits. -1 should be used instead.

from __future__ import print_function
import time
import gmpy2

def prime_numbers(limit=1000000):
    '''Prime number generator. Yields the series
    2, 3, 5, 7, 11, 13, 17, 19, 23, 29 ...
    using Sieve of Eratosthenes.
    '''
    sub_limit = int(limit**0.5)
    flags = gmpy2.xmpz(1)
    flags[(limit>>1)+1] = 1
    f_scan0 = flags.bit_scan0
    current = 0
    prime = 2
    while prime <= sub_limit:
        yield prime
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1
        flags[2*current*(current+1):limit>>1:prime] = -1
    while prime <= limit:
        yield prime
        current += 1
        current = f_scan0(current)
        prime = 2 * current + 1

start = time.time()
result = list(prime_numbers(1000000000))
print(time.time() - start)

编辑 #5:我对 gmpy2 2.0.0b2 做了一些改进.您现在可以遍历所有设置或清除的位.运行时间提高了约 30%.

Edit #5: I've made some enhancements to gmpy2 2.0.0b2. You can now iterate over all the bits that are either set or clear. Running time has improved by ~30%.

from __future__ import print_function
import time
import gmpy2

def sieve(limit=1000000):
    '''Returns a generator that yields the prime numbers up to limit.'''

    # Increment by 1 to account for the fact that slices do not include
    # the last index value but we do want to include the last value for
    # calculating a list of primes.
    sieve_limit = gmpy2.isqrt(limit) + 1
    limit += 1

    # Mark bit positions 0 and 1 as not prime.
    bitmap = gmpy2.xmpz(3)

    # Process 2 separately. This allows us to use p+p for the step size
    # when sieving the remaining primes.
    bitmap[4 : limit : 2] = -1

    # Sieve the remaining primes.
    for p in bitmap.iter_clear(3, sieve_limit):
        bitmap[p*p : limit : p+p] = -1

    return bitmap.iter_clear(2, limit)

if __name__ == "__main__":
    start = time.time()
    result = list(sieve(1000000000))
    print(time.time() - start)
    print(len(result))

这篇关于加速 Python 中的位串/位操作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆