大列表生成优化 [英] Large list generation optimization

查看:34
本文介绍了大列表生成优化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一个 python 函数,它可以采用以下形式的字符串列表:

I needed a python function which would take a list of strings in the form of:

seq = ['A[0]','B[2:5]','A[4]']

并返回具有保留顺序的扩展"元素的新列表,如下所示:

and return a new list of "expanded" elements with preserved order, like so:

expanded = ['A[0]', 'B[2]', 'B[3]', 'B[4]', 'B[5]', 'A[4]']

为了实现我的目标,我编写了这个简单的函数:

To achieve my goal I wrote this simple function:

def expand_seq(seq):
    #['item[i]' for item in seq for xrange in item]
    return ['%s[%s]'%(item.split('[')[0],i) for item in seq for i in xrange(int(item.split('[')[-1][:-1].split(':')[0]),int(item.split('[')[-1][:-1].split(':')[-1])+1)]

当处理生成少于 50 万个项目的序列时,它运行良好,但在生成非常大的列表(超过 100 万个)时,速度会变慢.例如:

When dealing with a sequence which would generate less than 500k items it works well, but it slows down quite a bit when generating very large lists (more 1 million). For example:

# let's generate 10 million items!
seq = ['A[1:5000000]','B[1:5000000]']
t1 = time.clock()
seq = expand_seq(seq)
t2 = time.clock()
print round(t2-t1, 3)
# RESULT: 9.541 seconds

我正在寻找改进此功能的方法,并希望在处理大型列表时加快速度.如果有人有建议,我很想听听!

I'm looking for ways to improve this function and hopefully speed it up when dealing with large lists. If anyone has suggestions, I would love to hear them!

推荐答案

以下似乎提供了 35% 的加速:

The following seems to give a 35% speedup:

import re

r = re.compile(r"(\w+)\[(\d+)(?::(\d+))?\]")

def expand_seq(seq):
    result = []
    for item in seq:
        m = r.match(item)
        name, start, end = m.group(1), int(m.group(2)), m.group(3)
        rng = xrange(start, int(end)) if end else (start,)
        t = name + "["
        result.extend(t + str(i) + "]" for i in rng)
    return result

使用此代码:

  • 我们编译了一个正则表达式以在函数中使用.
  • 我们直接连接字符串.

这篇关于大列表生成优化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆