更快的嵌套元组列出并返回 [英] A Faster Nested Tuple to List and Back

查看:70
本文介绍了更快的嵌套元组列出并返回的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试对未知深度和形状的嵌套序列执行元组列表和元组到列表的转换.这些呼叫被成千上万次,这就是为什么我试图尽可能提高速度.

I'm trying to perform tuple to list and list to tuple conversions on nested sequences of unknown depth and shape. The calls are being made hundreds of thousands of times, which is why I'm trying to squeeze out as much speed as possible.

非常感谢您的帮助.

这是我到目前为止所拥有的...

Here's what I have so far...

def listify(self, seq, was, toBe):
  temp = []
  a = temp.append
  for g in seq:
    if type(g) == was:
      a(self.listify(g, was, toBe))
    else:
      a(g)
  return toBe(temp)

并调用元组列出如下:

self.listify((...), tuple, list)

是的,我完全错过了枚举(从旧的实现中)并且忘记了键入else部分.

Yeah, I totally missed both the enumerate (from an old implementation) and forgot to type the else part.

感谢大家的帮助.我可能会去使用协程.

Thanks for the help both of you. I'll probably go with the coroutines.

推荐答案

最近我与协程安静地工作了很多.好处是可以减少方法调用的开销.将新值发送到协程比调用函数要快.虽然您不能创建递归协程,但会抛出ValueError: generator already executing,但是您可以创建一堆协程工人-在树的每个级别上都需要一个工人.我编写了一些有效的测试代码,但尚未查看计时问题.

I have been working with coroutines quiet a lot lately. The advantage would be that you reduce the overhead of the method calls. Sending a new value into a coroutine is faster than calling a function. While you can not make a recursive coroutine, it will throw a ValueError: generator already executing but you could make a pool of coroutine workers - you need one worker for every level of the tree. I have made some test code that works, but have not looked at the timing issues yet.

def coroutine(func):
    """ A helper function decorator from Beazley"""
    def start(*args, **kwargs):
        g = func(*args, **kwargs)
        g.next()
        return g
    return start

@coroutine
def cotuple2list():
    """This does the work"""
    result = None
    while True:
        (tup, co_pool) = (yield result)
        result = list(tup)
        # I don't like using append. So I am changing the data in place.
        for (i,x) in enumerate(result):
            # consider using "if hasattr(x,'__iter__')"
            if isinstance(x,tuple):
                result[i] = co_pool[0].send((x, co_pool[1:]))


@coroutine
def colist2tuple():
    """This does the work"""
    result = None
    while True:
        (lst, co_pool) = (yield result)
        # I don't like using append so I am changing the data in place...
        for (i,x) in enumerate(lst):
            # consider using "if hasattr(x,'__iter__')"
            if isinstance(x,list):
                lst[i] = co_pool[0].send((x, co_pool[1:]))
        result = tuple(lst)

HYRY帖子中的纯python替代方法:

Pure python alternative from HYRY's post:

def list2tuple(a):
    return tuple((list2tuple(x) if isinstance(x, list) else x for x in a))
def tuple2list(a):
    return list((tuple2list(x) if isinstance(x, tuple) else x for x in a))

制作一个协同程序池-这是对池的修改,但它可以:

Make a pool of coroutines - this is a hack of a pool, but it works:

# Make Coroutine Pools
colist2tuple_pool = [colist2tuple() for i in xrange(20) ]
cotuple2list_pool = [cotuple2list() for i in xrange(20) ]

现在做些时间-与:p相比

Now do some timing - comparing to :

def make_test(m, n):
    # Test data function taken from HYRY's post!
    return [[range(m), make_test(m, n-1)] for i in range(n)]
import timeit
t = make_test(20, 8)
%timeit list2tuple(t)
%timeit colist2tuple_pool[0].send((t, colist2tuple_pool[1:]))

结果-注意第二行中"s"旁边的"u":-)

Results - notice the 'u' next to the 's' in the second line :-)

1 loops, best of 3: 1.32 s per loop
1 loops, best of 3: 4.05 us per loop

真的很难相信.有人知道timeit是否适用于协程吗? 这是老式的方法:

Really seems too fast to believe. Anybody know if timeit works with coroutines? Here is the old fashioned way:

tic = time.time()
t1 = colist2tuple_pool[0].send((t, colist2tuple_pool[1:]))
toc = time.time()
print toc - tic

结果:

0.000446081161499

较新版本的Ipython和%timit发出警告:

Newer versions of Ipython and %timit give a warning:

最慢的运行时间比最快的运行时间长9.04倍.这可能
表示正在缓存一个中间结果1000000个循环,最好 3:每个循环317 ns

The slowest run took 9.04 times longer than the fastest. This could
mean that an intermediate result is being cached 1000000 loops, best of 3: 317 ns per loop

经过进一步调查,python生成器并不是魔术,并且send仍然是函数调用.我的基于生成器的方法似乎更快的原因是,我在列表上执行了就地操作-这导致更少的函数调用.

After some further investigation, python generators are not magic and send is still a function call. The reason my generator based method appeared to be faster is that I was doing an inplace operation on the lists - which resulted in fewer function calls.

在最近的谈话中,我用所有其他细节写了所有这些内容.

I wrote all this out with lots of additional detail in a recent talk.

希望这可以帮助想要玩发电机的人.

Hope this helps someone looking to play with generators.

这篇关于更快的嵌套元组列出并返回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆