用2个生成器替换3个列表 [英] Replacing 3 lists with 2 generators

查看:64
本文介绍了用2个生成器替换3个列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用生成器来优化我的应用程序,而不是创建3个列表,我想使用2个生成器.这是当前版本的我的应用的简短方案:

I want to optimize my application using generators and instead of creating 3 lists I want to use 2 generators. Here's the short scheme of my app in it's current version:

1)从二进制文件->第一个列表中加载数据

1) Load data from a binary file -> 1st list

self.stream_data = [ struct.unpack(">H", data_file.read(2))[0] for foo in
                       xrange(self.columns*self.rows) ]

2)创建所谓的非零抑制数据(所有数据均为零)->第二个列表

2) Create so called Nonzero-suppressed-data (all data with zeros) -> 2nd list

self.NZS_data = list()
for row in xrange(self.rows):
    self.NZS_data.append( [ self.stream_data[column + row * self.rows ] 
                          for column in xrange(self.columns) ] )

3)创建零抑制数据(坐标中不包含零)->第三个列表

3) Create Zero-suppressed-data (without zeros with coordinates) -> 3rd list

self.ZS_data = list()
for row in xrange(self.rows):
    for column in xrange(self.columns):
        if self.NZS_data[row][column]:
            self.ZS_data.append( [ column, row, self.NZS_data[row][column] ] )

(我知道可以使用itertools.product将其压缩到单个列表理解中)

(I know that this could have been squeezed into a single list comprehension using itertools.product)

4)将ZS_data列表保存到文件中.

4) Save the ZS_data list into a file.

我使用了Python的cProfiler,大部分时间(除了读取和解压缩)都花在了创建这两个列表(NZS_data和ZS_data)上.因为我只需要它们就可以将数据保存到文件中,所以我一直在考虑使用2个生成器:

I used Python's cProfiler and most of the time (apart from reading and unpacking) is consumed for creation of these two (NZS_data and ZS_data) lists. Because I only need them for saving data into a file I've been thinking about using 2 generators:

1)创建一个用于读取文件的生成器->第一个生成器

1) Create a generator for reading a file -> 1st generator

self.stream_data = ( struct.unpack(">H", data_file.read(2))[0] for foo in
                       xrange(self.columns*self.rows) )

2)创建ZS_data生成器(我真的不需要此NZS数据)

2) Create ZS_data generator (I don't really need this NZS data)

self.ZS_data = ( [column, row, self.stream_data.next()]
                 for row, column in itertools.product(xrange(self.rows),
                 xrange(self.columns))
                 if self.stream_data.next() )

这当然不能正常工作,因为我从生成器中获得了两个不同的值.

3)使用生成器将数据保存到文件中.

3) Save data into a file using generator.

我想知道如何做到这一点. 也许您还有其他与此应用程序可能的优化有关的想法?

I wonder how this could be done. Maybe you have other ideas related to possible optimization of this application?

添加
基于生成器的解决方案:

def create_ZS_data(self):
    self.ZS_data = ( [column, row, self.stream_data[column + row * self.rows ]]
                     for row, column in itertools.product(xrange(self.rows), xrange(self.columns))
                     if self.stream_data[column + row * self.rows ] )

配置文件信息:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3257    1.117    0.000   71.598    0.022 decode_from_merlin.py:302(create_ZS_file)
   463419   67.705    0.000   67.705    0.000 decode_from_merlin.py:86(<genexpr>)

乔恩的解决方案:

create_ZS_data(self):
    self.ZS_data = list()
    for rowno, cols in enumerate(self.stream_data[i:i+self.columns] for i in xrange(0, len(self.stream_data), self.columns)):
        for colno, col in enumerate(cols):
            # col == value, (rowno, colno) = index
            if col:
                self.ZS_data.append([colno, rowno, col])


探查器信息:


Profiler info:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3257   18.616    0.006   19.919    0.006 decode_from_merlin.py:83(create_ZS_data)

推荐答案

您可以使解压缩更加有效...

You could possibly make the unpacking more efficient...

self.data_stream = struct.unpack_from('>{}H'.format(self.rows*self.columns), data_file)

将循环减少到类似以下内容:

The reduce the looping to something like:

for rowno, cols in enumerate(self.data_stream[i:i+self.columns] for i in xrange(0, len(self.data_stream), self.columns)):
    for colno, col in enumerate(cols):
        # col == value, (rowno, colno) = index
        if col == 0:
            pass # do something
        else:
            pass # do something else

注释-未经测试

这篇关于用2个生成器替换3个列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆