Python对象开销? [英] Python object overhead?

查看:94
本文介绍了Python对象开销?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python来处理大型管道('''')分隔数据

文件。这些文件的大小从25 MB到200 MB不等。


由于每行对应一条记录,我要做的是

创建每条记录中的一个对象。但是,看起来这样做会导致内存开销上升两到三次。


请参阅以下两个示例:同时运行每个输入文件

导致示例2的内存使用量增加3倍。(内存使用情况为

使用top检查。)


这个适用于Gentoo Linux(64位)上的Python 2.4.3和CentOS 4.4(64位)上的Python

2.3.4。


这是只是它的方式是还是我忽略了一些明显的东西?


谢谢,

马特

示例1:读入列表行:

#begin readlines.py

import sys,time

filedata = list()

file = open(sys.argv [ 1])

而True:

line = file.readline()

if len(line)== 0:break#EOF

filedata.append(行)

file.close()

print" data read;睡20秒..."

time.sleep(20)#给出时间检查顶部

#end readlines.py

示例2:读取对象的行:

#begin readobjects.py

import sys,time

class FileRecord:

def __init __(self,line):

self.line = line

records = list()

file = open(sys.argv [1])

而True:

line = file.readline()

if len(line)== 0:break#EOF

rec = FileRecord(行)

records.append(rec)

file.close()

打印数据读取;睡20秒..."

time.sleep(20)#给出时间检查顶部

#end readobjects.py

I''m trying to use Python to work with large pipe (''|'') delimited data
files. The files range in size from 25 MB to 200 MB.

Since each line corresponds to a record, what I''m trying to do is
create an object from each record. However, it seems that doing this
causes the memory overhead to go up two or three times.

See the two examples below: running each on the same input file
results in 3x the memory usage for Example 2. (Memory usage is
checked using top.)

This happens for both Python 2.4.3 on Gentoo Linux (64bit) and Python
2.3.4 on CentOS 4.4 (64bit).

Is this "just the way it is" or am I overlooking something obvious?

Thanks,
Matt
Example 1: read lines into list:
# begin readlines.py
import sys, time
filedata = list()
file = open(sys.argv[1])
while True:
line = file.readline()
if len(line) == 0: break # EOF
filedata.append(line)
file.close()
print "data read; sleeping 20 seconds..."
time.sleep(20) # gives time to check top
# end readlines.py
Example 2: read lines into objects:
# begin readobjects.py
import sys, time
class FileRecord:
def __init__(self, line):
self.line = line
records = list()
file = open(sys.argv[1])
while True:
line = file.readline()
if len(line) == 0: break # EOF
rec = FileRecord(line)
records.append(rec)
file.close()
print "data read; sleeping 20 seconds..."
time.sleep(20) # gives time to check top
# end readobjects.py

推荐答案

2007年3月23日星期五15:11:35 -0600,Matt Garman写道:

On Fri, 23 Mar 2007 15:11:35 -0600, Matt Garman wrote:


>

这是就像它的方式或者我是否忽略了一些明显的东西?
>
Is this "just the way it is" or am I overlooking something obvious?



Matt,


如果你迭代甚至最小的对象实例化了很多<与
相比,与简单的附加列表相比,它将是昂贵的。


我不认为你可以通过这些对象解决一些开销。


但是,就一般效率而言,与

对象实例化无关,你应该研究一下xreadlines()。


我建议在循环时执行以下操作:


for open(sys.argv [1])。xreadlines():

..

-

Mark Nenadov -skype:marknenadov,web: http://www.marknenadov.com

- 荣耀是短暂的,但默默无闻是永恒的。 - Napoleon Bonapart


Matt,

If you iterate over even the smallest object instantiation a large amount
of times, it will be costly compared to a simple list append.

I don''t think you can get around some overhead with the objects.

However, in terms of generally efficiency not specifically related to
object instantiation, you should look into xreadlines().

I''d suggest doing the following instead of that while loop:

for line in open(sys.argv[1]).xreadlines():
..
--
Mark Nenadov -skype: marknenadov, web: http://www.marknenadov.com
-"Glory is fleeting, but obscurity is forever." -- Napoleon Bonapart


En Fri,2007年3月23日18:27:25 -0300,Mark Nenadov

< ma **@freelance-developer.comescribió:
En Fri, 23 Mar 2007 18:27:25 -0300, Mark Nenadov
<ma**@freelance-developer.comescribió:

我建议在循环时执行以下操作:


for open(sys.argv [1])。xreadlines():
I''d suggest doing the following instead of that while loop:

for line in open(sys.argv[1]).xreadlines():



糟糕的xreadlines方法寿命很短:它诞生于Python 2.1并且得到了

在2.3上弃用:(

一个文件现在是它自己的行迭代器:


f = open(...)<对于f行中的


...


-

Gabriel Genellina


2007年3月23日星期五19:11:23 -0300,Gabriel Genellina写道:
On Fri, 23 Mar 2007 19:11:23 -0300, Gabriel Genellina wrote:

可怜的xreadlines方法寿命很短:它诞生于Python 2.1并且在2.3上被弃用了
:(

一个文件现在是它自己的行迭代器:


f =开放(...)

fo r行中的f:

...
Poor xreadlines method had a short life: it was born on Python 2.1 and got
deprecated on 2.3 :(
A file is now its own line iterator:

f = open(...)
for line in f:
...



Gabriel,


感谢您指出出来了!我完全忘记了

那个!


我之前测试过它们。 readlines()非常慢。不推荐使用的

xreadlines()作为迭代器以open()的速度接近。在我特别的

测试中,我发现了以下内容:


readlines() - 32" time units"

xreadlines( )-0.7时间单位

open()iterator -0.41"时间单位"

-

马克Nenadov -skype:marknenadov,web: http://www.marknenadov.com
- 他们不需要马上信任我,因为英国人说我好吗?但他们太荒谬了。到处都是麦克风

并且种植得非常明显。为什么,如果我弯腰嗅到一碗花的话,我会用麦克风刮鼻子。

- Tricyle(Dushko Popov)关于美国情报

Gabriel,

Thanks for pointing that out! I had completely forgotten about
that!

I''ve tested them before. readlines() is very slow. The deprecated
xreadlines() is close in speed to open() as an iterator. In my particular
test, I found the following:

readlines() -32 "time units"
xreadlines() -0.7 "time units"
open() iterator -0.41 "time units"

--
Mark Nenadov -skype: marknenadov, web: http://www.marknenadov.com
-"They need not trust me right away simply because the British say
that I am O.K.; but they are so ridiculous. Microphones everywhere
and planted so obviously. Why, if I bend over to smell a bowl of
flowers, I scratch my nose on a microphone."
-- Tricyle (Dushko Popov) on American Intelligence


这篇关于Python对象开销?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆