在Python中是否没有对大型字符串的压缩支持? [英] Is there no compression support for large sized strings in Python?

查看:52
本文介绍了在Python中是否没有对大型字符串的压缩支持?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果最好直接从硬盘上加载未压缩的数据

或者加载压缩数据并将其解压缩(Windows XP SP 2, Pentium4 3.0 GHz

系统,带3 GByte RAM)

似乎表明Python中没有可用的压缩库

真正适用于大尺寸

(即500 MByte)字符串。


测试提供的代码并看看你自己。


至少在我的系统:

zlib无法解压缩引发内存错误

pylzma无法解压缩运行无休止地占用99%的CPU时间

bz2未能压缩运行无休止地占用99%的CPU时间


同样适用于10 MB的字符串没有任何问题。


那么什么?在Python中没有对大型字符串的压缩支持吗?

我在这里采取了错误的做法吗?

是否有,如果有,理论上限是多少字符串大小

可以由每个压缩库处理吗?


我知道的唯一限制是python.exe进程本身2 GByte ,

但这似乎不是这种情况下的实际问题。

尝试创建大量时还有一些其他奇怪的效果

使用以下代码的字符串:

m =''m''* 1048576

#str1024MB = 1024 * m#因内存错误而失败,但是:

str512MB_01 = 512 * m#工作正常

#str512MB_02 = 512 * m#因内存错误而失败,但是:

str256MB_01 = 256 * m#工作正常
str256MB_02 = 256 * m#工作正常

等。

下来分配每个单独的MB单独的字符串来推送python.exe

到经验丰富的上限

of由Windows任务管理器报告的内存可用于python.exe

2.065.352 KByte。

问题为什么str1024MB = 1024 * m指令失败,

当内存显然存在并且目标大小为1 GByte时可以实现


超出本讨论主题的范围,或者这是同样的问题

导致压缩库失败?
压缩库失败?为什么没有引起内存错误呢?


我们欢迎任何暗示理解正在发生的事情以及为什么和/或朝向

变通方法的提示。


克劳迪奥


============================ ====================== ==========

#HDvsArchiveUnpackingSpeed_WriteFiles.py


strSize10MB =''1234567890''* 1048576#10 MB

strSize500MB = 50 * strSize10MB

fObj = file(r''c:\ strSize500MB.dat'',''wb'')

fObj.write(strSize500MB)

fObj.close()


fObj = file(r''c:\strSize500MBCompressed.zlib'',''wb'')

import zlib

strSize500MBCompressed = zlib.compress(strSize500MB)

fObj.write(strSize500MBCompressed)

fObj.close()


fObj = file(r''c:\ strSize500MBCompressed.pylzma'',''wb'')

import pylzma

strSize500MBCompressed = pylzma.compress (strSize500MB)

fObj.write(strSize500MBCompressed)

fObj.close()


fObj = file(r''c :\strSize500MBCompressed.bz2'',''wb'')

import bz2

strSize500MBCompressed = bz2.compress(strSize500MB)

fObj .write(strSize500MBCompressed)

fObj.close()


打印

print''创建文件:''

打印''%s \ n%s \ n%s \ n%s''%(

r''c:\ strSize500MB.dat''

,r''c:\strSize500MBCompressed.zlib''

,r''c:\ strSize500MBCompressed.pylzma''

, r''c:\strSize500MBCompressed.bz2''




raw_input(''退出,输入/> '')


=================================== =============== ==========

#HDvsArchiveUnpackingSpeed_TestSpeed.py

导入时间

startTime = time.clock()

fObj = file(r''c:\strSize500MB.dat'',''rb'')

strSize500MB = fObj.read()

fObj.close()

打印

打印''从文件加载未压缩的数据:%7.3f

秒''%(time.clock() - startTime,)

startTime = time.clock()

fObj = file(r''c:\strSize500MBCompressed.zlib'',''rb'')

strSize500MBCompressed = fObj.read()

fObj .close()

打印

打印''从文件加载压缩数据:%7.3f

秒''%(time.clock( )-startTime,)

import zlib

试试:

startTime = time.clock()

strSize500MB = zlib.decompress(strSize500MBCompressed)

打印''解压缩zli b数据:%7.3f

秒''%(time.clock() - startTime,)

除了:

打印''解压缩zlib数据失败''

startTime = time.clock()

fObj =文件(r''c:\strSize500MBCompressed.pylzma'',''rb'' )

strSize500MBCompressed = fObj.read()

fObj.close()

打印

打印''loading来自文件的压缩数据:%7.3f

秒''%(time.clock() - startTime,)

导入pylzma

尝试:

startTime = time.clock()

strSize500MB = pylzma.decompress(strSize500MBCompressed)

print''解压缩pylzma数据:%7.3f

秒''%(time.clock() - startTime,)

除了:

打印''解压缩pylzma数据失败''

startTime = time.clock()

fObj = file(r''c:\strSize500MBCompressed.bz2'',''rb'')

strSize500MBCompressed = fObj.read()

fObj.close()

打印

打印''从文件加载压缩数据:%7.3f

秒' '%(time.clock() - startTime,)

导入bz2

尝试:

startTime = time.clock()

strSize500MB = bz2.decompress(strSize500MBCompressed)

打印''解压缩bz2数据:%7.3f

秒''%(time.clock() - startTime,)

除外:

打印''解压缩bz2数据失败''


raw_input(''EXIT with Enter / > '')

What started as a simple test if it is better to load uncompressed data
directly from the harddisk or
load compressed data and uncompress it (Windows XP SP 2, Pentium4 3.0 GHz
system with 3 GByte RAM)
seems to show that none of the in Python available compression libraries
really works for large sized
(i.e. 500 MByte) strings.

Test the provided code and see yourself.

At least on my system:
zlib fails to decompress raising a memory error
pylzma fails to decompress running endlessly consuming 99% of CPU time
bz2 fails to compress running endlessly consuming 99% of CPU time

The same works with a 10 MByte string without any problem.

So what? Is there no compression support for large sized strings in Python?
Am I doing something the wrong way here?
Is there any and if yes, what is the theoretical upper limit of string size
which can be processed by each of the compression libraries?

The only limit I know about is 2 GByte for the python.exe process itself,
but this seems not to be the actual problem in this case.
There are also some other strange effects when trying to create large
strings using following code:
m = ''m''*1048576
# str1024MB = 1024*m # fails with memory error, but:
str512MB_01 = 512*m # works ok
# str512MB_02 = 512*m # fails with memory error, but:
str256MB_01 = 256*m # works ok
str256MB_02 = 256*m # works ok
etc. . etc. and so on
down to allocation of each single MB in separate string to push python.exe
to the experienced upper limit
of memory reported by Windows task manager available to python.exe of
2.065.352 KByte.
Is the question why did the str1024MB = 1024*m instruction fail,
when the memory is apparently there and the target size of 1 GByte can be
achieved
out of the scope of this discussion thread, or is this the same problem
causing
the compression libraries to fail? Why is no memory error raised then?

Any hints towards understanding what is going on and why and/or towards a
workaround are welcome.

Claudio

================================================== ==========
# HDvsArchiveUnpackingSpeed_WriteFiles.py

strSize10MB = ''1234567890''*1048576 # 10 MB
strSize500MB = 50*strSize10MB
fObj = file(r''c:\strSize500MB.dat'', ''wb'')
fObj.write(strSize500MB)
fObj.close()

fObj = file(r''c:\strSize500MBCompressed.zlib'', ''wb'')
import zlib
strSize500MBCompressed = zlib.compress(strSize500MB)
fObj.write(strSize500MBCompressed)
fObj.close()

fObj = file(r''c:\strSize500MBCompressed.pylzma'', ''wb'')
import pylzma
strSize500MBCompressed = pylzma.compress(strSize500MB)
fObj.write(strSize500MBCompressed)
fObj.close()

fObj = file(r''c:\strSize500MBCompressed.bz2'', ''wb'')
import bz2
strSize500MBCompressed = bz2.compress(strSize500MB)
fObj.write(strSize500MBCompressed)
fObj.close()

print
print '' Created files: ''
print '' %s \n %s \n %s \n %s'' %(
r''c:\strSize500MB.dat''
,r''c:\strSize500MBCompressed.zlib''
,r''c:\strSize500MBCompressed.pylzma''
,r''c:\strSize500MBCompressed.bz2''
)

raw_input('' EXIT with Enter /> '')

================================================== ==========
# HDvsArchiveUnpackingSpeed_TestSpeed.py
import time

startTime = time.clock()
fObj = file(r''c:\strSize500MB.dat'', ''rb'')
strSize500MB = fObj.read()
fObj.close()
print
print '' loading uncompressed data from file: %7.3f
seconds''%(time.clock()-startTime,)

startTime = time.clock()
fObj = file(r''c:\strSize500MBCompressed.zlib'', ''rb'')
strSize500MBCompressed = fObj.read()
fObj.close()
print
print ''loading compressed data from file: %7.3f
seconds''%(time.clock()-startTime,)
import zlib
try:
startTime = time.clock()
strSize500MB = zlib.decompress(strSize500MBCompressed)
print ''decompressing zlib data: %7.3f
seconds''%(time.clock()-startTime,)
except:
print ''decompressing zlib data FAILED''
startTime = time.clock()
fObj = file(r''c:\strSize500MBCompressed.pylzma'', ''rb'')
strSize500MBCompressed = fObj.read()
fObj.close()
print
print ''loading compressed data from file: %7.3f
seconds''%(time.clock()-startTime,)
import pylzma
try:
startTime = time.clock()
strSize500MB = pylzma.decompress(strSize500MBCompressed)
print ''decompressing pylzma data: %7.3f
seconds''%(time.clock()-startTime,)
except:
print ''decompressing pylzma data FAILED''
startTime = time.clock()
fObj = file(r''c:\strSize500MBCompressed.bz2'', ''rb'')
strSize500MBCompressed = fObj.read()
fObj.close()
print
print ''loading compressed data from file: %7.3f
seconds''%(time.clock()-startTime,)
import bz2
try:
startTime = time.clock()
strSize500MB = bz2.decompress(strSize500MBCompressed)
print ''decompressing bz2 data: %7.3f
seconds''%(time.clock()-startTime,)
except:
print ''decompressing bz2 data FAILED''

raw_input('' EXIT with Enter /> '')

推荐答案

Claudio Grondi写道:
Claudio Grondi wrote:
如果更好的话,什么开始作为一个简单的测试直接从硬盘加载未压缩数据或
加载压缩数据并解压缩(Windows XP SP 2,Pentium4 3.0 GHz系统和3 GB内存)
似乎表明Python中没有一个可用的压缩库
真的适用于大型
(即500 MByte)字符串。

测试提供的代码并看看你自己。

至少在我的系统上:
zlib无法解压缩引发内存错误pylzma无法解压缩运行无休止地消耗99%的CPU时间
bz2无法压缩运行无休止地消耗99%的CPU时间

同样的工作与10 MByte字符串没有任何问题。

那又怎样?在Python中是否没有对大型字符串的压缩支持?
What started as a simple test if it is better to load uncompressed data
directly from the harddisk or
load compressed data and uncompress it (Windows XP SP 2, Pentium4 3.0 GHz
system with 3 GByte RAM)
seems to show that none of the in Python available compression libraries
really works for large sized
(i.e. 500 MByte) strings.

Test the provided code and see yourself.

At least on my system:
zlib fails to decompress raising a memory error
pylzma fails to decompress running endlessly consuming 99% of CPU time
bz2 fails to compress running endlessly consuming 99% of CPU time

The same works with a 10 MByte string without any problem.

So what? Is there no compression support for large sized strings in Python?




你可能正在测量windows的内存管理而不是com /

压力库本身(Python委托所有内存分配> 256字节

到系统)。


我建议使用增量(流)处理;据我所知,

所有三个图书馆都支持。


< / F>



you''re probably measuring windows'' memory managment rather than the com-
pression libraries themselves (Python delegates all memory allocations >256 bytes
to the system).

I suggest using incremental (streaming) processing instead; from what I can tell,
all three libraries support that.

</F>


在这个系统上(Linux 2.6.x,AMD64,2 GB RAM,python2.4)我能够通过重复构建一个1 GB的字符串,并压缩一个512MB

一串gzip字符串。

On this system (Linux 2.6.x, AMD64, 2 GB RAM, python2.4) I am able to
construct a 1 GB string by repetition, as well as compress a 512MB
string with gzip in one gulp.


cat claudio.py

s ='' 1234567890''*(1048576 * 50)

import zlib

c = zlib.compress(s)

print len(c )

open(" /tmp/claudio.gz" ;," wb")。写(c)

cat claudio.py
s = ''1234567890''*(1048576*50)

import zlib
c = zlib.compress(s)
print len(c)
open("/tmp/claudio.gz", "wb").write(c)


这篇关于在Python中是否没有对大型字符串的压缩支持?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆