泡菜替代品 [英] pickle alternative

查看:109
本文介绍了泡菜替代品的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我编写了一个简单的模块来序列化这些python类型:

IntType,TupleType,StringType,FloatType,LongType,ListType,DictType


此处可供阅读:

http://aspn.activestate.com/ASPN/Coo.../Recipe/415503

它似乎比泡菜更快,但是,解码过程比编码过程慢得多(5倍)。有没有人得到任何关于

方法的提示我可以加快这个速度?

Sw。

I''ve written a simple module which serializes these python types:

IntType, TupleType, StringType, FloatType, LongType, ListType, DictType

It available for perusal here:

http://aspn.activestate.com/ASPN/Coo.../Recipe/415503

It appears to work faster than pickle, however, the decode process is
much slower (5x) than the encode process. Has anyone got any tips on
ways I might speed this up?
Sw.

推荐答案

simonwittber写道:
simonwittber wrote:
我编写了一个简单的模块来序列化这些python类型:

IntType,TupleType,StringType,FloatType,LongType,ListType ,DictType


对于简单数据类型,请考虑marshal。作为pickle的替代品。

它似乎比pickle工作得更快,但是,解码过程比编码过程慢得多(5x)。有没有人得到任何关于我可能加快速度的提示?
I''ve written a simple module which serializes these python types:

IntType, TupleType, StringType, FloatType, LongType, ListType, DictType
For simple data types consider "marshal" as an alternative to "pickle".
It appears to work faster than pickle, however, the decode process is
much slower (5x) than the encode process. Has anyone got any tips on
ways I might speed this up?



def dec_int_type(data):

value = int(unpack(' '!i'',data.read(4))[0])

返回值


不需要'int'' - unpack返回一个int而不是一个字符串

表示int。


BTW,你的代码在64位机器上不起作用。 />

def enc_long_type(obj):

返回"%s%s%s" %(B,pack("!L",len(str(obj))),str(obj))


没有必要计算str (长)两次 - 对于大型多头

转换到基数10需要大量工作。就此而言,

转换为十六进制的速度更快,十六进制形式更紧凑。


每次解码都需要多次函数调用。虽然不太优雅,但你可能会获得更好的表现(测试它!)

如果你最小化;尝试这样的事情


def解码(数据):

返回_decode(StringIO(data).read)


def _decode(read,unpack = struct.unpack):

code = read(1)

如果不是代码:

引发IOError (到达文件的末尾)

如果代码==我:

返回解包(!i,阅读(4)) [0]

如果代码==" F":

返回解包(!f,读取(4))[0]

如果代码==" L":

count = unpack("!i",read(4))

return [_decode(read)for我在范围(计数)]

如果代码==" D":

count = unpack("!i",read(4))

返回dict([_解码(读取)for i in range(count)]

...


Andrew
< a href =mailto:da *** @ dalkescientific.com> da *** @ dalkescientific.com


def dec_int_type(data):
value = int(unpack(''!i'', data.read(4))[0])
return value

That ''int'' isn''t needed -- unpack returns an int not a string
representation of the int.

BTW, your code won''t work on 64 bit machines.

def enc_long_type(obj):
return "%s%s%s" % ("B", pack("!L", len(str(obj))), str(obj))

There''s no need to compute str(long) twice -- for large longs
it takes a lot of work to convert to base 10. For that matter,
it''s faster to convert to hex, and the hex form is more compact.

Every decode you do requires several function calls. While
less elegant, you''ll likely get better performance (test it!)
if you minimize that; try something like this

def decode(data):
return _decode(StringIO(data).read)

def _decode(read, unpack = struct.unpack):
code = read(1)
if not code:
raise IOError("reached the end of the file")
if code == "I":
return unpack("!i", read(4))[0]
if code == "F":
return unpack("!f", read(4))[0]
if code == "L":
count = unpack("!i", read(4))
return [_decode(read) for i in range(count)]
if code == "D":
count = unpack("!i", read(4))
return dict([_decode(read) for i in range(count)]
...

Andrew
da***@dalkescientific.com


>简单数据类型将编组视为替代 ; pickle"。
> For simple data types consider "marshal" as an alternative to "pickle".
来自marhal文档:
警告:编组模块不是为了安全抵御

错误或恶意构造的数据。永远不要从不受信任或未经验证的来源解组数据



BTW,您的代码不能在64位计算机上运行。


知道如何解决这个问题吗?所使用的字节数必须是跨平台的一致的b / b $ b。我想这意味着我不能使用结构

模块?

没有必要计算str(long)两次 - 对于大长度来说
它需要大量工作才能转换为基数10.就此而言,转换为十六进制更快,十六进制形式更紧凑。
From the marhal documentation: Warning: The marshal module is not intended to be secure against
erroneous or maliciously constructed data. Never unmarshal data
received from an untrusted or unauthenticated source.
BTW, your code won''t work on 64 bit machines.
Any idea how this might be solved? The number of bytes used has to be
consistent across platforms. I guess this means I cannot use the struct
module?
There''s no need to compute str(long) twice -- for large longs
it takes a lot of work to convert to base 10. For that matter,
it''s faster to convert to hex, and the hex form is more compact.




感谢您的提示。


Sw。



Thanks for the tip.

Sw.


simonwittber写道:
simonwittber wrote:
来自marhal文档:警告:编组模块无意对抗错误或恶意构造的数据。永远不要从不受信任或未经认证的来源收集数据
From the marhal documentation: Warning: The marshal module is not intended to be secure against
erroneous or maliciously constructed data. Never unmarshal data
received from an untrusted or unauthenticated source.




啊,我忘记了这一点。虽然我不记得攻击可能是什么b $ b,但我认为这是因为C代码没有经过全面的审查

因意外错误情况。

任何想法如何解决这个问题?所使用的字节数必须在各个平台上保持一致。我想这意味着我不能使用struct
模块?



Ahh, I had forgotten that. Though I can''t recall what an attack
might be, I think it''s because the C code hasn''t been fully vetted
for unexpected error conditions.
Any idea how this might be solved? The number of bytes used has to be
consistent across platforms. I guess this means I cannot use the struct
module?




你想如何解决它? 64位机器是否应该能够在32位机器上读取数据流?b
反过来呢?如何将
互换成浮点数?


您可以在输出流前加上编码说明
使用的
:版本号,浮动的大小,int的大小(我认为这应该总是

这些天是浮动的大小)。阅读这些然后使用

信息来确定要使用的解码/发送功能。


Andrew
da *** @ dalkescientific.com



How do you want to solve it? Should a 64 bit machine be able to read
a data stream made on a 32 bit machine? What about vice versa? How
are floats interconverted?

You could preface the output stream with a description of the encoding
used: version number, size of float, size of int (which should always
be sizeof float these days, I think). Read these then use that
information to figure out which decode/dispatch function to use.

Andrew
da***@dalkescientific.com


这篇关于泡菜替代品的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆