求和字节(及其方块)的最新标准方法? [英] Fatest standard way to sum bytes (and their squares)?

查看:74
本文介绍了求和字节(及其方块)的最新标准方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于文件散列系统(找到类似的文件,而不是相同的

),我需要能够高效快速地总结

的字数。一个文件和他们的正方形。由于

应用程序的性质,我要求我使用Python,或者仅使用

标准库模块(如果存在此类设施)协助。


到目前为止,我发现最快的方法是使用`sum`内置和

发电机::

ordinalSum = sum(数据中x的ord(x))

ordinalSumSquared = sum(数据中x的ord(x)** 2)


这大约是显式循环的两倍,但由于处理海量数据需要处理大量数据,因此越快越好。是否有任何我想不到的技巧,或者其他

模块中的辅助功能,我没有想到的?


-

Erik Max Francis&& ma*@alcyone.com && http://www.alcyone.com/max/

美国加利福尼亚州圣何塞市&& 37 20 N 121 53 W&& AIM,Y!M erikmaxfrancis

无论谁命名为颈缩都是一个很差的解剖学判断。

- 格劳乔·马克思

解决方案

当然,我的意思是最快,而不是最好的。胖子也不会好,




-

Erik Max Francis&& ma*@alcyone.com && http://www.alcyone.com/max/

美国加利福尼亚州圣何塞市&& 37 20 N 121 53 W&& AIM,Y!M erikmaxfrancis

无论谁命名它缩颈都是一个很差的解剖判断。

- Groucho Marx


< blockquote> Erik Max Francis< ma*@alcyone.comwrites:


对于文件散列系统(找到类似的文件,而不是相同的文件),

我需要能够高效快速地总结

a文件及其正方形的字节数。由于应用程序的性质,我在Python中使用它,或者只使用标准库模块(如果存在这样的设施,这可能是b / b)协助。


到目前为止,我发现最快的方法是使用`sum`内置和生成器::


ordinalSum = sum (数据中x的ord(x))

ordinalSumSquared = sum(数据中x的ord(x)** 2)


这是大约两倍和显式循环一样快,但由于它将处理大量数据,因此越快越好。是不是有任何我想不起的技巧,或者其他模块中的辅助函数

我没想到?



这个更快吗?


ordSum,orsSumSq =(lambda c:c.real,c.imag)( sum(复数(ord(x),ord(x)<< 1)

for data in data))


''as


Erik Max Francis写道:


对于文件哈希系统(查找类似文件,而不是相同的文件) >
ones),我需要能够高效快速地将文件的字节和它们的正方形相加。

的序数。由于

应用程序的性质,我要求我使用Python,或者仅使用

标准库模块(如果存在此类设施)协助。


到目前为止,我发现最快的方法是使用`sum`内置和

发电机::

ordinalSum = sum(数据中x的ord(x))

ordinalSumSquared = sum(数据中x的ord(x)** 2)


这大约是显式循环的两倍,但由于处理海量数据需要处理大量数据,因此越快越好。是否有任何我想不到的技巧,或者其他

模块中的辅助功能,我没有想到的?



两个想法:


使用ord的查找表(c)** 2

使用array.array()


For a file hashing system (finding similar files, rather than identical
ones), I need to be able to efficiently and quickly sum the ordinals of
the bytes of a file and their squares. Because of the nature of the
application, it''s a requirement that I do it in Python, or only with
standard library modules (if such facilities exist) that might assist.

So far the fastest way I''ve found is using the `sum` builtin and
generators::

ordinalSum = sum(ord(x) for x in data)
ordinalSumSquared = sum(ord(x)**2 for x in data)

This is about twice as fast as an explicit loop, but since it''s going to
be processing massive amounts of data, the faster the better. Are there
any tricks I''m not thinking of, or perhaps helper functions in other
modules that I''m not thinking of?

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM, Y!M erikmaxfrancis
Whoever named it necking was a poor judge of anatomy.
-- Groucho Marx

解决方案

And of course I meant fastest, not "fatest." Fattest wouldn''t be good,
either.

--
Erik Max Francis && ma*@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 20 N 121 53 W && AIM, Y!M erikmaxfrancis
Whoever named it necking was a poor judge of anatomy.
-- Groucho Marx


Erik Max Francis <ma*@alcyone.comwrites:

For a file hashing system (finding similar files, rather than identical ones),
I need to be able to efficiently and quickly sum the ordinals of the bytes of
a file and their squares. Because of the nature of the application, it''s a
requirement that I do it in Python, or only with standard library modules (if
such facilities exist) that might assist.

So far the fastest way I''ve found is using the `sum` builtin and generators::

ordinalSum = sum(ord(x) for x in data)
ordinalSumSquared = sum(ord(x)**2 for x in data)

This is about twice as fast as an explicit loop, but since it''s going to be
processing massive amounts of data, the faster the better. Are there any
tricks I''m not thinking of, or perhaps helper functions in other modules that
I''m not thinking of?

Is this any faster?

ordSum, orsSumSq = (lambda c:c.real,c.imag)(sum(complex(ord(x),ord(x)<<1)
for x in data))

''as


Erik Max Francis wrote:

For a file hashing system (finding similar files, rather than identical
ones), I need to be able to efficiently and quickly sum the ordinals of
the bytes of a file and their squares. Because of the nature of the
application, it''s a requirement that I do it in Python, or only with
standard library modules (if such facilities exist) that might assist.

So far the fastest way I''ve found is using the `sum` builtin and
generators::

ordinalSum = sum(ord(x) for x in data)
ordinalSumSquared = sum(ord(x)**2 for x in data)

This is about twice as fast as an explicit loop, but since it''s going to
be processing massive amounts of data, the faster the better. Are there
any tricks I''m not thinking of, or perhaps helper functions in other
modules that I''m not thinking of?

Two ideas:

Use a lookup-table for ord(c)**2
Use array.array()


这篇关于求和字节(及其方块)的最新标准方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆