Python加速 [英] Python speed-up
问题描述
大家好,
我正在进行霍夫曼编码练习,但它有点慢。这是
不是一个大问题,我这样做是为了教育自己:)
所以我开始分析代码并且实际上发生了减速
在我没想到的地方。
我创建了一个包含类似
等编码的查找表字典{'''':''0110'',''e'':''01''等}对这样的原始文本进行编码:
for original in original :
encoded_text + = table [c]
我可以欣赏文字的长度很大,但这不是问题所在
eaxample的字符频率计数。为什么这么慢?
减速发生的第二个地方就是当我切断编码的字符串时
的0'和1'的eigth像这样:
chr_list = []#结果列表
而1:
chr_list.append(encoded_text [:8] )#从字符串中取8位并将它们放在列表中
encoded_text = encoded_text [8:]#trunc the string
if len (encoded_text)< 8:#字符串结束到达
chr_list.append(encoded_text)
休息
我希望有人能告诉我为什么这些都很慢。< br $> b $ b问候,
Guyon
Hi all,
I am working on a Huffman encoding exercise, but it is kinda slow. This is
not a big problem, I do this to educate myself :)
So I started profiling the code and the slowdown was actually taking place
at places where I didn''t expect it.
after I have created a lookup-table-dictionary with encodings like
{''d'':''0110'', ''e'':''01'' etc} to encode the original text like this:
for c in original_text:
encoded_text += table[c]
I can appreciate the length of the text is big, but this isn''t a problem at
character frequency counting for eaxample. Why is this slow?
the second place the slowdown occurs is when I ty to chop the encoded string
of 0''s and 1''s in pieces of eigth like this:
chr_list = [] # resulting list
while 1:
chr_list.append(encoded_text[:8]) # take 8 bits from string and put them
in the list
encoded_text = encoded_text[8:] # truncate the string
if len(encoded_text) < 8: # end of string reached
chr_list.append(encoded_text)
break
I hope someone can tell me why these are slow.
regards,
Guyon
推荐答案
2004年9月22日星期三16 :06:04 +0200 schriebGuyonMorée:
Am Wed, 22 Sep 2004 16:06:04 +0200 schrieb Guyon Morée:
大家好,
我正在进行霍夫曼编码练习,但它有点慢。这不是一个大问题,我这样做是为了教育自己:)
所以我开始分析代码并且实际上发生了减速
在我没有的地方在我创建一个包含类似
{'''''''''0110'',''e'''等编码的查找表字典之后,我不会指望它。
'01''etc}对这样的原始文本进行编码:
对于原始文本中的c:
encoded_text + = table [c]
Hi all,
I am working on a Huffman encoding exercise, but it is kinda slow. This is
not a big problem, I do this to educate myself :)
So I started profiling the code and the slowdown was actually taking place
at places where I didn''t expect it.
after I have created a lookup-table-dictionary with encodings like
{''d'':''0110'', ''e'':''01'' etc} to encode the original text like this:
for c in original_text:
encoded_text += table[c]
你好Guyon,
这很慢。试试这个:
e_t = []
for original in原文:
e_t.append(table [c])
e_t =''''。join(e_t)
您的解决方案为每个+ =创建一个新字符串。
HTH ,
Thomas
Hi Guyon,
this is slow. Try this:
e_t=[]
for c in original_text:
e_t.append(table[c])
e_t=''''.join(e_t)
Your solutions creates a new string for every +=.
HTH,
Thomas
声明
s + = t
和
s = s [j:]
(字符串s和t;整数j)都需要花费与s长度相关的时间。
第一个创建一个new string将len(s)+ len(t)个字符复制到其中,
然后将新字符串分配给s。第二个创建一个新字符串,
将len(s)-j字符复制到其中,并将新字符串分配给s。
这主要是一个字符串不变性的后果,虽然它可能会在未来的Python版本中使得偷偷摸摸的优化可以使这些操作更快地获得
。
您可以将第一个循环写为
encoded_text ="" .join([table [c] for c in original_text])
可以写成原始文件中的c /
encoded_text_parts = []
for c:
encoded_text_parts.append(c)
encoded_text ="" .join(encoded_text_parts)
在列表中累积字符串部分然后加入它们是一种常见的增加代码性能的方法重复加入字符串。
对于第二个,考虑写作
chr_list = [encoded_text [i:i + 8] for i in range(0,len( encoded_text),8)]
可以是w r outten as
chr_list = []
for i in range(0,len(encoded_text),8):
chr_list.append( encoded_text [i:i + 8])
而不是削弱编码文本,只需要每次感兴趣的8个字符
。
Jeff
-----开始PGP SIGNATURE -----
版本:GnuPG v1.2.1(GNU / Linux)
iD8DBQFBUY3RJd01MZaTXX0RAsDsAJ44bewY1SoWBbyef65jgI 8en + 80LwCeO5P1
LW7C44lnWO6gfwIqjDh2TBo =
= 5rkm
---- -END PGP SIGNATURE -----
The statements
s += t
and
s = s[j:]
(strings s and t; integer j) both take time related to the length of s.
The first creates a new string copies len(s)+len(t) characters into it,
and then assigns the new string to s. The second creates a new string,
copies len(s)-j characters into it, and assigns the new string to s.
This is mostly a consequence of string immutability, though it''s possible
that sneaky optimizations might be able to make these operations faster
in a future version of Python.
You might write the first loop as
encoded_text = "".join([table[c] for c in original_text])
which can be written out as
encoded_text_parts = []
for c in original_text:
encoded_text_parts.append(c)
encoded_text = "".join(encoded_text_parts)
Accumulating string parts in a list and then joining them is one common
way to increase performance of code that repeatedly joins strings.
For the second, consider writing
chr_list = [encoded_text[i:i+8] for i in range(0, len(encoded_text), 8)]
which can be written out as
chr_list = []
for i in range(0, len(encoded_text), 8):
chr_list.append(encoded_text[i:i+8])
Instead of whittling away encoded_text, just take the 8 characters
you''re interested in each time.
Jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQFBUY3RJd01MZaTXX0RAsDsAJ44bewY1SoWBbyef65jgI 8en+80LwCeO5P1
LW7C44lnWO6gfwIqjDh2TBo=
=5rkm
-----END PGP SIGNATURE-----
Python中的字符串连接可能比你想象的要慢,
因为它需要构建一个新的字符串,其中包含一个内存分配和复制。建议阅读:
< http://www.skymind.com/~ocrow/python_string/>
对于第二部分,请尝试替换
encoded_text = encoded_text [8:]
with
del encoded_text [:8]
或者,使用索引变量并且不要改变列表。
2004年9月22日星期三04: 06:04 PM +0200,Guyon Mor?e写道:
String contatination in Python might be slower than you think it is,
because it requires building a new string, which involves a memory
allocation and copy. Suggested reading:
<http://www.skymind.com/~ocrow/python_string/>
For the seccond part, try replacing
encoded_text = encoded_text[8:]
with
del encoded_text[:8]
Or, use an index variable and don''t mutate the list at all.
On Wed, Sep 22, 2004 at 04:06:04PM +0200, Guyon Mor?e wrote:
大家好,
我正在进行霍夫曼编码练习,但它有点慢。这不是一个大问题,我这样做是为了教育自己:)
所以我开始分析代码并且实际上发生了减速
在我没有的地方在我创建一个包含类似
{'''''''''0110'',''e'''等编码的查找表字典之后,我不会指望它。
'01''etc}对这样的原始文本进行编码:
for original in原文:
encoded_text + = table [c]
我可以欣赏文本的长度很大,但这对于eaxample的字符频率计数来说不是问题。为什么这么慢?
减速发生的第二个地方就是当我切断编码的字符串
的0'和1'的时候像这样: br />
chr_list = []#结果列表
1:
chr_list.append(encoded_text [:8])#从字符串中取8位并将它们放在
中list
encoded_text = encoded_text [8:]#truncate string
if len(encoded_text)< 8:#字符串结束到达
chr_list.append(encoded_text)
休息
我希望有人能告诉我为什么这些都很慢。
问候,
Guyon
-
http://mail.python.org/mailman/listinfo/python-list
这篇关于Python加速的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!