为什么 Python 3.5 中的 str.translate 比 Python 3.4 快得多? [英] Why is str.translate much faster in Python 3.5 compared to Python 3.4?

查看:59
本文介绍了为什么 Python 3.5 中的 str.translate 比 Python 3.4 快得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在 Python 3.4 中使用 text.translate() 从给定字符串中删除不需要的字符.

I was trying to remove unwanted characters from a given string using text.translate() in Python 3.4.

最少的代码是:

import sys 
s = 'abcde12345@#@$#%$'
mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$')
print(s.translate(mapper))

它按预期工作.然而,在 Python 3.4 和 Python 3.5 中执行的同一个程序却有很大的不同.

It works as expected. However the same program when executed in Python 3.4 and Python 3.5 gives a large difference.

计算时间的代码是

python3 -m timeit -s "import sys;s = 'abcde12345@#@$#%$'*1000 ; mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$'); "   "s.translate(mapper)"

Python 3.4 程序需要 1.3ms 而 Python 3.5 中的相同程序只需要 26.4μs.

The Python 3.4 program takes 1.3ms whereas the same program in Python 3.5 takes only 26.4μs.

与 Python 3.4 相比,Python 3.5 有哪些改进使其速度更快?

What has improved in Python 3.5 that makes it faster compared to Python 3.4?

推荐答案

TL;DR - ISSUE 21118

长篇故事

Josh Rosenberg 发现 str.translate() 函数与 bytes.translate 相比非常慢,他提出了一个 issue,说明:

Josh Rosenberg found out that the str.translate() function is very slow compared to the bytes.translate, he raised an issue, stating that:

在 Python 3 中,str.translate() 通常是一种性能悲观,而不是优化.

In Python 3, str.translate() is usually a performance pessimization, not optimization.

为什么 str.translate() 很慢?

str.translate() 速度很慢的主要原因是以前在 Python 字典中查找.

Why was str.translate() slow?

The main reason for str.translate() to be very slow was that the lookup used to be in a Python dictionary.

maketrans 的使用使这个问题变得更糟.使用 bytes 的类似方法构建了一个 C256 个项目的数组以快速查找表.因此,使用更高级别的 Python dict 使得 Python 3.4 中的 str.translate() 非常慢.

The usage of maketrans made this problem worse. The similar approach using bytes builds a C array of 256 items to fast table lookup. Hence the usage of higher level Python dict makes the str.translate() in Python 3.4 very slow.

第一种方法是添加一个小补丁,translate_writer,但是速度提升不是那么令人愉快.很快,另一个补丁 fast_translate 得到了测试,它产生了高达 55% 加速的非常好的结果.

The first approach was to add a small patch, translate_writer, However the speed increase was not that pleasing. Soon another patch fast_translate was tested and it yielded very nice results of up to 55% speedup.

从文件中可以看出的主要变化是将Python字典查找更改为C级查找.

The main change as can be seen from the file is that the Python dictionary lookup is changed into a C level lookup.

现在的速度几乎与bytes

                                unpatched           patched

str.translate                   4.55125927699919    0.7898181750006188
str.translate from bytes trans  1.8910855210015143  0.779950579000797

<小时>

这里需要注意的是,性能增强仅在 ASCII 字符串中表现突出.


A small note here is that the performance enhancement is only prominent in ASCII strings.

正如 JFSebastian 在 下面的评论,在3.5之前,翻译用于ASCII和非ASCII情况下的工作方式相同.然而,从 3.5 ASCII case 开始要快得多.

As J.F.Sebastian mentions in a comment below, Before 3.5, translate used to work in the same way for both ASCII and non-ASCII cases. However from 3.5 ASCII case is much faster.

以前的 ASCII 与非 ASCII 过去几乎相同,但现在我们可以看到性能发生了很大变化.

Earlier ASCII vs non-ascii used to be almost same, however now we can see a great change in the performance.

如本答案.

以下代码演示了这一点

python3.5 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
100000 loops, best of 3: 2.3 usec per loop
python3.5 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 117 usec per loop

python3 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 91.2 usec per loop
python3 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
10000 loops, best of 3: 101 usec per loop

结果列表:

         Python 3.4    Python 3.5  
Ascii     91.2          2.3 
Unicode   101           117

这篇关于为什么 Python 3.5 中的 str.translate 比 Python 3.4 快得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆