使用 Python 从字符串中删除除数字以外的字符? [英] Remove characters except digits from string using Python?

查看:92
本文介绍了使用 Python 从字符串中删除除数字以外的字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何从字符串中删除除数字以外的所有字符?

解决方案

在 Python 2.* 中,目前最快的方法是 .translate 方法:

<预><代码>>>>x='aaa12333bb445bb54b5b52'>>>导入字符串>>>all=string.maketrans('','')>>>nodigs=all.translate(all, string.digits)>>>x.translate(all, nodigs)'1233344554552'>>>

string.maketrans 生成一个转换表(长度为 256 的字符串),在这种情况下与 ''.join(chr(x) for x in range(256) 相同))(制作速度更快;-)..translate 应用翻译表(这里无关紧要,因为 all 本质上表示身份)并删除存在于第二个参数中的字符——关键部分.

.translate 在 Unicode 字符串(以及 Python 3 中的字符串)上的工作方式非常不同——我确实希望问题能指明感兴趣的 Python 的哪个主要版本!)-- 不是那么简单,不是那么快,但仍然很实用.

回到2.*,性能差异令人印象深刻...:

$ python -mtimeit -s'import string;all=string.maketrans("", "");nodig=all.translate(all, string.digits);x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'1000000 个循环,最好的 3 个:每个循环 1.04 微秒$ python -mtimeit -s'import re;x="aaa12333bb445bb54b5b52"''re.sub(r"\D", "", x)'100000 个循环,最好的 3 个:每个循环 7.9 微秒

将速度提高 7-8 倍绝非易事,因此 translate 方法非常值得了解和使用.另一种流行的非 RE 方法...:

$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'100000 个循环,最好的 3 个:每个循环 11.5 微秒

比 RE 慢 50%,因此 .translate 方法比它快一个数量级.

在 Python 3 或 Unicode 中,您需要传递 .translate 一个映射(使用序数,而不是直接作为键的字符),该映射为您返回None想删除.这里有一种方便的表达方式,用于删除除几个字符之外的所有内容":

导入字符串类德尔:def __init__(self, keep=string.digits):self.comp = dict((ord(c),c) for c in keep)def __getitem__(self, k):返回 self.comp.get(k)DD = 德尔()x='aaa12333bb445bb54b5b52'x.translate(DD)

也发出 '1233344554552'.然而,把它放在 xx.py 中我们有...:

$ python3.1 -mtimeit -s'import re;x="aaa12333bb445bb54b5b52"''re.sub(r"\D", "", x)'100000 个循环,最好的 3 个:每个循环 8.43 微秒$ python3.1 -mtimeit -s'import xx;x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'10000 个循环,最好的 3 个:每个循环 24.3 微秒

...这表明性能优势消失了,对于这种删除"任务,变成了性能下降.

How can I remove all characters except numbers from string?

解决方案

In Python 2.*, by far the fastest approach is the .translate method:

>>> x='aaa12333bb445bb54b5b52'
>>> import string
>>> all=string.maketrans('','')
>>> nodigs=all.translate(all, string.digits)
>>> x.translate(all, nodigs)
'1233344554552'
>>> 

string.maketrans makes a translation table (a string of length 256) which in this case is the same as ''.join(chr(x) for x in range(256)) (just faster to make;-). .translate applies the translation table (which here is irrelevant since all essentially means identity) AND deletes characters present in the second argument -- the key part.

.translate works very differently on Unicode strings (and strings in Python 3 -- I do wish questions specified which major-release of Python is of interest!) -- not quite this simple, not quite this fast, though still quite usable.

Back to 2.*, the performance difference is impressive...:

$ python -mtimeit -s'import string; all=string.maketrans("", ""); nodig=all.translate(all, string.digits); x="aaa12333bb445bb54b5b52"' 'x.translate(all, nodig)'
1000000 loops, best of 3: 1.04 usec per loop
$ python -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 7.9 usec per loop

Speeding things up by 7-8 times is hardly peanuts, so the translate method is well worth knowing and using. The other popular non-RE approach...:

$ python -mtimeit -s'x="aaa12333bb445bb54b5b52"' '"".join(i for i in x if i.isdigit())'
100000 loops, best of 3: 11.5 usec per loop

is 50% slower than RE, so the .translate approach beats it by over an order of magnitude.

In Python 3, or for Unicode, you need to pass .translate a mapping (with ordinals, not characters directly, as keys) that returns None for what you want to delete. Here's a convenient way to express this for deletion of "everything but" a few characters:

import string

class Del:
  def __init__(self, keep=string.digits):
    self.comp = dict((ord(c),c) for c in keep)
  def __getitem__(self, k):
    return self.comp.get(k)

DD = Del()

x='aaa12333bb445bb54b5b52'
x.translate(DD)

also emits '1233344554552'. However, putting this in xx.py we have...:

$ python3.1 -mtimeit -s'import re;  x="aaa12333bb445bb54b5b52"' 're.sub(r"\D", "", x)'
100000 loops, best of 3: 8.43 usec per loop
$ python3.1 -mtimeit -s'import xx; x="aaa12333bb445bb54b5b52"' 'x.translate(xx.DD)'
10000 loops, best of 3: 24.3 usec per loop

...which shows the performance advantage disappears, for this kind of "deletion" tasks, and becomes a performance decrease.

这篇关于使用 Python 从字符串中删除除数字以外的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆