您如何过滤字符串以便只返回列表中的字符? [英] How do you filter a string such that only characters in your list are returned?

查看:19
本文介绍了您如何过滤字符串以便只返回列表中的字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

想象一个字符串,比如 'Agh#$%#%2341- -!zdrkfd' 并且我只想对它执行一些操作,以便只返回小写字母(作为示例),在这种情况下带上'ghzdrkfd'.

Imagine a string, like 'Agh#$%#%2341- -!zdrkfd' and I only wish to perform some operating on it such that only the lowercase letters are returned (as an example), which in this case would bring 'ghzdrkfd'.

你如何在 Python 中做到这一点?显而易见的方法是创建一个字符列表,从 'a' 到 'z',然后迭代我的字符串中的字符,并逐个字符地构建一个新的字符串,仅在我的列表中.这看起来很原始.

How do you do this in Python? The obvious way would be to create a list, of characters, 'a' through 'z', then iterate over the characters in my string and build a new string, character by character, of those in my list only. This seems primitive.

我想知道正则表达式是否合适.替换不需要的字符似乎有问题,我倾向于将白名单列入黑名单..match 函数似乎不合适.我查看了 Python 站点上的相应页面,但没有找到合适的方法.

I was wondering if regular expressions are appropriate. Replacing unwanted characters seems problematic and I tend to prefer whitelisting over blacklisting. The .match function does not seem appropriate. I have looked over the appropriate page on the Python site, but have not found a method which seems to fit.

如果正则表达式不合适并且正确的方法是循环,是否有一个简单的函数可以将字符串分解"为列表?还是我只是在那里打了另一个 for 循环?

If regular expressions are not appropriate and the correct approach is looping, is there a simple function which "explodes" a string into a list? Or am I just hitting another for loop there?

推荐答案

如果您正在寻找效率.使用 translate 函数是最快的.

If you are looking for efficiency. Using the translate function is the fastest you can get.

它可用于快速替换和/或删除字符.

It can be used to quickly replace characters and/or delete them.

import string
delete_table  = string.maketrans(
    string.ascii_lowercase, ' ' * len(string.ascii_lowercase)
)
table = string.maketrans('', '')

"Agh#$%#%2341- -!zdrkfd".translate(table, delete_table)

在 python 2.6 中:你不再需要第二个表

import string
delete_table  = string.maketrans(
    string.ascii_lowercase, ' ' * len(string.ascii_lowercase)
)
"Agh#$%#%2341- -!zdrkfd".translate(None, delete_table)

这种方法比任何其他方法都快.当然,您需要将 delete_table 存储在某处并使用它.但即使你不存储它并每次构建它,它仍然会比目前其他建议的方法更快.

This is method is way faster than any other. Of course you need to store the delete_table somewhere and use it. But even if you don't store it and build it every time, it is still going to be faster than other suggested methods so far.

为了证实我的说法,这里是结果:

To confirm my claims here are the results:

for i in xrange(10000):
    ''.join(c for c in s if c.islower())

real    0m0.189s
user    0m0.176s
sys 0m0.012s

同时运行正则表达式解决方案:

While running the regular expression solution:

for i in xrange(10000):
    re.sub(r'[^a-z]', '', s)

real    0m0.172s
user    0m0.164s
sys 0m0.004s

[应要求]如果您预编译正则表达式:

[Upon request] If you pre-compile the regular expression:

r = re.compile(r'[^a-z]')
for i in xrange(10000):
    r.sub('', s)

real    0m0.166s
user    0m0.144s
sys 0m0.008s

运行翻译方法的次数相同:

Running the translate method the same number of times took:

real    0m0.075s
user    0m0.064s
sys 0m0.012s

这篇关于您如何过滤字符串以便只返回列表中的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆