如何过滤一个字符串,以便只返回列表中的字符? [英] How do you filter a string such that only characters in your list are returned?

查看:128
本文介绍了如何过滤一个字符串,以便只返回列表中的字符?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

设想一个字符串,比如'Agh#$%#%2341- - !zdrkfd',我只希望对它进行一些操作,使得只返回小写字母(作为一个例子),在这种情况下带来'ghzdrkfd'。



你如何在Python中做到这一点?显而易见的方法是创建一个字符列表a到z,然后迭代我的字符串中的字符,然后逐个字符地构建一个新的字符串。这似乎是原始的。



我想知道正则表达式是否合适。替换不需要的字符似乎有问题,我倾向于将白名单列入黑名单。 .match 函数似乎不太合适。我已经查看了Python网站上的适当页面,但还没有找到一个似乎适合的方法。



如果正则表达式不合适,并且正确的方法是循环,是否有一个简单的函数,将一个字符串爆炸成一个列表?或者我只是打另一个for循环呢?

解决方案

如果你正在寻找效率。使用翻译功能是最快速的。



它可以用来快速替换字符和/或删除它们。

  import字符串
delete_table = string.maketrans(
string.ascii_lowercase,''len(string.ascii_lowercase)

table = string.maketrans('','')

agh#$%#%2341- - !zdrkfd.translate(table,delete_table)

在Python 2.6中:不再需要第二个表

  import string 
delete_table = string.maketrans(
string.ascii_lowercase,''len(string.ascii_lowercase)

Agh#$%#%2341- - !zdrkfd .translate(None,delete_table)

这种方法比其他任何方法都快。当然,你需要将delete_table存储在某个地方并使用它。但是,即使你不存储它,并且每次都建立它,它仍然会比其他建议的方法到目前为止更快。



确认我的说法是结果:

  for xrange(10000):
''.join(c for c in s if c.islower())

real 0m0.189s
user 0m0.176s
sys 0m0.012s

运行正则表达式解决方案:

  ):
re.sub(r'[^ az]','',s)

real 0m0.172s
user 0m0.164s
sys 0m0。 004s

[请求时] 如果预编译正则表达式:


$ b $

  r = re.compile(r'[^ az]')
for xrange(10000)
r.sub('',s)

real 0m0.166s
user 0m0.144s
sys 0m0.008s





$ b

运行翻译方法的次数相同p> real 0m0.075s
user 0m0.064s
sys 0m0.012s


Imagine a string, like 'Agh#$%#%2341- -!zdrkfd' and I only wish to perform some operating on it such that only the lowercase letters are returned (as an example), which in this case would bring 'ghzdrkfd'.

How do you do this in Python? The obvious way would be to create a list, of characters, 'a' through 'z', then iterate over the characters in my string and build a new string, character by character, of those in my list only. This seems primitive.

I was wondering if regular expressions are appropriate. Replacing unwanted characters seems problematic and I tend to prefer whitelisting over blacklisting. The .match function does not seem appropriate. I have looked over the appropriate page on the Python site, but have not found a method which seems to fit.

If regular expressions are not appropriate and the correct approach is looping, is there a simple function which "explodes" a string into a list? Or am I just hitting another for loop there?

解决方案

If you are looking for efficiency. Using the translate function is the fastest you can get.

It can be used to quickly replace characters and/or delete them.

import string
delete_table  = string.maketrans(
    string.ascii_lowercase, ' ' * len(string.ascii_lowercase)
)
table = string.maketrans('', '')

"Agh#$%#%2341- -!zdrkfd".translate(table, delete_table)

In python 2.6: you don't need the second table anymore

import string
delete_table  = string.maketrans(
    string.ascii_lowercase, ' ' * len(string.ascii_lowercase)
)
"Agh#$%#%2341- -!zdrkfd".translate(None, delete_table)

This is method is way faster than any other. Of course you need to store the delete_table somewhere and use it. But even if you don't store it and build it every time, it is still going to be faster than other suggested methods so far.

To confirm my claims here are the results:

for i in xrange(10000):
    ''.join(c for c in s if c.islower())

real    0m0.189s
user    0m0.176s
sys 0m0.012s

While running the regular expression solution:

for i in xrange(10000):
    re.sub(r'[^a-z]', '', s)

real    0m0.172s
user    0m0.164s
sys 0m0.004s

[Upon request] If you pre-compile the regular expression:

r = re.compile(r'[^a-z]')
for i in xrange(10000):
    r.sub('', s)

real    0m0.166s
user    0m0.144s
sys 0m0.008s

Running the translate method the same number of times took:

real    0m0.075s
user    0m0.064s
sys 0m0.012s

这篇关于如何过滤一个字符串,以便只返回列表中的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆