如何删除非ASCII字符但保留句点和空格? [英] How can I remove non-ASCII characters but leave periods and spaces?
问题描述
我正在处理.txt文件.我想要文件中没有非ASCII字符的文本字符串.但是,我想留空格和句点.目前,我也正在剥离它们.这是代码:
I'm working with a .txt file. I want a string of the text from the file with no non-ASCII characters. However, I want to leave spaces and periods. At present, I'm stripping those too. Here's the code:
def onlyascii(char):
if ord(char) < 48 or ord(char) > 127: return ''
else: return char
def get_my_string(file_path):
f=open(file_path,'r')
data=f.read()
f.close()
filtered_data=filter(onlyascii, data)
filtered_data = filtered_data.lower()
return filtered_data
如何修改onlyascii()以保留空格和句点?我想这并不太复杂,但我想不出来.
How should I modify onlyascii() to leave spaces and periods? I imagine it's not too complicated but I can't figure it out.
推荐答案
You can filter all characters from the string that are not printable using string.printable, like this:
>>> s = "some\x00string. with\x15 funny characters"
>>> import string
>>> printable = set(string.printable)
>>> filter(lambda x: x in printable, s)
'somestring. with funny characters'
我机器上的
string.printable包含:
string.printable on my machine contains:
0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c
在Python 3上,过滤器将返回可迭代的.返回字符串的正确方法是:
On Python 3, filter will return an iterable. The correct way to obtain a string back would be:
''.join(filter(lambda x: x in printable, s))
这篇关于如何删除非ASCII字符但保留句点和空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!