如何删除非ASCII字符但保留句点和空格? [英] How can I remove non-ASCII characters but leave periods and spaces?

查看:89
本文介绍了如何删除非ASCII字符但保留句点和空格?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理.txt文件.我想要文件中没有非ASCII字符的文本字符串.但是,我想留空格和句点.目前,我也正在剥离它们.这是代码:

I'm working with a .txt file. I want a string of the text from the file with no non-ASCII characters. However, I want to leave spaces and periods. At present, I'm stripping those too. Here's the code:

def onlyascii(char):
    if ord(char) < 48 or ord(char) > 127: return ''
    else: return char

def get_my_string(file_path):
    f=open(file_path,'r')
    data=f.read()
    f.close()
    filtered_data=filter(onlyascii, data)
    filtered_data = filtered_data.lower()
    return filtered_data

如何修改onlyascii()以保留空格和句点?我想这并不太复杂,但我想不出来.

How should I modify onlyascii() to leave spaces and periods? I imagine it's not too complicated but I can't figure it out.

推荐答案

您可以使用

You can filter all characters from the string that are not printable using string.printable, like this:

>>> s = "some\x00string. with\x15 funny characters"
>>> import string
>>> printable = set(string.printable)
>>> filter(lambda x: x in printable, s)
'somestring. with funny characters'

我机器上的

string.printable包含:

string.printable on my machine contains:

0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c

在Python 3上,过滤器将返回可迭代的.返回字符串的正确方法是:

On Python 3, filter will return an iterable. The correct way to obtain a string back would be:

''.join(filter(lambda x: x in printable, s))

这篇关于如何删除非ASCII字符但保留句点和空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆