Python - 替换字符串中的非ASCII字符(») [英] Python - Replace non-ascii character in string (»)

查看:341
本文介绍了Python - 替换字符串中的非ASCII字符(»)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要用一个空格替换字符串»,但我仍然收到一个错误。这是我使用的代码:

I need to replace in a string the character "»" with a whitespace, but I still get an error. This is the code I use:

# -*- coding: utf-8 -*-
from bs4 import BeautifulSoup

# other code

soup = BeautifulSoup(data, 'lxml')
mystring = soup.find('a').text.replace(' »','')




UnicodeEncodeError:'ascii' t编码字符u'\xbb'在
位置13:ordinal不在范围(128)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in position 13: ordinal not in range(128)

但是如果我用这个其他脚本测试:

But If I test it with this other script:

# -*- coding: utf-8 -*-
a = "hi »"
b = a.replace('»','') 

它可以工作。为什么这样?

It works. Why this?

推荐答案

为了用 str.replace()方法;您需要首先对字符串进行解码,然后替换文本并将其编码回原始文本:

In order to replace the content of string using str.replace() method; you need to firstly decode the string, then replace the text and encode it back to the original text:

>>> a = "hi »"
>>> a.decode('utf-8').replace("»".decode('utf-8'), "").encode('utf-8')
'hi '

您可以使用以下正则表达式从字符串中删除所有非ASCII字符:

You may use the following regex to remove all the non-ascii characters from the string:

>>> import re
>>> re.sub(r'[^\x00-\x7f]',r'', 'hi »')
'hi '

这篇关于Python - 替换字符串中的非ASCII字符(»)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆